# Political Polarisation and Economic Factors (GINI & CPI) in OECD Countries

## Project Overview

This notebook investigates the potential relationships between key economic indicators and political polarisation within OECD (Organisation for Economic Co-operation and Development) member countries. Specifically, we aim to explore whether income inequality, as measured by the **GINI coefficient**, and inflation, as measured by the **Consumer Price Index (CPI)**, correlate with or can help explain variations in political polarisation. The analysis incorporates both contemporaneous and lagged effects of these economic indicators and employs panel data modelling techniques to account for country-specific characteristics.

Political polarisation, broadly defined as the divergence of political attitudes toward ideological extremes, is a topic of considerable interest in contemporary political science. Understanding its potential drivers, including socio-economic factors, is crucial for assessing democratic health and societal stability.

**Research Question:** How do income inequality (GINI) and inflation (CPI), both contemporaneously and with a time lag, relate to the ideological dispersion of political parties in OECD countries, considering country-specific fixed effects?

**Data Sources:**
1.  **Manifesto Project Dataset (MPD):** Provides coded data from political party manifestos, including the `rile` score, which estimates a party's position on a left-right ideological scale. This is our primary source for measuring party ideology. (File: `MPDataset_MPDS2024a.csv`)
2.  **World Bank:** Provides GINI coefficient data. (File: `worldbank_gini_data.csv`)
3.  **OECD Statistics:** Provides Consumer Price Index (CPI) data. (File: `OECD_cpi_data.csv`)

**Methodology Outline:**
1.  **Data Loading and Merging:** Load the pre-merged dataset (generated by `scripts/merge_data.py`) which combines party-level manifesto data with country-level GINI (from World Bank) and CPI (from OECD) economic indicators.
2.  **Filtering:** Focus the analysis on OECD member countries.
3.  **Quantifying Polarisation:** Calculate a political polarisation score for each country-year using the vote-share weighted standard deviation of party `rile` scores.
4.  **Feature Engineering:** Create 1-year lagged versions of GINI and CPI to explore potential delayed effects of economic conditions on polarisation.
5.  **Dataset Preparation:** Aggregate data to the country-year level, creating a panel dataset suitable for time-series and cross-sectional analysis.
6.  **Exploratory Analysis:** Conduct correlation analysis to identify linear associations between polarisation and both current and lagged GINI and CPI values.
7.  **Statistical Modelling:** 
    a.  Employ Pooled Ordinary Least Squares (OLS) regression models to examine general relationships.
    b.  Employ Panel Data (Fixed Effects) regression models to control for unobserved time-invariant country-specific characteristics.

This notebook documents each step of the analysis, from data preparation through to the interpretation of results.

## 1. Setup and Library Imports

This section imports the necessary Python libraries for data manipulation, numerical computation, statistical analysis, and visualisation. Each library plays a specific role:

-   `pandas`: For data manipulation and analysis, particularly for working with DataFrames.
-   `numpy`: For numerical operations, especially for array manipulations and mathematical functions.
-   `os`: For interacting with the operating system, primarily used here for constructing file paths in a system-agnostic way.
-   `statsmodels`: A powerful library for estimating and interpreting statistical models, including OLS regression and weighted statistics.
-   `linearmodels`: Specialised for panel data econometrics, used here for Fixed Effects models.
-   `matplotlib.pyplot` and `seaborn`: For creating static, interactive, and animated visualisations. Seaborn builds on Matplotlib and provides a high-level interface for drawing attractive and informative statistical graphics.

In [None]:
import pandas as pd
import numpy as np
import os
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.stats.weightstats import DescrStatsW
from linearmodels.panel import PanelOLS # For Panel Data Regression
import matplotlib.pyplot as plt
import seaborn as sns

# Set plotting style for consistency and aesthetics
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6) # Default figure size

## 2. Configuration and File Paths


In [None]:
# Get the directory where the notebook is likely located (e.g., /path/to/project/notebooks)
try:
    notebook_dir = os.path.dirname(os.path.abspath(__vsc_ipynb_file__))
except NameError:
    notebook_dir = os.getcwd()

project_base_dir = os.path.dirname(notebook_dir)

output_folder_name = "output"
output_dir = os.path.join(project_base_dir, output_folder_name)
    
if not os.path.exists(output_dir):
    os.makedirs(output_dir)
    print(f"Created output directory: {output_dir}")

merged_data_filename = "merged_political_oecd_data.csv" # Generated by scripts/merge_data.py
merged_data_path = os.path.join(output_dir, merged_data_filename)

power_bi_data_filename = "country_year_full_analysis_data.csv" # Also generated by scripts/analyse_polarisation.py
power_bi_data_path = os.path.join(output_dir, power_bi_data_filename)

print(f"Project base directory (assumed): {project_base_dir}")
print(f"Output directory: {output_dir}")
print(f"Looking for merged data (for initial processing) at: {merged_data_path}")
print(f"Looking for comprehensive analysis data (for Power BI and panel models) at: {power_bi_data_path}")

## 3. Load Processed Data for Analysis

This notebook will primarily walk through the process of creating `country_year_full_analysis_data.csv` file. This file is generated by the `scripts/analyse_polarisation.py` script, which itself uses the output of `scripts/merge_data.py`. 

The `country_year_full_analysis_data.csv` contains country-year level data including:
- `countryname`
- `year`
- `PolarisationScore` (calculated)
- `GINI` (current year, from World Bank via `merge_data.py`)
- `CPI` (current year, from OECD via `merge_data.py`)
- `GINI_lag1` (GINI from the previous year)
- `CPI_lag1` (CPI from the previous year)


In [None]:
if not os.path.exists(power_bi_data_path):
    print(f"Error: Comprehensive analysis data file not found at {power_bi_data_path}")
    print("Please ensure you have run the 'scripts/analyse_polarisation.py' script first, ")
    print("which generates 'country_year_full_analysis_data.csv' in the 'output' directory.")
    analysis_df_lagged = pd.DataFrame()
else:
    analysis_df_lagged = pd.read_csv(power_bi_data_path, low_memory=False)
    print(f"Loaded comprehensive analysis data with {analysis_df_lagged.shape[0]} rows and {analysis_df_lagged.shape[1]} columns.")
    print("First 5 rows of the loaded dataframe:")
    display(analysis_df_lagged.head())
    print("\nData types:")
    analysis_df_lagged.info()

## 4. Prepare Data Subsets for Specific Analyses

For different regression models (contemporaneous vs. lagged), we need datasets with complete observations for the specific variables involved in each model. This involves listwise deletion based on the required columns for each analysis type.

**Rationale:** Statistical models like OLS and PanelOLS require non-missing values for all variables in a given equation. Creating these subsets ensures each model runs on the maximum available valid data for its specification.

In [None]:
if not analysis_df_lagged.empty:
    cols_for_current_analysis = ['PolarisationScore', 'GINI', 'CPI']
    analysis_df_current_complete = analysis_df_lagged.dropna(subset=cols_for_current_analysis).copy()
    print(f"Shape for current variables OLS/Pooled analysis: {analysis_df_current_complete.shape}")

    cols_for_lagged_analysis = ['PolarisationScore', 'GINI_lag1', 'CPI_lag1']
    analysis_df_lagged_complete = analysis_df_lagged.dropna(subset=cols_for_lagged_analysis).copy()
    print(f"Shape for lagged variables OLS/Pooled analysis: {analysis_df_lagged_complete.shape}")

    # Data for PanelOLS (will also be subsetted per model later)
    panel_data_source = analysis_df_lagged.set_index(['countryname', 'year'])
    print(f"Panel data source shape (before model-specific NaN dropping): {panel_data_source.shape}")
else:
    print("Initial data for analysis (analysis_df_lagged) is empty. Cannot create subsets.")
    analysis_df_current_complete = pd.DataFrame()
    analysis_df_lagged_complete = pd.DataFrame()
    panel_data_source = pd.DataFrame()

## 5. Pooled OLS: Correlation Analysis (Current and Lagged Variables)

**Objective:** To conduct an initial exploratory analysis of the linear relationships between `PolarisationScore` and the economic indicators (current and lagged), without accounting for panel structure yet.

**Method: Pearson Correlation Coefficient**
-   Measures the strength and direction of a linear association between two continuous variables.
-   Values range from -1 (perfect negative linear correlation) to +1 (perfect positive linear correlation), with 0 indicating no linear correlation.

**Process:**
1.  Calculates and displays the Pearson correlation matrix for `PolarisationScore`, current `GINI`, and current `CPI` using `analysis_df_current_complete`.
2.  Calculates and displays a more comprehensive Pearson correlation matrix using `analysis_df_lagged_complete` (or `analysis_df_current_complete` if focusing only on common observations) to show relationships including lagged terms.
3.  Visualises key pairwise relationships using scatter plots.

In [None]:
# Correlation for Current Variables
if not analysis_df_current_complete.empty and len(analysis_df_current_complete) >= 2:
    print("--- Pooled OLS: Correlation Analysis (Current Variables) ---")
    correlation_matrix_current = analysis_df_current_complete[cols_for_current_analysis].corr(method='pearson')
    print("Pearson Correlation Matrix (Current Variables):")
    display(correlation_matrix_current)

    fig_current, axes_current = plt.subplots(1, 2, figsize=(16, 6))
    sns.scatterplot(ax=axes_current[0], data=analysis_df_current_complete, x='GINI', y='PolarisationScore')
    axes_current[0].set_title('Polarisation Score vs. Current GINI')
    axes_current[0].grid(True)
    sns.scatterplot(ax=axes_current[1], data=analysis_df_current_complete, x='CPI', y='PolarisationScore')
    axes_current[1].set_title('Polarisation Score vs. Current CPI')
    axes_current[1].grid(True)
    plt.tight_layout()
    plt.show()
    corr_current_plot_path = os.path.join(output_dir, "correlation_plots_current_notebook.png")
    try: fig_current.savefig(corr_current_plot_path); print(f"Current correlation plots saved to: {corr_current_plot_path}")
    except Exception as e: print(f"Could not save current correlation plots: {e}")
else:
    print("Skipping current variable correlation analysis due to insufficient data.")

# Correlation for Lagged Variables (on the sample where lagged data is available)
if not analysis_df_lagged_complete.empty and len(analysis_df_lagged_complete) >= 2:
    print("\n--- Pooled OLS: Correlation Analysis (Including Lagged Variables) ---")
    cols_for_extended_corr = ['PolarisationScore', 'GINI', 'CPI', 'GINI_lag1', 'CPI_lag1']
    existing_cols_for_corr = [col for col in cols_for_extended_corr if col in analysis_df_lagged_complete.columns]
    correlation_matrix_extended = analysis_df_lagged_complete[existing_cols_for_corr].corr(method='pearson')
    print("Pearson Correlation Matrix (on sample with complete lagged data, including contemporaneous GINI/CPI for comparison):")
    display(correlation_matrix_extended)

    fig_lagged, axes_lagged = plt.subplots(1, 2, figsize=(16, 6))
    sns.scatterplot(ax=axes_lagged[0], data=analysis_df_lagged_complete, x='GINI_lag1', y='PolarisationScore')
    axes_lagged[0].set_title('Polarisation Score vs. GINI (Lagged 1 Year)')
    axes_lagged[0].grid(True)
    sns.scatterplot(ax=axes_lagged[1], data=analysis_df_lagged_complete, x='CPI_lag1', y='PolarisationScore')
    axes_lagged[1].set_title('Polarisation Score vs. CPI (Lagged 1 Year)')
    axes_lagged[1].grid(True)
    plt.tight_layout()
    plt.show()
    corr_lagged_plot_path = os.path.join(output_dir, "correlation_plots_lagged_notebook.png")
    try: fig_lagged.savefig(corr_lagged_plot_path); print(f"Lagged correlation plots saved to: {corr_lagged_plot_path}")
    except Exception as e: print(f"Could not save lagged correlation plots: {e}")
else:
    print("Skipping lagged variable correlation analysis due to insufficient data.")

## 6. Pooled OLS Regression Analysis (Current and Lagged Variables)

**Objective:** To initially model the relationship between polarisation and economic indicators using Pooled OLS, which treats all country-year observations as independent. This serves as a baseline before applying more sophisticated panel data techniques.

**Models to be Estimated:**
1.  **Contemporaneous Models (using `analysis_df_current_complete`):**
    * `PolarisationScore ~ GINI`
    * `PolarisationScore ~ CPI`
    * `PolarisationScore ~ GINI + CPI`
2.  **Lagged Models (using `analysis_df_lagged_complete`):**
    * `PolarisationScore ~ GINI_lag1`
    * `PolarisationScore ~ CPI_lag1`
    * `PolarisationScore ~ GINI_lag1 + CPI_lag1`

**Interpretation Focus:** Coefficients, p-values, R-squared, and overall model fit. Regression plots are generated for simple linear models.

In [None]:
# --- Pooled OLS Regressions with Current Variables ---
if not analysis_df_current_complete.empty and len(analysis_df_current_complete) >= 2:
    print("\n--- Pooled OLS: Regression Analysis (Current Variables) ---")
    regression_data_current = analysis_df_current_complete[cols_for_current_analysis].copy()
    regression_data_current.replace([np.inf, -np.inf], np.nan, inplace=True); regression_data_current.dropna(inplace=True)

    if not regression_data_current.empty and len(regression_data_current) >=2:
        print(f"Observations for current OLS regression: {len(regression_data_current)}")
        try:
            print("\nOLS Model 1a: PolarisationScore ~ GINI")
            model_gini_current = smf.ols('PolarisationScore ~ GINI', data=regression_data_current).fit()
            print(model_gini_current.summary())
            fig, ax = plt.subplots(figsize=(8,6)); sns.regplot(x='GINI', y='PolarisationScore', data=regression_data_current, ax=ax, ci=95, line_kws={'color':'red'}); ax.set_title('OLS: Polarisation Score vs. Current GINI'); plt.grid(True); fig.tight_layout()
            path = os.path.join(output_dir, "ols_regression_gini_current_notebook.png"); plt.savefig(path); plt.show(); print(f'Plot saved: {path}')
        except Exception as e: print(f"Error in OLS GINI current regression: {e}")
            
        try:
            print("\nOLS Model 2a: PolarisationScore ~ CPI")
            model_cpi_current = smf.ols('PolarisationScore ~ CPI', data=regression_data_current).fit()
            print(model_cpi_current.summary())
            fig, ax = plt.subplots(figsize=(8,6)); sns.regplot(x='CPI', y='PolarisationScore', data=regression_data_current, ax=ax, ci=95, line_kws={'color':'red'}); ax.set_title('OLS: Polarisation Score vs. Current CPI'); plt.grid(True); fig.tight_layout()
            path = os.path.join(output_dir, "ols_regression_cpi_current_notebook.png"); plt.savefig(path); plt.show(); print(f'Plot saved: {path}')
        except Exception as e: print(f"Error in OLS CPI current regression: {e}")

        try:
            print("\nOLS Model 3a: PolarisationScore ~ GINI + CPI")
            if regression_data_current['GINI'].nunique() > 1 and regression_data_current['CPI'].nunique() > 1:
                model_multiple_current = smf.ols('PolarisationScore ~ GINI + CPI', data=regression_data_current).fit()
                print(model_multiple_current.summary())
            else: print("Skipping current OLS multiple regression due to insufficient variance.")
        except Exception as e: print(f"Error in OLS Multiple current regression: {e}")
    else: print("Not enough data for current OLS regressions after final cleaning.")
else:
    print("Skipping current variable OLS regression analysis due to insufficient initial data.")

# --- Pooled OLS Regressions with Lagged Variables ---
if not analysis_df_lagged_complete.empty and len(analysis_df_lagged_complete) >= 2:
    print("\n--- Pooled OLS: Regression Analysis (Lagged Variables) ---")
    regression_data_lagged = analysis_df_lagged_complete[['PolarisationScore', 'GINI_lag1', 'CPI_lag1']].copy()
    regression_data_lagged.replace([np.inf, -np.inf], np.nan, inplace=True); regression_data_lagged.dropna(inplace=True)

    if not regression_data_lagged.empty and len(regression_data_lagged) >=2:
        print(f"Observations for lagged OLS regression: {len(regression_data_lagged)}")
        try:
            print("\nOLS Model 1b: PolarisationScore ~ GINI_lag1")
            model_gini_lagged = smf.ols('PolarisationScore ~ GINI_lag1', data=regression_data_lagged).fit()
            print(model_gini_lagged.summary())
            fig, ax = plt.subplots(figsize=(8,6)); sns.regplot(x='GINI_lag1', y='PolarisationScore', data=regression_data_lagged, ax=ax, ci=95, line_kws={'color':'blue'}); ax.set_title('OLS: Polarisation Score vs. GINI (Lagged 1 Year)'); plt.grid(True); fig.tight_layout()
            path = os.path.join(output_dir, "ols_regression_gini_lagged_notebook.png"); plt.savefig(path); plt.show(); print(f'Plot saved: {path}')
        except Exception as e: print(f"Error in OLS GINI_lag1 regression: {e}")

        try:
            print("\nOLS Model 2b: PolarisationScore ~ CPI_lag1")
            model_cpi_lagged = smf.ols('PolarisationScore ~ CPI_lag1', data=regression_data_lagged).fit()
            print(model_cpi_lagged.summary())
            fig, ax = plt.subplots(figsize=(8,6)); sns.regplot(x='CPI_lag1', y='PolarisationScore', data=regression_data_lagged, ax=ax, ci=95, line_kws={'color':'blue'}); ax.set_title('OLS: Polarisation Score vs. CPI (Lagged 1 Year)'); plt.grid(True); fig.tight_layout()
            path = os.path.join(output_dir, "ols_regression_cpi_lagged_notebook.png"); plt.savefig(path); plt.show(); print(f'Plot saved: {path}')
        except Exception as e: print(f"Error in OLS CPI_lag1 regression: {e}")

        try:
            print("\nOLS Model 3b: PolarisationScore ~ GINI_lag1 + CPI_lag1")
            if regression_data_lagged['GINI_lag1'].nunique() > 1 and regression_data_lagged['CPI_lag1'].nunique() > 1:
                model_multiple_lagged = smf.ols('PolarisationScore ~ GINI_lag1 + CPI_lag1', data=regression_data_lagged).fit()
                print(model_multiple_lagged.summary())
            else: print("Skipping OLS lagged multiple regression due to insufficient variance.")
        except Exception as e: print(f"Error in OLS Multiple lagged regression: {e}")
    else: print("Not enough data for OLS lagged regressions after final cleaning.")
else:
    print("Skipping lagged variable OLS regression analysis due to insufficient initial data.")

## 7. Panel Data Regression (Country Fixed Effects)

**Objective:** To account for unobserved time-invariant country-specific characteristics that might be correlated with both economic factors and political polarisation. This provides a more robust estimate of the effects of GINI and CPI by essentially analysing how *changes within each country over time* in GINI/CPI relate to *changes within that same country over time* in polarisation.

**Method: PanelOLS with Entity (Country) Fixed Effects**
-   `linearmodels.panel.PanelOLS` is used.
-   `EntityEffects=True` (or including `+ EntityEffects` in the formula) instructs the model to include a dummy variable for each country (except one, to avoid perfect multicollinearity), effectively absorbing all stable between-country differences.
-   **Robust Standard Errors (`cov_type='robust'`):** These are used to account for potential heteroskedasticity or autocorrelation in the residuals, which are common in panel data.

**Data Preparation for PanelOLS:**
The DataFrame needs a `MultiIndex` consisting of the entity identifier (`countryname`) and the time identifier (`year`).

**Interpretation Focus for Fixed Effects Models:**
-   **`R-squared (Within)`:** Indicates the proportion of the variance *within countries over time* that is explained by the model.
-   **Coefficients:** Represent the estimated change in the dependent variable for a one-unit change in an independent variable, *holding constant all time-invariant country characteristics*.
-   **`F-test for Poolability`:** A significant p-value for this test (provided by `linearmodels`) strongly suggests that fixed effects are necessary and that a simpler Pooled OLS model would likely suffer from omitted variable bias.

In [None]:
if not panel_data_source.empty:
    print("\n--- Panel Data Regression (Country Fixed Effects) ---")
    print("Note: This uses the 'linearmodels' library.")

    # Panel Model 1: Current GINI + CPI
    print("\nPanel Model 1: PolarisationScore ~ GINI + CPI (Country Fixed Effects)")
    try:
        model1_data_panel = panel_data_source[['PolarisationScore', 'GINI', 'CPI']].dropna()
        if len(model1_data_panel) > (model1_data_panel.index.get_level_values('countryname').nunique() + 2):
            formula1 = 'PolarisationScore ~ 1 + GINI + CPI + EntityEffects'
            mod1_panel = PanelOLS.from_formula(formula1, data=model1_data_panel)
            results1_panel = mod1_panel.fit(cov_type='robust') 
            print(results1_panel)
        else: print("Not enough observations for Panel Model 1.")
    except Exception as e: print(f"Error in Panel Model 1: {e}")

    # Panel Model 2: Lagged GINI + Lagged CPI
    print("\nPanel Model 2: PolarisationScore ~ GINI_lag1 + CPI_lag1 (Country Fixed Effects)")
    try:
        model2_data_panel = panel_data_source[['PolarisationScore', 'GINI_lag1', 'CPI_lag1']].dropna()
        if len(model2_data_panel) > (model2_data_panel.index.get_level_values('countryname').nunique() + 2):
            formula2 = 'PolarisationScore ~ 1 + GINI_lag1 + CPI_lag1 + EntityEffects'
            mod2_panel = PanelOLS.from_formula(formula2, data=model2_data_panel)
            results2_panel = mod2_panel.fit(cov_type='robust') 
            print(results2_panel)
        else: print("Not enough observations for Panel Model 2.")
    except Exception as e: print(f"Error in Panel Model 2: {e}")

    # Panel Model 3: Current CPI + Lagged CPI
    print("\nPanel Model 3: PolarisationScore ~ CPI + CPI_lag1 (Country Fixed Effects)")
    try:
        model3_data_panel = panel_data_source[['PolarisationScore', 'CPI', 'CPI_lag1']].dropna()
        if len(model3_data_panel) > (model3_data_panel.index.get_level_values('countryname').nunique() + 2):
            formula3 = 'PolarisationScore ~ 1 + CPI + CPI_lag1 + EntityEffects'
            mod3_panel = PanelOLS.from_formula(formula3, data=model3_data_panel)
            results3_panel = mod3_panel.fit(cov_type='robust')
            print(results3_panel)
        else: print("Not enough observations for Panel Model 3.")
    except Exception as e: print(f"Error in Panel Model 3: {e}")
        
    # Panel Model 4: All GINI/CPI (Current and Lagged)
    print("\nPanel Model 4: PolarisationScore ~ GINI + CPI + GINI_lag1 + CPI_lag1 (Country Fixed Effects)")
    try:
        model4_data_panel = panel_data_source[['PolarisationScore', 'GINI', 'CPI', 'GINI_lag1', 'CPI_lag1']].dropna()
        if len(model4_data_panel) > (model4_data_panel.index.get_level_values('countryname').nunique() + 4): 
            formula4 = 'PolarisationScore ~ 1 + GINI + CPI + GINI_lag1 + CPI_lag1 + EntityEffects'
            mod4_panel = PanelOLS.from_formula(formula4, data=model4_data_panel)
            results4_panel = mod4_panel.fit(cov_type='robust')
            print(results4_panel)
        else: print("Not enough observations for Panel Model 4.")
    except Exception as e: print(f"Error in Panel Model 4: {e}")
else:
    print("Panel data source is empty. Skipping Panel Data Regression.")

## 8. Discussion and Conclusion of Findings (Pooled OLS and Fixed Effects)

This section synthesises the results from both Pooled OLS and Country Fixed Effects panel regressions to provide a comprehensive answer to the question: *How do income inequality (GINI) and inflation (CPI), both contemporaneously and with a 1-year lag, relate to political polarisation in OECD countries, considering country-specific characteristics?*

**Recap of Available Observations (after processing new GINI/CPI data):**
-   Contemporaneous Pooled OLS / Panel Model 1 (GINI, CPI):     - **~174 country-years**.
-   Lagged Pooled OLS / Panel Model 2 (GINI_lag1, CPI_lag1):    - **~149 country-years**.
-   Panel Model 3 (CPI, CPI_lag1):                              - **~337 country-years** (more due to not requiring GINI).
-   Panel Model 4 (All four GINI/CPI terms):                    - **~131 country-years**.

The F-test for Poolability in all panel models strongly indicated that country fixed effects are important, suggesting that Pooled OLS results might be less reliable due to country-specific differences. This confirms that the Fixed Effects models provide more reliable estimates than Pooled OLS by controlling for unobserved differences between countries.

**Key Findings from Pooled OLS Models (Baseline):**

1.  **Income Inequality (GINI):**
    * Contemporaneous GINI: Not statistically significant (p≈0.21).
    * Lagged GINI (GINI_lag1): Marginally not significant (p≈0.06-0.07) with a negative coefficient, suggesting a potential weak delayed negative association.
2.  **Inflation (CPI):**
    * Contemporaneous CPI: Statistically significant positive predictor (p≈0.01). Higher current CPI associated with higher current polarisation.
    * Lagged CPI (CPI_lag1): Statistically significant positive predictor (p≈0.02-0.03). Higher past year's CPI associated with higher current polarisation.
    * In multiple Pooled OLS models, CPI (current or lagged) generally retained significance when GINI (current or lagged) was included.

**Key Findings from Country Fixed Effects Panel Data Models (More Robust):**

1.  **Income Inequality (GINI):**
    * **Contemporaneous GINI:** In Panel Model 1 (with current CPI), GINI was not statistically significant (p≈0.22).
    * **Lagged GINI (GINI_lag1):** In Panel Model 2 (with lagged CPI), GINI_lag1 was not statistically significant (p≈0.14).
    * In the comprehensive Panel Model 4 (all GINI/CPI terms), neither current nor lagged GINI were significant.
    * **Interpretation:** After controlling for time-invariant country-specific characteristics, there is no robust statistical evidence that changes in GINI (either current or lagged by one year) are significantly associated with changes in political polarisation within OECD countries.

2.  **Inflation (CPI):**
    * **Contemporaneous CPI:** In Panel Model 1 (with current GINI), current CPI was a statistically significant positive predictor (p≈0.013). A one-unit increase in current CPI was associated with a ~0.08 unit increase in polarisation score, within countries over time.
    * **Lagged CPI (CPI_lag1):** In Panel Model 2 (with lagged GINI), CPI_lag1 was not statistically significant at the 5% level (p≈0.10), though it showed a positive coefficient. 
    * **CPI and CPI_lag1 Together (Panel Model 3):** When both current and lagged CPI were included (N=337, more observations), neither term was individually statistically significant. This could suggest multicollinearity between CPI and its lag once country effects are controlled, or that their distinct effects are harder to disentangle with this specification.
    * **Comprehensive Model (Panel Model 4):** In the model with all four economic terms, neither current CPI nor lagged CPI were statistically significant at the 5% level.
    * **Interpretation:** The evidence for a CPI effect on polarisation becomes less consistent and generally weaker once country fixed effects are introduced. While current CPI showed significance in one fixed effects specification (Panel Model 1), this did not robustly hold across all models including CPI terms. This suggests that some of the CPI-polarisation relationship observed in Pooled OLS might have been influenced by between-country differences.

**Overall Conclusion from All Analyses:**

The initial Pooled OLS models suggested a statistically significant positive relationship between inflation (both current and lagged CPI) and political polarisation, and a potential weak, delayed negative relationship for income inequality (GINI). However, the introduction of **country fixed effects** in the panel data models provides a more rigorous test by controlling for stable, unobserved differences between countries.

With country fixed effects:
-   The evidence for an effect of **income inequality (GINI)**, whether current or lagged, on political polarisation remains **not statistically significant**.
-   The evidence for an effect of **inflation (CPI)** is **mixed and less robust** than in Pooled OLS. Current CPI was significant in one specification (alongside current GINI), but lagged CPI lost its conventional significance, and in models with both current and lagged CPI or all four economic terms, individual CPI terms were not significant. The most comprehensive fixed effects model (Panel Model 4) found none of the economic variables to be statistically significant predictors of within-country changes in polarisation.

The consistently low **R-squared (Within)** values (typically below 10%) across all fixed effects models indicate that changes in GINI and CPI (current or lagged) explain very little of the *changes in political polarisation within countries over time*. This strongly implies that other factors not captured by this model, including time-varying country-specific events, policy changes, social shifts, or institutional dynamics, are likely more influential drivers of shifts in polarisation within countries.

**Final Limitations and Considerations:**

-   **Data Sparsity:** Despite using new data, the number of observations for panel models, especially those requiring multiple lagged and current variables without NaNs, can still be a limiting factor for detecting more subtle effects or for more complex models.
-   **Model Specification:** The choice of a 1-year lag is one possibility; other lag structures might exist. The relationships could also be non-linear.
-   **Measurement:** The specific measures of polarisation, GINI, and CPI have their own nuances and limitations.
-   **Measure of Polarisation:** The chosen measure (vote-share weighted standard deviation of `rile`) is one among many. Different operationalizations of polarisation (e.g., affective polarization, legislative voting patterns) might yield different results.
-   **Linearity Assumption:** The regression models assume linear relationships. The actual relationships might be non-linear or more complex.
-   **Correlation vs. Causation:** This analysis identifies associations. It **cannot establish causal links**. For instance, while higher CPI is associated with higher polarisation, we cannot conclude that inflation *causes* polarisation, or vice-versa. There could be a third factor influencing both, or the relationship could be bidirectional.
-   **Omitted Variable Bias:** The exclusion of other relevant variables could bias the estimated coefficients for GINI and CPI.

**Refined Future Research Directions:**

-   Explore **time fixed effects** in addition to country fixed effects (two-way fixed effects) to control for common shocks affecting all countries in a given year.
-   Investigate longer or distributed lag models if data permits.
-   Consider alternative estimation techniques for dynamic panel data if theoretical reasons suggest endogeneity or if lagged dependent variables are included as predictors.
-   Qualitative case studies or comparative analyses of smaller groups of countries might help uncover context-specific mechanisms that aggregate quantitative models might obscure.

## 9. Next Steps for the Project

The primary analytical phase in Python, covering Pooled OLS and Country Fixed Effects Panel Data models, is now complete. The findings suggest that an increase in the CPI may correlate with increased political polarisation, suggesting that after controlling for stable country differences, the direct linear impact of changes in GINI and CPI on changes in political polarisation is limited within this analytical framework.

The next step involves using the generated dataset (`country_year_full_analysis_data.csv` in the `output` folder) for visualisation and further exploration in a tool like **Power BI**. 
