# Analysing the Effect of Longitude and Latitude on European Inflation Dynamics

Liam Kane, Valentin Leuthard, Panagiotis Patsias, & Liam Tessendorf 

## Abstract

This project analyzes European inflation dynamics with a focus on the effects of longitude and latitude on inflation rates. By examining spatial patterns across different countries, the study aims to uncover how geographic location influences economic indicators within Europe. Utilizing data visualization and statistical modeling, the project provides insights into regional inflation trends and their relationship with geographical coordinates.

## Prelims

In [69]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from statsmodels.formula.api import ols
import ipywidgets as widgets
from IPython.display import display, clear_output
import plotly.graph_objects as go
import plotly.io as pio

pio.renderers.default = "iframe_connected"

## Data

For our analysis we used data on European Inflation rates from the [Organisation for Economic Co-operation and Development](https://data-explorer.oecd.org/vis?tm=inflation&pg=0&snb=50&vw=tb&df%5Bds%5D=dsDisseminateFinalDMZ&df%5Bid%5D=DSD_PRICES%40DF_PRICES_HICP&df%5Bag%5D=OECD.SDD.TPS&df%5Bvs%5D=1.0&dq=HRV%2BBGR%2BTUR%2BGBR%2BCHE%2BSVN%2BSWE%2BESP%2BSVK%2BPRT%2BPOL%2BNOR%2BNLD%2BLUX%2BLTU%2BLVA%2BIRL%2BITA%2BISL%2BHUN%2BGRC%2BDEU%2BFRA%2BFIN%2BEST%2BDNK%2BBEL%2BCZE%2BAUT.M.HICP.CPI.PA._T.N.GY&to%5BTIME_PERIOD%5D=false&pd=2000-01%2C2024-10). It contains Inflation data of 29 European countries. For most countries, there is data from 2000 until September 2024, but for some, such as Switzerland, there is less. For this reason, we removed the missing timeperiods for all countries which results in a dataset of inflation rates from December 2005 until September 2024. For more insights into this data source, please see [0.02-lte-oced-european-inflation-rates-1.ipynb](./notebooks/0.02-lte-oced-european-inflation-rates-1.ipynb). 

For the longitude and latitude data, we used Googles [countries.csv](https://developers.google.com/public-data/docs/canonical/countries_csv). It provides longitude and latitude data for all countries. While processing this dataset we remove all countries that are not in europe, and thus not of interest.

In order to run this notebook interactively, please run the `make data` command in the command line while situated in the root directory of this project, if not already done. This will download all requirements and create the processed data from the external data. Then we are ready to read the processed files.

In [70]:
df_inflation = pd.read_feather("../data/processed/inflation-data-clean.ftr")
df_long_lat = pd.read_feather("../data/processed/countries-with-long-lat-data.ftr")

Below cell allows you to choose a country for which you want to see the timeseries of the inflation rates.

In [78]:
df_inflation["TIME_PERIOD"] = pd.to_datetime(df_inflation["TIME_PERIOD"], format="%Y-%m")

df_inflation["Year"] = df_inflation["TIME_PERIOD"].dt.year

df_inflation.sort_values(by="TIME_PERIOD", inplace=True)

min_year = df_inflation["Year"].min()
max_year = df_inflation["Year"].max()

country_dropdown = widgets.Dropdown(
    options=df_inflation["Reference area"].unique(),
    description="Country:",
    value=df_inflation["Reference area"].unique()[0],
)

year_slider = widgets.IntRangeSlider(
    value=[min_year, max_year],
    min=min_year,
    max=max_year,
    step=1,
    description="Year Range:",
    continuous_update=False,
    orientation="horizontal",
    readout=True,
    readout_format="d",
)

output = widgets.Output()


def update_plot(change):
    with output:
        clear_output(wait=True)
        selected_country = country_dropdown.value
        start_year, end_year = year_slider.value
        country_data = df_inflation[
            (df_inflation["Reference area"] == selected_country)
            & (df_inflation["Year"] >= start_year)
            & (df_inflation["Year"] <= end_year)
        ]
        if country_data.empty:
            print("No data available for the selected country and year range.")
        else:
            plt.figure(figsize=(10, 5))
            plt.plot(country_data["TIME_PERIOD"], country_data["OBS_VALUE"], marker="o")
            plt.title(
                f"Inflation Rate Time Series for {selected_country} ({start_year}-{end_year})"
            )
            plt.xlabel("Time Period")
            plt.ylabel("Inflation Rate (%)")
            plt.grid(True)
            plt.show()


country_dropdown.observe(update_plot, names="value")
year_slider.observe(update_plot, names="value")

display(country_dropdown, year_slider, output)

update_plot(None)

Dropdown(description='Country:', options=('Italy', 'Slovak Republic', 'Czechia', 'Norway', 'Sweden', 'Greece',…

IntRangeSlider(value=(2005, 2024), continuous_update=False, description='Year Range:', max=2024, min=2005)

Output()

## Feature Engineering

We will calculate the mean inflation rate per country for our analysis and merge it with the longitude latitude dataframe.

In [72]:
df_inflation_aggregated = (
    df_inflation.groupby("Reference area").agg({"OBS_VALUE": "mean"}).reset_index()
)
df_inflation_aggregated.rename(columns={"Reference area": "name"}, inplace=True)

df_merged = pd.merge(df_inflation_aggregated, df_long_lat, on="name", how="inner")

## Results

### Linear Regression

To see if longitude and latitude have an effect on the inflation rate of european countries, we run three linear regressions: 

1. Regression of inflation rate on latitude
2. Regression of inflation rate on longitude
3. Regression of inflation rate on longitude and latitude

In [73]:
lm_lat = ols("OBS_VALUE ~ latitude", data=df_merged).fit()
print(lm_lat.summary())

                            OLS Regression Results                            
Dep. Variable:              OBS_VALUE   R-squared:                       0.034
Model:                            OLS   Adj. R-squared:                 -0.002
Method:                 Least Squares   F-statistic:                    0.9382
Date:                Tue, 26 Nov 2024   Prob (F-statistic):              0.341
Time:                        17:36:13   Log-Likelihood:                -72.182
No. Observations:                  29   AIC:                             148.4
Df Residuals:                      27   BIC:                             151.1
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      7.1080      4.023      1.767      0.0

In [74]:
lm_long = ols("OBS_VALUE ~ longitude", data=df_merged).fit()
print(lm_long.summary())

                            OLS Regression Results                            
Dep. Variable:              OBS_VALUE   R-squared:                       0.188
Model:                            OLS   Adj. R-squared:                  0.158
Method:                 Least Squares   F-statistic:                     6.255
Date:                Tue, 26 Nov 2024   Prob (F-statistic):             0.0188
Time:                        17:36:13   Log-Likelihood:                -69.656
No. Observations:                  29   AIC:                             143.3
Df Residuals:                      27   BIC:                             146.0
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      1.9919      0.719      2.769      0.0

In [75]:
lm_comb = ols("OBS_VALUE ~ latitude + longitude", data=df_merged).fit()
print(lm_comb.summary())

                            OLS Regression Results                            
Dep. Variable:              OBS_VALUE   R-squared:                       0.205
Model:                            OLS   Adj. R-squared:                  0.144
Method:                 Least Squares   F-statistic:                     3.362
Date:                Tue, 26 Nov 2024   Prob (F-statistic):             0.0503
Time:                        17:36:13   Log-Likelihood:                -69.342
No. Observations:                  29   AIC:                             144.7
Df Residuals:                      26   BIC:                             148.8
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      4.8369      3.838      1.260      0.2

You can use below slider to vary Longitude and Latitude values to see what Inflation Rate the OLS model predicts.

In [76]:
latitude_slider = widgets.FloatSlider(
    value=df_merged["latitude"].mean(),
    min=df_merged["latitude"].min(),
    max=df_merged["latitude"].max(),
    step=0.1,
    description="Latitude:",
    continuous_update=False,
)

longitude_slider = widgets.FloatSlider(
    value=df_merged["longitude"].mean(),
    min=df_merged["longitude"].min(),
    max=df_merged["longitude"].max(),
    step=0.1,
    description="Longitude:",
    continuous_update=False,
)

output = widgets.Output()


def update_prediction(change):
    with output:
        clear_output(wait=True)
        lat = latitude_slider.value
        lon = longitude_slider.value
        input_df = pd.DataFrame({"latitude": [lat], "longitude": [lon]})
        predicted_value = lm_comb.predict(input_df)
        print(f"Predicted OBS_VALUE (Inflation Rate): {predicted_value.iloc[0]:.2f}")


latitude_slider.observe(update_prediction, names="value")
longitude_slider.observe(update_prediction, names="value")

display(latitude_slider, longitude_slider, output)

update_prediction(None)

FloatSlider(value=50.299762724137935, continuous_update=False, description='Latitude:', max=64.963051, min=38.…

FloatSlider(value=11.850222724137932, continuous_update=False, description='Longitude:', max=35.243322, min=-1…

Output()

Below 3D Regression plane visualizes the linear regression nicely.

In [79]:
output = widgets.Output()


def update_plot(change):
    with output:
        clear_output(wait=True)
        lat_range = np.linspace(df_merged["latitude"].min(), df_merged["latitude"].max(), 10)
        lon_range = np.linspace(df_merged["longitude"].min(), df_merged["longitude"].max(), 10)
        lat_grid, lon_grid = np.meshgrid(lat_range, lon_range)
        grid_df = pd.DataFrame({"latitude": lat_grid.ravel(), "longitude": lon_grid.ravel()})
        obs_pred = lm_comb.predict(grid_df)
        obs_pred_grid = obs_pred.values.reshape(lat_grid.shape)
        fig = go.Figure()
        fig.add_trace(
            go.Scatter3d(
                x=df_merged["latitude"],
                y=df_merged["longitude"],
                z=df_merged["OBS_VALUE"],
                mode="markers",
                marker=dict(size=5, color="blue"),
                name="Data Points",
            )
        )
        fig.add_trace(
            go.Surface(
                x=lat_grid,
                y=lon_grid,
                z=obs_pred_grid,
                colorscale="Viridis",
                opacity=0.6,
                name="Regression Plane",
            )
        )
        fig.update_layout(
            scene=dict(
                xaxis_title="Latitude",
                yaxis_title="Longitude",
                zaxis_title="OBS_VALUE (Inflation Rate)",
            ),
            title="3D Regression Plane and Data Points",
        )
        fig.show()


display(output)

update_plot(None)

Output()

## Discussion

This regression analysis investigates the relationship between inflation rates (dependent variable) and geographical coordinates (latitude and longitude) for European countries. The model is based on Ordinary Least Squares (OLS) regression.

### Key Findings
- **R-squared (0.205)**: 20.5% of the variation in inflation rates is explained by the model, indicating limited explanatory power.
- **Latitude**: Insignificant (p = 0.457). No strong north-south effect on inflation.
- **Longitude**: Significant (p = 0.025). Suggests inflation rates increase moving eastward. This makes sense as Turkey has a very high inflation rate.

### Model Diagnostics
- **F-statistic (p = 0.0503)**: The model is borderline significant.

### Implications
Longitude's significance highlights possible east-west economic trends, while latitude shows no influence. The low R-squared suggests geographic data alone is insufficient to model inflation, and other economic factors should be included.

## Conclusion
Geographical coordinates play a minor role in explaining inflation rates, with longitude showing some regional significance. Future models should integrate additional variables for better insight.