In [1]:
import pandas as pd
df = pd.read_csv(filepath_or_buffer='/kaggle/input/the-economists-big-mac-index/big-mac-adjusted-index.csv', parse_dates=['date'])
df['year'] = df['date'].dt.year
df.head()

Unnamed: 0,date,iso_a3,currency_code,name,local_price,dollar_ex,dollar_price,GDP_bigmac,adj_price,USD,EUR,GBP,JPY,CNY,year
0,2000-04-01,ARG,ARS,Argentina,2.5,1.0,2.5,7803.328512,1.922652,0.39117,,-0.06626,0.10096,0.97153,2000
1,2000-04-01,AUS,AUD,Australia,2.59,1.68,1.541667,29144.876973,2.30155,-0.28335,,-0.51898,-0.43285,0.01563,2000
2,2000-04-01,BRA,BRL,Brazil,2.95,1.79,1.648045,4822.738983,1.869734,-0.05696,,-0.36704,-0.25369,0.33645,2000
3,2000-04-01,GBR,GBP,Britain,1.9,0.632911,3.002,20932.924968,2.155755,0.48988,,0.0,0.17908,1.11143,2000
4,2000-04-01,CAN,CAD,Canada,2.85,1.47,1.938776,26087.329235,2.247266,-0.07698,,-0.38047,-0.26953,0.30809,2000


In [2]:
# how many years of data do we have and how many datapoints per year?
from plotly.express import histogram
histogram(data_frame=df, x='year')

In [3]:
histogram(data_frame=df, x='adj_price', facet_col='year', facet_col_wrap=5)

Adjusted prices are not Gaussian, and they movve more or less in unison through 2012 and then sort of spread out afterward.

In [4]:
from plotly.express import choropleth
choropleth(data_frame=df[df['year'] == 2022], locations='iso_a3', color='adj_price', hover_name='name', title='2022 global Big Mac adjusted prices')

I think we would like to get an idea of how Big Mac prices cluster or diverge over time on a year x country basis. A big line plot seems like a good idea.

In [5]:
from plotly.express import line
line(data_frame=df, x='date', y='adj_price', color='name')

We can easily see the general trend and the divergence from it over time; let's add a scatter plot with a non-linear trendline.

In [6]:
from plotly.express import scatter
scatter(data_frame=df, x='date', y='adj_price', color='name', trendline='lowess', trendline_scope='overall')

It can be hard to get a feel for dollar prices and even harder to grasp GDP-adjusted prices, since strictly speaking nobody eats a dollar burger unless they're in a country that uses the dollar, Let's make some plots to try to get a feel for what these numbers might tell us.

In [7]:
mean_df = df[['name', 'adj_price']].groupby(by=['name']).mean().reset_index().sort_values(by='adj_price').rename(axis=1, mapper={'adj_price': 'mean'})
scatter(data_frame=mean_df, x='name', y='mean', title = 'Mean GDP-adjusted price')

This plot attempts to remove the time component from the GDP adjusted prices to see what countries have the most expensive burgers. It is probably not surprising that this sort of separates countries with expensive labor from countries with cheaper labor.

In [8]:
from numpy import std
std_df = df[['name', 'adj_price']].groupby(by=['name']).agg(std, ddof=0).reset_index().sort_values(by='adj_price').rename(axis=1, mapper={'adj_price': 'std'})
scatter(data_frame=std_df, x='name', y='std', title = 'GDP-adjusted price std (volatility)')

This graph gives us the standard deviation (from the mean), so in a sense it is a measure of volatility of burger prices. It is a little surprising to see how these shake out.

In [9]:
dollar_df = df[['name', 'dollar_price']].groupby(by=['name',]).mean().reset_index().sort_values(by='dollar_price').rename(axis=1, mapper={'dollar_price': 'USD'})
metrics_df = mean_df.merge(right=std_df, on='name', how='inner').merge(right=dollar_df, on='name', how='inner')
scatter(data_frame=metrics_df, x='mean', y='std', hover_name='name', color='USD')

If we plot the mean against the volatility we clearly see the low-volatility countries separate themselves, and how mean and standard deviation are not highly correlated, even though we don't see low-mean, high-deviation countries.

In [10]:
metrics_df[['mean', 'std']].corr()

Unnamed: 0,mean,std
mean,1.0,0.082205
std,0.082205,1.0


In fact they're almost uncorrelated.

In [11]:
scatter(data_frame=metrics_df, x='mean', y='USD', color='std', hover_name='name', trendline='lowess', )

Surprisingly when we swap the dollar price and adjusted price stddev we see that some of the low-stddev countries cluster and some of the higher-stddev countries cluster, suggesting that something about GDP changes over time drives adjusted price volatility.