In [1]:
# pandas and plotly don't pay nice only complain instead of failing
from warnings import filterwarnings
filterwarnings(action='ignore', category=FutureWarning)

In [2]:
import pandas as pd

INFLATION = '/kaggle/input/global-inflation-data/global_inflation_data.csv'
# our data has rows for countries and columns for years but our line plot needs the data the other way around
df = pd.read_csv(filepath_or_buffer=INFLATION).drop(columns=['indicator_name']).T.reset_index()
# when we take the transpose the country names are in the first row, and we need to fix that
# https://stackoverflow.com/a/26147330
df.columns = df.iloc[0]
df = df.tail(n=len(df)-1).astype(float)
df.head()

Unnamed: 0,country_name,Afghanistan,Albania,Algeria,Andorra,Angola,Antigua and Barbuda,Argentina,Armenia,Aruba,...,United States,Uruguay,Uzbekistan,Vanuatu,Venezuela,Vietnam,West Bank and Gaza,Yemen,Zambia,Zimbabwe
1,1980.0,13.4,,9.7,,46.7,19.0,,,,...,13.5,63.5,,11.2,21.4,25.2,,,11.7,
2,1981.0,22.2,,14.6,,1.4,11.5,,,,...,10.4,34.0,,26.8,16.2,69.6,,,14.0,5.6
3,1982.0,18.2,,6.6,,1.8,4.2,,,,...,6.2,19.0,,6.7,9.6,95.4,,,12.5,0.6
4,1983.0,15.9,,7.8,,1.8,2.3,,,,...,3.2,49.2,,1.7,6.2,49.5,,,19.7,-8.5
5,1984.0,20.4,,6.3,,1.8,3.8,,,,...,4.4,55.3,,5.5,12.2,64.9,,,20.0,-1.9


Inflation is a funny thing: we don't know what an ideal level of inflation is, but we do know that very large annual inflation is bad and either leads to or accompanies social unrest. So we can investigate this data and look for how inflation rates change during periods of upheaval and whether more stable countries have higher or lower inflation rates, and we can ask whether inflation is trending up or down on a global or regional basis over time.

In [3]:
from plotly.express import line
line(data_frame=df, x=df.columns[0], y=df.columns[1:], height=900, log_y=False)

The spikes make most of our data look like zeros. The spikes tell a story, though, so it's worth keeping this graph. Let's use a log plot in the Y (inflation) direction to make the rest of the data visible.

In [4]:
line(data_frame=df, x=df.columns[0], y=df.columns[1:], height=900, log_y=True)

This plot makes our outliers - countries having a bad year or a sequence of bad years - stand out at the top of the graph; most countries, interestingly, seem to cluster around a line that isn't flat and is somewhere between 2% and 20% annually. Let's find out what that line is.

In [5]:
from plotly.express import scatter
median_df = pd.DataFrame(data={'year': df[df.columns[0]].values.tolist(), 'median rate': df[df.columns[1:]].median(axis=1).tolist()})
scatter(data_frame=median_df, x='year', y='median rate', log_y=False, trendline='lowess')

Because our outliers are so large we get a better sense of the average country by looking at the median inflation rate across countries. The median inflation rate is still pretty volatile, but its moving average trended downward over the period of interest. This is consistent with our naive read of the messy graph above.

Before we go any further it is important to note that our mean and median inflation rates mean very different things: the median inflation rate is the median rate across all countries for that year, but it is the rate of a particular country, while the mean rate is the unweighted average of values from many countries of very different sizes; in other words, the mean does not tell us anything about the inflation rate across the world. We would need some sort of size-weighted data, weighted by the size of each country's economy in people, or GDP, or some other sensible metric to do that.

In [6]:
mean_df = pd.DataFrame(data={'year': df[df.columns[0]].values.tolist(), 'mean rate': df[df.columns[1:]].mean(axis=1).tolist()})
mean_median_df = mean_df.merge(right=median_df, on='year', how='inner')
line(data_frame=mean_median_df, x='year', y=['mean rate', 'median rate'], log_y=True)

Plotting the mean and median together tells us a couple of things:
* The years of the Global Financial Crisis (2007-2010) were rough all over, and the median and mean moved together.
* The last years of the Cold War were rough all over, and the first few years after the end of the Cold War were rough on the new former Soviet republics.
* Venezuela is in a class by itself, with a 2018 rate that moved the mean substantially but made no apparent impact on the median rate.
* The post-COVID years were rough all over


In [7]:
scatter(data_frame=mean_median_df, x='mean rate', y='median rate', log_x=True, log_y=True, color='year')

We should always be careful when plotting two things against each other when one is not a function of the other; also, scatter plots like this can be difficult to interpret. But the takeaway here is that earlier years had higher mean/median inflation, middle years had lower mean/median inflation, and recent years have been rather volatile year over year.

Now let's look at the mean inflation on a per-country basis, along with the standard deviation of inflation on a per-country basis as a proxy for inflation volatility. It is important to note that the standard deviation treats year over year rises and falls similarly, while we would not generally be indifferent to the difference between a rise and fall in inflation.

In [8]:
scatter_df = pd.DataFrame(data= {'country' : df.columns.tolist()[1:], 'mean': df[df.columns[1:]].mean(axis=0).values.tolist(),
                                 'stdev' : df[df.columns[1:]].std(axis=0).values.tolist(), } )
scatter(data_frame=scatter_df, x='mean', y='stdev', hover_name='country', log_x=True, log_y=True, trendline='lowess')

If we view the mean/stdev data on a log-log plot we see that countries with high inflation tend to have high inflation 'volatility,' which is not surprising, and that they cluster pretty closely to a fairly straight LOWESS trendline, which is kind of surprising. 

In [9]:
ISO = '/kaggle/input/country-mapping-iso-continent-region/continents2.csv'
iso_df = pd.read_csv(filepath_or_buffer=ISO, usecols=['name', 'alpha-3', 'region', 'sub-region', ])
scatter_iso_df = scatter_df.merge(right=iso_df, left_on='country', right_on='name', how='inner').drop(columns=['name'])
scatter(data_frame=scatter_iso_df, x='mean', y='stdev', hover_name='country', text='alpha-3', log_x=True, log_y=True, height=800, marginal_x='box').update_traces(marker={'size': 1})

Using ISO-3 text labels we can make this data easier to understand at first glance provided our audience knows ISO-3 codes. The low-low region is still kind of difficult to understand, though.

In [10]:
scatter(data_frame=scatter_iso_df[scatter_iso_df['mean'] < 10], x='mean', y='stdev', hover_name='country', text='alpha-3', log_x=True, log_y=True, height=800, marginal_x='box').update_traces(marker={'size': 1})

Really the only way to make that part of the graph legible is to devote a graph to it; now we can see individual countries more easily. Now we can see that the countries with low inflation and low infation variability are generally Islamic countries or microstates, Switzerland and Japan being the exceptions.

In [11]:
scatter(data_frame=scatter_iso_df[scatter_iso_df['mean'] < 10], x='mean', y='stdev', hover_name='country',  color='sub-region', log_x=True, log_y=True, height=800, ).update_traces(marker={'size': 5})

Adding subregion data tells us at a glance that we see some clustering on a geographic basis; unfortunately adding back in the text labels is too much clutter so we have to put the country names in the hover data. 

It's easy to see that sub-Saharan African countries cluster together; and we see some interesting pairs (e.g. Luxembourg/Belgium), but generally no big stories leap out at us.