<a href="https://www.kaggle.com/code/mikedelong/male-height-eda-with-lots-of-plots?scriptVersionId=160537049" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [1]:
import pandas as pd

HEIGHT = '/kaggle/input/male-height-dataset/height_dataset.csv'
REGION = '/kaggle/input/country-mapping-iso-continent-region/continents2.csv'

df = pd.read_csv(filepath_or_buffer=HEIGHT, decimal=',', sep=';').drop(columns=['ccode'])
df = df.merge(right=pd.read_csv(filepath_or_buffer=REGION, usecols=['name', 'alpha-3', 'region', 'sub-region']), right_on='name', left_on='country.name').drop(columns=['country.name'])

df.head()

Unnamed: 0,year,value,name,alpha-3,region,sub-region
0,1550,167.8,Germany,DEU,Europe,Western Europe
1,1650,169.8,Germany,DEU,Europe,Western Europe
2,1710,164.0,Germany,DEU,Europe,Western Europe
3,1720,163.7,Germany,DEU,Europe,Western Europe
4,1730,163.8,Germany,DEU,Europe,Western Europe


In [2]:
df.nunique()

year           37
value         695
name          145
alpha-3       145
region          5
sub-region     15
dtype: int64

In [3]:
from plotly.express import line
line(data_frame=df, x='year', y='value', color='name', height=2400, facet_col='region', facet_col_wrap=1)

In a sense we want to see all the data together in a single graph, but that graph is incomprehensible; splitting the data somewhat arbitrarily by region shows that:
1. We don't have data from every country from every year
2. Generally men are getting taller over time.

In [4]:
line(data_frame=df[['year', 'value']].groupby(by='year').mean().reset_index(), x='year', y='value')

Obviously if we are averating a different number of values at every point in time we will see impacts on the mean due to the number of values.

In [5]:
from plotly.express import scatter
scatter(data_frame=df, x='year', y='value', hover_name='name', color='region', trendline='lowess')

This is probably a better plot than the global average, as different regions have data available for different ranges of years. Maybe the really striking thing here is how dramatically European men seem to have gotten taller since the Industrial Revolution.

In [6]:
from plotly.express import scatter
scatter(data_frame=df[['year', 'value']].groupby(by='year').mean().reset_index(), x='year', y='value', trendline='lowess')

This is probably the nut graf regarding global means as it implies that men have been getting taller almost monotonically since the beginning of the Industrial Revolution.

In [7]:
from plotly.express import scatter
scatter(data_frame=df[['name', 'value', 'region']].groupby(by=['name', 'region']).mean().reset_index().rename(columns={'value': 'mean'}).merge(right=df[['name', 'region', 'value']].groupby(by=['name', 'region']).std().reset_index().rename(columns={'value': 'stdev'}), on=['name','region']),
        x='mean', y='stdev', hover_name='name', color='region')

This plot has some drawbacks because it treats countries with longer time series the same as countries with shorter time series, which is probably not sensible, but it shows that 
1. Men in Asian countries are generally shorter than men outside Asia
2. Men in Africa and Europe are similar heights over time, but European men have heights that vary more.

In [8]:
df['year'].value_counts().head(n=5)

year
1980    90
1950    90
1960    89
1970    88
1930    86
Name: count, dtype: int64

Let's look at the data for 1980 since that is the one where we have the most complete dataset.

In [9]:
from plotly.express import choropleth
choropleth(data_frame=df[df['year'] == 1980], locations='alpha-3', color='value')

When we use a colorbar that puts the extreme values of the data at the far ends we probably get a view that exaggerates the absolute differences between countries.

In [10]:
from plotly.express import histogram
histogram(data_frame=df[df['year'] == 1980].sort_values(by='region'), y='name', x='value', height=1500, color='sub-region')

A histogram shows all the data for 1980 in context of the region/subregion and does not overstate the difference from country to country.