Let's load up our data and do a little cleanup; we have some odd usage of whitespace in our column names.

In [1]:
import pandas as pd

LIFE = '/kaggle/input/life-expectancy/life_expectancy.csv'
df = pd.read_csv(filepath_or_buffer=LIFE)
df.columns = [item.strip().replace('  ', ' ') for item in df.columns]
df.head()

Unnamed: 0,Country,Sum of Females Life Expectancy,Sum of Life Expectancy (both sexes),Sum of Males Life Expectancy
0,Chad,57.19,55.24,53.36
1,Nigeria,54.94,54.64,54.33
2,South Sudan,60.75,57.74,54.76
3,Lesotho,60.44,57.8,55.03
4,Central African Republic,59.56,57.67,55.51


The obvious thing to do here is to make a scatter plot, so let's do that.

We have some data quality issues with Micronesia, so we're going to omit Micronesia.

In [2]:
from plotly import express
from plotly import io

io.renderers.default = 'iframe'
express.scatter(data_frame=df[df['Country'] != 'Micronesia'], x='Sum of Females Life Expectancy', y='Sum of Males Life Expectancy',
                hover_name='Country', color='Sum of Life Expectancy (both sexes)', trendline='ols')

The slope of this linear trendline shows that women tend to live longer than men. Are there any countries where men live longer than women?

In [3]:
df['sex difference'] = df['Sum of Males Life Expectancy'] - df['Sum of Females Life Expectancy']
express.histogram(data_frame=df, x='sex difference')

The short answer appears to be no; also, it's probably not surprising that the difference is roughly normally distributed. Let's make a scatter plot and see which countries come closest.

In [4]:
express.scatter(data_frame=df[df['Country'] != 'Micronesia'],  y='sex difference', hover_name='Country', x='Sum of Life Expectancy (both sexes)', color='Sum of Females Life Expectancy')

Is it surprising that the countries with both large life expectancy and low sex differences are Bahrain, Qatar, the UAE, Norway, and Iceland?