In [1]:
import pandas as pd

DATA = '/kaggle/input/trachoma-cause/population-at-risk-of-trachoma new.csv'

df = pd.read_csv(filepath_or_buffer=DATA)
df.columns = ['Entity', 'Year', 'people treated']
df.head()

Unnamed: 0,Entity,Year,people treated
0,Africa,2005,15173419
1,Asia,2005,1387641
2,Brazil,2005,25000
3,Cambodia,2005,6000
4,Ethiopia,2005,2618488


Let's look at large multi-national entities first.

In [2]:
from plotly import express
import warnings

warnings.filterwarnings(action='ignore', category=FutureWarning)

entities = ['Africa', 'Asia', 'World', 'South America', 'Oceania', 'North America', 'Europe']

express.line(data_frame=df[df['Entity'].isin(entities)], x='Year', y='people treated', color='Entity', log_y=False)

Almost all of the world's treatments are in Africa. If we look at these in a log plot we see that there are sometimes a few million elsewhere.

In [3]:
express.scatter(data_frame=df[df['Entity'].isin(entities)], x='Year', y='people treated', color='Entity', log_y=True)

Let's have a look at the longitudinal data for each country.

In [4]:
express.line(data_frame=df[~df['Entity'].isin(entities)], x='Year', y='people treated', color='Entity', log_y=False)

The bulk of the treatments are in Ethiopia. Let's look at the total data. Since we have so many countries let's just look at the top countries.

In [5]:
TOP_N = 50

express.histogram(data_frame=df[~df['Entity'].isin(entities)][['Entity', 'people treated']].groupby(by=['Entity']).sum().reset_index().sort_values(ascending=False, by='people treated').head(n=TOP_N),
                  x='Entity', y='people treated', log_y=False)


We need a log plot to see the countries on the small end of the plot.

In [6]:
express.histogram(data_frame=df[~df['Entity'].isin(entities)][['Entity', 'people treated']].groupby(by=['Entity']).sum().reset_index().sort_values(ascending=False, by='people treated').head(n=TOP_N),
                  x='Entity', y='people treated', log_y=True)

Only a handful of countries outside Africa are in the top 50.