Dataset Links:
1. [Obesity Among Adults by Country (1975-2016)](https://www.kaggle.com/amanarora/obesity-among-adults-by-country-19752016/)
2. [GDP per Person (1901-2011)](https://www.kaggle.com/divyansh22/gdp-per-person-19012011?select=GDP.csv)

### 0. Import libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px

### 1. Obesity Among Adults by Country (1975-2016)

#### Uploading data file

In [None]:
df_obesity = pd.read_csv("obesity_cleaned.csv", index_col=0)
df_obesity

#### Understanding data type

In [None]:
df_obesity.info()

Obesity should appear as float, but it's Dtype object (str)

#### Cleaning data and formatting correctly

In [None]:
df_obesity['Obesity (%)'].value_counts()

In [None]:
df_obesity['Obesity'] = df_obesity['Obesity (%)'].apply(lambda x: x.split()[0])

df_obesity.loc[df_obesity['Obesity'] == 'No', 'Obesity'] = np.nan
df_obesity = df_obesity.dropna()
display(df_obesity['Obesity'].value_counts())

In [None]:
df_obesity['Obesity'] = df_obesity.loc[:, 'Obesity'].apply(lambda x: float(x))
df_obesity.info()

In [None]:
df_obesity.set_index('Year', inplace=True)

#### Exploring data

- What was the average percentage of obesity by sex in the world in 2015?

In [None]:
df = df_obesity[['Sex', 'Obesity']]
df = df[df.index == 2015]
df.groupby('Sex').mean()

- Which are the 5 countries with the highest and lowest rate of increase in obesity rates over the observed period?

In [None]:
df_start = df_obesity[df_obesity.index == 1975]
df_start.set_index('Country', inplace=True)

df_end = df_obesity[df_obesity.index == 2016]
df_end.set_index('Country', inplace=True)

In [None]:
df_ev = df_end[df_end['Sex'] == 'Both sexes']['Obesity']  - df_start[df_start['Sex'] == 'Both sexes']['Obesity']
df_ev.sort_values().dropna().head(5)

In [None]:
df_ev.sort_values().dropna().tail(5)

- Which countries had the highest and lowest percentages of obesity in 2015?

In [None]:
df = df_obesity[df_obesity.index == 2015]
df = df[['Country', 'Obesity']]
df.groupby('Country').mean().sort_values('Obesity', ascending=False)

In [None]:
df.groupby('Country').mean().sort_values('Obesity')

- What is the average percentage difference in obesity between the sexes over the years in Brazil?

In [None]:
df = df_obesity[df_obesity['Country'] == 'Brazil']
df_diff = df[df['Sex'] == 'Female']['Obesity'] - df[df['Sex'] == 'Male']['Obesity']
df_diff.plot()

- Plot a graph showing the evolution of obesity for both sexes in the world

In [None]:
df = df_obesity[df_obesity['Sex'] == 'Both sexes']
df.groupby('Year')['Obesity'].mean().plot()

### 2. GDP per Person (1901-2011)

#### Uploading data file

In [None]:
df_gdp = pd.read_csv("GDP.csv", thousands=",", decimal=".")
df_gdp

#### Understanding data type

In [None]:
df_gdp.info()

Year should appear as int, but it's Dtype object (str)

#### Cleaning data and formatting correctly

In [None]:
df_gdp['Year'] = pd.to_datetime(df_gdp['Year']).dt.year
df_gdp

In [None]:
df_gdp.info()

#### Exploring data

- What is the first value recorded for each country?

In [None]:
df_gdp.sort_values(['Year', 'Country']).drop_duplicates(subset='Country')[['Country', ' GDP_pp ']]

- Name the regions with the highest growth in GDP per capita in the last century.

In [None]:
df_gdp[df_gdp['Year'] < 2000].max()

In [None]:
df_start = df_gdp[df_gdp['Year'] == 1901]
df_end = df_gdp[df_gdp['Year'] == 1996]

In [None]:
(((df_end.groupby('Region')[' GDP_pp '].mean() / df_start.groupby('Region')[' GDP_pp '].mean()) - 1)*100).sort_values()

- Fill in the missing years in each country with an estimate, based on the difference between the next record and the previous one.

In [None]:
df_gdp

In [None]:
arr_year = np.arange(df_gdp['Year'].min(), df_gdp['Year'].max())
df_all_years = pd.DataFrame(arr_year, columns=['Year'])
df_all_years.index = df_all_years['Year']
df_all_years

In [None]:
df_years_off = ~df_all_years['Year'].isin(df_gdp['Year'])
df_years_off

In [None]:
df_years_off = df_all_years.loc[df_years_off].index
df_years_off

In [None]:
df_gdp = df_gdp.sort_values(['Country', 'Year'])

df_gdp['Delta_gdp'] = df_gdp[' GDP_pp '] - df_gdp[' GDP_pp '].shift(1)
df_gdp['Delta_year'] = df_gdp['Year'] - df_gdp['Year'].shift(1)

df_gdp['gdp_year'] = (df_gdp['Delta_gdp']/df_gdp['Delta_year']).shift(-1)

df_gdp

In [None]:
df_gdp['next_year'] = df_gdp['Year'].shift(-1)
del df_gdp['Delta_gdp'], df_gdp['Delta_year']

df_gdp

In [None]:
df_new_data = pd.DataFrame()

for idx, row in df_gdp.iterrows():
    if row['Year'] == 2011:
        continue

    years_to_add = df_years_off[(df_years_off > row['Year']) & (df_years_off < row['next_year'])]
    
    for new_year in years_to_add:
        add_row = row.copy()
        add_row[' GDP_pp '] = (new_year - add_row['Year']) * add_row['gdp_year'] + add_row[' GDP_pp ']
        add_row['Year'] = new_year
        add_row['kind'] = 'estimated'
        df_new_data = pd.concat([df_new_data, add_row.to_frame().transpose()])

df_new_data

In [None]:
df_gdp = pd.concat([df_gdp, df_new_data])
df_gdp.sort_values(['Country', 'Year'], inplace=True)
df_gdp.index = df_gdp['Year']
df_gdp['kind'].fillna('real', inplace=True)
df_gdp

- Checking if the estimate is consistent

In [None]:
fig, ax = plt.subplots(figsize=(20, 5))

country = 'Brazil'
df_gdp[(df_gdp['kind'] == 'real') & (df_gdp['Country'] == country)].plot(kind='scatter', y=' GDP_pp ', x='Year', ax=ax)
df_gdp[(df_gdp['kind'] == 'estimated') & (df_gdp['Country'] == country)].plot(kind='scatter', y=' GDP_pp ', x='Year', ax=ax, color = 'orange')

### 3. Comparing Both Datasets

- Create a map of GDP or obesity in the world over the years

In [None]:
df_gdp['Year'] = df_gdp['Year'].astype(int)
df_gdp[' GDP_pp '] = df_gdp[' GDP_pp '].astype(float)

In [None]:
df = px.data.gapminder()
df

In [None]:
dict_iso_alpha = df.set_index('country').to_dict()['iso_alpha']
dict_iso_alpha

In [None]:
dict_num = {j: i for i, j in enumerate(df_gdp['Country'].unique())}
dict_num

In [None]:
df_gdp['iso_alpha'] = df_gdp['Country'].map(dict_iso_alpha)
df_gdp['iso_num'] = df_gdp['Country'].map(dict_num)
df_gdp

In [None]:
fig = px.choropleth(
    df_gdp[df_gdp['kind'] == 'real'].reset_index(drop=True), 
    locations='iso_alpha', 
    color=' GDP_pp ', 
    hover_name='Country', 
    animation_frame='Year'
)
fig.update_layout(height=600)
fig.show()

- Is there a relationship between obesity levels and GDP per capita?

In [None]:
df_obesity['country-year'] = df_obesity['Country'] + '-' + df_obesity.reset_index()['Year'].apply(lambda x: str(int(x))).values
dict_obesity_year = df_obesity.set_index('country-year').to_dict()['Obesity']
dict_obesity_year

In [None]:
df_gdp['country-year'] = df_gdp['Country'] + '-' + df_gdp['Year'].apply(lambda x: str(int(x))).values
dict_gdp_year = df_gdp.set_index('country-year').to_dict()[' GDP_pp ']
dict_gdp_year

In [None]:
df_gdp['obesity'] = df_gdp['country-year'].map(dict_obesity_year)
df_gdp

In [None]:
df_gdp_clean = df_gdp.dropna()
df_gdp_clean

In [None]:
df_gdp_clean.reset_index(drop=True).groupby('Year')[['obesity', ' GDP_pp ']].mean().corr()