### Finding the relationship between GDP and obesity levels (across 180 countries) in the year 2022 

In [74]:
import plotly.io as pio

pio.renderers.default = "vscode+jupyterlab+notebook_connected"

In [75]:
import pandas as pd

import plotly.express as px


## Step 1 : Loading the datasets
- Loading the datasets for GDP (Gross Domestic Product per capita) and Obesity levels (Prevelance of Obesity in the adult population- 18 and older). 
-  Downloaded from FAO's website for the year 2022, setting 'Element' as 'Value'. 
- Links used:  https://www.fao.org/faostat/en/#data/FS and https://www.fao.org/faostat/en/#data/FS 

In [76]:
df_gdp = pd.read_csv ("gdp.csv")
df_gdp

Unnamed: 0,Domain Code,Domain,Area Code (M49),Area,Element Code,Element,Item Code,Item,Year Code,Year,Unit,Value,Flag,Flag Description,Note
0,FS,Suite of Food Security Indicators,8,Albania,6126,Value,22013,"Gross domestic product per capita, PPP, (const...",2022,2022,Int$/cap,17261.0,X,Figure from international organizations,
1,FS,Suite of Food Security Indicators,12,Algeria,6126,Value,22013,"Gross domestic product per capita, PPP, (const...",2022,2022,Int$/cap,13090.9,X,Figure from international organizations,
2,FS,Suite of Food Security Indicators,20,Andorra,6126,Value,22013,"Gross domestic product per capita, PPP, (const...",2022,2022,Int$/cap,63379.0,X,Figure from international organizations,
3,FS,Suite of Food Security Indicators,24,Angola,6126,Value,22013,"Gross domestic product per capita, PPP, (const...",2022,2022,Int$/cap,7407.1,X,Figure from international organizations,
4,FS,Suite of Food Security Indicators,28,Antigua and Barbuda,6126,Value,22013,"Gross domestic product per capita, PPP, (const...",2022,2022,Int$/cap,27757.2,X,Figure from international organizations,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
178,FS,Suite of Food Security Indicators,860,Uzbekistan,6126,Value,22013,"Gross domestic product per capita, PPP, (const...",2022,2022,Int$/cap,8447.5,X,Figure from international organizations,
179,FS,Suite of Food Security Indicators,548,Vanuatu,6126,Value,22013,"Gross domestic product per capita, PPP, (const...",2022,2022,Int$/cap,3074.1,X,Figure from international organizations,
180,FS,Suite of Food Security Indicators,704,Viet Nam,6126,Value,22013,"Gross domestic product per capita, PPP, (const...",2022,2022,Int$/cap,13102.3,X,Figure from international organizations,
181,FS,Suite of Food Security Indicators,894,Zambia,6126,Value,22013,"Gross domestic product per capita, PPP, (const...",2022,2022,Int$/cap,3610.7,X,Figure from international organizations,


In [77]:
df_obesity = pd.read_csv ("obesity.csv")
df_obesity

Unnamed: 0,Domain Code,Domain,Area Code (M49),Area,Element Code,Element,Item Code,Item,Year Code,Year,Unit,Value,Flag,Flag Description,Note
0,FS,Suite of Food Security Indicators,4,Afghanistan,6121,Value,21042,Prevalence of obesity in the adult population ...,2022,2022,%,19.2,X,Figure from international organizations,
1,FS,Suite of Food Security Indicators,8,Albania,6121,Value,21042,Prevalence of obesity in the adult population ...,2022,2022,%,23.4,X,Figure from international organizations,
2,FS,Suite of Food Security Indicators,12,Algeria,6121,Value,21042,Prevalence of obesity in the adult population ...,2022,2022,%,23.8,X,Figure from international organizations,
3,FS,Suite of Food Security Indicators,16,American Samoa,6121,Value,21042,Prevalence of obesity in the adult population ...,2022,2022,%,75.2,X,Figure from international organizations,
4,FS,Suite of Food Security Indicators,20,Andorra,6121,Value,21042,Prevalence of obesity in the adult population ...,2022,2022,%,18.1,X,Figure from international organizations,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
194,FS,Suite of Food Security Indicators,862,Venezuela (Bolivarian Republic of),6121,Value,21042,Prevalence of obesity in the adult population ...,2022,2022,%,22.7,X,Figure from international organizations,
195,FS,Suite of Food Security Indicators,704,Viet Nam,6121,Value,21042,Prevalence of obesity in the adult population ...,2022,2022,%,2.0,X,Figure from international organizations,
196,FS,Suite of Food Security Indicators,887,Yemen,6121,Value,21042,Prevalence of obesity in the adult population ...,2022,2022,%,13.7,X,Figure from international organizations,
197,FS,Suite of Food Security Indicators,894,Zambia,6121,Value,21042,Prevalence of obesity in the adult population ...,2022,2022,%,11.1,X,Figure from international organizations,


## Step 2 : Cleaning and Filtering the Data Sets
- Based on the steps above, we can identify that both datasets have different number of countries included (183 vs 199 rows. 
- This is a given since all other rows such as 'year' 'description' 'domain' etc are constant, i.e. have consistent values. 
- Therefore, in order to merge the data sets, we need to first filter the data in order to ensure that the countries in both data sets match.
- This can be done in the following way- 

In [78]:
gdp_areas = set(df_gdp['Area'].unique())
gdp_areas

{'Albania',
 'Algeria',
 'Andorra',
 'Angola',
 'Antigua and Barbuda',
 'Argentina',
 'Armenia',
 'Australia',
 'Austria',
 'Azerbaijan',
 'Bahamas',
 'Bahrain',
 'Bangladesh',
 'Barbados',
 'Belarus',
 'Belgium',
 'Belize',
 'Benin',
 'Bermuda',
 'Bolivia (Plurinational State of)',
 'Bosnia and Herzegovina',
 'Botswana',
 'Brazil',
 'Brunei Darussalam',
 'Bulgaria',
 'Burkina Faso',
 'Burundi',
 'Cabo Verde',
 'Cambodia',
 'Cameroon',
 'Canada',
 'Central African Republic',
 'Chad',
 'Chile',
 'China',
 'China, Hong Kong SAR',
 'China, Macao SAR',
 'China, mainland',
 'Colombia',
 'Comoros',
 'Congo',
 'Costa Rica',
 'Croatia',
 'Cyprus',
 'Czechia',
 "Côte d'Ivoire",
 'Democratic Republic of the Congo',
 'Denmark',
 'Djibouti',
 'Dominica',
 'Dominican Republic',
 'Ecuador',
 'Egypt',
 'El Salvador',
 'Equatorial Guinea',
 'Estonia',
 'Eswatini',
 'Ethiopia',
 'Fiji',
 'Finland',
 'France',
 'Gabon',
 'Gambia',
 'Georgia',
 'Germany',
 'Ghana',
 'Greece',
 'Grenada',
 'Guatemala',
 '

In [79]:
obesity_areas = set(df_obesity['Area'].unique())
obesity_areas

{'Afghanistan',
 'Albania',
 'Algeria',
 'American Samoa',
 'Andorra',
 'Angola',
 'Antigua and Barbuda',
 'Argentina',
 'Armenia',
 'Australia',
 'Austria',
 'Azerbaijan',
 'Bahamas',
 'Bahrain',
 'Bangladesh',
 'Barbados',
 'Belarus',
 'Belgium',
 'Belize',
 'Benin',
 'Bermuda',
 'Bhutan',
 'Bolivia (Plurinational State of)',
 'Bosnia and Herzegovina',
 'Botswana',
 'Brazil',
 'Brunei Darussalam',
 'Bulgaria',
 'Burkina Faso',
 'Burundi',
 'Cabo Verde',
 'Cambodia',
 'Cameroon',
 'Canada',
 'Central African Republic',
 'Chad',
 'Chile',
 'China',
 'Colombia',
 'Comoros',
 'Congo',
 'Cook Islands',
 'Costa Rica',
 'Croatia',
 'Cuba',
 'Cyprus',
 'Czechia',
 "Côte d'Ivoire",
 "Democratic People's Republic of Korea",
 'Democratic Republic of the Congo',
 'Denmark',
 'Djibouti',
 'Dominica',
 'Dominican Republic',
 'Ecuador',
 'Egypt',
 'El Salvador',
 'Equatorial Guinea',
 'Eritrea',
 'Estonia',
 'Eswatini',
 'Ethiopia',
 'Fiji',
 'Finland',
 'France',
 'French Polynesia',
 'Gabon',
 'G

In [80]:
missing_in_gdp = obesity_areas - gdp_areas
missing_in_obesity = gdp_areas - obesity_areas

In [81]:
print("Areas in Obesity but not GDP:", missing_in_gdp)
print("Areas in GDP but not Obesity:", missing_in_obesity)

Areas in Obesity but not GDP: {'French Polynesia', 'Lebanon', "Democratic People's Republic of Korea", 'Yemen', 'South Sudan', 'Tonga', 'Niue', 'Cook Islands', 'Syrian Arab Republic', 'Palau', 'Turkmenistan', 'Greenland', 'Venezuela (Bolivarian Republic of)', 'Eritrea', 'Tokelau', 'Afghanistan', 'American Samoa', 'Cuba', 'Bhutan'}
Areas in GDP but not Obesity: {'China, Macao SAR', 'China, Hong Kong SAR', 'China, mainland'}


In [82]:
common_areas = gdp_areas & obesity_areas
df_gdp_aligned = df_gdp[df_gdp['Area'].isin(common_areas)]
df_obesity_aligned = df_obesity[df_obesity['Area'].isin(common_areas)]

In [83]:
print("Aligned GDP dataset rows:", len(df_gdp_aligned))
print("Aligned Obesity dataset rows:", len(df_obesity_aligned))

Aligned GDP dataset rows: 180
Aligned Obesity dataset rows: 180


## Step 3: Increasing data readability 
- In order to make the datasets easier to grasp, we will rename the 'value' column to 'GDP' since it is more indicative and descriptive.
- The same will be done for the obesity dataset. 

In [84]:
df_gdp_aligned = df_gdp_aligned[['Area', 'Year', 'Value']].rename(columns={'Value': 'GDP'})
df_gdp_aligned

Unnamed: 0,Area,Year,GDP
0,Albania,2022,17261.0
1,Algeria,2022,13090.9
2,Andorra,2022,63379.0
3,Angola,2022,7407.1
4,Antigua and Barbuda,2022,27757.2
...,...,...,...
178,Uzbekistan,2022,8447.5
179,Vanuatu,2022,3074.1
180,Viet Nam,2022,13102.3
181,Zambia,2022,3610.7


In [85]:
df_obesity_aligned = df_obesity_aligned[['Area', 'Year', 'Value']].rename(columns={'Value': 'Obesity'})
df_obesity_aligned

Unnamed: 0,Area,Year,Obesity
1,Albania,2022,23.4
2,Algeria,2022,23.8
4,Andorra,2022,18.1
5,Angola,2022,11.5
6,Antigua and Barbuda,2022,33.3
...,...,...,...
192,Uzbekistan,2022,30.0
193,Vanuatu,2022,21.3
195,Viet Nam,2022,2.0
197,Zambia,2022,11.1


## Step 4 : Merging the Datasets 
- After aligning the columns based on 'Area' and increasing readability through column renaming, we will now merge the datasets
- This merging will be done on the 'Area' and 'Year'
- Merging the two datasets will give us a combined table that will display the GDP and Obseity values for 180 countries. 

In [86]:
merged_data = pd.merge(df_gdp_aligned, df_obesity_aligned, on=['Area', 'Year'], how='inner')
merged_data

Unnamed: 0,Area,Year,GDP,Obesity
0,Albania,2022,17261.0,23.4
1,Algeria,2022,13090.9,23.8
2,Andorra,2022,63379.0,18.1
3,Angola,2022,7407.1,11.5
4,Antigua and Barbuda,2022,27757.2,33.3
...,...,...,...,...
175,Uzbekistan,2022,8447.5,30.0
176,Vanuatu,2022,3074.1,21.3
177,Viet Nam,2022,13102.3,2.0
178,Zambia,2022,3610.7,11.1


## Step 5 : Plotting the Figure 
- Based on the merged dataset, we will now choose a figure to best represent the relationship between the two variables
- The scatter plot is chosen because- 1. It can effectively show a large number of datapoints (180 countries), 2. It is simple to understand the spread/concentration for drawing inferences about the pattern and trends. 3. It showcases the relationship between the two variables easily. 

In [87]:
fig = px.scatter(
    merged_data,
    x='GDP',
    y='Obesity',
    color='Area',
    hover_data=['Year', 'Area'],
    title='Relationship Between GDP and Obesity Levels',
    labels={'GDP': 'GDP (USD)', 'Obesity': 'Obesity Rate (%)'}
)

fig.show()

### Analysis 
From the scatterplot provided, we can observe the following about the relationship between GDP per capita (GDP in USD) and obesity rates (%):

1. **Positive Correlation**: 
   - There seems to be a trend where countries with higher GDP per capita tend to have higher obesity rates. This could suggest that as countries grow wealthier, diets and lifestyles change, potentially leading to increased obesity rates.

2. **Wide Variation at Lower GDP Levels**:
   - At lower GDP values (less than $20k), there is significant variation in obesity rates, ranging from very low to moderately high values. This indicates that other factors beyond GDP might play a role in determining obesity levels in these countries.

3. **Limited Data at High GDP Levels**:
   - For countries with very high GDP values (above $100k), data points are sparse, but these countries appear to have relatively high obesity rates.

4. **Non-Linearity**:
   - The relationship may not be strictly linear, as obesity rates do not increase consistently across all GDP levels.

### Possible Takeaway:
While GDP per capita can be an indicator of obesity prevalence, it's not the sole determinant. Other factors, such as dietary habits, cultural norms, public health policies, and urbanization levels, are likely contributing to the observed variability. A deeper analysis might require incorporating these additional variables.