# Project Title
---

Group Name

Team members:
- Maryam Mohamed
- Zahraa Mohamed 


## Introduction 📝
__Introduction to the topic__ 

Economic growth and human well-being are deeply connected. Income levels often shape access to healthcare, education, and living standards, which in turn influence how long people live. By exploring the link between GNI per capita and life expectancy, we can uncover valuable insights into global development.

## Problem Statement ❗

Despite global progress, not all countries experience equal improvements in health and wealth. Some nations achieve high income without a proportional rise in life expectancy, while others manage to improve health outcomes despite lower income levels. Understanding this relationship is crucial for identifying gaps and opportunities in development.

## Objectives: 🎯
__Questions that will guide the analysis to solve the problem__

- Analyze the relationship between GNI per capita and life expectancy across countries.

- Identify patterns and outliers where income does not match expected health outcomes.

- Highlight regional and global trends over time.

- Provide visual insights that make the link between economic growth and human well-being easy to understand for a non-technical audience.

## Exploratory Data Analysis (EDA):

### Data Info:
__Getting the data and exploring it (includes descriptive statistics)__

In [84]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt



gni = pd.read_csv("data/gni_per_cap_atlas_method_con2021.csv")
life = pd.read_csv("data/life_expectancy.csv")



In [85]:
display(gni.head())
display(life.head())

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2041,2042,2043,2044,2045,2046,2047,2048,2049,2050
0,Afghanistan,207.0,207.0,207.0,207.0,207.0,207.0,207.0,207.0,207.0,...,751,767,783,800,817,834,852,870,888,907
1,Angola,517.0,519.0,522.0,524.0,525.0,528.0,531.0,533.0,536.0,...,2770,2830,2890,2950,3010,3080,3140,3210,3280,3340
2,Albania,207.0,207.0,207.0,207.0,207.0,207.0,207.0,207.0,207.0,...,9610,9820,10k,10.2k,10.5k,10.7k,10.9k,11.1k,11.4k,11.6k
3,United Arab Emirates,738.0,740.0,743.0,746.0,749.0,751.0,754.0,757.0,760.0,...,47.9k,48.9k,50k,51k,52.1k,53.2k,54.3k,55.5k,56.7k,57.9k
4,Argentina,794.0,797.0,799.0,802.0,805.0,808.0,810.0,813.0,816.0,...,12.8k,13.1k,13.4k,13.6k,13.9k,14.2k,14.5k,14.8k,15.2k,15.5k


Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2091,2092,2093,2094,2095,2096,2097,2098,2099,2100
0,Afghanistan,28.2,28.2,28.2,28.2,28.2,28.2,28.1,28.1,28.1,...,75.5,75.7,75.8,76.0,76.1,76.2,76.4,76.5,76.6,76.8
1,Angola,27.0,27.0,27.0,27.0,27.0,27.0,27.0,27.0,27.0,...,78.8,79.0,79.1,79.2,79.3,79.5,79.6,79.7,79.9,80.0
2,Albania,35.4,35.4,35.4,35.4,35.4,35.4,35.4,35.4,35.4,...,87.4,87.5,87.6,87.7,87.8,87.9,88.0,88.2,88.3,88.4
3,Andorra,,,,,,,,,,...,,,,,,,,,,
4,United Arab Emirates,30.7,30.7,30.7,30.7,30.7,30.7,30.7,30.7,30.7,...,82.4,82.5,82.6,82.7,82.8,82.9,83.0,83.1,83.2,83.3


In [86]:
life[life['country'] == 'Andorra']

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2091,2092,2093,2094,2095,2096,2097,2098,2099,2100
3,Andorra,,,,,,,,,,...,,,,,,,,,,


In [87]:
type(gni)

pandas.core.frame.DataFrame

In [88]:
display(pd.DataFrame(gni.dtypes, columns=['DataType']))

display(pd.DataFrame(life.dtypes, columns=['DataType']))

Unnamed: 0,DataType
country,object
1800,float64
1801,float64
1802,float64
1803,float64
...,...
2046,object
2047,object
2048,object
2049,object


Unnamed: 0,DataType
country,object
1800,float64
1801,float64
1802,float64
1803,float64
...,...
2096,float64
2097,float64
2098,float64
2099,float64


In [89]:
display(gni.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 191 entries, 0 to 190
Columns: 252 entries, country to 2050
dtypes: float64(97), object(155)
memory usage: 376.2+ KB


None

In [90]:
display(life.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Columns: 302 entries, country to 2100
dtypes: float64(301), object(1)
memory usage: 460.2+ KB


None

In [91]:
gni.shape

(191, 252)

In [92]:
life.shape

(195, 302)

In [93]:
gni.columns[1:]

Index(['1800', '1801', '1802', '1803', '1804', '1805', '1806', '1807', '1808',
       '1809',
       ...
       '2041', '2042', '2043', '2044', '2045', '2046', '2047', '2048', '2049',
       '2050'],
      dtype='object', length=251)

In [94]:
display(pd.DataFrame(gni.dtypes, columns=['DataType']))

Unnamed: 0,DataType
country,object
1800,float64
1801,float64
1802,float64
1803,float64
...,...
2046,object
2047,object
2048,object
2049,object


In [95]:
gni.describe()

Unnamed: 0,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,...,1887,1888,1889,1890,1891,1892,1893,1894,1895,1896
count,190.0,190.0,190.0,190.0,190.0,190.0,190.0,190.0,190.0,190.0,...,190.0,190.0,190.0,190.0,190.0,190.0,190.0,190.0,190.0,190.0
mean,608.615789,608.547368,611.536842,610.689474,611.878947,611.821053,612.921053,612.284211,602.557895,603.957895,...,1053.873684,1061.473684,1071.605263,1079.3,1078.921053,1091.142105,1097.763158,1115.757895,1129.394737,1147.884211
std,670.490166,669.126775,681.331746,674.917062,681.7259,677.22908,679.940924,672.112396,627.047946,633.264088,...,1413.243605,1413.148665,1451.158106,1455.053253,1452.061654,1467.186324,1447.512653,1462.43457,1506.202147,1543.079138
min,33.0,33.0,33.0,33.0,33.0,33.0,33.0,33.0,33.0,33.0,...,53.0,51.0,47.0,47.0,43.0,47.0,54.0,61.0,56.0,62.0
25%,257.0,252.0,252.0,251.75,251.5,251.0,252.5,252.5,252.25,251.75,...,318.25,320.25,321.25,323.0,322.75,323.0,324.25,328.25,326.25,329.5
50%,402.0,401.0,398.5,398.5,398.5,398.5,399.0,398.5,398.5,398.5,...,577.5,578.0,578.0,576.5,576.5,580.0,591.5,600.5,604.5,611.0
75%,644.75,645.25,646.0,646.75,648.0,649.25,651.5,655.25,660.75,666.5,...,1087.5,1090.0,1090.0,1130.0,1100.0,1107.5,1132.5,1150.0,1137.5,1180.0
max,4780.0,4690.0,4950.0,4850.0,5080.0,4780.0,4810.0,4410.0,3850.0,3850.0,...,8330.0,8240.0,8290.0,8550.0,8420.0,9060.0,8600.0,8590.0,9070.0,9610.0


In [96]:
life.describe()

Unnamed: 0,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,...,2091,2092,2093,2094,2095,2096,2097,2098,2099,2100
count,186.0,186.0,186.0,186.0,186.0,186.0,186.0,186.0,186.0,186.0,...,186.0,186.0,186.0,186.0,186.0,186.0,186.0,186.0,186.0,186.0
mean,31.503763,31.463441,31.480108,31.385484,31.460753,31.586559,31.644086,31.598387,31.385484,31.313441,...,83.361828,83.476344,83.600538,83.717742,83.838172,83.955376,84.076344,84.193548,84.312903,84.430645
std,3.80951,3.801217,3.932344,3.955872,3.928388,4.003874,4.102694,3.974506,4.08023,4.033412,...,5.803782,5.797854,5.788922,5.777904,5.770755,5.766333,5.756555,5.750616,5.743805,5.741341
min,23.4,23.4,23.4,19.6,23.4,23.4,23.4,23.4,12.5,13.4,...,66.4,66.5,66.7,66.8,66.9,67.0,67.1,67.2,67.3,67.4
25%,29.025,28.925,28.9,28.9,28.925,29.025,29.025,29.025,28.925,28.825,...,79.65,79.75,79.925,80.025,80.15,80.325,80.425,80.525,80.7,80.8
50%,31.75,31.65,31.55,31.5,31.55,31.65,31.75,31.75,31.55,31.5,...,84.0,84.1,84.25,84.3,84.5,84.6,84.7,84.8,84.9,85.0
75%,33.875,33.9,33.875,33.675,33.775,33.875,33.975,33.975,33.775,33.675,...,87.775,87.875,87.975,88.075,88.175,88.3,88.4,88.5,88.675,88.775
max,42.9,40.3,44.4,44.8,42.8,44.3,45.8,43.6,43.5,41.7,...,93.4,93.5,93.6,93.7,93.8,94.0,94.1,94.2,94.3,94.4


### Data Handling: 
__Cleaning, transforming, and combining data__

__Function to clean numbers like 25k, 1.2M, 3b into floats__

In [97]:
print(gni.head(), "\n")

def clean_number(value):
    if isinstance(value, str):      
        value = value.strip().lower() 
        value = value.replace(",", "") 
        if value.endswith("k"):        
            return float(value[:-1]) * 1000
        elif value.endswith("m"):    
            return float(value[:-1]) * 1_000_000
        elif value.endswith("b"):     
            return float(value[:-1]) * 1_000_000_000
        else:
            try:
                return float(value)   
            except:
                return None
    return value   


                country   1800   1801   1802   1803   1804   1805   1806  \
0           Afghanistan  207.0  207.0  207.0  207.0  207.0  207.0  207.0   
1                Angola  517.0  519.0  522.0  524.0  525.0  528.0  531.0   
2               Albania  207.0  207.0  207.0  207.0  207.0  207.0  207.0   
3  United Arab Emirates  738.0  740.0  743.0  746.0  749.0  751.0  754.0   
4             Argentina  794.0  797.0  799.0  802.0  805.0  808.0  810.0   

    1807   1808  ...   2041   2042   2043   2044   2045   2046   2047   2048  \
0  207.0  207.0  ...    751    767    783    800    817    834    852    870   
1  533.0  536.0  ...   2770   2830   2890   2950   3010   3080   3140   3210   
2  207.0  207.0  ...   9610   9820    10k  10.2k  10.5k  10.7k  10.9k  11.1k   
3  757.0  760.0  ...  47.9k  48.9k    50k    51k  52.1k  53.2k  54.3k  55.5k   
4  813.0  816.0  ...  12.8k  13.1k  13.4k  13.6k  13.9k  14.2k  14.5k  14.8k   

    2049   2050  
0    888    907  
1   3280   3340  
2  11.4k

In [98]:
print(clean_number(None))

None


__Apply cleaning to all columns except "country"__

In [99]:
columns = gni.columns[1:]
for col in columns:
    gni[col] = gni[col].apply(clean_number)

In [100]:
life[life['country'] == 'Andorra']

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2091,2092,2093,2094,2095,2096,2097,2098,2099,2100
3,Andorra,,,,,,,,,,...,,,,,,,,,,


In [101]:
YEAR = ["1800", "2025"]

print("After cleaning:")
print(life[["country"]+YEAR].head(), "\n")
print(gni[["country"]+YEAR].head(), "\n")


After cleaning:
                country  1800  2025
0           Afghanistan  28.2  65.2
1                Angola  27.0  67.1
2               Albania  35.4  79.2
3               Andorra   NaN   NaN
4  United Arab Emirates  30.7  74.7 

                country   1800     2025
0           Afghanistan  207.0    548.0
1                Angola  517.0   2120.0
2               Albania  207.0   6390.0
3  United Arab Emirates  738.0  38700.0
4             Argentina  794.0   9880.0 



In [122]:
life[life['2020'].isna()]['country']

Series([], Name: country, dtype: object)

In [123]:
# Drop missing and filter unrealistic values
df_long = df_long.dropna()
df_long = df_long[df_long["gni_per_capita"] > 0]

print("After cleaning:", df_long.shape)
print(df_long.head())


After cleaning: (23940, 4)
                country  year  life_expectancy  gni_per_capita
0           Afghanistan  1900             33.3           320.0
1                Angola  1900             32.6           576.0
2               Albania  1900             34.9           348.0
3  United Arab Emirates  1900             35.4          2160.0
4             Argentina  1900             37.2          2450.0


In [124]:
print("Final dataset shape:", df_long.shape)
print(df_long.head())


Final dataset shape: (23940, 4)
                country  year  life_expectancy  gni_per_capita
0           Afghanistan  1900             33.3           320.0
1                Angola  1900             32.6           576.0
2               Albania  1900             34.9           348.0
3  United Arab Emirates  1900             35.4          2160.0
4             Argentina  1900             37.2          2450.0


In [125]:
YEARS = [str(year) for year in range(1900, 2026)]
life_sel = life[["country"] + YEARS]

life_sel = life[["country"] + YEARS]
gni_sel  = gni[["country"] + YEARS]

life_long = life_sel.melt(id_vars="country", value_vars=YEARS,
                          var_name="year", value_name="life_expectancy")
gni_long  = gni_sel.melt(id_vars="country", value_vars=YEARS,
                         var_name="year", value_name="gni_per_capita")

df_long = pd.merge(life_long, gni_long, on=["country","year"]).dropna()
df_long

Unnamed: 0,country,year,life_expectancy,gni_per_capita
0,Afghanistan,1900,33.3,320.0
1,Angola,1900,32.6,576.0
2,Albania,1900,34.9,348.0
3,United Arab Emirates,1900,35.4,2160.0
4,Argentina,1900,37.2,2450.0
...,...,...,...,...
23935,Samoa,2025,71.2,3970.0
23936,Yemen,2025,68.4,969.0
23937,South Africa,2025,66.5,6100.0
23938,Zambia,2025,64.7,1060.0


In [126]:
display(gni.isnull().sum())
display(life.isnull().sum())

country    0
1800       0
1801       0
1802       0
1803       0
          ..
2046       0
2047       0
2048       0
2049       0
2050       0
Length: 252, dtype: int64

country    0
1800       0
1801       0
1802       0
1803       0
          ..
2096       0
2097       0
2098       0
2099       0
2100       0
Length: 302, dtype: int64

In [127]:
display(gni.isnull().mean()*100)
display(life.isnull().mean()*100)

country    0.0
1800       0.0
1801       0.0
1802       0.0
1803       0.0
          ... 
2046       0.0
2047       0.0
2048       0.0
2049       0.0
2050       0.0
Length: 252, dtype: float64

country    0.0
1800       0.0
1801       0.0
1802       0.0
1803       0.0
          ... 
2096       0.0
2097       0.0
2098       0.0
2099       0.0
2100       0.0
Length: 302, dtype: float64

In [128]:
gni = gni.dropna()
life = life.dropna()
gni.isnull().sum()

country    0
1800       0
1801       0
1802       0
1803       0
          ..
2046       0
2047       0
2048       0
2049       0
2050       0
Length: 252, dtype: int64

### Analysis: 
__Answering the objectives through data analysis__



---

## Summary
__Summarizing the key insights from the analysis__

**Note**: _Use Bullet Points_

    ...

## Recommendations/Conclusion
**Note**: _Use Bullet Points_

    ...