# **Life Span Trends: Analyzing Longevity**
Dataset: https://www.kaggle.com/datasets/kumarajarshi/life-expectancy-who/data

## Project Description:
Our aim is to uncover what contributes to longer life, by analyzing factors that contribute to longer lifespans covering economic gdp, health diseases, and spending on healthcare by countries.

## Jazzy's Research Question Focus:
What are the 5 countries with the least amount of diseases and high life expectancy?



In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from pathlib import Path
from scipy.stats import linregress

# The path to our CSV file
life_expectancy_data = Path("Life Expectancy Data.csv")
# Read our Life Expectancy data into pandas
life_expectancy_df = pd.read_csv(life_expectancy_data)
life_expectancy_df.head()

---

## **Part 1: Analyze Hepatitis B Coverage Trends üîé**

### Summary: 

- Higher coverage for Hepatitis B means life expectancy expands another 10 years. 


### Key Findings:
- Low Hep B covered (9% - 35%) countries have an average Life Expectancy of 63 years.
- High Hep B covered (98%) countries have an average Life Expectancy of 73 years.

### Posible Action Steps:
- Building contracts with pharmacueticals & governments to deliver HIV coverage in lacking countries.


---

### **Cleaning Data üßπ...**

In [None]:
# Isolate Country, Hep B, Life Expectancy and drop NaN values
hep_children_coverage = life_expectancy_df[["Country", "Hepatitis B", "Life expectancy "]].dropna()
hep_children_coverage.head()

In [None]:
# Filter out rows where Hepatitis B or Life expectancy is 0
hep_children_coverage_filtered = hep_children_coverage[(hep_children_coverage["Hepatitis B"] != 0) & (hep_children_coverage["Life expectancy "] != 0)]
hep_children_coverage_filtered.head()

In [None]:
# Group by country and calculate the mean for each column and reset the index
hep_children_coverage_df = hep_children_coverage_filtered.groupby("Country").mean().reset_index()
hep_children_coverage_df.head()

### **Lowest Covered Countries üìù...**

In [None]:
# Find the lowest Hepatitis B covered countries
lowest_hep_b_coverage = hep_children_coverage_df.sort_values(by="Hepatitis B", ascending=True)
print("THE BOTTOM 5 LOWEST HEP B COVERED COUNTRIES:")

# Return the bottom 5 rows
lowest_hep_b_coverage.head()


In [None]:
# Find the Average Life Expectancy - bottom 5 lowest covered countries
avg_low_hep_coverage = lowest_hep_b_coverage.head()["Life expectancy "].mean()
print(f"The average life expectancy is {avg_low_hep_coverage}.")


### **Highest Covered Countries üìù...**

In [None]:
# Find the highest Hepatitis B covered countries
highest_hep_b_coverage = hep_children_coverage_df.sort_values(by="Hepatitis B", ascending=False)

print("THE TOP 5 HIGHEST HEP B COVERED COUNTRIES:")
highest_hep_b_coverage.head()

In [None]:
# Find the average Life Expectancy of the top 5 highest covered countries
avg_high_hep_coverage = highest_hep_b_coverage.head()["Life expectancy "].mean()
print(f"The average life expectancy is {avg_high_hep_coverage}.")


### **Plotting Graph üìà**

In [None]:
# Adjust the size of the graph by width and height - ensures the visual isn't too small
plt.figure(figsize=(10, 6))

# Create the scatter plot using Hepatitis B (x-axis) & Life expectancy (y-axis) 
plt.scatter(hep_children_coverage_df["Hepatitis B"], hep_children_coverage_df["Life expectancy "])

# Add country labels using a for loop
for i, country in hep_children_coverage_df["Country"].items():
    plt.text(hep_children_coverage_df["Hepatitis B"][i], 
             hep_children_coverage_df["Life expectancy "][i], 
             country)

# Set bolded labels and set title
plt.xlabel("Hepatitis B Vaccination Coverage (%)", fontweight="bold")
plt.ylabel("Life Expectancy", fontweight="bold")
plt.title("Hepatitis B Vaccination Coverage vs Life Expectancy by Country")

# Show grid
plt.grid(True)

# Save the file as a png
plt.savefig("Hep_B_Coverage_Chart")

# Show the chart
plt.show()

---

## **Part 2: Measles Report ü¶†**

### Summary: 

- Greater cases of Measles leads to a decline in  12 years of Life Expectancy.

### Key Findings:
- Average Life Expectancy for the bottom 5 countries with the lowest Measles cases averages at about 74 years.
- Average Life Expectancy for the top 5 countries with the highest Measles cases averages at about 62 years.
- Countries with higher Measles cases can increase Life Expectancy to 20 years with better health treatment plans.

### Possible Action Steps:
- Working with Health Departments to understand how to limit and support countries with high Measles cases.
- Review world data on Measles affect on all 10 countries.
- Hiring more experts.

---

### **Cleaning Data üßπ...**

In [None]:
# Copy the view for Country, Measles, and Life Expectancy from original dataframe
measles_life_expectancy = life_expectancy_df[["Country", "Measles ", "Life expectancy "]].copy()

# Clean the data by dropping NaN values
measles_life_expectancy.dropna(inplace=True)

# Filter out rows where "Measles" column is not equal to 0
measles_life_expectancy = measles_life_expectancy[measles_life_expectancy["Measles "] != 0]

# Averages and grouped by country
grouped_measles_life_expectancy = measles_life_expectancy.groupby("Country").mean().reset_index()
grouped_measles_life_expectancy.head()

### **Lowest Reported Cases (Per 1000) Bottom 5 Countries üìù:**

In [None]:
# Find the bottom 5 countries with the lowest measles cases
lowest_reported_countries = grouped_measles_life_expectancy.sort_values(by="Measles ", ascending=True)

print("THE BOTTOM 5 LEAST REPORTED MEASLES COUNTRIES:")
lowest_reported_countries.head()

In [None]:
# Find the Average Life Expectancy of Low Case Measles
avg_low_measles_life = lowest_reported_countries.head()["Life expectancy "].mean()
print(f"The average life expectancy is {avg_low_measles_life}.")

### **Highest Reported Cases (Per 1000) Bottom 5 Countries üìù:**

In [None]:
# Sort the values and produce the top 5
highest_measles_reports = grouped_measles_life_expectancy.sort_values(by="Measles ")

print("THE TOP 5 MOST REPORTED MEASLES COUNTRIES:")
highest_measles_reports.tail()

In [None]:
# Find the Average Life Expectancy of Low Case Measles
avg_high_measles_life = highest_measles_reports.tail()["Life expectancy "].mean()
print(f"The average life expectancy is {avg_high_measles_life}.")

### **Plotting Graph # 1 üìà**

In [None]:
# Adjust the size of the scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(lowest_reported_countries["Measles "], lowest_reported_countries["Life expectancy "], s=50, c="#AC2D47", alpha=0.7)
plt.title("Measles Effect On Life Expectancy")

# Bold the Labels
plt.xlabel("Measles Cases (per 1000 population)", fontweight="bold")
plt.ylabel("Average Life Expectancy", fontweight="bold")

# Add the grid
plt.grid(True)

# Save chart as a png
plt.savefig("Measles_Effect_On_Life_Expectancy_Chart_Regular")

# Show the scatter plot
plt.show()

#### **Expanded Version ‚¨áÔ∏è**

In [None]:
# Create the biiger bubbles in relation to measle cases graph 
plt.figure(figsize=(10, 6))
plt.scatter(lowest_reported_countries["Measles "], lowest_reported_countries["Life expectancy "], s=lowest_reported_countries["Measles "]/10, c='#AC2D47', alpha=0.7)
plt.title("Measles Effect On Life Expectancy")

# Bold the Labels
plt.xlabel("Measles Cases (per 1000 population)", fontweight="bold")
plt.ylabel("Average Life Expectancy", fontweight="bold")

# Add the grid
plt.grid(True)

# Save chart as a png
plt.savefig("Measles_Effect_On_Life_Expectancy_Chart_Expanded")

# Show the scatter plot
plt.show()


---

## **Part 3: Obesity & Life Expectancy Analysis üí°**

### Summary: 

Generally the higher the Life Expectancy the greater the rates of obesity.

### Key Findings:
- Low BMI rates were 19.08625.
- Low BMI rates correspond to a Life Expectancy up to 48 years old.
- High BMI rates were 47.269999999999996.
- High BMI rates correspond to a Life Expectancy up to 82 years old.

### Possible Action Steps:
- Understanding the factors that create obesity in countries with better Life Expectancy.
- Population census testing for further reviews and data 


---

### **Cleaning Data üßπ...**

In [None]:
# Select "Country", "Life expectancy ", " BMI " 
country_BMI_life_stats = life_expectancy_df[["Country", "Life expectancy ", " BMI "]]

# Group by country and calculate the mean for each disease
country_BMI_life_stats_grouped = country_BMI_life_stats.groupby('Country').mean().reset_index()

# Convert to DataFrame & sort by life expectancy
country_BMI_life_stats_grouped_df = pd.DataFrame(country_BMI_life_stats_grouped).sort_values(by='Life expectancy ')

# Drop NaN values
country_BMI_life_stats_grouped_df = country_BMI_life_stats_grouped_df.dropna()


### **Best Life Expectancy & BMI Relationship üìù:**

In [None]:
# Select the top 5 (best life expectancy) 
top_5_countries = country_BMI_life_stats_grouped_df.tail(5)
print("BEST LIFE EXPECTANCY & BMI RATES")
top_5_countries


In [None]:
# Find the Highest Average Life Expectancy 
avg_bmi_top_life = top_5_countries.head()["Life expectancy "].mean()
print(f"The average life expectancy is {avg_bmi_top_life}.")

In [None]:
# Find the Highest Average BMI
avg_bmi_top = top_5_countries.head()[" BMI "].mean()
print(f"The average BMI is {avg_bmi_top}.")

### **Worst Life Expectancy & BMI Relationship üìù:**

In [None]:
# And bottom 5 (worst life expectancy) countries
bottom_5_countries = country_BMI_life_stats_grouped_df.head(5)
print("WORST LIFE EXPECTANCY & BMI RATES")
bottom_5_countries

In [None]:
# Find the Highest Average Life Expectancy 
avg_bmi_bottom_life = bottom_5_countries.head()["Life expectancy "].mean()
print(f"The average life expectancy is {avg_bmi_bottom_life}.")

In [None]:
# Find the Highest Average BMI
avg_bmi_bottom = bottom_5_countries.head()[" BMI "].mean()
print(f"The average BMI is {avg_bmi_bottom}.")

### **Plotting Graph # 1 üìà**

In [None]:
# Plotting Graph 1 - Top and Bottom 5 Only
plt.figure(figsize=(10, 6))

# Scatter plot for top 5 countries
plt.scatter(top_5_countries[" BMI "], top_5_countries["Life expectancy "], color='green', label='Top 5 Countries')

# Scatter plot for bottom 5 countries
plt.scatter(bottom_5_countries[" BMI "], bottom_5_countries["Life expectancy "], color='red', label='Bottom 5 Countries')

# Bold the labels
plt.xlabel('BMI (Body Mass Index)', fontweight='bold')
plt.ylabel('Life Expectancy', fontweight='bold')

# Add the title, legend, grid
plt.title('Relationship between BMI and Life Expectancy')
plt.legend()
plt.grid(True)

# Save chart as a png
plt.savefig("Top5_Bottom5_BMI_Life_Chart")

# Show the chart
plt.show()

### **Plotting Graph # 2 üìà**

In [None]:
# Plotting Graph # 2 - All Countries Included (Top and Bottom Distinctive)
plt.figure(figsize=(10, 6))

# Scatter plot for all countries in grey
plt.scatter(country_BMI_life_stats_grouped_df[" BMI "], country_BMI_life_stats_grouped_df["Life expectancy "], color='grey', label='Other Countries')

# Scatter plot for top 5 countries
plt.scatter(top_5_countries[" BMI "], top_5_countries["Life expectancy "], color='green', label='Top 5 Countries')

# Scatter plot for bottom 5 countries
plt.scatter(bottom_5_countries[" BMI "], bottom_5_countries["Life expectancy "], color='red', label='Bottom 5 Countries')


# Bold the labels
plt.xlabel('BMI (Body Mass Index)', fontweight='bold')
plt.ylabel('Life Expectancy', fontweight='bold')

# Add the title, legend, grid
plt.title('Relationship between BMI and Life Expectancy')
plt.legend()
plt.grid(True)

# Save chart as a png
plt.savefig("All_Countries__BMI_Life_Chart")

# Show the chart
plt.show()

---

## **Part 4: HIV Prevalence & Life Expectancy Analysis üß™**

### Summary: 

Similar to the Measles graph, there is greater Life Expectancy  with lower cases of deaths. We also notice a correlation of Measles and HIV which could possibly be happening simultaniously within countries, this could lead to a greater understanding of how to effectively get both diseases under control. HIV creates an imbalance from opitmal life expectancy (82) with 0.1% deaths affecting life expectancy by 14 years - possiblye due to stress and lack of sufficient health measures. Countries facing greater HIV see the effect double as 30 years are diminished in Life Expectancy findings. 

### Key Findings:
- The lowest average life expectancy is 68.5 years old with 0.1 HIV deaths.
- The highest average life expectancy is 51 years old with HIV averaged deaths at 22.8%.


### Posible Action Steps:
- Learn more about countries facing Measles and HIV, similar symptoms, and treatments 
- Working with suppliers and governments to pursue research and production of meds for relief

---

### **Cleaning Data üßπ...**

In [None]:
# Select "Country", "Life expectancy ", " HIV/AIDS"
hiv_life_expectancy = life_expectancy_df[["Country", "Life expectancy ", " HIV/AIDS"]]

# Group by country and calculate the mean for each disease
hiv_life_expectancy = hiv_life_expectancy.groupby('Country').mean().reset_index()

# Convert to DataFrame & sort
hiv_life_expectancy_df = pd.DataFrame(hiv_life_expectancy).sort_values(by=" HIV/AIDS", ascending=True)

# Drop NaN values
hiv_life_expectancy_df = hiv_life_expectancy_df.dropna()

### **Highest HIV Rates & Life Expectancy üìù:**

In [None]:
# Sort by highest life expectancy
print("TOP 5 COUNTRIES WITH THE HIGHEST HIV DEATHS (PER 1000 CHILDREN 0-4 YEARS) AND LIFE EXPECTANCY")
hiv_life_expectancy_df.tail()

In [None]:
# Find the Highest Average Life Expectancy 
avg_life_for_high_hiv = hiv_life_expectancy_df.tail()["Life expectancy "].mean()
print(f"The average life expectancy is {avg_life_for_high_hiv}.")

In [None]:
# Find the Average HIV RATE 
avg_highest_HIV_rate = hiv_life_expectancy_df.tail()[" HIV/AIDS"].mean()
print(f"The average HIV rate is {avg_highest_HIV_rate}.")

### **Lowest HIV Rates & Life Expectancy üìù:**

In [None]:
# Dispaly the 5 countries with the lowest amount of HIV deaths 
print("TOP 5 COUNTRIES WITH THE LOWEST HIV DEATHS (PER 1000 CHILDREN 0-4 YEARS) AND LIFE EXPECTANCY")
hiv_life_expectancy_df.head()

In [None]:
# Find the average Life Expectancy 
avg_low_HIV_life = hiv_life_expectancy_df.head()["Life expectancy "].mean()
print(f"The average life expectancy is {avg_low_HIV_life}.")
avg_low_HIV_life

In [None]:
# Find the Average HIV RATE 
avg_lowest_HIV_rate = hiv_life_expectancy_df.head()[" HIV/AIDS"].mean()
print(f"The average HIV rate is {avg_lowest_HIV_rate}.")

In [None]:
# Plotting the HIV and Life Expectancy Graph
plt.figure(figsize=(10, 6))
plt.scatter(hiv_life_expectancy_df[" HIV/AIDS"], hiv_life_expectancy_df["Life expectancy "], color='skyblue')

# Bold the labels
plt.xlabel('HIV/AIDS Prevalence (%)', fontweight='bold')
plt.ylabel('Life Expectancy', fontweight='bold')

# Add title & grid
plt.title('Relationship between HIV/AIDS Prevalence and Life Expectancy')
plt.grid(True)

# Save the image
plt.savefig("HIV_and_Life_Chart")

# show the chart
plt.show()


---

## Additional Tables üìå

---

In [None]:
# Select the columns of interest
top_diseases_for_life_expectancy = life_expectancy_df[["Country", "Life expectancy ", " BMI "]]

# Group by country and calculate the mean for each disease
grouped_by_country = top_diseases_for_life_expectancy.groupby('Country').mean().reset_index()

# Convert to DataFrame & sort
grouped_by_country_df = pd.DataFrame(grouped_by_country).sort_values(by='Life expectancy ', ascending=False)

# Drop NaN values
grouped_by_country_df = grouped_by_country_df.dropna()

# Sort by highest life expectancy
grouped_by_country_df.head()

In [None]:
# Select the columns of interest
top_diseases_for_life_expectancy = life_expectancy_df[["Country", "Life expectancy ",  "Measles ", "Hepatitis B", "Polio", "Diphtheria ", " HIV/AIDS"]]

# Group by country and calculate the mean for each disease
grouped_by_country = top_diseases_for_life_expectancy.groupby('Country').mean().reset_index()

# Convert to DataFrame & sort
grouped_by_country_df = pd.DataFrame(grouped_by_country).sort_values(by='Life expectancy ', ascending=False)

# Drop NaN values
grouped_by_country_df = grouped_by_country_df.dropna()

# Sort by highest life expectancy
print("ALL DISEASES TOP LIFE EXPECTANCY")
grouped_by_country_df.head()



In [None]:
# Sort by lowest life expectancy
print("ALL DISEASES BOTTOM LIFE EXPECTANCY")
grouped_by_country_df.tail()

In [None]:
# Select the columns of interest
top_diseases_for_life_expectancy = life_expectancy_df[["Country", "Life expectancy ", " BMI ", "Hepatitis B", "Measles ", "Polio", "Diphtheria ", " HIV/AIDS"]]

# Group by country and calculate the mean for each disease
grouped_by_country = top_diseases_for_life_expectancy.groupby('Country').mean().reset_index()

# Convert to DataFrame & sort
grouped_by_country_df = pd.DataFrame(grouped_by_country).sort_values(by='Life expectancy ', ascending=False)

# Drop NaN values
grouped_by_country_df = grouped_by_country_df.dropna()

# Sort by highest life expectancy
print("ALL DISEASES WITH BMI")
grouped_by_country_df.head()

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
from pathlib import Path

# The path to our CSV file
life_expectancy_data = Path("Life Expectancy Data.csv")

# Read our Life Expectancy data into pandas
life_expectancy_df = pd.read_csv(life_expectancy_data)

# Select the columns of interest
top_diseases_for_life_expectancy = life_expectancy_df[["Country"," HIV/AIDS",  " BMI "]]

# Group by country and calculate the mean for each disease
grouped_by_country = top_diseases_for_life_expectancy.groupby('Country').mean().reset_index()

# Convert to DataFrame & sort
grouped_by_country_df = pd.DataFrame(grouped_by_country).sort_values(by=" HIV/AIDS", ascending=False)

# Drop NaN values
grouped_by_country_df = grouped_by_country_df.dropna()

# Sort by highest life expectancy
print("COUNTRIES WITH THE HIGHEST HIV DEATHS (PER 1000 CHILDREN 0-4 YEARS)")
grouped_by_country_df.head()

In [None]:
# Sort by highest life expectancy
print("COUNTRIES WITH THE LOWEST HIV DEATHS (PER 1000 CHILDREN 0-4 YEARS)")
grouped_by_country_df.tail()

In [None]:
# Select the columns of interest
top_diseases_for_life_expectancy = life_expectancy_df[["Country", "Life expectancy ",  " HIV/AIDS"]]

# Group by country and calculate the mean for each disease
grouped_by_country = top_diseases_for_life_expectancy.groupby('Country').mean().reset_index()

# Convert to DataFrame & sort
grouped_by_country_df = pd.DataFrame(grouped_by_country).sort_values(by="Life expectancy ", ascending=False)

# Drop NaN values
grouped_by_country_df = grouped_by_country_df.dropna()

# Sort by highest life expectancy
print("BEST LIFE EXPECTANCY AND LOWEST HIV DEATHS (PER 1000 CHILDREN 0-4 YEARS)")
grouped_by_country_df.head()

In [None]:
# Convert to DataFrame & sort
grouped_by_country_df = pd.DataFrame(grouped_by_country).sort_values(by="Life expectancy ", ascending=True)

# Drop NaN values
grouped_by_country_df = grouped_by_country_df.dropna()

# Sort by highest life expectancy
print("WORST LIFE EXPECTANCY & HIV DEATHS (PER 1000 CHILDREN 0-4 YEARS)")
grouped_by_country_df.head()