### Project Overview
This project uses machine learning to support UN Sustainable Development Goal 3 (Good Health and Well-being) by predicting life expectancy based on health indicators. Using regression algorithms like Random Forest, the model analyzes health data including mortality rates, immunization coverage, healthcare spending, and education levels to forecast life expectancy across different countries. The goal is to provide policymakers with a data-driven tool that identifies which health factors most strongly influence life expectancy, enabling better healthcare decisions and resource allocation to improve global health outcomes.

### Problem Statement
Healthcare policymakers struggle to effectively allocate limited resources because they don't know which health interventions will have the biggest impact on life expectancy. Current decisions often rely on guesswork rather than data, leading to wasted resources and missed opportunities to save lives. We need a predictive model that can accurately identify the most important health factors affecting life expectancy and help policymakers prioritize their investments. The model should predict life expectancy with high accuracy and provide clear insights about which health indicators matter most for improving population health.

### Business Problem:
Healthcare policymakers need to understand which factors most strongly influence life expectancy to:

1. Allocate limited healthcare resources effectively.
2. Design targeted intervention programs.
3. Monitor progress toward SDG 3 targets.
4. Predict health outcomes for strategic planning.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.svm import SVR
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import warnings
warnings.filterwarnings('ignore')

In [2]:
df = pd.read_csv('Life Expectancy Data.csv')
df.head()

Unnamed: 0,Country,Year,Status,Life expectancy,Adult Mortality,infant deaths,Alcohol,percentage expenditure,Hepatitis B,Measles,...,Polio,Total expenditure,Diphtheria,HIV/AIDS,GDP,Population,thinness 1-19 years,thinness 5-9 years,Income composition of resources,Schooling
0,Afghanistan,2015,Developing,65.0,263.0,62,0.01,71.279624,65.0,1154,...,6.0,8.16,65.0,0.1,584.25921,33736494.0,17.2,17.3,0.479,10.1
1,Afghanistan,2014,Developing,59.9,271.0,64,0.01,73.523582,62.0,492,...,58.0,8.18,62.0,0.1,612.696514,327582.0,17.5,17.5,0.476,10.0
2,Afghanistan,2013,Developing,59.9,268.0,66,0.01,73.219243,64.0,430,...,62.0,8.13,64.0,0.1,631.744976,31731688.0,17.7,17.7,0.47,9.9
3,Afghanistan,2012,Developing,59.5,272.0,69,0.01,78.184215,67.0,2787,...,67.0,8.52,67.0,0.1,669.959,3696958.0,17.9,18.0,0.463,9.8
4,Afghanistan,2011,Developing,59.2,275.0,71,0.01,7.097109,68.0,3013,...,68.0,7.87,68.0,0.1,63.537231,2978599.0,18.2,18.2,0.454,9.5
