#### We will estimate the life expectency of people of age group less than 1 year old. Or superficially, lets say life expectency at birth.

#### Lets first get the dataset of just a single year, so that we can apply the process to a small dataset before applying to a large dataset.

In [1]:
## importing necessary libraries

import pandas as pd

In [2]:
## lets read the data.
## To read the data of a different year, replace the name of the csv file.

df1 = pd.read_csv('Life Expectency Data/IHME_USA_COD_COUNTY_RACE_ETHN_2000_2019_LT_2003_ALL_BOTH_Y2023M06D12.csv')

#### The csv file consists of life expectency of a total of 19 age groups. These age groups include :::
#### <1 year, 1 to 4, 5 to 9, 10 to 14, 15 to 19, 20 to 24, 25 to 29, 30 to 34, 35 to 39, 40 to 44,
#### 45 to 49, 50 to 54, 55 to 59, 60 to 64, 65 to 69, 70 to 74, 75 to 79, 80 to 84 and 85+. <br/><br/>

#### The data is classified into variety of groups. These groups include :::
#### (a) Total
#### (b) Latino
#### (c) Black
#### (d) White
#### (e) AIAN ( Non-Hispanic American Indian Alaska Native)
#### (f) API ( I dont know what it means, as it is not mentioned in the documentation) <br/><br/>

#### To get started, we will only be using the total life expectency and that of less than 1 year old.



In [3]:
## lets extract the total life expectency and only of the age group less than 1 year olds.

df2=df1.loc[(df1['race_name'] == 'Total') & (df1['age_name'] == '<1 year')]
df2

Unnamed: 0,measure_id,measure_name,location_id,location_name,fips,race_id,race_name,sex_id,sex_name,age_group_id,age_name,year,metric_id,metric_name,val,upper,lower
0,26,Life expectancy,102,United States of America,,1,Total,3,Both,28,<1 year,2003,1,Number,77.095526,77.120155,77.071230
114,26,Life expectancy,523,Alabama,1.0,1,Total,3,Both,28,<1 year,2003,1,Number,74.294151,74.351420,74.237475
228,26,Life expectancy,614,Autauga County (Alabama),1001.0,1,Total,3,Both,28,<1 year,2003,1,Number,74.628765,75.048044,74.217372
234,26,Life expectancy,637,Baldwin County (Alabama),1003.0,1,Total,3,Both,28,<1 year,2003,1,Number,76.661419,76.917529,76.416103
240,26,Life expectancy,624,Barbour County (Alabama),1005.0,1,Total,3,Both,28,<1 year,2003,1,Number,74.047811,74.496751,73.586544
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
363426,26,Life expectancy,3712,Sweetwater County (Wyoming),56037.0,1,Total,3,Both,28,<1 year,2003,1,Number,76.282640,76.697873,75.845954
363432,26,Life expectancy,3697,Teton County (Wyoming),56039.0,1,Total,3,Both,28,<1 year,2003,1,Number,82.121084,82.802580,81.531829
363438,26,Life expectancy,3714,Uinta County (Wyoming),56041.0,1,Total,3,Both,28,<1 year,2003,1,Number,76.259355,76.913164,75.626777
363444,26,Life expectancy,3700,Washakie County (Wyoming),56043.0,1,Total,3,Both,28,<1 year,2003,1,Number,77.543467,78.261629,76.803990


In [4]:
## lets remove empty cells

df3=df2.dropna()
df3

Unnamed: 0,measure_id,measure_name,location_id,location_name,fips,race_id,race_name,sex_id,sex_name,age_group_id,age_name,year,metric_id,metric_name,val,upper,lower
114,26,Life expectancy,523,Alabama,1.0,1,Total,3,Both,28,<1 year,2003,1,Number,74.294151,74.351420,74.237475
228,26,Life expectancy,614,Autauga County (Alabama),1001.0,1,Total,3,Both,28,<1 year,2003,1,Number,74.628765,75.048044,74.217372
234,26,Life expectancy,637,Baldwin County (Alabama),1003.0,1,Total,3,Both,28,<1 year,2003,1,Number,76.661419,76.917529,76.416103
240,26,Life expectancy,624,Barbour County (Alabama),1005.0,1,Total,3,Both,28,<1 year,2003,1,Number,74.047811,74.496751,73.586544
246,26,Life expectancy,603,Bibb County (Alabama),1007.0,1,Total,3,Both,28,<1 year,2003,1,Number,73.057987,73.643055,72.502644
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
363426,26,Life expectancy,3712,Sweetwater County (Wyoming),56037.0,1,Total,3,Both,28,<1 year,2003,1,Number,76.282640,76.697873,75.845954
363432,26,Life expectancy,3697,Teton County (Wyoming),56039.0,1,Total,3,Both,28,<1 year,2003,1,Number,82.121084,82.802580,81.531829
363438,26,Life expectancy,3714,Uinta County (Wyoming),56041.0,1,Total,3,Both,28,<1 year,2003,1,Number,76.259355,76.913164,75.626777
363444,26,Life expectancy,3700,Washakie County (Wyoming),56043.0,1,Total,3,Both,28,<1 year,2003,1,Number,77.543467,78.261629,76.803990


#### Each state and a county has a unique fips number of its own. The fips number of state are either single or two digits.
#### The fips number of county are greater than or equal to 4 digits.
#### Information about the fips can be found on the internet.

In [5]:
## As the dataframe consists of life expectencey at the state level as well
## lets gather only those with county, since fips for state end at 56, will set the condition to be greater than this to get the data at the county level.

df4=df3.loc[(df3['fips'] > 60)]
df4

Unnamed: 0,measure_id,measure_name,location_id,location_name,fips,race_id,race_name,sex_id,sex_name,age_group_id,age_name,year,metric_id,metric_name,val,upper,lower
228,26,Life expectancy,614,Autauga County (Alabama),1001.0,1,Total,3,Both,28,<1 year,2003,1,Number,74.628765,75.048044,74.217372
234,26,Life expectancy,637,Baldwin County (Alabama),1003.0,1,Total,3,Both,28,<1 year,2003,1,Number,76.661419,76.917529,76.416103
240,26,Life expectancy,624,Barbour County (Alabama),1005.0,1,Total,3,Both,28,<1 year,2003,1,Number,74.047811,74.496751,73.586544
246,26,Life expectancy,603,Bibb County (Alabama),1007.0,1,Total,3,Both,28,<1 year,2003,1,Number,73.057987,73.643055,72.502644
252,26,Life expectancy,588,Blount County (Alabama),1009.0,1,Total,3,Both,28,<1 year,2003,1,Number,75.053119,75.408324,74.692248
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
363426,26,Life expectancy,3712,Sweetwater County (Wyoming),56037.0,1,Total,3,Both,28,<1 year,2003,1,Number,76.282640,76.697873,75.845954
363432,26,Life expectancy,3697,Teton County (Wyoming),56039.0,1,Total,3,Both,28,<1 year,2003,1,Number,82.121084,82.802580,81.531829
363438,26,Life expectancy,3714,Uinta County (Wyoming),56041.0,1,Total,3,Both,28,<1 year,2003,1,Number,76.259355,76.913164,75.626777
363444,26,Life expectancy,3700,Washakie County (Wyoming),56043.0,1,Total,3,Both,28,<1 year,2003,1,Number,77.543467,78.261629,76.803990


In [6]:
## lets get the column names to delete unnecessary columns. 

col_names=list(df4.columns)
col_names

['measure_id',
 'measure_name',
 'location_id',
 'location_name',
 'fips',
 'race_id',
 'race_name',
 'sex_id',
 'sex_name',
 'age_group_id',
 'age_name',
 'year',
 'metric_id',
 'metric_name',
 'val',
 'upper',
 'lower']

#### The mean life expectency will be our Target variable which we will predict using a series of predictor variables. 
#### These predictor variables include :::
#### (a) Weather and meteorological data
#### (b) Demographics
#### (c) Livestock data

In [7]:
## lets delete these columns. 
df5 = df4.drop(['measure_id',
 'location_id',              
 'measure_name',
 'race_id',
 'race_name',
 'sex_id',
 'sex_name',
 'age_group_id',
 'age_name',
 'metric_id',
 'metric_name',
 'upper',
 'lower'], axis=1)

## lets rename the column val to mean life expectency. 
df5 = df5.rename(columns={'val': 'MeanLifeExpectency'})
df5


Unnamed: 0,location_name,fips,year,MeanLifeExpectency
228,Autauga County (Alabama),1001.0,2003,74.628765
234,Baldwin County (Alabama),1003.0,2003,76.661419
240,Barbour County (Alabama),1005.0,2003,74.047811
246,Bibb County (Alabama),1007.0,2003,73.057987
252,Blount County (Alabama),1009.0,2003,75.053119
...,...,...,...,...
363426,Sweetwater County (Wyoming),56037.0,2003,76.282640
363432,Teton County (Wyoming),56039.0,2003,82.121084
363438,Uinta County (Wyoming),56041.0,2003,76.259355
363444,Washakie County (Wyoming),56043.0,2003,77.543467


In [8]:
df5.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3127 entries, 228 to 363450
Data columns (total 4 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   location_name       3127 non-null   object 
 1   fips                3127 non-null   float64
 2   year                3127 non-null   int64  
 3   MeanLifeExpectency  3127 non-null   float64
dtypes: float64(2), int64(1), object(1)
memory usage: 122.1+ KB


In [9]:
df5.describe()

Unnamed: 0,fips,year,MeanLifeExpectency
count,3127.0,3127.0,3127.0
mean,30297.793092,2003.0,76.335131
std,15250.52008,0.0,2.224229
min,1001.0,2003.0,65.461621
25%,18158.0,2003.0,74.80749
50%,29149.0,2003.0,76.471748
75%,45082.0,2003.0,77.949523
max,56045.0,2003.0,87.032539


In [10]:
df5.to_csv('LE_2003.csv')