#### We will estimate the life expectency of people of age group less than 1 year old. Or superficially, lets say life expectency at birth.

#### Lets first get the dataset of just a single year, so that we can apply the process to a small dataset before applying to a large dataset.

In [1]:
## importing necessary libraries

import os
import pandas as pd

In [2]:
## lets read the data.
## To read the data of a different year, replace the name of the csv file.

DATA_PATH=os.path.join("Life_Expectency_Data", 
                       "IHME_USA_COD_COUNTY_RACE_ETHN_2000_2019_LT_2000_ALL_BOTH_Y2023M06D12.csv")
df1 = pd.read_csv(DATA_PATH)

#### The csv file consists of life expectency of a total of 19 age groups. These age groups include :::
#### <1 year, 1 to 4, 5 to 9, 10 to 14, 15 to 19, 20 to 24, 25 to 29, 30 to 34, 35 to 39, 40 to 44,
#### 45 to 49, 50 to 54, 55 to 59, 60 to 64, 65 to 69, 70 to 74, 75 to 79, 80 to 84 and 85+. <br/><br/>

#### The data is classified into variety of groups. These groups include :::
#### (a) Total
#### (b) Latino
#### (c) Black
#### (d) White
#### (e) AIAN ( Non-Hispanic American Indian Alaska Native)
#### (f) API ( I dont know what it means, as it is not mentioned in the documentation) <br/><br/>

#### To get started, we will only be using the total life expectency and that of less than 1 year old.



In [3]:
## lets extract the total life expectency and only of the age group less than 1 year olds.

df2=df1.loc[(df1['race_name'] == 'Total') & (df1['age_name'] == '<1 year')]
df2

Unnamed: 0,measure_id,measure_name,location_id,location_name,fips,race_id,race_name,sex_id,sex_name,age_group_id,age_name,year,metric_id,metric_name,val,upper,lower
0,26,Life expectancy,102,United States of America,,1,Total,3,Both,28,<1 year,2000,1,Number,76.767298,76.792345,76.743639
114,26,Life expectancy,523,Alabama,1.0,1,Total,3,Both,28,<1 year,2000,1,Number,74.353468,74.443105,74.266502
228,26,Life expectancy,614,Autauga County (Alabama),1001.0,1,Total,3,Both,28,<1 year,2000,1,Number,74.712886,75.310522,74.181133
234,26,Life expectancy,637,Baldwin County (Alabama),1003.0,1,Total,3,Both,28,<1 year,2000,1,Number,76.699950,77.074936,76.347986
240,26,Life expectancy,624,Barbour County (Alabama),1005.0,1,Total,3,Both,28,<1 year,2000,1,Number,74.009403,74.632950,73.436558
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
363426,26,Life expectancy,3712,Sweetwater County (Wyoming),56037.0,1,Total,3,Both,28,<1 year,2000,1,Number,76.180476,76.745672,75.641556
363432,26,Life expectancy,3697,Teton County (Wyoming),56039.0,1,Total,3,Both,28,<1 year,2000,1,Number,81.747642,82.590751,81.013434
363438,26,Life expectancy,3714,Uinta County (Wyoming),56041.0,1,Total,3,Both,28,<1 year,2000,1,Number,76.098530,76.926319,75.319539
363444,26,Life expectancy,3700,Washakie County (Wyoming),56043.0,1,Total,3,Both,28,<1 year,2000,1,Number,77.274227,78.111317,76.425021


In [4]:
## lets remove empty cells

df3=df2.dropna()
df3

Unnamed: 0,measure_id,measure_name,location_id,location_name,fips,race_id,race_name,sex_id,sex_name,age_group_id,age_name,year,metric_id,metric_name,val,upper,lower
114,26,Life expectancy,523,Alabama,1.0,1,Total,3,Both,28,<1 year,2000,1,Number,74.353468,74.443105,74.266502
228,26,Life expectancy,614,Autauga County (Alabama),1001.0,1,Total,3,Both,28,<1 year,2000,1,Number,74.712886,75.310522,74.181133
234,26,Life expectancy,637,Baldwin County (Alabama),1003.0,1,Total,3,Both,28,<1 year,2000,1,Number,76.699950,77.074936,76.347986
240,26,Life expectancy,624,Barbour County (Alabama),1005.0,1,Total,3,Both,28,<1 year,2000,1,Number,74.009403,74.632950,73.436558
246,26,Life expectancy,603,Bibb County (Alabama),1007.0,1,Total,3,Both,28,<1 year,2000,1,Number,73.201889,73.930087,72.493426
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
363426,26,Life expectancy,3712,Sweetwater County (Wyoming),56037.0,1,Total,3,Both,28,<1 year,2000,1,Number,76.180476,76.745672,75.641556
363432,26,Life expectancy,3697,Teton County (Wyoming),56039.0,1,Total,3,Both,28,<1 year,2000,1,Number,81.747642,82.590751,81.013434
363438,26,Life expectancy,3714,Uinta County (Wyoming),56041.0,1,Total,3,Both,28,<1 year,2000,1,Number,76.098530,76.926319,75.319539
363444,26,Life expectancy,3700,Washakie County (Wyoming),56043.0,1,Total,3,Both,28,<1 year,2000,1,Number,77.274227,78.111317,76.425021


#### Each state and a county has a unique fips number of its own. The fips number of state are either single or two digits.
#### The fips number of county are greater than or equal to 4 digits.
#### Information about the fips can be found on the internet.

In [5]:
## As the dataframe consists of life expectencey at the state level as well
## lets gather only those with county, since fips for state end at 56, will set the condition
## to be greater than this to get the data at the county level.

df4=df3.loc[(df3['fips'] > 60)]
df4

Unnamed: 0,measure_id,measure_name,location_id,location_name,fips,race_id,race_name,sex_id,sex_name,age_group_id,age_name,year,metric_id,metric_name,val,upper,lower
228,26,Life expectancy,614,Autauga County (Alabama),1001.0,1,Total,3,Both,28,<1 year,2000,1,Number,74.712886,75.310522,74.181133
234,26,Life expectancy,637,Baldwin County (Alabama),1003.0,1,Total,3,Both,28,<1 year,2000,1,Number,76.699950,77.074936,76.347986
240,26,Life expectancy,624,Barbour County (Alabama),1005.0,1,Total,3,Both,28,<1 year,2000,1,Number,74.009403,74.632950,73.436558
246,26,Life expectancy,603,Bibb County (Alabama),1007.0,1,Total,3,Both,28,<1 year,2000,1,Number,73.201889,73.930087,72.493426
252,26,Life expectancy,588,Blount County (Alabama),1009.0,1,Total,3,Both,28,<1 year,2000,1,Number,75.274624,75.773239,74.771706
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
363426,26,Life expectancy,3712,Sweetwater County (Wyoming),56037.0,1,Total,3,Both,28,<1 year,2000,1,Number,76.180476,76.745672,75.641556
363432,26,Life expectancy,3697,Teton County (Wyoming),56039.0,1,Total,3,Both,28,<1 year,2000,1,Number,81.747642,82.590751,81.013434
363438,26,Life expectancy,3714,Uinta County (Wyoming),56041.0,1,Total,3,Both,28,<1 year,2000,1,Number,76.098530,76.926319,75.319539
363444,26,Life expectancy,3700,Washakie County (Wyoming),56043.0,1,Total,3,Both,28,<1 year,2000,1,Number,77.274227,78.111317,76.425021


In [6]:
## lets get the column names to delete unnecessary columns. 

col_names=list(df4.columns)
col_names

['measure_id',
 'measure_name',
 'location_id',
 'location_name',
 'fips',
 'race_id',
 'race_name',
 'sex_id',
 'sex_name',
 'age_group_id',
 'age_name',
 'year',
 'metric_id',
 'metric_name',
 'val',
 'upper',
 'lower']

#### The mean life expectency will be our Target variable which we will predict using a series of predictor variables. 
#### These predictor variables include :::
#### (a) Weather and meteorological data
#### (b) Demographics
#### (c) Livestock data

In [7]:
## lets delete these columns. 

df5 = df4.drop(['measure_id',
 'location_id',              
 'measure_name',
 'race_id',
 'race_name',
 'sex_id',
 'sex_name',
 'age_group_id',
 'age_name',
 'metric_id',
 'metric_name',
 'upper',
 'lower'], axis=1)

## lets rename the column val to mean life expectency. 

df5 = df5.rename(columns={'val': 'MeanLifeExpectency'})
df5


Unnamed: 0,location_name,fips,year,MeanLifeExpectency
228,Autauga County (Alabama),1001.0,2000,74.712886
234,Baldwin County (Alabama),1003.0,2000,76.699950
240,Barbour County (Alabama),1005.0,2000,74.009403
246,Bibb County (Alabama),1007.0,2000,73.201889
252,Blount County (Alabama),1009.0,2000,75.274624
...,...,...,...,...
363426,Sweetwater County (Wyoming),56037.0,2000,76.180476
363432,Teton County (Wyoming),56039.0,2000,81.747642
363438,Uinta County (Wyoming),56041.0,2000,76.098530
363444,Washakie County (Wyoming),56043.0,2000,77.274227


In [8]:
df5.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3127 entries, 228 to 363450
Data columns (total 4 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   location_name       3127 non-null   object 
 1   fips                3127 non-null   float64
 2   year                3127 non-null   int64  
 3   MeanLifeExpectency  3127 non-null   float64
dtypes: float64(2), int64(1), object(1)
memory usage: 122.1+ KB


In [9]:
df5.describe()

Unnamed: 0,fips,year,MeanLifeExpectency
count,3127.0,3127.0,3127.0
mean,30297.793092,2000.0,76.160477
std,15250.52008,0.0,2.134849
min,1001.0,2000.0,65.326742
25%,18158.0,2000.0,74.701514
50%,29149.0,2000.0,76.284554
75%,45082.0,2000.0,77.688618
max,56045.0,2000.0,86.566399


In [10]:
#df5.to_csv('Single_year_Dataframe.csv')