# The Associated Press and Life Expectancy

**Story:** [AP analysis: Unemployment, income affect life expectancy](https://www.apnews.com/66ac44186b6249709501f07a7eab36da)

**Author:** Nicky Forster, Associated Press

**Topics:** Census Data, Linear Regression

**Datasets**

* **R12221544_SL140.csv:** ACS 2015 5-year, tract level, from [Social Explorer](https://www.socialexplorer.com)
    - Table B23025: Employment Status
    - **R12221544.txt** is the data dictionary
* **R12221544_SL140.csv:** ACS 2015 5-year, tract level, from [Social Explorer](https://www.socialexplorer.com)
    - Table B23025: Employment Status
    - Table B06009: Educational Attainment
    - Table B03002: Race
    - Table B19013: Median income
    - Table C17002: Ratio of income to poverty level
    - **R12221544.txt** is the data dictionary
* **US_A.CSV:** life expectancy by census tract, from [USALEEP](https://www.cdc.gov/nchs/nvss/usaleep/usaleep.html)
    - **Record_Layout_CensusTract_Life_Expectancy.pdf** is data dictionary

# What's the story?

We're trying to figure out how the **life expectancy in a census tract** is related to other factors like unemployment, income, and others.

# PREPWORK BONUS!

Download the data yourself from Social Explorer and USALEEP (linked above) instead of relying on the data included.

In [14]:
import pandas as pd

pd.set_option('display.max_columns', 500)

import statsmodels.api as sm


## Reading in our data

### Read in `USA_A.CSV`

Rename any columns with weird or not-understandable names as something more descriptive.

In [2]:
life = pd.read_csv('data/US_A.CSV')

life = life.rename(columns={"e(0)": "life_expectancy", "se(e(0))":"life_expect_se"})

life.head(5)

Unnamed: 0,Tract ID,STATE2KX,CNTY2KX,TRACT2KX,life_expectancy,life_expect_se,Abridged life table flag
0,1001020100,1,1,20100,73.1,2.2348,3
1,1001020200,1,1,20200,76.9,3.3453,3
2,1001020400,1,1,20400,75.4,1.0216,3
3,1001020500,1,1,20500,79.4,1.1768,1
4,1001020600,1,1,20600,73.1,1.5519,3


### Open `R12221544_SL140.csv`

You'll need to give an option to `pd.read_csv` to make sure it's read in successfully.

In [15]:
sl = pd.read_csv('data/R12221544_SL140.csv', encoding="Latin-1")
sl.head(5)

Unnamed: 0,Geo_FIPS,Geo_GEOID,Geo_NAME,Geo_QName,Geo_STUSAB,Geo_SUMLEV,Geo_GEOCOMP,Geo_FILEID,Geo_LOGRECNO,Geo_US,Geo_REGION,Geo_DIVISION,Geo_STATECE,Geo_STATE,Geo_COUNTY,Geo_COUSUB,Geo_PLACE,Geo_PLACESE,Geo_TRACT,Geo_BLKGRP,Geo_CONCIT,Geo_AIANHH,Geo_AIANHHFP,Geo_AIHHTLI,Geo_AITSCE,Geo_AITS,Geo_ANRC,Geo_CBSA,Geo_CSA,Geo_METDIV,Geo_MACC,Geo_MEMI,Geo_NECTA,Geo_CNECTA,Geo_NECTADIV,Geo_UA,Geo_UACP,Geo_CDCURR,Geo_SLDU,Geo_SLDL,Geo_VTD,Geo_ZCTA3,Geo_ZCTA5,Geo_SUBMCD,Geo_SDELM,Geo_SDSEC,Geo_SDUNI,Geo_UR,Geo_PCI,Geo_TAZ,Geo_UGA,Geo_BTTR,Geo_BTBG,Geo_PUMA5,Geo_PUMA1,ACS15_5yr_B23025001,ACS15_5yr_B23025002,ACS15_5yr_B23025003,ACS15_5yr_B23025004,ACS15_5yr_B23025005,ACS15_5yr_B23025006,ACS15_5yr_B23025007,ACS15_5yr_B23025001s,ACS15_5yr_B23025002s,ACS15_5yr_B23025003s,ACS15_5yr_B23025004s,ACS15_5yr_B23025005s,ACS15_5yr_B23025006s,ACS15_5yr_B23025007s
0,1001020100,14000US01001020100,"Census Tract 201, Autauga County, Alabama","Census Tract 201, Autauga County, Alabama",al,140,0,ACSSF,1760,,,,,1,1,,,,20100,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1554,997,997,943,54,0,557,92.121212,85.454545,85.454545,83.636364,18.787879,6.666667,67.878788
1,1001020200,14000US01001020200,"Census Tract 202, Autauga County, Alabama","Census Tract 202, Autauga County, Alabama",al,140,0,ACSSF,1761,,,,,1,1,,,,20200,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1731,884,869,753,116,15,847,143.030303,115.151515,114.545455,107.272727,38.181818,14.545455,86.666667
2,1001020300,14000US01001020300,"Census Tract 203, Autauga County, Alabama","Census Tract 203, Autauga County, Alabama",al,140,0,ACSSF,1762,,,,,1,1,,,,20300,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2462,1472,1464,1373,91,8,990,169.090909,132.121212,134.545455,123.030303,31.515152,8.484848,120.606061
3,1001020400,14000US01001020400,"Census Tract 204, Autauga County, Alabama","Census Tract 204, Autauga County, Alabama",al,140,0,ACSSF,1763,,,,,1,1,,,,20400,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3424,2013,1998,1782,216,15,1411,197.575758,157.575758,161.818182,132.121212,58.787879,14.545455,127.878788
4,1001020500,14000US01001020500,"Census Tract 205, Autauga County, Alabama","Census Tract 205, Autauga County, Alabama",al,140,0,ACSSF,1764,,,,,1,1,,,,20500,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,8198,5461,5258,5037,221,203,2737,321.818182,339.393939,356.969697,369.090909,89.090909,103.030303,273.939394


#### Filter out any columns we aren't interestd in

In [4]:
sl.info()
#KEEP GEO FIPS

sl = sl.drop(['Geo_GEOID','Geo_GEOID','Geo_NAME','Geo_QName','Geo_STUSAB','Geo_SUMLEV','Geo_GEOCOMP','Geo_FILEID'], axis=1)
sl = sl.drop(['Geo_LOGRECNO','Geo_US','Geo_REGION','Geo_DIVISION','Geo_STATECE','Geo_STATE','Geo_COUNTY'], axis=1)
sl = sl.drop(['Geo_COUSUB','Geo_PLACE','Geo_PLACESE','Geo_TRACT','Geo_BLKGRP','Geo_CONCIT','Geo_AIANHH'], axis =1)
sl = sl.drop(['Geo_AIANHHFP','Geo_AIHHTLI','Geo_AITSCE','Geo_AITS','Geo_ANRC','Geo_CBSA','Geo_CSA','Geo_METDIV'], axis=1)
sl = sl.drop(['Geo_MACC','Geo_MEMI','Geo_NECTA','Geo_CNECTA','Geo_NECTADIV','Geo_UA','Geo_UACP','Geo_CDCURR'], axis=1)
sl = sl.drop(['Geo_SLDU','Geo_SLDL','Geo_VTD','Geo_ZCTA3','Geo_ZCTA5','Geo_SUBMCD','Geo_SDELM','Geo_SDSEC'], axis=1)
sl = sl.drop(['Geo_SDUNI','Geo_UR','Geo_PCI','Geo_TAZ','Geo_UGA','Geo_BTTR','Geo_BTBG','Geo_PUMA5','Geo_PUMA1'], axis=1)

sl.head(5)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 74001 entries, 0 to 74000
Data columns (total 69 columns):
Geo_FIPS                74001 non-null int64
Geo_GEOID               74001 non-null object
Geo_NAME                74001 non-null object
Geo_QName               74001 non-null object
Geo_STUSAB              74001 non-null object
Geo_SUMLEV              74001 non-null int64
Geo_GEOCOMP             74001 non-null int64
Geo_FILEID              74001 non-null object
Geo_LOGRECNO            74001 non-null int64
Geo_US                  0 non-null float64
Geo_REGION              0 non-null float64
Geo_DIVISION            0 non-null float64
Geo_STATECE             0 non-null float64
Geo_STATE               74001 non-null int64
Geo_COUNTY              74001 non-null int64
Geo_COUSUB              0 non-null float64
Geo_PLACE               0 non-null float64
Geo_PLACESE             0 non-null float64
Geo_TRACT               74001 non-null int64
Geo_BLKGRP              0 non-null float64
Ge

Unnamed: 0,Geo_FIPS,ACS15_5yr_B23025001,ACS15_5yr_B23025002,ACS15_5yr_B23025003,ACS15_5yr_B23025004,ACS15_5yr_B23025005,ACS15_5yr_B23025006,ACS15_5yr_B23025007,ACS15_5yr_B23025001s,ACS15_5yr_B23025002s,ACS15_5yr_B23025003s,ACS15_5yr_B23025004s,ACS15_5yr_B23025005s,ACS15_5yr_B23025006s,ACS15_5yr_B23025007s
0,1001020100,1554,997,997,943,54,0,557,92.121212,85.454545,85.454545,83.636364,18.787879,6.666667,67.878788
1,1001020200,1731,884,869,753,116,15,847,143.030303,115.151515,114.545455,107.272727,38.181818,14.545455,86.666667
2,1001020300,2462,1472,1464,1373,91,8,990,169.090909,132.121212,134.545455,123.030303,31.515152,8.484848,120.606061
3,1001020400,3424,2013,1998,1782,216,15,1411,197.575758,157.575758,161.818182,132.121212,58.787879,14.545455,127.878788
4,1001020500,8198,5461,5258,5037,221,203,2737,321.818182,339.393939,356.969697,369.090909,89.090909,103.030303,273.939394


#### Create a new column for percent unemployment

We'll be using the total population in the census tract as the baseline for employment.

In [5]:
# number unemployed
#'ACS15_5yr_B23025005',
#total
#'ACS15_5yr_B23025001',

sl["Unemployment"] = (sl['ACS15_5yr_B23025005']/sl['ACS15_5yr_B23025001'])*100
sl.head(5)

Unnamed: 0,Geo_FIPS,ACS15_5yr_B23025001,ACS15_5yr_B23025002,ACS15_5yr_B23025003,ACS15_5yr_B23025004,ACS15_5yr_B23025005,ACS15_5yr_B23025006,ACS15_5yr_B23025007,ACS15_5yr_B23025001s,ACS15_5yr_B23025002s,ACS15_5yr_B23025003s,ACS15_5yr_B23025004s,ACS15_5yr_B23025005s,ACS15_5yr_B23025006s,ACS15_5yr_B23025007s,Unemployment
0,1001020100,1554,997,997,943,54,0,557,92.121212,85.454545,85.454545,83.636364,18.787879,6.666667,67.878788,3.474903
1,1001020200,1731,884,869,753,116,15,847,143.030303,115.151515,114.545455,107.272727,38.181818,14.545455,86.666667,6.701329
2,1001020300,2462,1472,1464,1373,91,8,990,169.090909,132.121212,134.545455,123.030303,31.515152,8.484848,120.606061,3.696182
3,1001020400,3424,2013,1998,1782,216,15,1411,197.575758,157.575758,161.818182,132.121212,58.787879,14.545455,127.878788,6.308411
4,1001020500,8198,5461,5258,5037,221,203,2737,321.818182,339.393939,356.969697,369.090909,89.090909,103.030303,273.939394,2.695779


## Merging the data

Merge the dataframes together based on their census tract.

In [6]:
merged = life.merge(sl, c)
merged.head(5)

Unnamed: 0,Tract ID,STATE2KX,CNTY2KX,TRACT2KX,life_expectancy,life_expect_se,Abridged life table flag,Geo_FIPS,ACS15_5yr_B23025001,ACS15_5yr_B23025002,...,ACS15_5yr_B23025006,ACS15_5yr_B23025007,ACS15_5yr_B23025001s,ACS15_5yr_B23025002s,ACS15_5yr_B23025003s,ACS15_5yr_B23025004s,ACS15_5yr_B23025005s,ACS15_5yr_B23025006s,ACS15_5yr_B23025007s,Unemployment
0,1001020100,1,1,20100,73.1,2.2348,3,1001020100,1554,997,...,0,557,92.121212,85.454545,85.454545,83.636364,18.787879,6.666667,67.878788,3.474903
1,1001020200,1,1,20200,76.9,3.3453,3,1001020200,1731,884,...,15,847,143.030303,115.151515,114.545455,107.272727,38.181818,14.545455,86.666667,6.701329
2,1001020400,1,1,20400,75.4,1.0216,3,1001020400,3424,2013,...,15,1411,197.575758,157.575758,161.818182,132.121212,58.787879,14.545455,127.878788,6.308411
3,1001020500,1,1,20500,79.4,1.1768,1,1001020500,8198,5461,...,203,2737,321.818182,339.393939,356.969697,369.090909,89.090909,103.030303,273.939394,2.695779
4,1001020600,1,1,20600,73.1,1.5519,3,1001020600,2855,1802,...,52,1053,206.060606,160.606061,158.787879,139.393939,63.030303,30.30303,151.515152,6.654991


## Running the regression

Using the `statsmodels` package, run a linear regression to find the coefficient relating unemployment and life expectancy.

In [7]:
merged.shape

(65662, 23)

In [35]:
#how the life expectancy in a census tract is related to other factors like unemployment, income, and others.
#ind: unemployment, income, etc
#dep: life expectancy

import statsmodels.api as sm

X = merged[['Unemployment']]
X = sm.add_constant(X)
y = merged['life_expectancy']

model = sm.OLS(y, X)
result = model.fit()
result.summary()

0,1,2,3
Dep. Variable:,life_expectancy,R-squared:,0.169
Model:,OLS,Adj. R-squared:,0.169
Method:,Least Squares,F-statistic:,13360.0
Date:,"Sat, 20 Jul 2019",Prob (F-statistic):,0.0
Time:,23:46:10,Log-Likelihood:,-178100.0
No. Observations:,65662,AIC:,356200.0
Df Residuals:,65660,BIC:,356200.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,81.1377,0.028,2856.410,0.000,81.082,81.193
Unemployment,-0.5214,0.005,-115.595,0.000,-0.530,-0.513

0,1,2,3
Omnibus:,616.108,Durbin-Watson:,1.117
Prob(Omnibus):,0.0,Jarque-Bera (JB):,807.895
Skew:,-0.146,Prob(JB):,3.7e-176
Kurtosis:,3.459,Cond. No.,12.8


Translate that into the form **"every X percentage point change in unemployment translates to a Y change in life expectancy"**

In [9]:
# Every +1 percent point change in the unemployment population translates to a drop in -0.5 year change in life expectancy

## Bringing more columns into the mix

Only dealing with unemployment seems kind of narrow-minded, let's expand our reach a bit.

### Read in `R12221550`

It's also from the Census, and has many, many, many more columns available to you compared to the list dataset.

In [26]:
al = pd.read_csv('data/R12221550_SL140.csv', encoding="Latin-1")

al.head(5)

Unnamed: 0,Geo_FIPS,Geo_GEOID,Geo_NAME,Geo_QName,Geo_STUSAB,Geo_SUMLEV,Geo_GEOCOMP,Geo_FILEID,Geo_LOGRECNO,Geo_US,Geo_REGION,Geo_DIVISION,Geo_STATECE,Geo_STATE,Geo_COUNTY,Geo_COUSUB,Geo_PLACE,Geo_PLACESE,Geo_TRACT,Geo_BLKGRP,Geo_CONCIT,Geo_AIANHH,Geo_AIANHHFP,Geo_AIHHTLI,Geo_AITSCE,Geo_AITS,Geo_ANRC,Geo_CBSA,Geo_CSA,Geo_METDIV,Geo_MACC,Geo_MEMI,Geo_NECTA,Geo_CNECTA,Geo_NECTADIV,Geo_UA,Geo_UACP,Geo_CDCURR,Geo_SLDU,Geo_SLDL,Geo_VTD,Geo_ZCTA3,Geo_ZCTA5,Geo_SUBMCD,Geo_SDELM,Geo_SDSEC,Geo_SDUNI,Geo_UR,Geo_PCI,Geo_TAZ,Geo_UGA,Geo_BTTR,Geo_BTBG,Geo_PUMA5,Geo_PUMA1,ACS15_5yr_B03002001,ACS15_5yr_B03002002,ACS15_5yr_B03002003,ACS15_5yr_B03002004,ACS15_5yr_B03002005,ACS15_5yr_B03002006,ACS15_5yr_B03002007,ACS15_5yr_B03002008,ACS15_5yr_B03002009,ACS15_5yr_B03002010,ACS15_5yr_B03002011,ACS15_5yr_B03002012,ACS15_5yr_B03002013,ACS15_5yr_B03002014,ACS15_5yr_B03002015,ACS15_5yr_B03002016,ACS15_5yr_B03002017,ACS15_5yr_B03002018,ACS15_5yr_B03002019,ACS15_5yr_B03002020,ACS15_5yr_B03002021,ACS15_5yr_B03002001s,ACS15_5yr_B03002002s,ACS15_5yr_B03002003s,ACS15_5yr_B03002004s,ACS15_5yr_B03002005s,ACS15_5yr_B03002006s,ACS15_5yr_B03002007s,ACS15_5yr_B03002008s,ACS15_5yr_B03002009s,ACS15_5yr_B03002010s,ACS15_5yr_B03002011s,ACS15_5yr_B03002012s,ACS15_5yr_B03002013s,ACS15_5yr_B03002014s,ACS15_5yr_B03002015s,ACS15_5yr_B03002016s,ACS15_5yr_B03002017s,ACS15_5yr_B03002018s,ACS15_5yr_B03002019s,ACS15_5yr_B03002020s,ACS15_5yr_B03002021s,ACS15_5yr_B06009001,ACS15_5yr_B06009002,ACS15_5yr_B06009003,ACS15_5yr_B06009004,ACS15_5yr_B06009005,ACS15_5yr_B06009006,ACS15_5yr_B06009007,ACS15_5yr_B06009008,ACS15_5yr_B06009009,ACS15_5yr_B06009010,ACS15_5yr_B06009011,ACS15_5yr_B06009012,ACS15_5yr_B06009013,ACS15_5yr_B06009014,ACS15_5yr_B06009015,ACS15_5yr_B06009016,ACS15_5yr_B06009017,ACS15_5yr_B06009018,ACS15_5yr_B06009019,ACS15_5yr_B06009020,ACS15_5yr_B06009021,ACS15_5yr_B06009022,ACS15_5yr_B06009023,ACS15_5yr_B06009024,ACS15_5yr_B06009025,ACS15_5yr_B06009026,ACS15_5yr_B06009027,ACS15_5yr_B06009028,ACS15_5yr_B06009029,ACS15_5yr_B06009030,ACS15_5yr_B06009001s,ACS15_5yr_B06009002s,ACS15_5yr_B06009003s,ACS15_5yr_B06009004s,ACS15_5yr_B06009005s,ACS15_5yr_B06009006s,ACS15_5yr_B06009007s,ACS15_5yr_B06009008s,ACS15_5yr_B06009009s,ACS15_5yr_B06009010s,ACS15_5yr_B06009011s,ACS15_5yr_B06009012s,ACS15_5yr_B06009013s,ACS15_5yr_B06009014s,ACS15_5yr_B06009015s,ACS15_5yr_B06009016s,ACS15_5yr_B06009017s,ACS15_5yr_B06009018s,ACS15_5yr_B06009019s,ACS15_5yr_B06009020s,ACS15_5yr_B06009021s,ACS15_5yr_B06009022s,ACS15_5yr_B06009023s,ACS15_5yr_B06009024s,ACS15_5yr_B06009025s,ACS15_5yr_B06009026s,ACS15_5yr_B06009027s,ACS15_5yr_B06009028s,ACS15_5yr_B06009029s,ACS15_5yr_B06009030s,ACS15_5yr_C17002001,ACS15_5yr_C17002002,ACS15_5yr_C17002003,ACS15_5yr_C17002004,ACS15_5yr_C17002005,ACS15_5yr_C17002006,ACS15_5yr_C17002007,ACS15_5yr_C17002008,ACS15_5yr_C17002001s,ACS15_5yr_C17002002s,ACS15_5yr_C17002003s,ACS15_5yr_C17002004s,ACS15_5yr_C17002005s,ACS15_5yr_C17002006s,ACS15_5yr_C17002007s,ACS15_5yr_C17002008s,ACS15_5yr_B19013001,ACS15_5yr_B19013001s,ACS15_5yr_B23025001,ACS15_5yr_B23025002,ACS15_5yr_B23025003,ACS15_5yr_B23025004,ACS15_5yr_B23025005,ACS15_5yr_B23025006,ACS15_5yr_B23025007,ACS15_5yr_B23025001s,ACS15_5yr_B23025002s,ACS15_5yr_B23025003s,ACS15_5yr_B23025004s,ACS15_5yr_B23025005s,ACS15_5yr_B23025006s,ACS15_5yr_B23025007s
0,1001020100,14000US01001020100,"Census Tract 201, Autauga County, Alabama","Census Tract 201, Autauga County, Alabama",al,140,0,ACSSF,1760,,,,,1,1,,,,20100,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1948,1931,1703,150,6,12,0,0,60,0,60,17,17,0,0,0,0,0,0,0,0,123.030303,128.484848,138.787879,76.363636,4.848485,9.69697,6.666667,6.666667,26.666667,6.666667,26.666667,12.727273,12.727273,6.666667,6.666667,6.666667,6.666667,6.666667,6.666667,6.666667,6.666667,1243.0,184.0,459.0,258.0,166.0,176.0,711.0,118.0,262.0,156.0,84.0,91.0,468.0,44.0,193.0,76.0,82.0,73.0,31.0,5.0,0.0,26.0,0.0,0.0,33.0,17.0,4.0,0.0,0.0,12.0,81.818182,44.242424,80.606061,40.606061,36.969697,42.424242,70.909091,30.909091,62.424242,36.363636,32.727273,26.666667,67.878788,24.848485,51.515152,17.575758,23.636364,24.242424,18.787879,5.454545,6.666667,17.575758,6.666667,6.666667,18.181818,12.727273,4.242424,6.666667,6.666667,11.515152,1948,26,132,81,101,125,16,1467,123.030303,18.787879,60.606061,40.606061,58.181818,60.0,10.909091,127.272727,61838.0,7212.121212,1554,997,997,943,54,0,557,92.121212,85.454545,85.454545,83.636364,18.787879,6.666667,67.878788
1,1001020200,14000US01001020200,"Census Tract 202, Autauga County, Alabama","Census Tract 202, Autauga County, Alabama",al,140,0,ACSSF,1761,,,,,1,1,,,,20200,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2156,2139,872,1149,0,50,0,0,68,0,68,17,14,0,0,0,0,3,0,0,0,162.424242,162.424242,125.454545,151.515152,6.666667,36.969697,6.666667,6.666667,37.575758,6.666667,37.575758,15.151515,13.939394,6.666667,6.666667,6.666667,6.666667,4.242424,6.666667,6.666667,6.666667,1397.0,356.0,496.0,342.0,133.0,70.0,1102.0,295.0,391.0,275.0,94.0,47.0,243.0,43.0,86.0,58.0,33.0,23.0,9.0,0.0,0.0,9.0,0.0,0.0,43.0,18.0,19.0,0.0,6.0,0.0,101.212121,69.69697,72.121212,47.272727,29.090909,21.212121,95.151515,68.484848,69.090909,46.060606,24.242424,16.969697,33.939394,22.424242,26.060606,20.0,13.939394,9.69697,9.090909,6.666667,6.666667,9.090909,6.666667,6.666667,28.484848,18.181818,20.0,6.666667,4.848485,6.666667,1983,185,320,232,58,34,25,1129,155.151515,110.909091,74.545455,88.484848,25.454545,18.181818,16.969697,144.848485,32303.0,8204.848485,1731,884,869,753,116,15,847,143.030303,115.151515,114.545455,107.272727,38.181818,14.545455,86.666667
2,1001020300,14000US01001020300,"Census Tract 203, Autauga County, Alabama","Census Tract 203, Autauga County, Alabama",al,140,0,ACSSF,1762,,,,,1,1,,,,20300,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2968,2968,2212,551,15,41,8,0,141,0,141,0,0,0,0,0,0,0,0,0,0,244.848485,244.848485,225.454545,115.151515,13.333333,37.575758,8.484848,6.666667,81.818182,6.666667,81.818182,6.666667,6.666667,6.666667,6.666667,6.666667,6.666667,6.666667,6.666667,6.666667,6.666667,2074.0,221.0,747.0,674.0,240.0,192.0,1330.0,119.0,548.0,414.0,135.0,114.0,683.0,87.0,184.0,252.0,82.0,78.0,31.0,0.0,0.0,8.0,23.0,0.0,30.0,15.0,15.0,0.0,0.0,0.0,154.545455,53.333333,97.575758,101.212121,49.090909,46.060606,123.636364,35.151515,76.363636,92.727273,34.545455,36.363636,112.121212,34.545455,59.393939,64.848485,33.939394,28.484848,20.0,6.666667,6.666667,8.484848,17.575758,6.666667,26.666667,16.969697,13.939394,6.666667,6.666667,6.666667,2968,164,213,148,207,82,520,1634,244.848485,138.181818,70.30303,60.606061,78.181818,39.393939,189.090909,175.151515,44922.0,3411.515152,2462,1472,1464,1373,91,8,990,169.090909,132.121212,134.545455,123.030303,31.515152,8.484848,120.606061
3,1001020400,14000US01001020400,"Census Tract 204, Autauga County, Alabama","Census Tract 204, Autauga County, Alabama",al,140,0,ACSSF,1763,,,,,1,1,,,,20400,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4423,3959,3662,162,69,0,0,48,18,5,13,464,30,42,0,0,0,372,20,20,0,298.787879,213.939394,207.878788,80.606061,49.090909,6.666667,6.666667,49.69697,10.30303,4.848485,9.69697,264.848485,20.0,30.30303,6.666667,6.666667,6.666667,276.363636,17.575758,17.575758,6.666667,2899.0,339.0,1044.0,806.0,453.0,257.0,1623.0,154.0,605.0,458.0,305.0,101.0,1107.0,106.0,410.0,301.0,148.0,142.0,36.0,0.0,6.0,30.0,0.0,0.0,133.0,79.0,23.0,17.0,0.0,14.0,156.363636,78.787879,117.575758,97.575758,76.363636,51.515152,129.69697,43.030303,97.575758,72.727273,67.878788,27.272727,107.272727,36.969697,89.090909,56.969697,38.787879,37.575758,17.575758,6.666667,6.060606,16.363636,6.666667,6.666667,58.181818,60.0,17.575758,12.121212,6.666667,12.727273,4423,18,74,141,182,583,201,3224,298.787879,17.575758,41.818182,53.333333,58.181818,188.484848,140.0,331.515152,54329.0,4244.242424,3424,2013,1998,1782,216,15,1411,197.575758,157.575758,161.818182,132.121212,58.787879,14.545455,127.878788
4,1001020500,14000US01001020500,"Census Tract 205, Autauga County, Alabama","Census Tract 205, Autauga County, Alabama",al,140,0,ACSSF,1764,,,,,1,1,,,,20500,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,10763,10683,7368,2674,0,412,0,0,229,49,180,80,80,0,0,0,0,0,0,0,0,378.181818,373.333333,482.424242,449.69697,10.909091,146.666667,10.909091,10.909091,100.606061,44.848485,90.30303,43.030303,43.030303,10.909091,10.909091,10.909091,10.909091,10.909091,10.909091,10.909091,10.909091,6974.0,310.0,1674.0,1999.0,1829.0,1162.0,3243.0,127.0,967.0,832.0,865.0,452.0,3432.0,159.0,573.0,1129.0,884.0,687.0,127.0,0.0,14.0,38.0,52.0,23.0,172.0,24.0,120.0,0.0,28.0,0.0,265.454545,102.424242,223.636364,263.030303,255.151515,216.969697,290.30303,55.757576,192.727273,198.787879,183.636364,127.272727,283.636364,82.424242,120.606061,210.30303,152.121212,157.575758,58.787879,10.909091,14.545455,38.181818,26.666667,23.636364,93.939394,27.272727,69.69697,10.909091,29.090909,10.909091,10563,251,952,256,1064,289,89,7662,369.69697,94.545455,521.212121,113.333333,385.454545,162.424242,52.121212,641.818182,51965.0,4203.030303,8198,5461,5258,5037,221,203,2737,321.818182,339.393939,356.969697,369.090909,89.090909,103.030303,273.939394


Using this census data, create a new dataframe that includes the following columns:

* Percent unemployed
* Percents Black, White, and Hispanic
* Median Income (in increments of 10,000 dollars)
* Percent of the population with less than a high school education
* Percent of the population between 1-1.5x the poverty line

If you have to many any editorial decisions about which columns you choose or how you do your math, please explain them.

## Editorial Decision

The media income is listed as inflation-adjusted dollars and in 2015-figures.
If the question is how life expectancy depends on income, I am not sure it matters which one we are looking at
but if we are looking at current life expectancy "now" we shoudl look at inflation-adjusted income.

percent unemployed B23025005/B23025001
total B03002001
black/white/hisp
B03002004/B03002003/B03002012

median income B19013001
percent pop education B06009002/B06009001

percent pop between 1.15xpovery C17002004 || C17002005 / C17002001

In [27]:
al.info(verbose=True)
al.shape

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 74001 entries, 0 to 74000
Data columns (total 189 columns):
Geo_FIPS                int64
Geo_GEOID               object
Geo_NAME                object
Geo_QName               object
Geo_STUSAB              object
Geo_SUMLEV              int64
Geo_GEOCOMP             int64
Geo_FILEID              object
Geo_LOGRECNO            int64
Geo_US                  float64
Geo_REGION              float64
Geo_DIVISION            float64
Geo_STATECE             float64
Geo_STATE               int64
Geo_COUNTY              int64
Geo_COUSUB              float64
Geo_PLACE               float64
Geo_PLACESE             float64
Geo_TRACT               int64
Geo_BLKGRP              float64
Geo_CONCIT              float64
Geo_AIANHH              float64
Geo_AIANHHFP            float64
Geo_AIHHTLI             float64
Geo_AITSCE              float64
Geo_AITS                float64
Geo_ANRC                float64
Geo_CBSA                float64
Geo_CSA      

(74001, 189)

In [28]:
al = al.rename(columns={"ACS15_5yr_B03002001": "total_pop", 
                        "ACS15_5yr_B03002004":"black",
                        "ACS15_5yr_B03002003":"white",
                        "ACS15_5yr_B03002012":"hispanic",
                        "ACS15_5yr_B23025001":"labor",
                        "ACS15_5yr_B23025005":"unemployed",
                        "ACS15_5yr_B19013001":"income",
                        "ACS15_5yr_B06009002":"less_hs",
                        "ACS15_5yr_B06009001":"total_ed",
                        "ACS15_5yr_C17002004":"pov_1",
                        "ACS15_5yr_C17002005":"pov_2",
                        "ACS15_5yr_C17002001":"pop_pov"
                       })
al.head(5)

Unnamed: 0,Geo_FIPS,Geo_GEOID,Geo_NAME,Geo_QName,Geo_STUSAB,Geo_SUMLEV,Geo_GEOCOMP,Geo_FILEID,Geo_LOGRECNO,Geo_US,Geo_REGION,Geo_DIVISION,Geo_STATECE,Geo_STATE,Geo_COUNTY,Geo_COUSUB,Geo_PLACE,Geo_PLACESE,Geo_TRACT,Geo_BLKGRP,Geo_CONCIT,Geo_AIANHH,Geo_AIANHHFP,Geo_AIHHTLI,Geo_AITSCE,Geo_AITS,Geo_ANRC,Geo_CBSA,Geo_CSA,Geo_METDIV,Geo_MACC,Geo_MEMI,Geo_NECTA,Geo_CNECTA,Geo_NECTADIV,Geo_UA,Geo_UACP,Geo_CDCURR,Geo_SLDU,Geo_SLDL,Geo_VTD,Geo_ZCTA3,Geo_ZCTA5,Geo_SUBMCD,Geo_SDELM,Geo_SDSEC,Geo_SDUNI,Geo_UR,Geo_PCI,Geo_TAZ,Geo_UGA,Geo_BTTR,Geo_BTBG,Geo_PUMA5,Geo_PUMA1,total_pop,ACS15_5yr_B03002002,white,black,ACS15_5yr_B03002005,ACS15_5yr_B03002006,ACS15_5yr_B03002007,ACS15_5yr_B03002008,ACS15_5yr_B03002009,ACS15_5yr_B03002010,ACS15_5yr_B03002011,hispanic,ACS15_5yr_B03002013,ACS15_5yr_B03002014,ACS15_5yr_B03002015,ACS15_5yr_B03002016,ACS15_5yr_B03002017,ACS15_5yr_B03002018,ACS15_5yr_B03002019,ACS15_5yr_B03002020,ACS15_5yr_B03002021,ACS15_5yr_B03002001s,ACS15_5yr_B03002002s,ACS15_5yr_B03002003s,ACS15_5yr_B03002004s,ACS15_5yr_B03002005s,ACS15_5yr_B03002006s,ACS15_5yr_B03002007s,ACS15_5yr_B03002008s,ACS15_5yr_B03002009s,ACS15_5yr_B03002010s,ACS15_5yr_B03002011s,ACS15_5yr_B03002012s,ACS15_5yr_B03002013s,ACS15_5yr_B03002014s,ACS15_5yr_B03002015s,ACS15_5yr_B03002016s,ACS15_5yr_B03002017s,ACS15_5yr_B03002018s,ACS15_5yr_B03002019s,ACS15_5yr_B03002020s,ACS15_5yr_B03002021s,total_ed,less_hs,ACS15_5yr_B06009003,ACS15_5yr_B06009004,ACS15_5yr_B06009005,ACS15_5yr_B06009006,ACS15_5yr_B06009007,ACS15_5yr_B06009008,ACS15_5yr_B06009009,ACS15_5yr_B06009010,ACS15_5yr_B06009011,ACS15_5yr_B06009012,ACS15_5yr_B06009013,ACS15_5yr_B06009014,ACS15_5yr_B06009015,ACS15_5yr_B06009016,ACS15_5yr_B06009017,ACS15_5yr_B06009018,ACS15_5yr_B06009019,ACS15_5yr_B06009020,ACS15_5yr_B06009021,ACS15_5yr_B06009022,ACS15_5yr_B06009023,ACS15_5yr_B06009024,ACS15_5yr_B06009025,ACS15_5yr_B06009026,ACS15_5yr_B06009027,ACS15_5yr_B06009028,ACS15_5yr_B06009029,ACS15_5yr_B06009030,ACS15_5yr_B06009001s,ACS15_5yr_B06009002s,ACS15_5yr_B06009003s,ACS15_5yr_B06009004s,ACS15_5yr_B06009005s,ACS15_5yr_B06009006s,ACS15_5yr_B06009007s,ACS15_5yr_B06009008s,ACS15_5yr_B06009009s,ACS15_5yr_B06009010s,ACS15_5yr_B06009011s,ACS15_5yr_B06009012s,ACS15_5yr_B06009013s,ACS15_5yr_B06009014s,ACS15_5yr_B06009015s,ACS15_5yr_B06009016s,ACS15_5yr_B06009017s,ACS15_5yr_B06009018s,ACS15_5yr_B06009019s,ACS15_5yr_B06009020s,ACS15_5yr_B06009021s,ACS15_5yr_B06009022s,ACS15_5yr_B06009023s,ACS15_5yr_B06009024s,ACS15_5yr_B06009025s,ACS15_5yr_B06009026s,ACS15_5yr_B06009027s,ACS15_5yr_B06009028s,ACS15_5yr_B06009029s,ACS15_5yr_B06009030s,pop_pov,ACS15_5yr_C17002002,ACS15_5yr_C17002003,pov_1,pov_2,ACS15_5yr_C17002006,ACS15_5yr_C17002007,ACS15_5yr_C17002008,ACS15_5yr_C17002001s,ACS15_5yr_C17002002s,ACS15_5yr_C17002003s,ACS15_5yr_C17002004s,ACS15_5yr_C17002005s,ACS15_5yr_C17002006s,ACS15_5yr_C17002007s,ACS15_5yr_C17002008s,income,ACS15_5yr_B19013001s,labor,ACS15_5yr_B23025002,ACS15_5yr_B23025003,ACS15_5yr_B23025004,unemployed,ACS15_5yr_B23025006,ACS15_5yr_B23025007,ACS15_5yr_B23025001s,ACS15_5yr_B23025002s,ACS15_5yr_B23025003s,ACS15_5yr_B23025004s,ACS15_5yr_B23025005s,ACS15_5yr_B23025006s,ACS15_5yr_B23025007s
0,1001020100,14000US01001020100,"Census Tract 201, Autauga County, Alabama","Census Tract 201, Autauga County, Alabama",al,140,0,ACSSF,1760,,,,,1,1,,,,20100,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1948,1931,1703,150,6,12,0,0,60,0,60,17,17,0,0,0,0,0,0,0,0,123.030303,128.484848,138.787879,76.363636,4.848485,9.69697,6.666667,6.666667,26.666667,6.666667,26.666667,12.727273,12.727273,6.666667,6.666667,6.666667,6.666667,6.666667,6.666667,6.666667,6.666667,1243.0,184.0,459.0,258.0,166.0,176.0,711.0,118.0,262.0,156.0,84.0,91.0,468.0,44.0,193.0,76.0,82.0,73.0,31.0,5.0,0.0,26.0,0.0,0.0,33.0,17.0,4.0,0.0,0.0,12.0,81.818182,44.242424,80.606061,40.606061,36.969697,42.424242,70.909091,30.909091,62.424242,36.363636,32.727273,26.666667,67.878788,24.848485,51.515152,17.575758,23.636364,24.242424,18.787879,5.454545,6.666667,17.575758,6.666667,6.666667,18.181818,12.727273,4.242424,6.666667,6.666667,11.515152,1948,26,132,81,101,125,16,1467,123.030303,18.787879,60.606061,40.606061,58.181818,60.0,10.909091,127.272727,61838.0,7212.121212,1554,997,997,943,54,0,557,92.121212,85.454545,85.454545,83.636364,18.787879,6.666667,67.878788
1,1001020200,14000US01001020200,"Census Tract 202, Autauga County, Alabama","Census Tract 202, Autauga County, Alabama",al,140,0,ACSSF,1761,,,,,1,1,,,,20200,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2156,2139,872,1149,0,50,0,0,68,0,68,17,14,0,0,0,0,3,0,0,0,162.424242,162.424242,125.454545,151.515152,6.666667,36.969697,6.666667,6.666667,37.575758,6.666667,37.575758,15.151515,13.939394,6.666667,6.666667,6.666667,6.666667,4.242424,6.666667,6.666667,6.666667,1397.0,356.0,496.0,342.0,133.0,70.0,1102.0,295.0,391.0,275.0,94.0,47.0,243.0,43.0,86.0,58.0,33.0,23.0,9.0,0.0,0.0,9.0,0.0,0.0,43.0,18.0,19.0,0.0,6.0,0.0,101.212121,69.69697,72.121212,47.272727,29.090909,21.212121,95.151515,68.484848,69.090909,46.060606,24.242424,16.969697,33.939394,22.424242,26.060606,20.0,13.939394,9.69697,9.090909,6.666667,6.666667,9.090909,6.666667,6.666667,28.484848,18.181818,20.0,6.666667,4.848485,6.666667,1983,185,320,232,58,34,25,1129,155.151515,110.909091,74.545455,88.484848,25.454545,18.181818,16.969697,144.848485,32303.0,8204.848485,1731,884,869,753,116,15,847,143.030303,115.151515,114.545455,107.272727,38.181818,14.545455,86.666667
2,1001020300,14000US01001020300,"Census Tract 203, Autauga County, Alabama","Census Tract 203, Autauga County, Alabama",al,140,0,ACSSF,1762,,,,,1,1,,,,20300,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2968,2968,2212,551,15,41,8,0,141,0,141,0,0,0,0,0,0,0,0,0,0,244.848485,244.848485,225.454545,115.151515,13.333333,37.575758,8.484848,6.666667,81.818182,6.666667,81.818182,6.666667,6.666667,6.666667,6.666667,6.666667,6.666667,6.666667,6.666667,6.666667,6.666667,2074.0,221.0,747.0,674.0,240.0,192.0,1330.0,119.0,548.0,414.0,135.0,114.0,683.0,87.0,184.0,252.0,82.0,78.0,31.0,0.0,0.0,8.0,23.0,0.0,30.0,15.0,15.0,0.0,0.0,0.0,154.545455,53.333333,97.575758,101.212121,49.090909,46.060606,123.636364,35.151515,76.363636,92.727273,34.545455,36.363636,112.121212,34.545455,59.393939,64.848485,33.939394,28.484848,20.0,6.666667,6.666667,8.484848,17.575758,6.666667,26.666667,16.969697,13.939394,6.666667,6.666667,6.666667,2968,164,213,148,207,82,520,1634,244.848485,138.181818,70.30303,60.606061,78.181818,39.393939,189.090909,175.151515,44922.0,3411.515152,2462,1472,1464,1373,91,8,990,169.090909,132.121212,134.545455,123.030303,31.515152,8.484848,120.606061
3,1001020400,14000US01001020400,"Census Tract 204, Autauga County, Alabama","Census Tract 204, Autauga County, Alabama",al,140,0,ACSSF,1763,,,,,1,1,,,,20400,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4423,3959,3662,162,69,0,0,48,18,5,13,464,30,42,0,0,0,372,20,20,0,298.787879,213.939394,207.878788,80.606061,49.090909,6.666667,6.666667,49.69697,10.30303,4.848485,9.69697,264.848485,20.0,30.30303,6.666667,6.666667,6.666667,276.363636,17.575758,17.575758,6.666667,2899.0,339.0,1044.0,806.0,453.0,257.0,1623.0,154.0,605.0,458.0,305.0,101.0,1107.0,106.0,410.0,301.0,148.0,142.0,36.0,0.0,6.0,30.0,0.0,0.0,133.0,79.0,23.0,17.0,0.0,14.0,156.363636,78.787879,117.575758,97.575758,76.363636,51.515152,129.69697,43.030303,97.575758,72.727273,67.878788,27.272727,107.272727,36.969697,89.090909,56.969697,38.787879,37.575758,17.575758,6.666667,6.060606,16.363636,6.666667,6.666667,58.181818,60.0,17.575758,12.121212,6.666667,12.727273,4423,18,74,141,182,583,201,3224,298.787879,17.575758,41.818182,53.333333,58.181818,188.484848,140.0,331.515152,54329.0,4244.242424,3424,2013,1998,1782,216,15,1411,197.575758,157.575758,161.818182,132.121212,58.787879,14.545455,127.878788
4,1001020500,14000US01001020500,"Census Tract 205, Autauga County, Alabama","Census Tract 205, Autauga County, Alabama",al,140,0,ACSSF,1764,,,,,1,1,,,,20500,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,10763,10683,7368,2674,0,412,0,0,229,49,180,80,80,0,0,0,0,0,0,0,0,378.181818,373.333333,482.424242,449.69697,10.909091,146.666667,10.909091,10.909091,100.606061,44.848485,90.30303,43.030303,43.030303,10.909091,10.909091,10.909091,10.909091,10.909091,10.909091,10.909091,10.909091,6974.0,310.0,1674.0,1999.0,1829.0,1162.0,3243.0,127.0,967.0,832.0,865.0,452.0,3432.0,159.0,573.0,1129.0,884.0,687.0,127.0,0.0,14.0,38.0,52.0,23.0,172.0,24.0,120.0,0.0,28.0,0.0,265.454545,102.424242,223.636364,263.030303,255.151515,216.969697,290.30303,55.757576,192.727273,198.787879,183.636364,127.272727,283.636364,82.424242,120.606061,210.30303,152.121212,157.575758,58.787879,10.909091,14.545455,38.181818,26.666667,23.636364,93.939394,27.272727,69.69697,10.909091,29.090909,10.909091,10563,251,952,256,1064,289,89,7662,369.69697,94.545455,521.212121,113.333333,385.454545,162.424242,52.121212,641.818182,51965.0,4203.030303,8198,5461,5258,5037,221,203,2737,321.818182,339.393939,356.969697,369.090909,89.090909,103.030303,273.939394


In [29]:
al['perc_unemp'] = (al.unemployed/al.labor)*100
al['perc_black'] = (al.black/al.total_pop)*100
al['perc_white'] = (al.white/al.total_pop)*100
al['perc_hisp'] = (al.hispanic/al.total_pop)*100
al['perc_hs'] = (al.less_hs/al.total_ed)*100
al['perc_pov'] = ((al.pov_1+al.pov_2)/al.pop_pov)*100

al = al[['Geo_FIPS','perc_unemp','perc_black','perc_white','perc_hisp','perc_hs','perc_pov','total_pop',"black","white","hispanic","labor","unemployed","income","less_hs","total_ed","pov_1","pov_2","pop_pov"]]
al.head(5)

Unnamed: 0,Geo_FIPS,perc_unemp,perc_black,perc_white,perc_hisp,perc_hs,perc_pov,total_pop,black,white,hispanic,labor,unemployed,income,less_hs,total_ed,pov_1,pov_2,pop_pov
0,1001020100,3.474903,7.700205,87.422998,0.872690,14.802896,9.342916,1948,150,1703,17,1554,54,61838.0,184.0,1243.0,81,101,1948
1,1001020200,6.701329,53.293135,40.445269,0.788497,25.483178,14.624307,2156,1149,872,17,1731,116,32303.0,356.0,1397.0,232,58,1983
2,1001020300,3.696182,18.564690,74.528302,0.000000,10.655738,11.960916,2968,551,2212,0,2462,91,44922.0,221.0,2074.0,148,207,2968
3,1001020400,6.308411,3.662672,82.794483,10.490617,11.693687,7.302736,4423,162,3662,464,3424,216,54329.0,339.0,2899.0,141,182,4423
4,1001020500,2.695779,24.844374,68.456750,0.743287,4.445082,12.496450,10763,2674,7368,80,8198,221,51965.0,310.0,6974.0,256,1064,10563
5,1001020600,6.654991,11.918982,72.916126,13.061542,17.487267,10.854324,3851,459,2808,503,2855,190,63092.0,412.0,2356.0,212,206,3851
6,1001020700,7.102273,19.666787,74.538211,3.766751,12.250554,11.952191,2761,543,2058,104,2112,150,34821.0,221.0,1804.0,113,217,2761
7,1001020801,5.359270,10.699718,84.028867,1.255099,7.366985,4.769376,3187,341,2678,40,2519,135,73728.0,162.0,2199.0,136,16,3187
8,1001020802,5.131058,8.401283,89.491525,1.374256,11.875902,2.999175,10915,917,9768,150,8088,415,60063.0,823.0,6930.0,215,112,10903
9,1001020900,4.332291,12.138320,85.497530,0.405787,11.082341,11.805310,5668,688,4846,23,4478,194,41287.0,428.0,3862.0,295,372,5650


### Join your datasets

Combine your life expectancy dataset with this census dataset to create a new dataframe.

In [30]:
merged2 = life.merge(al, left_on = "Tract ID", right_on = "Geo_FIPS")
merged2.head(5)

Unnamed: 0,Tract ID,STATE2KX,CNTY2KX,TRACT2KX,life_expectancy,life_expect_se,Abridged life table flag,Geo_FIPS,perc_unemp,perc_black,perc_white,perc_hisp,perc_hs,perc_pov,total_pop,black,white,hispanic,labor,unemployed,income,less_hs,total_ed,pov_1,pov_2,pop_pov
0,1001020100,1,1,20100,73.1,2.2348,3,1001020100,3.474903,7.700205,87.422998,0.87269,14.802896,9.342916,1948,150,1703,17,1554,54,61838.0,184.0,1243.0,81,101,1948
1,1001020200,1,1,20200,76.9,3.3453,3,1001020200,6.701329,53.293135,40.445269,0.788497,25.483178,14.624307,2156,1149,872,17,1731,116,32303.0,356.0,1397.0,232,58,1983
2,1001020400,1,1,20400,75.4,1.0216,3,1001020400,6.308411,3.662672,82.794483,10.490617,11.693687,7.302736,4423,162,3662,464,3424,216,54329.0,339.0,2899.0,141,182,4423
3,1001020500,1,1,20500,79.4,1.1768,1,1001020500,2.695779,24.844374,68.45675,0.743287,4.445082,12.49645,10763,2674,7368,80,8198,221,51965.0,310.0,6974.0,256,1064,10563
4,1001020600,1,1,20600,73.1,1.5519,3,1001020600,6.654991,11.918982,72.916126,13.061542,17.487267,10.854324,3851,459,2808,503,2855,190,63092.0,412.0,2356.0,212,206,3851


In [31]:
merged2.shape

(65662, 26)

In [49]:
### Also ...

#There were six rows where income is ```NaN```. Considering the size of the dataset, there's no point in keeping those rows.

merged2.isnull().sum()

Geo_FIPS         0
perc_unemp     690
perc_black     690
perc_white     690
perc_hisp      690
perc_hs       1587
perc_pov       835
total_pop        0
black            0
white            0
hispanic         0
labor            0
unemployed       0
income        1062
less_hs        945
total_ed       945
pov_1            0
pov_2            0
pop_pov          0
dtype: int64

In [54]:
merged3 = merged2.dropna(subset=['income'])
merged3.shape

(65656, 26)

## Running your multivariate regression

Using the `statsmodels` package and this new dataframe, run a multivariate linear regression to find the coefficient relating your columns and life expectancy.

In [38]:
#how the life expectancy in a census tract is related to other factors like unemployment, income, and others.
#ind: living below poverty level
#dep: life expectancy

X2 = merged2[['perc_pov']]
X2 = sm.add_constant(X2)
y2 = merged2['life_expectancy']

model = sm.OLS(y2, X2)
result = model.fit()
result.summary()

0,1,2,3
Dep. Variable:,life_expectancy,R-squared:,0.216
Model:,OLS,Adj. R-squared:,0.216
Method:,Least Squares,F-statistic:,18110.0
Date:,"Sat, 20 Jul 2019",Prob (F-statistic):,0.0
Time:,23:48:37,Log-Likelihood:,-176190.0
No. Observations:,65662,AIC:,352400.0
Df Residuals:,65660,BIC:,352400.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,81.3293,0.026,3075.807,0.000,81.278,81.381
perc_pov,-0.3074,0.002,-134.568,0.000,-0.312,-0.303

0,1,2,3
Omnibus:,1157.924,Durbin-Watson:,1.204
Prob(Omnibus):,0.0,Jarque-Bera (JB):,2147.294
Skew:,-0.113,Prob(JB):,0.0
Kurtosis:,3.857,Cond. No.,22.3


In [39]:
#how the life expectancy in a census tract is related to other factors like unemployment, income, and others.
#ind: less than high school education
#dep: life expectancy

X2 = merged2[['perc_hs']]
X2 = sm.add_constant(X2)
y2 = merged2['life_expectancy']

model = sm.OLS(y2, X2)
result = model.fit()
result.summary()

0,1,2,3
Dep. Variable:,life_expectancy,R-squared:,0.161
Model:,OLS,Adj. R-squared:,0.161
Method:,Least Squares,F-statistic:,12580.0
Date:,"Sat, 20 Jul 2019",Prob (F-statistic):,0.0
Time:,23:48:43,Log-Likelihood:,-178430.0
No. Observations:,65662,AIC:,356900.0
Df Residuals:,65660,BIC:,356900.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,80.3587,0.023,3449.582,0.000,80.313,80.404
perc_hs,-0.1447,0.001,-112.175,0.000,-0.147,-0.142

0,1,2,3
Omnibus:,699.416,Durbin-Watson:,1.028
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1181.458
Skew:,-0.048,Prob(JB):,2.8200000000000003e-257
Kurtosis:,3.65,Cond. No.,29.5


In [40]:
#how the life expectancy in a census tract is related to other factors like unemployment, income, and others.
#ind: percent black
#dep: life expectancy

X2 = merged2[['perc_black']]
X2 = sm.add_constant(X2)
y2 = merged2['life_expectancy']

model = sm.OLS(y2, X2)
result = model.fit()
result.summary()

0,1,2,3
Dep. Variable:,life_expectancy,R-squared:,0.193
Model:,OLS,Adj. R-squared:,0.193
Method:,Least Squares,F-statistic:,15710.0
Date:,"Sat, 20 Jul 2019",Prob (F-statistic):,0.0
Time:,23:49:13,Log-Likelihood:,-177140.0
No. Observations:,65662,AIC:,354300.0
Df Residuals:,65660,BIC:,354300.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,79.3755,0.016,4823.655,0.000,79.343,79.408
perc_black,-0.0804,0.001,-125.340,0.000,-0.082,-0.079

0,1,2,3
Omnibus:,576.421,Durbin-Watson:,1.096
Prob(Omnibus):,0.0,Jarque-Bera (JB):,841.991
Skew:,-0.097,Prob(JB):,1.46e-183
Kurtosis:,3.519,Cond. No.,30.1


In [42]:
#how the life expectancy in a census tract is related to other factors like unemployment, income, and others.
#ind: less than high school education
#dep: percent hispanic

X2 = merged2[['perc_hisp']]
X2 = sm.add_constant(X2)
y2 = merged2['life_expectancy']

model = sm.OLS(y2, X2)
result = model.fit()
result.summary()

0,1,2,3
Dep. Variable:,life_expectancy,R-squared:,0.0
Model:,OLS,Adj. R-squared:,0.0
Method:,Least Squares,F-statistic:,17.45
Date:,"Sat, 20 Jul 2019",Prob (F-statistic):,2.96e-05
Time:,23:49:40,Log-Likelihood:,-184180.0
No. Observations:,65662,AIC:,368400.0
Df Residuals:,65660,BIC:,368400.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,78.2466,0.020,4005.718,0.000,78.208,78.285
perc_hisp,0.0030,0.001,4.177,0.000,0.002,0.004

0,1,2,3
Omnibus:,887.859,Durbin-Watson:,0.952
Prob(Omnibus):,0.0,Jarque-Bera (JB):,996.881
Skew:,-0.246,Prob(JB):,3.3900000000000003e-217
Kurtosis:,3.35,Cond. No.,33.6


In [43]:
#how the life expectancy in a census tract is related to other factors like unemployment, income, and others.
#ind: less than high school education
#dep: percent white

X2 = merged2[['perc_white']]
X2 = sm.add_constant(X2)
y2 = merged2['life_expectancy']

model = sm.OLS(y2, X2)
result = model.fit()
result.summary()

0,1,2,3
Dep. Variable:,life_expectancy,R-squared:,0.051
Model:,OLS,Adj. R-squared:,0.051
Method:,Least Squares,F-statistic:,3530.0
Date:,"Sat, 20 Jul 2019",Prob (F-statistic):,0.0
Time:,23:50:02,Log-Likelihood:,-182470.0
No. Observations:,65662,AIC:,364900.0
Df Residuals:,65660,BIC:,365000.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,76.4224,0.035,2183.356,0.000,76.354,76.491
perc_white,0.0299,0.001,59.415,0.000,0.029,0.031

0,1,2,3
Omnibus:,428.751,Durbin-Watson:,0.937
Prob(Omnibus):,0.0,Jarque-Bera (JB):,500.939
Skew:,-0.142,Prob(JB):,1.67e-109
Kurtosis:,3.32,Cond. No.,160.0


In [55]:
#how the life expectancy in a census tract is related to other factors like unemployment, income, and others.
#ind: median income
#dep: life expectancy

X3 = merged3[['income']]
X3 = sm.add_constant(X3)
y3 = merged3['life_expectancy']

model = sm.OLS(y3, X3)
result = model.fit()
result.summary()

0,1,2,3
Dep. Variable:,life_expectancy,R-squared:,0.367
Model:,OLS,Adj. R-squared:,0.367
Method:,Least Squares,F-statistic:,38000.0
Date:,"Sat, 20 Jul 2019",Prob (F-statistic):,0.0
Time:,23:59:21,Log-Likelihood:,-169170.0
No. Observations:,65656,AIC:,338300.0
Df Residuals:,65654,BIC:,338400.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,73.3860,0.028,2613.068,0.000,73.331,73.441
income,8.493e-05,4.36e-07,194.941,0.000,8.41e-05,8.58e-05

0,1,2,3
Omnibus:,1327.431,Durbin-Watson:,1.315
Prob(Omnibus):,0.0,Jarque-Bera (JB):,2897.509
Skew:,0.03,Prob(JB):,0.0
Kurtosis:,4.027,Cond. No.,146000.0


Translate some of your coefficients into the form **"every X percentage point change in unemployment translates to a Y change in life expectancy."** Do this with numbers that are meaningful, and in a way that is easily understandable to your reader.