In this notebook we will use six economic harship variables for 77 Chicago communities (normalized and scaled from 1 to 100) and two five-year periods ending in 2014 and in 2017 to predict the hardship index (HI) and annual homicide count (HOM) in these communities.

The index data which will appear in the scatterplots are as follows:

<ul>
    <li> HI = hardship index </li>
    <li> HOM= homicide count </li>
    <li> UNEMP* = Normalized and scaled % of community age 16 and older who are unemployed. </li>
     <li> NOHS* = Normalized and Scaled % of community age 25 and older without a high school diploma. </li>
         <li> DEP* = Normalized and scaled % of community who are dependent (under age 18 or over age 64). </li>
             <li> HOUS*= Normalized and Scaled % of community with overcrowded housing (more than 1 occupant per room).</li>
                 <li> POV* = Normalized and Scaled % below federal poverty line
    <li> INC* = Normalized and scaled per capita income.</li>
             
 Data Sources: 
 HI
 https://greatcities.uic.edu/wp-content/uploads/2016/07/GCI-Hardship-Index-Fact-SheetV2.pdf (2010-2014) https://greatcities.uic.edu/wp-content/uploads/2019/12/Hardship-Index-Fact-Sheet-2017-ACS-Final-1.pdf (2013-2017).
 
 HOM
 https://data.cityofchicago.org/Public-Safety/Homicides/k9xv-yxzs
 
 

1) We begin by importing Python's data analytics (pandas) and Numerical Python (numpy) libraries. (Press shift+enter to execute each cell)

In [3]:
import pandas as pd
import numpy as np

2) We use pandas (pd) to import the data file 'NHIHOM20142017Regression.xlsx' to a dataframe called "raw_hardship".

In [30]:
raw_hardship=pd.read_excel('NHIHOM20142017Regression.xlsx')
raw_hardship.head(2)

Unnamed: 0,ID,HI14,NUNEMP14,NNOHS14,NDEP14,NHOUS14,NPOV14,NINC14,HI17,NUNEMP17,NNOHS17,NDEP17,NHOUS17,NPOV17,NINC17,HOM14,HOM17
0,1,39.7,14.088398,27.16763,36.337209,43.624161,38.294011,21.434025,39.4,14.678899,29.562044,30.744337,47.368421,33.577982,23.521922,8,4
1,2,44.3,16.022099,28.516378,68.604651,46.979866,26.315789,20.384273,47.3,17.431193,37.591241,68.608414,55.639098,22.93578,22.18737,3,2


3) Let's separate the 2014 and 2017 hardship index (HI) data into two dataframes called "HIHOM14" and "HIHOM17". The column names will reflect the year.

In [31]:
HIHOM14=raw_hardship[["ID","HI14","NUNEMP14","NNOHS14","NDEP14","NHOUS14","NPOV14","NINC14","HOM14"]]
HIHOM14 = HI14.rename(columns = {'ID':'ID14'})
HIHOM17=raw_hardship[["ID","HI17","NUNEMP17","NNOHS17","NDEP17","NHOUS17","NPOV17","NINC17","HOM17"]]
HIHOM17 = HI17.rename(columns = {'ID':'ID17'})

4) Let's check the 2014 data.

In [32]:
HIHOM14.head(2)

Unnamed: 0,ID14,HI14,NUNEMP14,NNOHS14,NDEP14,NHOUS14,NPOV14,NINC14,HOM14
0,1,39.7,14.088398,27.16763,36.337209,43.624161,38.294011,21.434025,8
1,2,44.3,16.022099,28.516378,68.604651,46.979866,26.315789,20.384273,3


5) Let's also check the 2017 data.

In [33]:
HIHOM17.head(2)

Unnamed: 0,ID17,HI17,NUNEMP17,NNOHS17,NDEP17,NHOUS17,NPOV17,NINC17,HOM17
0,1,39.4,14.678899,29.562044,30.744337,47.368421,33.577982,23.521922,4
1,2,47.3,17.431193,37.591241,68.608414,55.639098,22.93578,22.18737,2


6) Separate the independent variables X and dependent variable y.

In [34]:
X14=HIHOM14[["NUNEMP14","NNOHS14","NDEP14","NHOUS14","NPOV14","NINC14"]]
X17=HIHOM17[["NUNEMP17","NNOHS17","NDEP17","NHOUS17","NPOV17","NINC17"]]
yHI14=HIHOM14[["HI14"]]
yHI17=HIHOM17[["HI17"]]
yHOM14=HIHOM14[["HOM14"]]
yHOM17=HIHOM17[["HOM17"]]

7) Make the linear model prediction
y=c0+c1y=c0+c1UNEMP*+c2NOHS*+c3DEP*+c4HOUS*+c5POV*+c6INC*

In [35]:
from sklearn import linear_model
regrHI14 = linear_model.LinearRegression()
regrHI17 = linear_model.LinearRegression()
regrHOM14 = linear_model.LinearRegression()
regrHOM17 = linear_model.LinearRegression()
regrHI14.fit(X14, yHI14)
regrHI17.fit(X17, yHI17)
regrHOM14.fit(X14, yHOM14)
regrHOM17.fit(X17, yHOM17)
print("y=c0+c1UNEMP*+c2NOHS*+c3DEP*+c4HOUS*+c5POV*+c6INC*")
print("Coefficients c1,...,c6 for y=HI14: ",regrHI14.coef_)
print("Coefficients c1,..., c6 for y=HI17: ",regrHI17.coef_)
print("Coefficients c1,..,c6 for y=HOM14: ",regrHOM14.coef_)
print("Coefficients c1,...,c6 for y=HOM17: ",regrHOM17.coef_)

y=c0+c1UNEMP*+c2NOHS*+c3DEP*+c4HOUS*+c5POV*+c6INC*
Coefficients c1,...,c6 for y=HI14:  [[ 0.16639838  0.16621632  0.16683639  0.16667235  0.16707147 -0.16673254]]
Coefficients c1,..., c6 for y=HI17:  [[ 0.16605811  0.16718178  0.16751533  0.16196384  0.1685895  -0.17364809]]
Coefficients c1,..,c6 for y=HOM14:  [[ 0.19335418 -0.13368914 -0.09136798  0.17149934  0.00507071 -0.0487013 ]]
Coefficients c1,...,c6 for y=HOM17:  [[ 0.09189198 -0.02705192 -0.05043781  0.00254789  0.17600118 -0.11415851]]
