In this notebook we will use six economic harship variables for 77 Chicago communities (normalized and scaled from 1 to 100) and two five-year periods ending in 2014 and in 2017 to predict the hardship index (HI) and annual homicide count (HOM) in these communities.

The variables are as follows:

<ul>
    <li> HI = hardship index </li>
    <li> HOM= homicide count </li>
    <li> UNEMP* = Normalized and scaled % of community age 16 and older who are unemployed. </li>
     <li> NOHS* = Normalized and Scaled % of community age 25 and older without a high school diploma. </li>
         <li> DEP* = Normalized and scaled % of community who are dependent (under age 18 or over age 64). </li>
             <li> HOUS*= Normalized and Scaled % of community with overcrowded housing (more than 1 occupant per room).</li>
                 <li> POV* = Normalized and Scaled % below federal poverty line
    <li> INC* = Normalized and scaled per capita income.</li>
             
 Data Sources: 
 HI
 https://greatcities.uic.edu/wp-content/uploads/2016/07/GCI-Hardship-Index-Fact-SheetV2.pdf (2010-2014) https://greatcities.uic.edu/wp-content/uploads/2019/12/Hardship-Index-Fact-Sheet-2017-ACS-Final-1.pdf (2013-2017).
 
 HOM
 https://data.cityofchicago.org/Public-Safety/Homicides/k9xv-yxzs
 
 

1) We begin by importing Python's data analytics (pandas) and Numerical Python (numpy) libraries. (Press shift+enter to execute each cell)

In [None]:
import pandas as pd
import numpy as np

2) We use pandas (pd) to import the data file 'NHIHOM20142017Regression.xlsx' to a dataframe called "raw_hardship".

In [None]:
raw_hardship=pd.read_excel('NHIHOM20142017Regr.xlsx')
raw_hardship.head(2)

3) Let's separate the 2014 and 2017 hardship index (HI) data into two dataframes called "HIHOM14" and "HIHOM17". The column names will reflect the year.

In [None]:
HIHOM14=raw_hardship[["ID","HI14","NUNEMP14","NNOHS14","NDEP14","NHOUS14","NPOV14","NINC14","HOM14"]]
HIHOM14 = HIHOM14.rename(columns = {'ID':'ID14'})
HIHOM17=raw_hardship[["ID","HI17","NUNEMP17","NNOHS17","NDEP17","NHOUS17","NPOV17","NINC17","HOM17"]]
HIHOM17 = HIHOM17.rename(columns = {'ID':'ID17'})

4) Let's check the 2014 data.

In [None]:
HIHOM14.head(2)

5) Let's also check the 2017 data.

In [None]:
HIHOM17.head(2)

6) Separate the independent variables X and dependent variable y.

In [None]:
X14=HIHOM14[["NUNEMP14","NNOHS14","NDEP14","NHOUS14","NPOV14","NINC14"]]
X17=HIHOM17[["NUNEMP17","NNOHS17","NDEP17","NHOUS17","NPOV17","NINC17"]]
yHI14=HIHOM14[["HI14"]]
yHI17=HIHOM17[["HI17"]]
yHOM14=HIHOM14[["HOM14"]]
yHOM17=HIHOM17[["HOM17"]]

7) Make the linear model prediction
y=c0+c1y=c0+c1UNEMP*+c2NOHS*+c3DEP*+c4HOUS*+c5POV*+c6INC*

In [None]:
from sklearn import linear_model
regrHI14 = linear_model.LinearRegression()
regrHI17 = linear_model.LinearRegression()
regrHOM14 = linear_model.LinearRegression()
regrHOM17 = linear_model.LinearRegression()
regrHI14.fit(X14, yHI14)
regrHI17.fit(X17, yHI17)
regrHOM14.fit(X14, yHOM14)
regrHOM17.fit(X17, yHOM17)
print("y=c0+c1UNEMP*+c2NOHS*+c3DEP*+c4HOUS*+c5POV*+c6INC*")
print("Coefficients c1,...,c6 for y=HI14: ",regrHI14.coef_)
print("Coefficients c1,..., c6 for y=HI17: ",regrHI17.coef_)
print("Coefficients c1,..,c6 for y=HOM14: ",regrHOM14.coef_)
print("Coefficients c1,...,c6 for y=HOM17: ",regrHOM17.coef_)