In this notebook we will explore the harship index for Woodlawn, Chicago in relationship to other Chicago neighborhoods for two five-year periods ending in 2014 and in 2017.

The variables are designated as follows:

<ul>
    <li> HI = hardship index </li>
    <li> UNEMP = % of community age 16 and older who are unemployed. </li>
     <li> NOHS = % of community age 25 and older without a high school diploma. </li>
         <li> DEP = % of community who are dependent (under age 18 or over age 64). </li>
             <li> HOUS= % of community with overcrowded housing (more than 1 occupant per room).</li>
                 <li> POV = % below federal poverty line
    <li> INC = per capita income.</li>
             
 Data Source: https://greatcities.uic.edu/wp-content/uploads/2016/07/GCI-Hardship-Index-Fact-SheetV2.pdf (2010-2014) https://greatcities.uic.edu/wp-content/uploads/2019/12/Hardship-Index-Fact-Sheet-2017-ACS-Final-1.pdf (2013-2017).       

1) We begin by importing Python's data analytics (pandas) and Numerical Python (numpy) libraries. (Press shift+enter to execute each cell)

In [None]:
import pandas as pd
import numpy as np

2) We use pandas (pd) to import the data file 'HI20142017.xlsx' to a dataframe called "raw_hardship".

In [None]:
raw_hardship=pd.read_excel('HI20142017.xlsx')
raw_hardship.head(1)

3) Let's separate the 2014 and 2017 hardship index (HI) data into two dataframes called "HI14" and "HI17". The column names will reflect the year.

In [None]:
HI14=raw_hardship[["Community","HI14","UNEMP14","NOHS14","DEP14","HOUS14","POV14","INC14"]]
HI14 = HI14.rename(columns = {'Community':'Community14'})
HI17=raw_hardship[["Community","HI17","UNEMP17","NOHS17","DEP17","HOUS17","POV17","INC17"]]
HI17 = HI17.rename(columns = {'Community':'Community17'})
HI14.head(2)

In [None]:
HI17.head(2)

4) In the next cell we'll define a function 

makescatterplot(HI14,HI17,community_list,show_all,index1,index2,xaxislabel,yaxislabel,xrange,title) 

which uses the data in HI14 and HI17 to makes scatterplots of (index1,index2). Brief explanation or examples of all the inputs to the function are described below.

<ul>
    <li> HI14 -- the dataframe with 2014 hardship index data </li>
    <li> HI17 -- the dataframe with 2017 hardship index data </li>
    <li> community_list --  a list of names of 1 or more communities, for eg. ['Woodlawn','Englewood','Austin'] </li>
    <li> show_all -- Value is either True (in which case the scatterplotplot will show the names of all 77 communities) or False (in which case the scatterplot will only show the names of the community_list on the scatterplot) </li> 
    <li> index1 -- Name of the column (do not include the year) whose values are to be plotted as x coordinates  eg."UNEMP" </li>
    <li> index 2 -- Name of the column (do not include the year) whose values are to be plotted as y coordinates. eg. "NOHS" </li>  
    <li>xaxislabel -- Label on the x-axis eg. "% Age 16+ Unemployed"</li>
    <li>yaxislabel -- Label on the y-axis eg. "% Age 25+ without a High School Diploma"</li>
    <li> xrange -- Labeled tick marks on the x-axis eg. np.arange(0,42,1) will display the numbers 0,1,2,..., 41  on the x-axis or np.arange(0,51,2)  will display the numbers 0,2,4,...,48,50 on the x-axis</li>
    <li> title-- Title at the top of the scatterplot eg. "Unemployment and Education 2014(gray) 2017 (red)"
    
        
        

In [None]:
def makescatterplot(HI14,HI17,community_list,show_all,index1,index2,xaxislabel,yaxislabel,xrange,title):
    #import plotting tools
    import matplotlib
    import matplotlib.pyplot as plt
    
    #create a new figure
    fig=plt.figure() 
    
    #plot the 2014 points with size given by the HI value
    ax=HI14.plot(x=index1+'14', y=index2+'14', kind='scatter',c=['gray'],xticks=np.arange(len(index1)), s=2*HI14['HI14'], alpha=.25,figsize=[16,20])
    plt.xticks(xrange)
    #Add labels to the HI14 points
    for i in HI14.index:
        if (HI14.loc[i,"Community14"] not in community_list) and show_all:
            plt.gca().text(HI14.loc[i,index1+'14'], HI14.loc[i,index2+'14']+.25,HI14.loc[i,"Community14"],ha='center', color='k', fontsize=5)
        elif HI14.loc[i,"Community14"] in community_list:
            plt.gca().text(HI14.loc[i,index1+'14'], HI14.loc[i,index2+'14']+.25,HI14.loc[i,"Community14"],ha='center', color='k', fontsize=20)
            plt.gca().text(HI14.loc[i,index1+'14'],HI14.loc[i,index2+'14']-.25,'x',ha='center', color='k', fontsize=10)
    
    #Add Labels for the HI17 points   
    for i in HI17.index:
        if (HI17.loc[i,"Community17"] not in community_list) and show_all:      
            plt.gca().text(HI17.loc[i,index1+'17'], HI17.loc[i,index2+'17']+.25,HI17.loc[i,"Community17"],ha='center', color='r', fontsize=5)
            
        elif HI14.loc[i,"Community14"] in community_list:
            plt.gca().text(HI17.loc[i,index1+'17'], HI17.loc[i,index2+'17']+.25,HI17.loc[i,"Community17"],ha='center', color='r', fontsize=20)
            plt.gca().text(HI17.loc[i,index1+'17'], HI17.loc[i,index2+'17']-.25,'x',ha='center', color='r', fontsize=10)
    #Add a Title
    plt.title(title,size=20)
    plt.xlabel(xaxislabel,size=15)
    plt.ylabel(yaxislabel,size=15)

    #Save the figure to a file
    plt.savefig(index1+index2+'.png')

5) Let's make three scatterplots highlighting the Woodlawn community.

SCATTERPLOT #1: Unemployment ("UNEMP") vs. No High School Diploma ("NOHS")

In [None]:
makescatterplot(HI14,HI17,['Woodlawn'],True,"UNEMP","NOHS","% Age 16+ Unemployed","% Age 25+ without a High School Diploma",np.arange(0,42,1),"Unemployment and Education 2014(gray) 2017 (red)")

SCATTERPLOT #2: Dependent Population ("DEP") vs. Per Capita Income ("INC")

In [None]:
makescatterplot(HI14,HI17,['Woodlawn'],True,"DEP","INC","% Dependent Population (Under Age 16, Over Age 64)","Per Capita Income",np.arange(0,51,1),"Dependent Population and Income")

SCATTERPLOT #3: Below Poverty Line ("POV") and the value of the hardship index  ("HI")

In [None]:
makescatterplot(HI14,HI17,['Woodlawn'],True,"POV","HI","% Below Poverty Line","Hardshp Index",np.arange(0,64,2),"Poverty and the Hardship Index 2014(gray) 2017 (red)")

The next three scatterplots will focus on three communities: Woodlawn, Englewood, and Austin.

SCATTERPLOT #4: Three Community Unemployment vs No High School Diploma

In [None]:
makescatterplot(HI14,HI17,['Woodlawn','Englewood','Austin'],False,"UNEMP","NOHS","% Age 16+ Unemployed","% Age 25+ without a High School Diploma",np.arange(0,42,1),"Unemployment and Education 2014(gray) 2017 (red)")

SCATTERPLOT #5: Three Community Dependent Population vs Per Capita Income

In [None]:
makescatterplot(HI14,HI17,['Woodlawn','Englewood','Austin'],False,"DEP","INC","% Dependent Population (Under Age 16, Over Age 64)","Per Capita Income",np.arange(0,51,1),"Dependent Population and Income")

SCATTERPLOT #6: Three Community Below Poverty Line and HI

In [None]:
makescatterplot(HI14,HI17,['Woodlawn','Englewood','Austin'],False,"POV","HI","% Below Poverty Line","Hardshp Index",np.arange(0,64,2),"Poverty and the Hardship Index 2014(gray) 2017 (red)")

SCATTERPLOT #7: 3 Community Housing and Hardship Index

In [None]:
makescatterplot(HI14,HI17,['Woodlawn','Englewood','Austin'],False,"HOUS","HI","% in Overcrowded Housing (>1/room)","Hardship Index",np.arange(0,18,2),"Housing and the Hardship Index 2014 (Gray)  2017 (Red)")