# Beat Demographic Analysis
## Logan Chang and Robert Yu
## 08/01/20

In this notebook, we analyze the demographics of each police beat in Chicago. This includes race, age, household income, educational attainment, and food stamp usage. We compare the demographics of the top and bottom 20 beats in crimes, arrests, and arrest to crime ratio quantitatively and visually. We also offer some of our own insights.

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
pd.options.mode.chained_assignment = None #remove 'SettingWithCopyWarning' message
import math
# visualization
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/cpd-police-beat-demographics/ERRORLOG.txt
/kaggle/input/cpd-police-beat-demographics/beatage.txt
/kaggle/input/cpd-police-beat-demographics/beathi.txt
/kaggle/input/cpd-police-beat-demographics/master.csv
/kaggle/input/cpd-police-beat-demographics/outputBeats_EDA_TOP.txt
/kaggle/input/cpd-police-beat-demographics/beatse.txt
/kaggle/input/cpd-police-beat-demographics/beatea.txt
/kaggle/input/cpd-police-beat-demographics/outputBeats_EDA_LAST.txt
/kaggle/input/cpd-police-beat-demographics/beatfs.txt
/kaggle/input/cpd-police-beat-demographics/beatrace.txt
/kaggle/input/cpd-police-beat-demographics/beatpop.txt
/kaggle/input/cpd-police-beat-demographics/beathh.txt


# FILE READING/DATA COLLECTION

Read in population and sq mileage:

In [2]:
df_pop = pd.read_csv('/kaggle/input/cpd-police-beat-demographics/beatpop.txt', sep=" ", skiprows = [0], header=None)
df_pop.columns = ["beat", "population", "square_mileage"]
df_pop.set_index('beat', inplace= True)

Read in and properly format race:

In [3]:
racef = open("/kaggle/input/cpd-police-beat-demographics/beatrace.txt",'r')
size = int(racef.readline())
df_race = pd.read_csv('/kaggle/input/cpd-police-beat-demographics/beatrace.txt', sep=" ", skiprows = [0], header=None)
df_race.columns = ["beat", "num_white", 'num_hispanic', 'num_black', 'num_asian', 'num_mixed', 'num_other']
df_race.set_index('beat', inplace = True)

In [4]:
df_beat = pd.concat([df_pop, df_race], axis = 1)
df_beat.head()

Unnamed: 0_level_0,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other
beat,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
111,1982.585454,0.106797,1272.117539,189.765898,84.765805,412.651312,17.374913,5.909985
112,1075.282355,0.08606,667.891824,56.178674,116.041634,204.091402,24.85612,6.222698
113,1072.951867,0.087558,613.579992,75.685872,140.302498,210.892786,27.998407,4.492315
114,16145.43461,0.66811,10782.50253,1057.578252,1246.289115,2736.379391,268.33286,54.352438
121,6400.496775,0.264266,3113.800044,427.296424,297.086085,2401.104246,131.279268,29.930697


Breakdown the percentage of each race by beat:

In [5]:
df_beat['percent_white'] = df_beat.apply(lambda row: row['num_white']/row['population']*100, axis = 1)
df_beat['percent_hispanic'] = df_beat.apply(lambda row: row['num_hispanic']/row['population']*100, axis = 1)
df_beat['percent_black'] = df_beat.apply(lambda row: row['num_black']/row['population']*100, axis = 1)
df_beat['percent_asian'] = df_beat.apply(lambda row: row['num_asian']/row['population']*100, axis = 1)
df_beat['percent_mixed'] = df_beat.apply(lambda row: row['num_mixed']/row['population']*100, axis = 1)
df_beat['percent_other'] = df_beat.apply(lambda row: row['num_other']/row['population']*100, axis = 1)
df_beat.head()

Unnamed: 0_level_0,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,percent_white,percent_hispanic,percent_black,percent_asian,percent_mixed,percent_other
beat,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
111,1982.585454,0.106797,1272.117539,189.765898,84.765805,412.651312,17.374913,5.909985,64.164575,9.571638,4.275518,20.813797,0.876376,0.298095
112,1075.282355,0.08606,667.891824,56.178674,116.041634,204.091402,24.85612,6.222698,62.113158,5.224551,10.791736,18.980261,2.31159,0.578704
113,1072.951867,0.087558,613.579992,75.685872,140.302498,210.892786,27.998407,4.492315,57.186162,7.053986,13.076309,19.655382,2.609475,0.418687
114,16145.43461,0.66811,10782.50253,1057.578252,1246.289115,2736.379391,268.33286,54.352438,66.7836,6.550324,7.719143,16.948317,1.661974,0.336643
121,6400.496775,0.264266,3113.800044,427.296424,297.086085,2401.104246,131.279268,29.930697,48.649349,6.675988,4.64161,37.514342,2.051079,0.467631


Read in household income:

In [6]:
df_hi = pd.read_csv('/kaggle/input/cpd-police-beat-demographics/beathi.txt', sep=" ", skiprows = [0], header=None)
df_hi.columns = ["beat", "med_income"]
df_hi.set_index('beat', inplace= True)
df_beat = pd.concat([df_beat, df_hi], axis = 1)
df_beat.head()

Unnamed: 0_level_0,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,percent_white,percent_hispanic,percent_black,percent_asian,percent_mixed,percent_other,med_income
beat,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
111,1982.585454,0.106797,1272.117539,189.765898,84.765805,412.651312,17.374913,5.909985,64.164575,9.571638,4.275518,20.813797,0.876376,0.298095,84026.501461
112,1075.282355,0.08606,667.891824,56.178674,116.041634,204.091402,24.85612,6.222698,62.113158,5.224551,10.791736,18.980261,2.31159,0.578704,96400.650448
113,1072.951867,0.087558,613.579992,75.685872,140.302498,210.892786,27.998407,4.492315,57.186162,7.053986,13.076309,19.655382,2.609475,0.418687,101646.757683
114,16145.43461,0.66811,10782.50253,1057.578252,1246.289115,2736.379391,268.33286,54.352438,66.7836,6.550324,7.719143,16.948317,1.661974,0.336643,110367.182242
121,6400.496775,0.264266,3113.800044,427.296424,297.086085,2401.104246,131.279268,29.930697,48.649349,6.675988,4.64161,37.514342,2.051079,0.467631,101550.382372


Read in food stamps:

In [7]:
df_fs = pd.read_csv('/kaggle/input/cpd-police-beat-demographics/beatfs.txt', sep=" ", skiprows = [0], header=None)
df_fs.columns = ["beat", "pop_food_stamps"]
df_fs.set_index('beat', inplace= True)
#df_fs.head()
df_beat = pd.concat([df_beat, df_fs], axis = 1)
df_beat.index.name = 'beat'
#df_beat.head()

In [8]:
df_beat['percent_on_fs'] = df_beat.apply(lambda row: row['pop_food_stamps']/row['population']*100, axis = 1)
df_beat.head()

Unnamed: 0_level_0,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,percent_white,percent_hispanic,percent_black,percent_asian,percent_mixed,percent_other,med_income,pop_food_stamps,percent_on_fs
beat,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
111,1982.585454,0.106797,1272.117539,189.765898,84.765805,412.651312,17.374913,5.909985,64.164575,9.571638,4.275518,20.813797,0.876376,0.298095,84026.501461,5.047278,0.254581
112,1075.282355,0.08606,667.891824,56.178674,116.041634,204.091402,24.85612,6.222698,62.113158,5.224551,10.791736,18.980261,2.31159,0.578704,96400.650448,4.617578,0.429429
113,1072.951867,0.087558,613.579992,75.685872,140.302498,210.892786,27.998407,4.492315,57.186162,7.053986,13.076309,19.655382,2.609475,0.418687,101646.757683,5.54962,0.517229
114,16145.43461,0.66811,10782.50253,1057.578252,1246.289115,2736.379391,268.33286,54.352438,66.7836,6.550324,7.719143,16.948317,1.661974,0.336643,110367.182242,37.78907,0.234054
121,6400.496775,0.264266,3113.800044,427.296424,297.086085,2401.104246,131.279268,29.930697,48.649349,6.675988,4.64161,37.514342,2.051079,0.467631,101550.382372,20.313142,0.317368


Read in and append educational attainment:

In [9]:
df_ea = pd.read_csv('/kaggle/input/cpd-police-beat-demographics/beatea.txt', sep=" ", skiprows = [0], header=None)
df_ea.columns = ["beat", "bachelors", "high_school", "no_high_school"]
df_ea.set_index('beat', inplace= True)
df_beat = pd.concat([df_beat, df_ea], axis = 1)
df_beat.index.name = 'beat'
#df_beat.head()

Let's create percentages for each educational attainment category:

In [10]:
df_beat['percent_bachelors'] = df_beat.apply(lambda row: row['bachelors']/row['population']*100, axis = 1)
df_beat['percent_high_school'] = df_beat.apply(lambda row: row['high_school']/row['population']*100, axis = 1)
df_beat['percent_no_high_school'] = df_beat.apply(lambda row: row['no_high_school']/row['population']*100, axis = 1)

Read in population by age:

In [11]:
df_age = pd.read_csv('/kaggle/input/cpd-police-beat-demographics/beatage.txt', sep=" ", skiprows = [0], header=None)
df_age.columns = ["beat", '85+', '80-84', '75-79', '70-74', '67-69', '65-66', '62-64', '60-61', '55-59', '50-54', '45-49', '40-44', '35-39', '30-34', '25-29', '22-24', '21', '20', '18-19', '15-17', '10-14', '5-9', '0-4']
df_age.set_index('beat', inplace= True)
df_beat = pd.concat([df_beat, df_age], axis = 1)
df_beat.index.name = 'beat'
df_beat.head()

Unnamed: 0_level_0,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,percent_white,percent_hispanic,...,30-34,25-29,22-24,21,20,18-19,15-17,10-14,5-9,0-4
beat,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
111,1982.585454,0.106797,1272.117539,189.765898,84.765805,412.651312,17.374913,5.909985,64.164575,9.571638,...,310.985968,593.988017,201.963448,11.627009,53.387115,191.216071,11.677332,15.93893,15.834807,61.32315
112,1075.282355,0.08606,667.891824,56.178674,116.041634,204.091402,24.85612,6.222698,62.113158,5.224551,...,100.422705,137.578194,95.766661,39.306165,86.095274,185.894475,12.96235,15.736303,14.521993,16.788204
113,1072.951867,0.087558,613.579992,75.685872,140.302498,210.892786,27.998407,4.492315,57.186162,7.053986,...,97.224674,137.45921,108.92085,50.295552,87.914303,205.365936,8.252637,12.904984,13.332698,18.899059
114,16145.43461,0.66811,10782.50253,1057.578252,1246.289115,2736.379391,268.33286,54.352438,66.7836,6.550324,...,2029.003225,2206.52311,680.77603,252.478695,296.659764,1159.738366,49.631733,8.773843,161.243895,784.392936
121,6400.496775,0.264266,3113.800044,427.296424,297.086085,2401.104246,131.279268,29.930697,48.649349,6.675988,...,1152.080654,1534.974268,822.607903,84.486978,45.530758,171.615639,28.614369,32.163771,43.740345,340.308729


Since the given age bands are too small for any solid analysis, we will create new age bands but still keep the given ones:

In [12]:
df_beat['<=21'] = df_beat.apply(lambda row: row['21'] + row['20']+row['18-19']+row['15-17']+row['10-14']+row['5-9']+row['0-4'], axis = 1)
df_beat['22-29'] = df_beat.apply(lambda row: row['22-24'] + row['25-29'], axis = 1)
df_beat['30-39'] = df_beat.apply(lambda row: row['30-34'] + row['35-39'], axis = 1)
df_beat['40-49'] = df_beat.apply(lambda row: row['40-44'] + row['45-49'], axis = 1)
df_beat['50-59'] = df_beat.apply(lambda row: row['50-54'] + row['55-59'], axis = 1)
df_beat['60-64'] = df_beat.apply(lambda row: row['60-61'] + row['62-64'], axis = 1)
df_beat['65+'] = df_beat.apply(lambda row: row['65-66'] + row['67-69']+row['70-74']+row['75-79']+row['80-84']+row['85+'], axis = 1)
df_beat.head()

Unnamed: 0_level_0,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,percent_white,percent_hispanic,...,10-14,5-9,0-4,<=21,22-29,30-39,40-49,50-59,60-64,65+
beat,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
111,1982.585454,0.106797,1272.117539,189.765898,84.765805,412.651312,17.374913,5.909985,64.164575,9.571638,...,15.93893,15.834807,61.32315,361.004414,795.951465,400.583951,110.5734,145.918029,69.170732,99.39205
112,1075.282355,0.08606,667.891824,56.178674,116.041634,204.091402,24.85612,6.222698,62.113158,5.224551,...,15.736303,14.521993,16.788204,371.304764,233.344855,125.192172,84.481662,105.068609,53.759341,102.128414
113,1072.951867,0.087558,613.579992,75.685872,140.302498,210.892786,27.998407,4.492315,57.186162,7.053986,...,12.904984,13.332698,18.899059,396.965169,246.38006,165.976057,98.781121,71.443282,48.616807,44.788713
114,16145.43461,0.66811,10782.50253,1057.578252,1246.289115,2736.379391,268.33286,54.352438,66.7836,6.550324,...,8.773843,161.243895,784.392936,2712.919232,2887.29914,2893.376072,2286.423909,2188.955616,1162.235714,2014.26601
121,6400.496775,0.264266,3113.800044,427.296424,297.086085,2401.104246,131.279268,29.930697,48.649349,6.675988,...,32.163771,43.740345,340.308729,746.460589,2357.582171,2035.649422,750.47024,188.22542,131.009716,191.111799


Reading in school enrollment by age:

In [13]:
df_ea = pd.read_csv('/kaggle/input/cpd-police-beat-demographics/beatse.txt', sep=" ", skiprows = [0], header=None)
df_ea.columns = ["beat", "se_35+", "se_25-34", "se_20-24", "se_18-19", "se_15-17", "se_10-14", "se_5-9", "se_0-4"]
df_ea.set_index('beat', inplace= True)
df_beat = pd.concat([df_beat, df_ea], axis = 1)
df_beat.index.name = 'beat'
#df_beat.head()

Let's create a total school enrollment column:

In [14]:
df_beat['total_se'] = df_beat.apply(lambda row: row["se_35+"]+ row["se_25-34"]+ row["se_20-24"]+ row["se_18-19"]+ row["se_15-17"]+ row["se_10-14"]+row["se_5-9"]+row["se_0-4"], axis = 1)

Creating percentages of school enrollment for total population and each respective age band:

In [15]:
df_beat['percent_se'] = df_beat.apply(lambda row: row['total_se']/row['population']*100, axis = 1)
df_beat['percent_se_0-4'] = df_beat.apply(lambda row: row['se_0-4']/row['0-4']*100, axis = 1)
df_beat['percent_se_5-9'] = df_beat.apply(lambda row: row['se_5-9']/row['5-9']*100, axis = 1)
df_beat['percent_se_10-14'] = df_beat.apply(lambda row: row['se_10-14']/row['10-14']*100, axis = 1)
df_beat['percent_se_15-17'] = df_beat.apply(lambda row: row['se_15-17']/row['15-17']*100, axis = 1)
df_beat['percent_se_18-19'] = df_beat.apply(lambda row: row['se_18-19']/row['18-19']*100, axis = 1)
df_beat['percent_se_20-24'] = df_beat.apply(lambda row: row['se_20-24']/(row['20']+row['21']+row['22-24'])*100, axis = 1)
df_beat['percent_se_25-34'] = df_beat.apply(lambda row: row['se_25-34']/(row['25-29']+row['30-34'])*100, axis = 1)
df_beat['youngPop'] = df_beat.apply(lambda row: row["population"]- (row['25-29']+row['30-34'])- (row['20']+row['21']+row['22-24'])- row["18-19"]- row["15-17"]- row["10-14"]-row["5-9"]-row["0-4"], axis = 1)
df_beat['percent_se_35+'] = df_beat.apply(lambda row: row['se_35+']/(row['population']-row['youngPop'])*100, axis = 1)

In [16]:
df_beat.head()

Unnamed: 0_level_0,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,percent_white,percent_hispanic,...,percent_se,percent_se_0-4,percent_se_5-9,percent_se_10-14,percent_se_15-17,percent_se_18-19,percent_se_20-24,percent_se_25-34,youngPop,percent_se_35+
beat,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
111,1982.585454,0.106797,1272.117539,189.765898,84.765805,412.651312,17.374913,5.909985,64.164575,9.571638,...,34.345585,83.002531,99.900983,100.0,100.0,97.448466,52.211998,27.498145,514.643608,0.818351
112,1075.282355,0.08606,667.891824,56.178674,116.041634,204.091402,24.85612,6.222698,62.113158,5.224551,...,44.436205,23.278445,100.0,100.0,100.0,98.856268,78.819322,26.861461,370.210031,1.228801
113,1072.951867,0.087558,613.579992,75.685872,140.302498,210.892786,27.998407,4.492315,57.186162,7.053986,...,48.505227,0.0,100.0,100.0,100.0,99.360813,84.641884,27.013032,332.381964,1.258874
114,16145.43461,0.66811,10782.50253,1057.578252,1246.289115,2736.379391,268.33286,54.352438,66.7836,6.550324,...,26.685893,91.768634,100.0,100.0,100.0,99.018453,75.201963,25.759811,8516.213013,2.683614
121,6400.496775,0.264266,3113.800044,427.296424,297.086085,2401.104246,131.279268,29.930697,48.649349,6.675988,...,27.821948,71.831891,58.500783,100.0,100.0,99.986358,58.954917,22.159168,2144.37336,2.849663


# EDA

We need to take out the index to do EDA...

In [17]:
df_beat.reset_index(inplace = True)

In [18]:
#helper func to properly format beats
def pad0(beat):
    if(len(str(beat)) == 3):
        return '0'+ str(beat)
    else:
        return str(beat)

In [19]:
df_beat['beat'] = df_beat['beat'].apply(lambda x: pad0(x))

In [20]:
df_beat.head()

Unnamed: 0,beat,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,percent_white,...,percent_se,percent_se_0-4,percent_se_5-9,percent_se_10-14,percent_se_15-17,percent_se_18-19,percent_se_20-24,percent_se_25-34,youngPop,percent_se_35+
0,111,1982.585454,0.106797,1272.117539,189.765898,84.765805,412.651312,17.374913,5.909985,64.164575,...,34.345585,83.002531,99.900983,100.0,100.0,97.448466,52.211998,27.498145,514.643608,0.818351
1,112,1075.282355,0.08606,667.891824,56.178674,116.041634,204.091402,24.85612,6.222698,62.113158,...,44.436205,23.278445,100.0,100.0,100.0,98.856268,78.819322,26.861461,370.210031,1.228801
2,113,1072.951867,0.087558,613.579992,75.685872,140.302498,210.892786,27.998407,4.492315,57.186162,...,48.505227,0.0,100.0,100.0,100.0,99.360813,84.641884,27.013032,332.381964,1.258874
3,114,16145.43461,0.66811,10782.50253,1057.578252,1246.289115,2736.379391,268.33286,54.352438,66.7836,...,26.685893,91.768634,100.0,100.0,100.0,99.018453,75.201963,25.759811,8516.213013,2.683614
4,121,6400.496775,0.264266,3113.800044,427.296424,297.086085,2401.104246,131.279268,29.930697,48.649349,...,27.821948,71.831891,58.500783,100.0,100.0,99.986358,58.954917,22.159168,2144.37336,2.849663


**Getting some info on Chicago as a whole:**

First, race breakdowns:

In [21]:
totPop = df_beat['population'].sum()
wPop = df_beat['num_white'].sum()
bPop = df_beat['num_black'].sum()
hPop = df_beat['num_hispanic'].sum()
mPop = df_beat['num_mixed'].sum()
aPop = df_beat['num_asian'].sum()
oPop = df_beat['num_other'].sum()
print('CHICAGO POLICE BEAT RACE BREAKDOWN:\n')
print('Percentage of Population that is White: '+str(wPop/totPop*100)+'%')
print('Percentage of Population that is Black: '+str(bPop/totPop*100)+'%')
print('Percentage of Population that is Hispanic: '+str(hPop/totPop*100)+'%')
print('Percentage of Population that is Asian: '+str(aPop/totPop*100)+'%')
print('Percentage of Population that is Mixed: '+str(mPop/totPop*100)+'%')
print('Percentage of Population that is Other: '+str(oPop/totPop*100)+'%')

CHICAGO POLICE BEAT RACE BREAKDOWN:

Percentage of Population that is White: 32.91667230977577%
Percentage of Population that is Black: 30.56942792811464%
Percentage of Population that is Hispanic: 28.477597448300607%
Percentage of Population that is Asian: 6.115564840452076%
Percentage of Population that is Mixed: 1.6158859823115594%
Percentage of Population that is Other: 0.3048514945934356%


Now, population and square mileage

In [22]:
print('CHICAGO POLICE BEAT POPULATION AND SQ. MILEAGE BREAKDOWN:\n')
print('Population of Chicago Police Beats: '+str(df_beat['population'].sum()))
print('Square Mileage of Chicago Police Beats: '+str(df_beat['square_mileage'].sum()))

CHICAGO POLICE BEAT POPULATION AND SQ. MILEAGE BREAKDOWN:

Population of Chicago Police Beats: 2739973.9333928674
Square Mileage of Chicago Police Beats: 222.211280823


Now, income and food stamps data:

In [23]:
tot_MI = df_beat['med_income'].sum()
tot_FS = df_beat['pop_food_stamps'].sum()
print('CHICAGO POLICE BEATS INCOME AND FOOD STAMPS BREAKDOWN:\n')
print('Average Median Income: '+str(tot_MI/len(df_beat)))
print('Percentage of Population on Food Stamps: '+str(tot_FS/totPop*100)+'%')

CHICAGO POLICE BEATS INCOME AND FOOD STAMPS BREAKDOWN:

Average Median Income: 50009.28070012656
Percentage of Population on Food Stamps: 7.574929469674449%


Educational Attainment:

In [24]:
bachelorsTot = df_beat['bachelors'].sum()
hsTot = df_beat['high_school'].sum()
no_hsTot = df_beat['no_high_school'].sum()
print("CHICAGO POLICE BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:\n")
print('Percentage of Population with at least a Bachelor\'s Degree: '+ str(bachelorsTot/totPop*100)+'%')
print('Percentage of Population with at most a High School Diploma: '+ str(hsTot/totPop*100)+'%')
print('Percentage of Population without a High School Diploma: '+ str(no_hsTot/totPop*100)+'%')

CHICAGO POLICE BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:

Percentage of Population with at least a Bachelor's Degree: 28.58781234805012%
Percentage of Population with at most a High School Diploma: 27.65121258639815%
Percentage of Population without a High School Diploma: 11.38089725711917%


Age bands:

In [25]:
minors = df_beat['<=21'].sum()
twenties = df_beat['22-29'].sum()
thirties = df_beat['30-39'].sum()
forties = df_beat['40-49'].sum()
fifties = df_beat['50-59'].sum()
sixties = df_beat['60-64'].sum()
seniors = df_beat['65+'].sum()
print('CHICAGO POLICE BEAT AGE BREAKDOWN:\n')
print('Percentage of Residents that are Minors (<=21): '+str(minors/totPop*100)+'%')
print('Percentage of Residents that are in their Twenties: '+str(twenties/totPop*100)+'%')
print('Percentage of Residents that are in their Thirties: '+str(thirties/totPop*100)+'%')
print('Percentage of Residents that are in their Forties: '+str(forties/totPop*100)+'%')
print('Percentage of Residents that are in their Fifties: '+str(fifties/totPop*100)+'%')
print('Percentage of Residents that are in between 60-64: '+str(sixties/totPop*100)+'%')
print('Percentage of Residents that are Seniors (65+): '+str(seniors/totPop*100)+'%')

CHICAGO POLICE BEAT AGE BREAKDOWN:

Percentage of Residents that are Minors (<=21): 27.374727683049414%
Percentage of Residents that are in their Twenties: 15.153457942012258%
Percentage of Residents that are in their Thirties: 16.696869598946286%
Percentage of Residents that are in their Forties: 12.775802026244076%
Percentage of Residents that are in their Fifties: 11.742432446851174%
Percentage of Residents that are in between 60-64: 4.917904376794001%
Percentage of Residents that are Seniors (65+): 11.338833877664651%


School Enrollment:

In [26]:
total_se = df_beat['total_se'].sum()
total_se_0to4 = df_beat['se_0-4'].sum()
total_se_5to9 = df_beat['se_5-9'].sum()
total_se_10to14 = df_beat['se_10-14'].sum()
total_se_15to17 = df_beat['se_15-17'].sum()
total_se_18to19 = df_beat['se_18-19'].sum()
total_se_20to24 = df_beat['se_20-24'].sum()
total_se_25to34 = df_beat['se_25-34'].sum()
total_se_35plus = df_beat['se_35+'].sum()
total_0to4 = df_beat['0-4'].sum()
total_5to9 = df_beat['5-9'].sum()
total_10to14 = df_beat['10-14'].sum()
total_15to17 = df_beat['15-17'].sum()
total_18to19 = df_beat['18-19'].sum()
total_20to24 = df_beat['20'].sum() + df_beat['21'].sum() + df_beat['22-24'].sum()
total_25to34 = df_beat['25-29'].sum() + df_beat['30-34'].sum() 
total_youngPop = totPop - (df_beat['25-29'].sum() + df_beat['30-34'].sum()) - (df_beat['20'].sum() + df_beat['21'].sum() + df_beat['22-24'].sum()) - df_beat['18-19'].sum() - df_beat['15-17'].sum() -df_beat['10-14'].sum() -df_beat['5-9'].sum()-df_beat['0-4'].sum()

print('CHICAGO POLICE BEAT SCHOOL ENROLLMENT BREAKDOWN:\n')
print('Percentage of Residents that are Enrolled Students (All): '+str(total_se/totPop*100)+'%')
print('Percentage of Residents that are Enrolled Students (0-4): '+str(total_se_0to4/total_0to4*100)+'%')
print('Percentage of Residents that are Enrolled Students (5-9): '+str(total_se_5to9/total_5to9*100)+'%')
print('Percentage of Residents that are Enrolled Students (10-14): '+str(total_se_10to14/total_10to14*100)+'%')
print('Percentage of Residents that are Enrolled Students (15-17): '+str(total_se_15to17/total_15to17*100)+'%')
print('Percentage of Residents that are Enrolled Students (18-19): '+str(total_se_18to19/total_18to19*100)+'%')
print('Percentage of Residents that are Enrolled Students (20-24): '+str(total_se_20to24/total_20to24*100)+'%')
print('Percentage of Residents that are Enrolled Students (25-34): '+str(total_se_25to34/total_25to34*100)+'%')
print('Percentage of Residents that are Enrolled Students (35+): '+str(total_se_35plus/total_youngPop*100)+'%')

CHICAGO POLICE BEAT SCHOOL ENROLLMENT BREAKDOWN:

Percentage of Residents that are Enrolled Students (All): 28.105802681680164%
Percentage of Residents that are Enrolled Students (0-4): 57.58224091557026%
Percentage of Residents that are Enrolled Students (5-9): 96.13074077118937%
Percentage of Residents that are Enrolled Students (10-14): 98.41673714964996%
Percentage of Residents that are Enrolled Students (15-17): 95.91592553152162%
Percentage of Residents that are Enrolled Students (18-19): 78.25716987890398%
Percentage of Residents that are Enrolled Students (20-24): 41.46668623362923%
Percentage of Residents that are Enrolled Students (25-34): 14.096944178820392%
Percentage of Residents that are Enrolled Students (35+): 3.049767128372294%


**Ranking all the beats with respect to one another in terms of percentage of residents of each race :** (for reference, those with the highest percentage are ranked 1 and the lowest are ranked 271)

*Note: These columns will be labeled as follows: 'rank_%' + a letter representing the first letter of the race:*

*w = White*

*b = Black*

*a = Asian*

*h = Hispanic*

*m = Mixed*

*o = Other*



WHITE:

In [27]:
df_beat.sort_values(by = 'percent_white', ascending = False, inplace = True)
pW_beats = list(df_beat.beat)
df_beat['rank_%w'] = df_beat.beat.apply(lambda x: pW_beats.index(x)+1)
#df_beat.head()

BLACK:

In [28]:
df_beat.sort_values(by = 'percent_black', ascending = False, inplace = True)
pB_beats = list(df_beat.beat)
df_beat['rank_%b'] = df_beat.beat.apply(lambda x: pB_beats.index(x)+1)
#df_beat.head()

HISPANIC:

In [29]:
df_beat.sort_values(by = 'percent_hispanic', ascending = False, inplace = True)
pH_beats = list(df_beat.beat)
df_beat['rank_%h'] = df_beat.beat.apply(lambda x: pH_beats.index(x)+1)
#df_beat.head()

ASIAN:

In [30]:
df_beat.sort_values(by = 'percent_asian', ascending = False, inplace = True)
pA_beats = list(df_beat.beat)
df_beat['rank_%a'] = df_beat.beat.apply(lambda x: pA_beats.index(x)+1)
#df_beat.head()

MIXED:

In [31]:
df_beat.sort_values(by = 'percent_mixed', ascending = False, inplace = True)
pM_beats = list(df_beat.beat)
df_beat['rank_%m'] = df_beat.beat.apply(lambda x: pM_beats.index(x)+1)
#df_beat.head()

OTHER:

In [32]:
df_beat.sort_values(by = 'percent_other', ascending = False, inplace = True)
pO_beats = list(df_beat.beat)
df_beat['rank_%o'] = df_beat.beat.apply(lambda x: pO_beats.index(x)+1)
#df_beat.head()

We can now drop the percentage columns for each race from the dataset:

In [33]:
df_beat.drop(['percent_white', 'percent_hispanic', 'percent_black', 'percent_mixed', 'percent_other', 'percent_asian'], axis = 1, inplace = True)
df_beat.head()

Unnamed: 0,beat,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,med_income,...,percent_se_20-24,percent_se_25-34,youngPop,percent_se_35+,rank_%w,rank_%b,rank_%h,rank_%a,rank_%m,rank_%o
26,311,3632.01419,0.289946,8.140459,0.481957,3401.528661,0.0,107.74879,114.114323,19069.978887,...,29.943084,20.644914,1221.582904,1.058275,263,58,271,271,39,1
82,732,3281.79398,0.37474,2.177894,208.17636,2936.059437,0.0,49.107646,86.272643,22130.422277,...,27.066488,9.493176,1298.158709,3.011093,267,84,163,268,117,2
116,1011,6659.731682,0.475028,14.337526,97.374433,6383.82508,11.483253,5.475101,147.236288,22207.616375,...,53.865612,5.793657,2881.443314,2.183854,264,37,224,208,263,3
241,2222,7489.3485,0.645467,49.110405,81.467717,7188.74277,12.033193,8.002146,149.992274,43549.303748,...,35.386209,21.169201,4275.302962,3.9142,253,35,238,209,257,4
245,2234,8643.572796,1.192272,66.32414,233.898121,8031.300067,0.000194,139.446441,172.603842,46347.393856,...,52.70861,16.887773,4753.334217,4.598105,248,65,204,248,110,5


Now ranking by median income:

In [34]:
df_beat.sort_values(by = 'med_income', ascending = False, inplace = True)
mI_beats = list(df_beat.beat)
df_beat['rank_income'] = df_beat.beat.apply(lambda x: mI_beats.index(x)+1)
#df_beat.head()

Now ranking by percentage on food stamps:

In [35]:
df_beat.sort_values(by = 'percent_on_fs', ascending = False, inplace = True)
fS_beats = list(df_beat.beat)
df_beat['rank_fs'] = df_beat.beat.apply(lambda x: fS_beats.index(x)+1)
#df_beat.head()

Now ranking by each category of educational attainment:

In [36]:
df_beat.sort_values(by = 'percent_bachelors', ascending = False, inplace = True)
bachelors_beats = list(df_beat.beat)
df_beat['rank_bachelors'] = df_beat.beat.apply(lambda x: bachelors_beats.index(x)+1)
df_beat.sort_values(by = 'percent_high_school', ascending = False, inplace = True)
HS_beats = list(df_beat.beat)
df_beat['rank_high_school'] = df_beat.beat.apply(lambda x: HS_beats.index(x)+1)
df_beat.sort_values(by = 'percent_no_high_school', ascending = False, inplace = True)
no_HS_beats = list(df_beat.beat)
df_beat['rank_no_high_school'] = df_beat.beat.apply(lambda x: no_HS_beats.index(x)+1)
df_beat.sort_values(by = 'beat', ascending = True, inplace = True)
#df_beat.head()

Ranking by school enrollment (total and each age group):

In [37]:
df_beat.sort_values(by = 'percent_se', ascending = False, inplace = True)
se_beats = list(df_beat.beat)
df_beat['rank_total_se'] = df_beat.beat.apply(lambda x: se_beats.index(x)+1)
df_beat.sort_values(by = 'percent_se_0-4', ascending = False, inplace = True)
se04_beats = list(df_beat.beat)
df_beat['rank_total_se_0-4'] = df_beat.beat.apply(lambda x: se04_beats.index(x)+1)
df_beat.sort_values(by = 'percent_se_5-9', ascending = False, inplace = True)
se59_beats = list(df_beat.beat)
df_beat['rank_total_se_5-9'] = df_beat.beat.apply(lambda x: se59_beats.index(x)+1)
df_beat.sort_values(by = 'percent_se_10-14', ascending = False, inplace = True)
se1014_beats = list(df_beat.beat)
df_beat['rank_total_se_10-14'] = df_beat.beat.apply(lambda x: se1014_beats.index(x)+1)
df_beat.sort_values(by = 'percent_se_15-17', ascending = False, inplace = True)
se1517_beats = list(df_beat.beat)
df_beat['rank_total_se_15-17'] = df_beat.beat.apply(lambda x: se1517_beats.index(x)+1)
df_beat.sort_values(by = 'percent_se_18-19', ascending = False, inplace = True)
se1819_beats = list(df_beat.beat)
df_beat['rank_total_se_18-19'] = df_beat.beat.apply(lambda x: se1819_beats.index(x)+1)
df_beat.sort_values(by = 'percent_se_20-24', ascending = False, inplace = True)
se2024_beats = list(df_beat.beat)
df_beat['rank_total_se_20-24'] = df_beat.beat.apply(lambda x: se2024_beats.index(x)+1)
df_beat.sort_values(by = 'percent_se_25-34', ascending = False, inplace = True)
se2534_beats = list(df_beat.beat)
df_beat['rank_total_se_25-34'] = df_beat.beat.apply(lambda x: se2534_beats.index(x)+1)
df_beat.sort_values(by = 'percent_se_35+', ascending = False, inplace = True)
se35_beats = list(df_beat.beat)
df_beat['rank_total_se_35+'] = df_beat.beat.apply(lambda x: se35_beats.index(x)+1)
#df_beat.head()

**Now, let's look at the top 20 beats in each previously measured category:**

In [38]:
topFile = open("/kaggle/input/cpd-police-beat-demographics/outputBeats_EDA_TOP.txt","r") 

First, crime:

In [39]:
topFile.readline()
t20_crimes = topFile.readline().split(' ')
t20_crimes = t20_crimes[:-1]
for i in range (0,20):
    t20_crimes[i] = pad0(t20_crimes[i])
print(t20_crimes)

['0423', '0421', '0624', '1533', '0823', '0511', '1112', '1522', '2533', '0414', '0621', '0612', '0825', '0321', '0522', '0713', '1532', '1011', '1531', '1122']


In [40]:
df_topCrimes = df_beat[df_beat['beat'].isin(t20_crimes)]
df_topCrimes.set_index('beat', inplace = True)
df_topCrimes = df_topCrimes.reindex(t20_crimes)

In [41]:
df_topCrimes.reset_index(inplace = True)
df_topCrimes

Unnamed: 0,beat,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,med_income,...,rank_no_high_school,rank_total_se,rank_total_se_0-4,rank_total_se_5-9,rank_total_se_10-14,rank_total_se_15-17,rank_total_se_18-19,rank_total_se_20-24,rank_total_se_25-34,rank_total_se_35+
0,423,12259.865457,1.211219,268.127668,2253.581957,9699.072054,15.333409,12.328246,11.422118,33482.887401,...,108,70,97,195,200,148,175,170,121,74
1,421,10040.261172,0.640299,437.234583,121.744455,9264.339667,17.907552,178.17195,20.862949,22492.07571,...,184,28,46,113,12,74,146,196,87,66
2,624,7480.696752,0.602045,55.70874,57.560998,7316.225802,30.675287,6.614789,13.911136,25555.556677,...,118,189,174,247,163,245,29,186,104,41
3,1533,7287.673417,0.552272,140.701888,85.730872,7045.973828,5.959837,6.96481,2.342207,25628.891687,...,69,35,135,10,161,122,62,175,59,82
4,823,21821.939688,0.946877,1353.520293,16459.706694,3682.195764,66.657679,242.505817,17.353439,35823.520938,...,14,62,200,69,135,161,194,210,200,211
5,511,13572.186143,2.497999,140.956374,31.934824,13189.019429,72.375548,125.006606,12.893347,40540.734001,...,160,90,125,30,183,70,159,29,42,15
6,1112,7155.958067,0.315592,73.67862,2470.596099,4423.679248,0.002181,75.530586,112.471337,28693.33387,...,24,92,79,145,164,204,156,232,228,270
7,1522,9822.318339,0.573595,211.159122,435.344207,9149.668723,5.669283,5.736894,14.740125,33675.668578,...,127,126,221,168,109,87,235,213,226,173
8,2533,11224.65803,1.010532,222.263184,6610.983661,4196.415104,124.271773,70.723444,0.000864,37057.687967,...,25,129,248,212,133,175,180,185,243,208
9,414,10300.357245,0.93318,72.172264,119.500289,9941.039964,0.0,161.523316,6.121423,43118.177817,...,217,165,183,27,54,12,258,39,21,3


Let's do some aggregate measurements on this data:

First, we will look at race percentages for these beats all together:

In [42]:
totPop = 0
wPop = 0
bPop = 0
hPop = 0
mPop = 0
aPop = 0
oPop = 0
for ind in df_topCrimes.index:
    totPop += df_topCrimes['population'][ind]
    wPop += df_topCrimes['num_white'][ind]
    bPop += df_topCrimes['num_black'][ind]
    hPop += df_topCrimes['num_hispanic'][ind]
    mPop += df_topCrimes['num_mixed'][ind]
    aPop += df_topCrimes['num_asian'][ind]
    oPop += df_topCrimes['num_other'][ind]
print('TOP 20 CRIME BEATS RACE BREAKDOWN:\n')
print('Percentage of Population that is White: '+str(wPop/totPop*100)+'%')
print('Percentage of Population that is Black: '+str(bPop/totPop*100)+'%')
print('Percentage of Population that is Hispanic: '+str(hPop/totPop*100)+'%')
print('Percentage of Population that is Asian: '+str(aPop/totPop*100)+'%')
print('Percentage of Population that is Mixed: '+str(mPop/totPop*100)+'%')
print('Percentage of Population that is Other: '+str(oPop/totPop*100)+'%')

TOP 20 CRIME BEATS RACE BREAKDOWN:

Percentage of Population that is White: 2.0880394087767216%
Percentage of Population that is Black: 78.28544007068244%
Percentage of Population that is Hispanic: 18.41771635635397%
Percentage of Population that is Asian: 0.25066371683085625%
Percentage of Population that is Mixed: 0.7222361393901262%
Percentage of Population that is Other: 0.23590431963776354%


Now median income and food stamps:

In [43]:
tot_MI = 0
tot_FS = 0
for ind in df_topCrimes.index:
    tot_MI += df_topCrimes['med_income'][ind]
    tot_FS += df_topCrimes['pop_food_stamps'][ind]
print('TOP 20 CRIME BEATS INCOME AND FOOD STAMPS BREAKDOWN:\n')
print('Average Median Income: '+str(tot_MI/len(df_topCrimes)))
print('Percentage of Population on Food Stamps: '+str(tot_FS/totPop*100)+'%')

TOP 20 CRIME BEATS INCOME AND FOOD STAMPS BREAKDOWN:

Average Median Income: 30110.89463651152
Percentage of Population on Food Stamps: 13.31673665305361%


Now, the educational attainment percentages:

In [44]:
bachelorsPop = df_topCrimes['bachelors'].sum()
HSPop = df_topCrimes['high_school'].sum()
no_HSPop = df_topCrimes['no_high_school'].sum()
print('TOP 20 CRIME BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:\n')
print('Percentage of Population with at least a Bachelor\'s Degree: '+str(bachelorsPop/totPop*100)+'%')
print('Percentage of Population with at most a High School Diploma '+str(HSPop/totPop*100)+'%')
print('Percentage of Population without a High School Diploma '+str(no_HSPop/totPop*100)+'%')

TOP 20 CRIME BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:

Percentage of Population with at least a Bachelor's Degree: 12.033913755875185%
Percentage of Population with at most a High School Diploma 34.75242797082231%
Percentage of Population without a High School Diploma 13.938271548187686%


And school enrollment:

In [45]:
total_se = df_topCrimes['total_se'].sum()
total_se_0to4 = df_topCrimes['se_0-4'].sum()
total_se_5to9 = df_topCrimes['se_5-9'].sum()
total_se_10to14 = df_topCrimes['se_10-14'].sum()
total_se_15to17 = df_topCrimes['se_15-17'].sum()
total_se_18to19 = df_topCrimes['se_18-19'].sum()
total_se_20to24 = df_topCrimes['se_20-24'].sum()
total_se_25to34 = df_topCrimes['se_25-34'].sum()
total_se_35plus = df_topCrimes['se_35+'].sum()
total_0to4 = df_topCrimes['0-4'].sum()
total_5to9 = df_topCrimes['5-9'].sum()
total_10to14 = df_topCrimes['10-14'].sum()
total_15to17 = df_topCrimes['15-17'].sum()
total_18to19 = df_topCrimes['18-19'].sum()
total_20to24 = df_topCrimes['20'].sum() + df_topCrimes['21'].sum() + df_topCrimes['22-24'].sum()
total_25to34 = df_topCrimes['25-29'].sum() + df_topCrimes['30-34'].sum() 
total_youngPop = totPop - (df_topCrimes['25-29'].sum() + df_topCrimes['30-34'].sum()) - (df_topCrimes['20'].sum() + df_topCrimes['21'].sum() + df_topCrimes['22-24'].sum()) - df_topCrimes['18-19'].sum() - df_topCrimes['15-17'].sum() -df_topCrimes['10-14'].sum() -df_topCrimes['5-9'].sum()-df_topCrimes['0-4'].sum()

print('TOP 20 CRIME BEATS SCHOOL ENROLLMENT BREAKDOWN:\n')
print('Percentage of Residents that are Enrolled Students (All): '+str(total_se/totPop*100)+'%')
print('Percentage of Residents that are Enrolled Students (0-4): '+str(total_se_0to4/total_0to4*100)+'%')
print('Percentage of Residents that are Enrolled Students (5-9): '+str(total_se_5to9/total_5to9*100)+'%')
print('Percentage of Residents that are Enrolled Students (10-14): '+str(total_se_10to14/total_10to14*100)+'%')
print('Percentage of Residents that are Enrolled Students (15-17): '+str(total_se_15to17/total_15to17*100)+'%')
print('Percentage of Residents that are Enrolled Students (18-19): '+str(total_se_18to19/total_18to19*100)+'%')
print('Percentage of Residents that are Enrolled Students (20-24): '+str(total_se_20to24/total_20to24*100)+'%')
print('Percentage of Residents that are Enrolled Students (25-34): '+str(total_se_25to34/total_25to34*100)+'%')
print('Percentage of Residents that are Enrolled Students (35+): '+str(total_se_35plus/total_youngPop*100)+'%')

TOP 20 CRIME BEATS SCHOOL ENROLLMENT BREAKDOWN:

Percentage of Residents that are Enrolled Students (All): 31.664291520202635%
Percentage of Residents that are Enrolled Students (0-4): 54.056995034635%
Percentage of Residents that are Enrolled Students (5-9): 96.56860098647803%
Percentage of Residents that are Enrolled Students (10-14): 98.67822886515363%
Percentage of Residents that are Enrolled Students (15-17): 97.143694081353%
Percentage of Residents that are Enrolled Students (18-19): 67.33907105642321%
Percentage of Residents that are Enrolled Students (20-24): 32.25264858573105%
Percentage of Residents that are Enrolled Students (25-34): 13.141824799159416%
Percentage of Residents that are Enrolled Students (35+): 3.54169347960473%


Now, we will look at the age breakdowns:

In [46]:
minors = df_topCrimes['<=21'].sum()
twenties = df_topCrimes['22-29'].sum()
thirties = df_topCrimes['30-39'].sum()
forties = df_topCrimes['40-49'].sum()
fifties = df_topCrimes['50-59'].sum()
sixties = df_topCrimes['60-64'].sum()
seniors = df_topCrimes['65+'].sum()
print('TOP 20 CRIME BEATS AGE BREAKDOWN:\n')
print('Percentage of Residents that are Minors (<=21): '+str(minors/totPop*100)+'%')
print('Percentage of Residents that are in their Twenties: '+str(twenties/totPop*100)+'%')
print('Percentage of Residents that are in their Thirties: '+str(thirties/totPop*100)+'%')
print('Percentage of Residents that are in their Forties: '+str(forties/totPop*100)+'%')
print('Percentage of Residents that are in their Fifties: '+str(fifties/totPop*100)+'%')
print('Percentage of Residents that are in between 60-64: '+str(sixties/totPop*100)+'%')
print('Percentage of Residents that are Seniors (65+): '+str(seniors/totPop*100)+'%')

TOP 20 CRIME BEATS AGE BREAKDOWN:

Percentage of Residents that are Minors (<=21): 34.4500137675301%
Percentage of Residents that are in their Twenties: 11.51611512193443%
Percentage of Residents that are in their Thirties: 12.570126182939386%
Percentage of Residents that are in their Forties: 12.04503042627547%
Percentage of Residents that are in their Fifties: 12.17513465480023%
Percentage of Residents that are in between 60-64: 4.77224315603509%
Percentage of Residents that are Seniors (65+): 12.471320282339397%


Now, we will compute the average and median ranking for the race, income, food stamp, and educational attainment categories for this data:

In [47]:
df_topCrime_ranks = df_topCrimes[['beat','rank_%w', 'rank_%b', 'rank_%h', 'rank_%a', 'rank_%m',
       'rank_%o', 'rank_income', 'rank_fs', 'rank_bachelors',
       'rank_high_school', 'rank_no_high_school', 'rank_total_se',
       'rank_total_se_0-4', 'rank_total_se_5-9', 'rank_total_se_10-14',
       'rank_total_se_15-17', 'rank_total_se_18-19', 'rank_total_se_20-24',
       'rank_total_se_25-34', 'rank_total_se_35+']]

In [48]:
beatList = list(df_topCrime_ranks.beat)
df_topCrime_ranks['rank_beat'] = df_topCrime_ranks.beat.apply(lambda x: beatList.index(x)+1)
df_topCrime_ranks.set_index('beat',inplace = True)
print('TOP 20 CRIME BEATS MEDIAN RANKINGS: ')
print(df_topCrime_ranks.median(axis = 0))
print()
print('TOP 20 CRIME BEATS AVERAGE (MEAN) RANKINGS: ')
print(df_topCrime_ranks.mean(axis = 0))
df_topCrime_ranks

TOP 20 CRIME BEATS MEDIAN RANKINGS: 
rank_%w                222.5
rank_%b                 54.5
rank_%h                203.0
rank_%a                212.0
rank_%m                209.0
rank_%o                171.5
rank_income            213.5
rank_fs                 67.5
rank_bachelors         208.0
rank_high_school        75.0
rank_no_high_school     98.0
rank_total_se           84.5
rank_total_se_0-4      171.0
rank_total_se_5-9      164.5
rank_total_se_10-14    155.5
rank_total_se_15-17    104.5
rank_total_se_18-19    182.0
rank_total_se_20-24    183.0
rank_total_se_25-34    160.5
rank_total_se_35+      105.5
rank_beat               10.5
dtype: float64

TOP 20 CRIME BEATS AVERAGE (MEAN) RANKINGS: 
rank_%w                219.05
rank_%b                 61.00
rank_%h                175.65
rank_%a                211.60
rank_%m                197.85
rank_%o                165.30
rank_income            206.80
rank_fs                 67.85
rank_bachelors         205.05
rank_high_school       

Unnamed: 0_level_0,rank_%w,rank_%b,rank_%h,rank_%a,rank_%m,rank_%o,rank_income,rank_fs,rank_bachelors,rank_high_school,...,rank_total_se,rank_total_se_0-4,rank_total_se_5-9,rank_total_se_10-14,rank_total_se_15-17,rank_total_se_18-19,rank_total_se_20-24,rank_total_se_25-34,rank_total_se_35+,rank_beat
beat,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
423,196,97,96,216,259,177,186,101,186,68,...,70,97,195,200,148,175,170,121,74,1
421,167,70,232,206,102,119,254,39,173,59,...,28,46,113,12,74,146,196,87,66,2
624,249,8,247,175,261,128,237,9,190,11,...,189,174,247,163,245,29,186,104,41,3
1533,204,24,235,224,260,211,235,79,238,126,...,35,135,10,161,122,62,175,59,82,4
823,153,151,22,187,149,184,168,118,253,161,...,62,200,69,135,161,194,210,200,211,5
511,229,17,265,163,165,176,144,99,137,35,...,90,125,30,183,70,159,29,42,15,6
1112,231,110,68,244,154,10,213,60,243,145,...,92,79,145,164,204,156,232,228,270,7
1522,198,63,184,229,265,146,184,90,194,60,...,126,221,168,109,87,235,213,226,173,8
2533,203,123,37,140,199,236,160,126,225,144,...,129,248,212,133,175,180,185,243,208,9
414,250,27,236,261,116,196,126,108,109,41,...,165,183,27,54,12,258,39,21,3,10


**Key Takeaways:**
* When we look at how these beats rank in the context of all the police beats in Chicago, the mean and median rankings are relatively low  for percentage of white citizens, Asian citizens, and Hispanic citizens (each individually) to the total population in these beats
* However, we see these beats rank relatively high for percentage of black citizens to the population in these beats and the percentage of the population in these beats on food stamps in the context of all the police beats in Chicago
* Black citizens make up roughly 78% of these beats residents and Hispanic residents make up roughly 18% of these beats population...a very large portion of these beats demographic makeup, but singificantly more Black citizens. It's also worthwhile to mention that White citizens make up only about 2% of these beats residents
* We also see these beats have a low avergae median income (about $30500, which is slightly more than half the average income throughout all of Chicago) and about 40% of the residents in these beats are on food stamps (roughly twice the percentage of total Chicago residents on food stamps). This is well-reflected in the low average and median ranking for these beats in the median income category and the beats high ranking in the 'percentage of population on food stamps' category.
* These beats also had high rankings of residents who earned at most a High School Diploma or didn't have a High School Diploma (~50 and ~90, respectively) when compared to all police beats in Chicago. We also saw that roughly 19% of these beats had residents with at least a Bachelor's Degree, compared to the 40% of Chicago residents with a Bachelor's Degree.
*These beats were also heavily populated by minors and roughly equal percentages of every other age group

**Some Conclusions:**
* These beats are made up of mostly "poor" Black and Hispanic minorities, but significantly more Black citizens than any other race group
* Just based of these observations, we would assume areas with greater Black populations and lower standards of living will be more active in terms of reported crime
* These citizens also have typically lower levels of educational attainment with more than half of these beats residents never earning a degree higher than a High School Diploma (57%)
*These beats are also predominantly populated by minors, suggesting we have large nuclear and extended families where children and young adults far outnumber parents and older citizens


**Now, the top 20 beats in arrests:**

In [49]:
topFile.readline()
t20_arrests = topFile.readline().split(' ')
t20_arrests = t20_arrests[:-1]
for i in range (0,20):
    t20_arrests[i] = pad0(t20_arrests[i])
print(t20_arrests)

['1533', '1112', '1522', '1532', '2533', '1531', '1122', '1011', '1113', '1121', '0624', '1132', '1523', '0421', '0423', '0312', '0713', '0825', '1133', '2535']


In [50]:
df_topArrests = df_beat[df_beat['beat'].isin(t20_arrests)]
df_topArrests.set_index('beat', inplace = True)
df_topArrests = df_topArrests.reindex(t20_arrests)

In [51]:
df_topArrests.reset_index(inplace = True)
df_topArrests

Unnamed: 0,beat,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,med_income,...,rank_no_high_school,rank_total_se,rank_total_se_0-4,rank_total_se_5-9,rank_total_se_10-14,rank_total_se_15-17,rank_total_se_18-19,rank_total_se_20-24,rank_total_se_25-34,rank_total_se_35+
0,1533,7287.673417,0.552272,140.701888,85.730872,7045.973828,5.959837,6.96481,2.342207,25628.891687,...,69,35,135,10,161,122,62,175,59,82
1,1112,7155.958067,0.315592,73.67862,2470.596099,4423.679248,0.002181,75.530586,112.471337,28693.33387,...,24,92,79,145,164,204,156,232,228,270
2,1522,9822.318339,0.573595,211.159122,435.344207,9149.668723,5.669283,5.736894,14.740125,33675.668578,...,127,126,221,168,109,87,235,213,226,173
3,1532,6486.946676,0.393223,88.955982,289.068497,6026.17593,0.0,72.173634,10.572636,28346.444387,...,88,39,208,28,197,23,239,233,259,35
4,2533,11224.65803,1.010532,222.263184,6610.983661,4196.415104,124.271773,70.723444,0.000864,37057.687967,...,25,129,248,212,133,175,180,185,243,208
5,1531,5855.08873,0.319057,48.218679,453.475009,5309.199351,20.86749,23.328184,0.0,27640.81894,...,78,142,259,170,140,25,222,137,246,112
6,1122,6806.230907,0.560782,70.820814,566.205451,6097.837823,4.969004,46.31679,20.081017,24959.501213,...,63,52,117,161,172,171,223,125,250,166
7,1011,6659.731682,0.475028,14.337526,97.374433,6383.82508,11.483253,5.475101,147.236288,22207.616375,...,42,27,168,14,62,2,188,41,257,177
8,1113,4035.188344,0.371758,93.304441,98.400427,3806.816884,0.486165,33.746798,2.433629,26372.036069,...,41,15,48,134,175,57,114,44,185,182
9,1121,6090.196118,0.441662,576.883731,1769.204137,3705.648036,4.438118,29.798827,4.223282,34102.379689,...,98,60,74,206,44,132,181,127,118,163


Let's do some aggregate measurements on this data:

First, we will look at race percentages for these beats all together:

In [52]:
totPop = 0
wPop = 0
bPop = 0
hPop = 0
mPop = 0
aPop = 0
oPop = 0
for ind in df_topArrests.index:
    totPop += df_topArrests['population'][ind]
    wPop += df_topArrests['num_white'][ind]
    bPop += df_topArrests['num_black'][ind]
    hPop += df_topArrests['num_hispanic'][ind]
    mPop += df_topArrests['num_mixed'][ind]
    aPop += df_topArrests['num_asian'][ind]
    oPop += df_topArrests['num_other'][ind]
print('TOP 20 ARRESTS BEATS RACE BREAKDOWN:\n')
print('Percentage of Population that is White: '+str(wPop/totPop*100)+'%')
print('Percentage of Population that is Black: '+str(bPop/totPop*100)+'%')
print('Percentage of Population that is Hispanic: '+str(hPop/totPop*100)+'%')
print('Percentage of Population that is Asian: '+str(aPop/totPop*100)+'%')
print('Percentage of Population that is Mixed: '+str(mPop/totPop*100)+'%')
print('Percentage of Population that is Other: '+str(oPop/totPop*100)+'%')

TOP 20 ARRESTS BEATS RACE BREAKDOWN:

Percentage of Population that is White: 2.5159130390837747%
Percentage of Population that is Black: 78.06718239866511%
Percentage of Population that is Hispanic: 18.351130970464453%
Percentage of Population that is Asian: 0.2089970197329747%
Percentage of Population that is Mixed: 0.5261730496742908%
Percentage of Population that is Other: 0.3306035541017364%


Now median income and food stamps:

In [53]:
tot_MI = 0
tot_FS = 0
for ind in df_topArrests.index:
    tot_MI += df_topArrests['med_income'][ind]
    tot_FS += df_topArrests['pop_food_stamps'][ind]
print('TOP 20 ARRESTS BEATS INCOME AND FOOD STAMPS BREAKDOWN:\n')
print('Average Median Income: '+str(tot_MI/len(df_topArrests)))
print('Percentage of Population on Food Stamps: '+str(tot_FS/totPop*100)+'%')

TOP 20 ARRESTS BEATS INCOME AND FOOD STAMPS BREAKDOWN:

Average Median Income: 27909.107869050476
Percentage of Population on Food Stamps: 14.301851956627399%


Now, the educational attainment percentages:

In [54]:
bachelorsPop = df_topArrests['bachelors'].sum()
HSPop = df_topArrests['high_school'].sum()
no_HSPop = df_topArrests['no_high_school'].sum()
print('TOP 20 ARRESTS BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:\n')
print('Percentage of Population with at least a Bachelor\'s Degree: '+str(bachelorsPop/totPop*100)+'%')
print('Percentage of Population with at most a High School Diploma '+str(HSPop/totPop*100)+'%')
print('Percentage of Population without a High School Diploma '+str(no_HSPop/totPop*100)+'%')

TOP 20 ARRESTS BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:

Percentage of Population with at least a Bachelor's Degree: 9.919511736269582%
Percentage of Population with at most a High School Diploma 33.97339995460212%
Percentage of Population without a High School Diploma 15.068799169881425%


School enrollment percentages:

In [55]:
total_se = df_topArrests['total_se'].sum()
total_se_0to4 = df_topArrests['se_0-4'].sum()
total_se_5to9 = df_topArrests['se_5-9'].sum()
total_se_10to14 = df_topArrests['se_10-14'].sum()
total_se_15to17 = df_topArrests['se_15-17'].sum()
total_se_18to19 = df_topArrests['se_18-19'].sum()
total_se_20to24 = df_topArrests['se_20-24'].sum()
total_se_25to34 = df_topArrests['se_25-34'].sum()
total_se_35plus = df_topArrests['se_35+'].sum()
total_0to4 = df_topArrests['0-4'].sum()
total_5to9 = df_topArrests['5-9'].sum()
total_10to14 = df_topArrests['10-14'].sum()
total_15to17 = df_topArrests['15-17'].sum()
total_18to19 = df_topArrests['18-19'].sum()
total_20to24 = df_topArrests['20'].sum() + df_topArrests['21'].sum() + df_topArrests['22-24'].sum()
total_25to34 = df_topArrests['25-29'].sum() + df_topArrests['30-34'].sum() 
total_youngPop = totPop - (df_topArrests['25-29'].sum() + df_topArrests['30-34'].sum()) - (df_topArrests['20'].sum() + df_topArrests['21'].sum() + df_topArrests['22-24'].sum()) - df_topArrests['18-19'].sum() - df_topArrests['15-17'].sum() -df_topArrests['10-14'].sum() -df_topArrests['5-9'].sum()-df_topArrests['0-4'].sum()

print('TOP 20 ARRESTS BEATS SCHOOL ENROLLMENT BREAKDOWN:\n')
print('Percentage of Residents that are Enrolled Students (All): '+str(total_se/totPop*100)+'%')
print('Percentage of Residents that are Enrolled Students (0-4): '+str(total_se_0to4/total_0to4*100)+'%')
print('Percentage of Residents that are Enrolled Students (5-9): '+str(total_se_5to9/total_5to9*100)+'%')
print('Percentage of Residents that are Enrolled Students (10-14): '+str(total_se_10to14/total_10to14*100)+'%')
print('Percentage of Residents that are Enrolled Students (15-17): '+str(total_se_15to17/total_15to17*100)+'%')
print('Percentage of Residents that are Enrolled Students (18-19): '+str(total_se_18to19/total_18to19*100)+'%')
print('Percentage of Residents that are Enrolled Students (20-24): '+str(total_se_20to24/total_20to24*100)+'%')
print('Percentage of Residents that are Enrolled Students (25-34): '+str(total_se_25to34/total_25to34*100)+'%')
print('Percentage of Residents that are Enrolled Students (35+): '+str(total_se_35plus/total_youngPop*100)+'%')

TOP 20 ARRESTS BEATS SCHOOL ENROLLMENT BREAKDOWN:

Percentage of Residents that are Enrolled Students (All): 31.989831226156923%
Percentage of Residents that are Enrolled Students (0-4): 53.62660068967927%
Percentage of Residents that are Enrolled Students (5-9): 96.54014859337397%
Percentage of Residents that are Enrolled Students (10-14): 98.86494727633195%
Percentage of Residents that are Enrolled Students (15-17): 97.18158883289293%
Percentage of Residents that are Enrolled Students (18-19): 69.08311778369453%
Percentage of Residents that are Enrolled Students (20-24): 29.480844279943263%
Percentage of Residents that are Enrolled Students (25-34): 10.13457986223088%
Percentage of Residents that are Enrolled Students (35+): 3.1228224306400305%


Lastly, we will look at the age breakdowns:

In [56]:
minors = df_topArrests['<=21'].sum()
twenties = df_topArrests['22-29'].sum()
thirties = df_topArrests['30-39'].sum()
forties = df_topArrests['40-49'].sum()
fifties = df_topArrests['50-59'].sum()
sixties = df_topArrests['60-64'].sum()
seniors = df_topArrests['65+'].sum()
print('TOP 20 ARRESTS BEATS AGE BREAKDOWN:\n')
print('Percentage of Residents that are Minors (<=21): '+str(minors/totPop*100)+'%')
print('Percentage of Residents that are in their Twenties: '+str(twenties/totPop*100)+'%')
print('Percentage of Residents that are in their Thirties: '+str(thirties/totPop*100)+'%')
print('Percentage of Residents that are in their Forties: '+str(forties/totPop*100)+'%')
print('Percentage of Residents that are in their Fifties: '+str(fifties/totPop*100)+'%')
print('Percentage of Residents that are in between 60-64: '+str(sixties/totPop*100)+'%')
print('Percentage of Residents that are Seniors (65+): '+str(seniors/totPop*100)+'%')

TOP 20 ARRESTS BEATS AGE BREAKDOWN:

Percentage of Residents that are Minors (<=21): 35.71239216190667%
Percentage of Residents that are in their Twenties: 12.417169350790696%
Percentage of Residents that are in their Thirties: 12.335004848057968%
Percentage of Residents that are in their Forties: 11.455955509640054%
Percentage of Residents that are in their Fifties: 12.110412648396318%
Percentage of Residents that are in between 60-64: 4.710738764995332%
Percentage of Residents that are Seniors (65+): 11.258346134911047%


Now, we will compute the average and median ranking for the race, income, food stamp, and educational attainment categories for this data:

In [57]:
df_topArrest_ranks = df_topArrests[['beat','rank_%w', 'rank_%b', 'rank_%h', 'rank_%a', 'rank_%m',
       'rank_%o', 'rank_income', 'rank_fs', 'rank_bachelors',
       'rank_high_school', 'rank_no_high_school', 'rank_total_se',
       'rank_total_se_0-4', 'rank_total_se_5-9', 'rank_total_se_10-14',
       'rank_total_se_15-17', 'rank_total_se_18-19', 'rank_total_se_20-24',
       'rank_total_se_25-34', 'rank_total_se_35+']]

In [58]:
beatList = list(df_topArrest_ranks.beat)
df_topArrest_ranks['rank_beat'] = df_topArrest_ranks.beat.apply(lambda x: beatList.index(x)+1)
df_topArrest_ranks.set_index('beat',inplace = True)
print('TOP 20 ARRESTS BEATS MEDIAN RANKINGS: ')
print(df_topArrest_ranks.median(axis = 0))
print()
print('TOP 20 ARRESTS BEATS AVERAGE (MEAN) RANKINGS: ')
print(df_topArrest_ranks.mean(axis = 0))
df_topArrest_ranks

TOP 20 ARRESTS BEATS MEDIAN RANKINGS: 
rank_%w                203.5
rank_%b                 64.5
rank_%h                188.0
rank_%a                224.5
rank_%m                209.5
rank_%o                150.0
rank_income            225.0
rank_fs                 50.0
rank_bachelors         230.0
rank_high_school        97.0
rank_no_high_school     73.5
rank_total_se           77.0
rank_total_se_0-4      177.0
rank_total_se_5-9      153.0
rank_total_se_10-14    159.0
rank_total_se_15-17    104.5
rank_total_se_18-19    177.5
rank_total_se_20-24    183.0
rank_total_se_25-34    212.5
rank_total_se_35+      164.5
rank_beat               10.5
dtype: float64

TOP 20 ARRESTS BEATS AVERAGE (MEAN) RANKINGS: 
rank_%w                205.55
rank_%b                 72.00
rank_%h                162.00
rank_%a                219.35
rank_%m                211.50
rank_%o                151.00
rank_income            219.10
rank_fs                 56.90
rank_bachelors         222.25
rank_high_school   

Unnamed: 0_level_0,rank_%w,rank_%b,rank_%h,rank_%a,rank_%m,rank_%o,rank_income,rank_fs,rank_bachelors,rank_high_school,...,rank_total_se,rank_total_se_0-4,rank_total_se_5-9,rank_total_se_10-14,rank_total_se_15-17,rank_total_se_18-19,rank_total_se_20-24,rank_total_se_25-34,rank_total_se_35+,rank_beat
beat,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1533,204,24,235,224,260,211,235,79,238,126,...,35,135,10,161,122,62,175,59,82,1
1112,231,110,68,244,154,10,213,60,243,145,...,92,79,145,164,204,156,232,228,270,2
1522,198,63,184,229,265,146,184,90,194,60,...,126,221,168,109,87,235,213,226,173,3
1532,217,66,183,254,148,140,214,45,249,96,...,39,208,28,197,23,239,233,259,35,4
2533,203,123,37,140,199,236,160,126,225,144,...,129,248,212,133,175,180,185,243,208,5
1531,240,76,148,178,223,267,220,46,239,23,...,142,259,170,140,25,222,137,246,112,6
1122,228,82,143,225,189,98,244,30,245,109,...,52,117,161,172,171,223,125,250,166,7
1011,264,37,224,208,263,3,256,20,250,98,...,27,168,14,62,2,188,41,257,177,8
1113,192,53,208,239,179,195,230,48,267,94,...,15,48,134,175,57,114,44,185,182,9
1121,144,111,75,226,211,188,181,63,176,67,...,60,74,206,44,132,181,127,118,163,10


**Key Takeaways:**
* When we look at how these beats rank in the context of all the police beats in Chicago, the mean and median rankings are relatively low  for percentage of all of the races (each individually) except Blacks to the total population in these beats (much like what we saw in the top 20 crime beats)
* However, we see these beats rank relatively high for percentage of black citizens to the population in these beats and the percentage of the population in these beats on food stamps in the context of all the police beats in Chicago, ranking even higher than the beats with the most number of reported crimes
* Black citizens make up roughly 78% of these beats residents and Hispanic residents make up roughly 18% of these beats population...a very large portion of these beats demographic makeup, though significantly more Black citizens...nearly identical figures to what we saw in the top crime beats. It's also worthwhile to mention that White citizens make up only about 2% of these beats residents
* We also see these beats have a low avergae median income (about $28250, which is less than half the average income throughout all of Chicago) and about 44% of the residents in these beats are on food stamps (a little more than twice the percentage of total Chicago residents on food stamps). This is well-reflected in the low average and median ranking for these beats in the median income category and the beats high ranking in the 'percentage of population on food stamps' category.
*Much like the Top Crime beats, these beats had about 57% of its residents earning a High School Diploma as their highest level of educational attainment. However, these beats had a greater percentage of citizens never earning a High School Diploma and a smaller percentage of residents earning at least a Bachelor's Degree than the Top Crime beats. In the context of all of Chicago, these beats have a much greater percentage of residents who don't hold a Bachelor's Degree (about 24% less than the avergae of Chicago when combining the precentage who have earned at most a High School DIploma or less)
* These beats are also predominantly young minors (about 36% of these beats residents) and roughly equal percentages of all other age groups (about 11%)

**Some Conclusions:**
* These beats are made up of mostly "poor" Black and Hispanic minorities, but significantly more Black citizens than any other race group. These beats can be categorized as less affluent/wealthy than the beats with the most crime, both of which are extremely low
* These beats also have a smaller percentage of residents with at least a Bachelor's Degree, signifying that these residents have a lower level of educational attainment than even the Top Crime beats, both of which are significantly lower than the average level of educational attainment in Chicago
* Just based of these observations, we would assume areas with greater Black populations and lower standards of living will be more active in terms of arrests. We would also be lead to assume that the beats with more arrests would be subjected to a much lower standard of living compared to the average Chicago resident and even the Chiacgo resident living in a 'crime-ridden' area
* These beats are also predominantly populated by minors, suggesting we have large nuclear and extended families where children and young adults far outnumber parents and older citizens, even more so than in the Top Crime beats


**Now, the top 20 beats in arrest to crime ratio:**

In [59]:
topFile.readline()
t20_ratio = topFile.readline().split(' ')
t20_ratio = t20_ratio[:-1]
for i in range (0,20):
    t20_ratio[i] = pad0(t20_ratio[i])
print(t20_ratio)

['0134', '1114', '2113', '1531', '1113', '1115', '1532', '1533', '1112', '1134', '1131', '1124', '1122', '1522', '1823', '1523', '1125', '1011', '1132', '1121']


In [60]:
df_topRatio = df_beat[df_beat['beat'].isin(t20_ratio)]
df_topRatio.set_index('beat', inplace = True)
df_topRatio = df_topRatio.reindex(t20_ratio)

In [61]:
df_topRatio.reset_index(inplace = True)
df_topRatio

Unnamed: 0,beat,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,med_income,...,rank_no_high_school,rank_total_se,rank_total_se_0-4,rank_total_se_5-9,rank_total_se_10-14,rank_total_se_15-17,rank_total_se_18-19,rank_total_se_20-24,rank_total_se_25-34,rank_total_se_35+
0,134,,,,,,,,,,...,,,,,,,,,,
1,1114,2191.713002,0.19976,21.21044,60.513925,2109.988108,0.0,0.000542,0.0,27976.44307,...,131.0,36.0,217.0,2.0,225.0,263.0,113.0,247.0,10.0,210.0
2,2113,,,,,,,,,,...,,,,,,,,,,
3,1531,5855.08873,0.319057,48.218679,453.475009,5309.199351,20.86749,23.328184,0.0,27640.81894,...,78.0,142.0,259.0,170.0,140.0,25.0,222.0,137.0,246.0,112.0
4,1113,4035.188344,0.371758,93.304441,98.400427,3806.816884,0.486165,33.746798,2.433629,26372.036069,...,41.0,15.0,48.0,134.0,175.0,57.0,114.0,44.0,185.0,182.0
5,1115,2493.957781,0.171909,57.348211,42.300721,2351.581591,5.548039,36.202608,0.976614,22484.559921,...,66.0,82.0,241.0,64.0,146.0,224.0,259.0,269.0,172.0,212.0
6,1532,6486.946676,0.393223,88.955982,289.068497,6026.17593,0.0,72.173634,10.572636,28346.444387,...,88.0,39.0,208.0,28.0,197.0,23.0,239.0,233.0,259.0,35.0
7,1533,7287.673417,0.552272,140.701888,85.730872,7045.973828,5.959837,6.96481,2.342207,25628.891687,...,69.0,35.0,135.0,10.0,161.0,122.0,62.0,175.0,59.0,82.0
8,1112,7155.958067,0.315592,73.67862,2470.596099,4423.679248,0.002181,75.530586,112.471337,28693.33387,...,24.0,92.0,79.0,145.0,164.0,204.0,156.0,232.0,228.0,270.0
9,1134,4540.937191,0.352457,154.202578,129.991012,4174.674919,36.964347,23.168435,21.935901,26950.148678,...,94.0,71.0,103.0,246.0,37.0,3.0,119.0,240.0,260.0,187.0


Unfortunately, we are missing beat data for two of these beats...so we will have to remove them from our aggregate analysis

In [62]:
df_topRatio = df_topRatio.loc[(df_topRatio.beat != '0134') & (df_topRatio.beat != '2113')]

Now for some aggregate measurements on this data:

First, we will look at race percentages for these beats all together:

In [63]:
totPop = 0
wPop = 0
bPop = 0
hPop = 0
mPop = 0
aPop = 0
oPop = 0
for ind in df_topRatio.index:
    totPop += df_topRatio['population'][ind]
    wPop += df_topRatio['num_white'][ind]
    bPop += df_topRatio['num_black'][ind]
    hPop += df_topRatio['num_hispanic'][ind]
    mPop += df_topRatio['num_mixed'][ind]
    aPop += df_topRatio['num_asian'][ind]
    oPop += df_topRatio['num_other'][ind]
print('TOP 20 ARRESTS:CRIME BEATS RACE BREAKDOWN:\n')
print('Percentage of Population that is White: '+str(wPop/totPop*100)+'%')
print('Percentage of Population that is Black: '+str(bPop/totPop*100)+'%')
print('Percentage of Population that is Hispanic: '+str(hPop/totPop*100)+'%')
print('Percentage of Population that is Asian: '+str(aPop/totPop*100)+'%')
print('Percentage of Population that is Mixed: '+str(mPop/totPop*100)+'%')
print('Percentage of Population that is Other: '+str(oPop/totPop*100)+'%')

TOP 20 ARRESTS:CRIME BEATS RACE BREAKDOWN:

Percentage of Population that is White: 4.322684588819544%
Percentage of Population that is Black: 86.45954524026007%
Percentage of Population that is Hispanic: 7.772431726722815%
Percentage of Population that is Asian: 0.4610555730014877%
Percentage of Population that is Mixed: 0.5842832074440615%
Percentage of Population that is Other: 0.39999974230479646%


Now median income and food stamps:

In [64]:
tot_MI = 0
tot_FS = 0
for ind in df_topRatio.index:
    tot_MI += df_topRatio['med_income'][ind]
    tot_FS += df_topRatio['pop_food_stamps'][ind]
print('TOP 20 ARRESTS:CRIME BEATS INCOME AND FOOD STAMPS BREAKDOWN:\n')
print('Average Median Income: '+str(tot_MI/len(df_topRatio)))
print('Percentage of Population on Food Stamps: '+str(tot_FS/totPop*100)+'%')

TOP 20 ARRESTS:CRIME BEATS INCOME AND FOOD STAMPS BREAKDOWN:

Average Median Income: 30650.09883777042
Percentage of Population on Food Stamps: 14.780087553694802%


Now, the educational attainment percentages:

In [65]:
bachelorsPop = df_topRatio['bachelors'].sum()
HSPop = df_topRatio['high_school'].sum()
no_HSPop = df_topRatio['no_high_school'].sum()
print('TOP 20 ARRESTS:CRIME BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:\n')
print('Percentage of Population with at least a Bachelor\'s Degree: '+str(bachelorsPop/totPop*100)+'%')
print('Percentage of Population with at most a High School Diploma '+str(HSPop/totPop*100)+'%')
print('Percentage of Population without a High School Diploma '+str(no_HSPop/totPop*100)+'%')

TOP 20 ARRESTS:CRIME BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:

Percentage of Population with at least a Bachelor's Degree: 10.786002446718355%
Percentage of Population with at most a High School Diploma 33.44165642769781%
Percentage of Population without a High School Diploma 14.6056377151908%


School enrollment percentages:

In [66]:
total_se = df_topRatio['total_se'].sum()
total_se_0to4 = df_topRatio['se_0-4'].sum()
total_se_5to9 = df_topRatio['se_5-9'].sum()
total_se_10to14 = df_topRatio['se_10-14'].sum()
total_se_15to17 = df_topRatio['se_15-17'].sum()
total_se_18to19 = df_topRatio['se_18-19'].sum()
total_se_20to24 = df_topRatio['se_20-24'].sum()
total_se_25to34 = df_topRatio['se_25-34'].sum()
total_se_35plus = df_topRatio['se_35+'].sum()
total_0to4 = df_topRatio['0-4'].sum()
total_5to9 = df_topRatio['5-9'].sum()
total_10to14 = df_topRatio['10-14'].sum()
total_15to17 = df_topRatio['15-17'].sum()
total_18to19 = df_topRatio['18-19'].sum()
total_20to24 = df_topRatio['20'].sum() + df_topRatio['21'].sum() + df_topRatio['22-24'].sum()
total_25to34 = df_topRatio['25-29'].sum() + df_topRatio['30-34'].sum() 
total_youngPop = totPop - (df_topRatio['25-29'].sum() + df_topRatio['30-34'].sum()) - (df_topRatio['20'].sum() + df_topRatio['21'].sum() + df_topRatio['22-24'].sum()) - df_topRatio['18-19'].sum() - df_topRatio['15-17'].sum() -df_topRatio['10-14'].sum() -df_topRatio['5-9'].sum()-df_topRatio['0-4'].sum()

print('TOP 20 ARRESTS:CRIME BEATS SCHOOL ENROLLMENT BREAKDOWN:\n')
print('Percentage of Residents that are Enrolled Students (All): '+str(total_se/totPop*100)+'%')
print('Percentage of Residents that are Enrolled Students (0-4): '+str(total_se_0to4/total_0to4*100)+'%')
print('Percentage of Residents that are Enrolled Students (5-9): '+str(total_se_5to9/total_5to9*100)+'%')
print('Percentage of Residents that are Enrolled Students (10-14): '+str(total_se_10to14/total_10to14*100)+'%')
print('Percentage of Residents that are Enrolled Students (15-17): '+str(total_se_15to17/total_15to17*100)+'%')
print('Percentage of Residents that are Enrolled Students (18-19): '+str(total_se_18to19/total_18to19*100)+'%')
print('Percentage of Residents that are Enrolled Students (20-24): '+str(total_se_20to24/total_20to24*100)+'%')
print('Percentage of Residents that are Enrolled Students (25-34): '+str(total_se_25to34/total_25to34*100)+'%')
print('Percentage of Residents that are Enrolled Students (35+): '+str(total_se_35plus/total_youngPop*100)+'%')

TOP 20 ARRESTS:CRIME BEATS SCHOOL ENROLLMENT BREAKDOWN:

Percentage of Residents that are Enrolled Students (All): 32.5225018964701%
Percentage of Residents that are Enrolled Students (0-4): 55.790401186980944%
Percentage of Residents that are Enrolled Students (5-9): 96.54313070056169%
Percentage of Residents that are Enrolled Students (10-14): 99.16720725407569%
Percentage of Residents that are Enrolled Students (15-17): 97.38949735860052%
Percentage of Residents that are Enrolled Students (18-19): 66.44521108540046%
Percentage of Residents that are Enrolled Students (20-24): 28.651716169809227%
Percentage of Residents that are Enrolled Students (25-34): 11.2584576035831%
Percentage of Residents that are Enrolled Students (35+): 3.2176013108250667%


Lastly, we will look at the age breakdowns:

In [67]:
minors = df_topRatio['<=21'].sum()
twenties = df_topRatio['22-29'].sum()
thirties = df_topRatio['30-39'].sum()
forties = df_topRatio['40-49'].sum()
fifties = df_topRatio['50-59'].sum()
sixties = df_topRatio['60-64'].sum()
seniors = df_topRatio['65+'].sum()
print('TOP 20 ARRREST:CRIME BEATS AGE BREAKDOWN:\n')
print('Percentage of Residents that are Minors (<=21): '+str(minors/totPop*100)+'%')
print('Percentage of Residents that are in their Twenties: '+str(twenties/totPop*100)+'%')
print('Percentage of Residents that are in their Thirties: '+str(thirties/totPop*100)+'%')
print('Percentage of Residents that are in their Forties: '+str(forties/totPop*100)+'%')
print('Percentage of Residents that are in their Fifties: '+str(fifties/totPop*100)+'%')
print('Percentage of Residents that are in between 60-64: '+str(sixties/totPop*100)+'%')
print('Percentage of Residents that are Seniors (65+): '+str(seniors/totPop*100)+'%')

TOP 20 ARRREST:CRIME BEATS AGE BREAKDOWN:

Percentage of Residents that are Minors (<=21): 36.18842759117399%
Percentage of Residents that are in their Twenties: 12.67938282441686%
Percentage of Residents that are in their Thirties: 12.725310865532677%
Percentage of Residents that are in their Forties: 11.499033662710339%
Percentage of Residents that are in their Fifties: 11.392013515628047%
Percentage of Residents that are in between 60-64: 4.550621825411055%
Percentage of Residents that are Seniors (65+): 10.965295593611241%


Now, we will compute the average and median ranking for the race, income, food stamp, and educational attainment categories for this data:

In [68]:
df_topRatio_ranks = df_topRatio[['beat','rank_%w', 'rank_%b', 'rank_%h', 'rank_%a', 'rank_%m',
       'rank_%o', 'rank_income', 'rank_fs', 'rank_bachelors',
       'rank_high_school', 'rank_no_high_school', 'rank_total_se',
       'rank_total_se_0-4', 'rank_total_se_5-9', 'rank_total_se_10-14',
       'rank_total_se_15-17', 'rank_total_se_18-19', 'rank_total_se_20-24',
       'rank_total_se_25-34', 'rank_total_se_35+']]

In [69]:
beatList = list(df_topRatio_ranks.beat)
df_topRatio_ranks['rank_beat'] = df_topRatio_ranks.beat.apply(lambda x: beatList.index(x)+1)
df_topRatio_ranks.set_index('beat',inplace = True)
print('TOP 20 ARRESTS:CRIME BEATS MEDIAN RANKINGS: ')
print(df_topRatio_ranks.median(axis = 0))
print()
print('TOP 20 ARRESTS:CRIME BEATS AVERAGE (MEAN) RANKINGS: ')
print(df_topRatio_ranks.mean(axis = 0))
df_topRatio_ranks

TOP 20 ARRESTS:CRIME BEATS MEDIAN RANKINGS: 
rank_%w                201.5
rank_%b                 62.0
rank_%h                201.0
rank_%a                219.5
rank_%m                206.5
rank_%o                181.0
rank_income            227.0
rank_fs                 46.5
rank_bachelors         238.5
rank_high_school        97.0
rank_no_high_school     67.5
rank_total_se           65.5
rank_total_se_0-4      151.5
rank_total_se_5-9      164.5
rank_total_se_10-14    143.5
rank_total_se_15-17    104.5
rank_total_se_18-19    184.5
rank_total_se_20-24    232.5
rank_total_se_25-34    177.5
rank_total_se_35+      169.5
rank_beat                9.5
dtype: float64

TOP 20 ARRESTS:CRIME BEATS AVERAGE (MEAN) RANKINGS: 
rank_%w                197.111111
rank_%b                 65.111111
rank_%h                185.444444
rank_%a                207.166667
rank_%m                198.055556
rank_%o                158.611111
rank_income            214.055556
rank_fs                 54.500000
rank_

Unnamed: 0_level_0,rank_%w,rank_%b,rank_%h,rank_%a,rank_%m,rank_%o,rank_income,rank_fs,rank_bachelors,rank_high_school,...,rank_total_se,rank_total_se_0-4,rank_total_se_5-9,rank_total_se_10-14,rank_total_se_15-17,rank_total_se_18-19,rank_total_se_20-24,rank_total_se_25-34,rank_total_se_35+,rank_beat
beat,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1114,233.0,29.0,202.0,250.0,270.0,248.0,216.0,65.0,255.0,63.0,...,36.0,217.0,2.0,225.0,263.0,113.0,247.0,10.0,210.0,1
1531,240.0,76.0,148.0,178.0,223.0,267.0,220.0,46.0,239.0,23.0,...,142.0,259.0,170.0,140.0,25.0,222.0,137.0,246.0,112.0,2
1113,192.0,53.0,208.0,239.0,179.0,195.0,230.0,48.0,267.0,94.0,...,15.0,48.0,134.0,175.0,57.0,114.0,44.0,185.0,182.0,3
1115,194.0,54.0,221.0,200.0,122.0,210.0,255.0,33.0,265.0,119.0,...,82.0,241.0,64.0,146.0,224.0,259.0,269.0,172.0,212.0,4
1532,217.0,66.0,183.0,254.0,148.0,140.0,214.0,45.0,249.0,96.0,...,39.0,208.0,28.0,197.0,23.0,239.0,233.0,259.0,35.0,5
1533,204.0,24.0,235.0,224.0,260.0,211.0,235.0,79.0,238.0,126.0,...,35.0,135.0,10.0,161.0,122.0,62.0,175.0,59.0,82.0,6
1112,231.0,110.0,68.0,244.0,154.0,10.0,213.0,60.0,243.0,145.0,...,92.0,79.0,145.0,164.0,204.0,156.0,232.0,228.0,270.0,7
1134,181.0,71.0,200.0,153.0,210.0,54.0,224.0,36.0,196.0,112.0,...,71.0,103.0,246.0,37.0,3.0,119.0,240.0,260.0,187.0,8
1131,199.0,41.0,220.0,190.0,243.0,190.0,236.0,41.0,232.0,62.0,...,89.0,190.0,182.0,141.0,81.0,264.0,249.0,70.0,125.0,9
1124,171.0,61.0,227.0,215.0,176.0,174.0,231.0,25.0,189.0,115.0,...,19.0,5.0,191.0,217.0,65.0,197.0,235.0,157.0,201.0,10


**Key Takeaways:**
* When we look at how these beats rank in the context of all the police beats in Chicago, the mean and median rankings are relatively low  for percentage of all of the races (each individually) except Blacks to the total population in these beats (much like what we saw in the top 20 crime beats)
* However, we see these beats rank relatively high for percentage of black citizens to the population in these beats and the percentage of the population in these beats on food stamps in the context of all the police beats in Chicago, ranking even higher than the beats with the most number of reported crimes
* Black citizens make up roughly 86% of these beats residents...more than the percentage of Black residents in the top crime and arrest beats. It's also worthwhile to mention that Hispanic and White citizens make up only about 7% and 4% of these beats residents, respectively
* We also see these beats have a much lower avergae median income (about $30700) than the average median income in Chicago and about 28% of the residents in these beats are on food stamps, which is close to the average amount of residents on food stamps in all of Chicago. This is well-reflected in the low average and median ranking for these beats in the median income category and the beats high ranking in the 'percentage of population on food stamps' category.
*Almost identical to the Top Arrests beats, these Top Ratio beats had about 57% of its residents earning a High School Diploma as their highest level of educational attainment, 16% of residents earning at least a Bachelor's Degree, and about 27% of residents never graduating high school
* These beats are also predominantly young minors (about 36% of these beats residents) and roughly equal percentages of all other age groups (about 11%)

**Some Conclusions:**
* These beats are made up of mostly "poor" Black minorities, but significantly more Black citizens than any other race group. These beats can be categorized as less affluent/wealthy than the average Chicago residentail area. Out of the three categories measured thus far, these beats are populated by the most Black residents, but have an equal standard of living (albeit all are low comapred to the average of Chicago and the US).
* These beats also have a smaller percentage of residents with at least a Bachelor's Degree, signifying that these residents have a lower level of educational attainment than even the Top Crime beats, both of which are significantly lower than the average level of educational attainment in Chicago. The educatioanl attainment statistics for these Top Ratio beats, however, are on par with the Top Arrests beats
* Just based of these observations, we would assume areas with greater Black populations and lower standards of living will be more active in terms of arrests to the amount of crime being reported. We would also be lead to assume that the beats with high arrest rates are among some of the poorest in Chicago and are predominantly African-American
* These beats are also predominantly populated by minors, suggesting we have large nuclear and extended families where children and young adults far outnumber parents and older citizens, even more so than in the Top Crime beats


**Lastly, the top 20 beats in police-caused civilian injuries:**

In [70]:
topFile.readline()
t20_injuries = topFile.readline().split(' ')
t20_injuries = t20_injuries[:-1]
for i in range (0,20):
    t20_injuries[i] = pad0(t20_injuries[i])
print(t20_injuries)

['1134', '0621', '1112', '1533', '0713', '1522', '0624', '1824', '1531', '0531', '1133', '0421', '0321', '1132', '1121', '2515', '1024', '0823', '1122', '0522']


In [71]:
df_topInjuries = df_beat[df_beat['beat'].isin(t20_injuries)]
df_topInjuries.set_index('beat', inplace = True)
df_topInjuries = df_topInjuries.reindex(t20_injuries)

In [72]:
df_topInjuries.reset_index(inplace = True)
df_topInjuries

Unnamed: 0,beat,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,med_income,...,rank_no_high_school,rank_total_se,rank_total_se_0-4,rank_total_se_5-9,rank_total_se_10-14,rank_total_se_15-17,rank_total_se_18-19,rank_total_se_20-24,rank_total_se_25-34,rank_total_se_35+
0,1134,4540.937191,0.352457,154.202578,129.991012,4174.674919,36.964347,23.168435,21.935901,26950.148678,...,94,71,103,246,37,3,119,240,260,187
1,621,7119.663074,0.569081,112.400242,67.181066,6914.143865,0.0,12.838651,13.099253,25675.795878,...,135,116,58,248,259,170,184,178,92,106
2,1112,7155.958067,0.315592,73.67862,2470.596099,4423.679248,0.002181,75.530586,112.471337,28693.33387,...,24,92,79,145,164,204,156,232,228,270
3,1533,7287.673417,0.552272,140.701888,85.730872,7045.973828,5.959837,6.96481,2.342207,25628.891687,...,69,35,135,10,161,122,62,175,59,82
4,713,5002.726959,0.487889,64.239085,178.154851,4749.410005,0.0,10.923025,0.0,29756.999864,...,83,79,184,174,49,8,157,181,206,105
5,1522,9822.318339,0.573595,211.159122,435.344207,9149.668723,5.669283,5.736894,14.740125,33675.668578,...,127,126,221,168,109,87,235,213,226,173
6,624,7480.696752,0.602045,55.70874,57.560998,7316.225802,30.675287,6.614789,13.911136,25555.556677,...,118,189,174,247,163,245,29,186,104,41
7,1824,21505.609928,0.327145,17300.316444,910.794823,1530.870378,1477.943566,189.190845,96.493876,87221.212061,...,244,268,258,255,35,31,1,96,77,161
8,1531,5855.08873,0.319057,48.218679,453.475009,5309.199351,20.86749,23.328184,0.0,27640.81894,...,78,142,259,170,140,25,222,137,246,112
9,531,4542.030363,0.616274,648.822199,258.369386,3535.439682,0.000558,99.306897,0.091641,32097.605411,...,133,138,72,186,119,76,137,111,122,178


Let's do some aggregate measurements on this data:

First, we will look at race percentages for these beats all together:

In [73]:
totPop = 0
wPop = 0
bPop = 0
hPop = 0
mPop = 0
aPop = 0
oPop = 0
for ind in df_topInjuries.index:
    totPop += df_topInjuries['population'][ind]
    wPop += df_topInjuries['num_white'][ind]
    bPop += df_topInjuries['num_black'][ind]
    hPop += df_topInjuries['num_hispanic'][ind]
    mPop += df_topInjuries['num_mixed'][ind]
    aPop += df_topInjuries['num_asian'][ind]
    oPop += df_topInjuries['num_other'][ind]
print('TOP 20 INJURY BEATS RACE BREAKDOWN:\n')
print('Percentage of Population that is White: '+str(wPop/totPop*100)+'%')
print('Percentage of Population that is Black: '+str(bPop/totPop*100)+'%')
print('Percentage of Population that is Hispanic: '+str(hPop/totPop*100)+'%')
print('Percentage of Population that is Asian: '+str(aPop/totPop*100)+'%')
print('Percentage of Population that is Mixed: '+str(mPop/totPop*100)+'%')
print('Percentage of Population that is Other: '+str(oPop/totPop*100)+'%')

TOP 20 INJURY BEATS RACE BREAKDOWN:

Percentage of Population that is White: 13.099472043609332%
Percentage of Population that is Black: 57.98063112528079%
Percentage of Population that is Hispanic: 26.899429560524162%
Percentage of Population that is Asian: 1.0266449332443717%
Percentage of Population that is Mixed: 0.7188839652926318%
Percentage of Population that is Other: 0.27493838941281434%


Now median income and food stamps:

In [74]:
tot_MI = 0
tot_FS = 0
for ind in df_topInjuries.index:
    tot_MI += df_topInjuries['med_income'][ind]
    tot_FS += df_topInjuries['pop_food_stamps'][ind]
print('TOP 20 INJURY BEATS INCOME AND FOOD STAMPS BREAKDOWN:\n')
print('Average Median Income: '+str(tot_MI/20.0))
print('Percentage of Population on Food Stamps: '+str(tot_FS/totPop*100)+'%')

TOP 20 INJURY BEATS INCOME AND FOOD STAMPS BREAKDOWN:

Average Median Income: 32180.045288375008
Percentage of Population on Food Stamps: 12.307264212241463%


Now, the educational attainment percentages:

In [75]:
bachelorsPop = df_topInjuries['bachelors'].sum()
HSPop = df_topInjuries['high_school'].sum()
no_HSPop = df_topInjuries['no_high_school'].sum()
print('TOP 20 INJURY BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:\n')
print('Percentage of Population with at least a Bachelor\'s Degree: '+str(bachelorsPop/totPop*100)+'%')
print('Percentage of Population with at most a High School Diploma '+str(HSPop/totPop*100)+'%')
print('Percentage of Population without a High School Diploma '+str(no_HSPop/totPop*100)+'%')

TOP 20 INJURY BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:

Percentage of Population with at least a Bachelor's Degree: 17.17691953198873%
Percentage of Population with at most a High School Diploma 29.980552560234912%
Percentage of Population without a High School Diploma 15.06962613276492%


School Enrollment percentages:

In [76]:
total_se = df_topInjuries['total_se'].sum()
total_se_0to4 = df_topInjuries['se_0-4'].sum()
total_se_5to9 = df_topInjuries['se_5-9'].sum()
total_se_10to14 = df_topInjuries['se_10-14'].sum()
total_se_15to17 = df_topInjuries['se_15-17'].sum()
total_se_18to19 = df_topInjuries['se_18-19'].sum()
total_se_20to24 = df_topInjuries['se_20-24'].sum()
total_se_25to34 = df_topInjuries['se_25-34'].sum()
total_se_35plus = df_topInjuries['se_35+'].sum()
total_0to4 = df_topInjuries['0-4'].sum()
total_5to9 = df_topInjuries['5-9'].sum()
total_10to14 = df_topInjuries['10-14'].sum()
total_15to17 = df_topInjuries['15-17'].sum()
total_18to19 = df_topInjuries['18-19'].sum()
total_20to24 = df_topInjuries['20'].sum() + df_topInjuries['21'].sum() + df_topInjuries['22-24'].sum()
total_25to34 = df_topInjuries['25-29'].sum() + df_topInjuries['30-34'].sum() 
total_youngPop = totPop - (df_topInjuries['25-29'].sum() + df_topInjuries['30-34'].sum()) - (df_topInjuries['20'].sum() + df_topInjuries['21'].sum() + df_topInjuries['22-24'].sum()) - df_topInjuries['18-19'].sum() - df_topInjuries['15-17'].sum() -df_topInjuries['10-14'].sum() -df_topInjuries['5-9'].sum()-df_topInjuries['0-4'].sum()

print('TOP 20 INJURY BEATS SCHOOL ENROLLMENT BREAKDOWN:\n')
print('Percentage of Residents that are Enrolled Students (All): '+str(total_se/totPop*100)+'%')
print('Percentage of Residents that are Enrolled Students (0-4): '+str(total_se_0to4/total_0to4*100)+'%')
print('Percentage of Residents that are Enrolled Students (5-9): '+str(total_se_5to9/total_5to9*100)+'%')
print('Percentage of Residents that are Enrolled Students (10-14): '+str(total_se_10to14/total_10to14*100)+'%')
print('Percentage of Residents that are Enrolled Students (15-17): '+str(total_se_15to17/total_15to17*100)+'%')
print('Percentage of Residents that are Enrolled Students (18-19): '+str(total_se_18to19/total_18to19*100)+'%')
print('Percentage of Residents that are Enrolled Students (20-24): '+str(total_se_20to24/total_20to24*100)+'%')
print('Percentage of Residents that are Enrolled Students (25-34): '+str(total_se_25to34/total_25to34*100)+'%')
print('Percentage of Residents that are Enrolled Students (35+): '+str(total_se_35plus/total_youngPop*100)+'%')

TOP 20 INJURY BEATS SCHOOL ENROLLMENT BREAKDOWN:

Percentage of Residents that are Enrolled Students (All): 29.9494285202161%
Percentage of Residents that are Enrolled Students (0-4): 54.57447175836824%
Percentage of Residents that are Enrolled Students (5-9): 95.87068770970437%
Percentage of Residents that are Enrolled Students (10-14): 98.99434001426985%
Percentage of Residents that are Enrolled Students (15-17): 96.80867971557629%
Percentage of Residents that are Enrolled Students (18-19): 69.8312373729458%
Percentage of Residents that are Enrolled Students (20-24): 29.92122978667889%
Percentage of Residents that are Enrolled Students (25-34): 12.18266865145137%
Percentage of Residents that are Enrolled Students (35+): 2.728290057838403%


Lastly, we will look at the age breakdowns:

In [77]:
minors = df_topInjuries['<=21'].sum()
twenties = df_topInjuries['22-29'].sum()
thirties = df_topInjuries['30-39'].sum()
forties = df_topInjuries['40-49'].sum()
fifties = df_topInjuries['50-59'].sum()
sixties = df_topInjuries['60-64'].sum()
seniors = df_topInjuries['65+'].sum()
print('TOP 20 INJURY BEATS AGE BREAKDOWN:\n')
print('Percentage of Residents that are Minors (<=21): '+str(minors/totPop*100)+'%')
print('Percentage of Residents that are in their Twenties: '+str(twenties/totPop*100)+'%')
print('Percentage of Residents that are in their Thirties: '+str(thirties/totPop*100)+'%')
print('Percentage of Residents that are in their Forties: '+str(forties/totPop*100)+'%')
print('Percentage of Residents that are in their Fifties: '+str(fifties/totPop*100)+'%')
print('Percentage of Residents that are in between 60-64: '+str(sixties/totPop*100)+'%')
print('Percentage of Residents that are Seniors (65+): '+str(seniors/totPop*100)+'%')

TOP 20 INJURY BEATS AGE BREAKDOWN:

Percentage of Residents that are Minors (<=21): 32.74093898975596%
Percentage of Residents that are in their Twenties: 13.398184875978645%
Percentage of Residents that are in their Thirties: 13.869109419286401%
Percentage of Residents that are in their Forties: 11.802905454133562%
Percentage of Residents that are in their Fifties: 11.338669822988006%
Percentage of Residents that are in between 60-64: 4.739019159196832%
Percentage of Residents that are Seniors (65+): 12.111332370095388%


Now, we will compute the average and median ranking for the race, income, food stamp, and educational attainment categories for this data:

In [78]:
df_topInjuries_ranks = df_topInjuries[['beat','rank_%w', 'rank_%b', 'rank_%h', 'rank_%a', 'rank_%m',
       'rank_%o', 'rank_income', 'rank_fs', 'rank_bachelors',
       'rank_high_school', 'rank_no_high_school', 'rank_total_se',
       'rank_total_se_0-4', 'rank_total_se_5-9', 'rank_total_se_10-14',
       'rank_total_se_15-17', 'rank_total_se_18-19', 'rank_total_se_20-24',
       'rank_total_se_25-34', 'rank_total_se_35+']]

In [79]:
beatList = list(df_topInjuries_ranks.beat)
df_topInjuries_ranks['rank_beat'] = df_topInjuries_ranks.beat.apply(lambda x: beatList.index(x)+1)
df_topInjuries_ranks.set_index('beat',inplace = True)
print('TOP 20 INJURY BEATS MEDIAN RANKINGS: ')
print(df_topInjuries_ranks.median(axis = 0))
print()
print('TOP 20 INJURY BEATS AVERAGE (MEAN) RANKINGS: ')
print(df_topInjuries_ranks.mean(axis = 0))
df_topInjuries.head()

TOP 20 INJURY BEATS MEDIAN RANKINGS: 
rank_%w                207.5
rank_%b                 70.5
rank_%h                189.0
rank_%a                224.5
rank_%m                206.5
rank_%o                138.5
rank_income            216.5
rank_fs                 57.5
rank_bachelors         208.0
rank_high_school       110.5
rank_no_high_school     92.0
rank_total_se           77.0
rank_total_se_0-4      154.5
rank_total_se_5-9      172.0
rank_total_se_10-14    127.0
rank_total_se_15-17    133.0
rank_total_se_18-19    166.0
rank_total_se_20-24    183.5
rank_total_se_25-34    161.0
rank_total_se_35+      164.5
rank_beat               10.5
dtype: float64

TOP 20 INJURY BEATS AVERAGE (MEAN) RANKINGS: 
rank_%w                191.30
rank_%b                 82.30
rank_%h                162.60
rank_%a                207.90
rank_%m                195.65
rank_%o                145.75
rank_income            203.30
rank_fs                 68.55
rank_bachelors         200.05
rank_high_school     

Unnamed: 0,beat,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,med_income,...,rank_no_high_school,rank_total_se,rank_total_se_0-4,rank_total_se_5-9,rank_total_se_10-14,rank_total_se_15-17,rank_total_se_18-19,rank_total_se_20-24,rank_total_se_25-34,rank_total_se_35+
0,1134,4540.937191,0.352457,154.202578,129.991012,4174.674919,36.964347,23.168435,21.935901,26950.148678,...,94,71,103,246,37,3,119,240,260,187
1,621,7119.663074,0.569081,112.400242,67.181066,6914.143865,0.0,12.838651,13.099253,25675.795878,...,135,116,58,248,259,170,184,178,92,106
2,1112,7155.958067,0.315592,73.67862,2470.596099,4423.679248,0.002181,75.530586,112.471337,28693.33387,...,24,92,79,145,164,204,156,232,228,270
3,1533,7287.673417,0.552272,140.701888,85.730872,7045.973828,5.959837,6.96481,2.342207,25628.891687,...,69,35,135,10,161,122,62,175,59,82
4,713,5002.726959,0.487889,64.239085,178.154851,4749.410005,0.0,10.923025,0.0,29756.999864,...,83,79,184,174,49,8,157,181,206,105


**Key Takeaways:**
* When we look at how these beats rank in the context of all the police beats in Chicago, the mean and median rankings are relatively low  for percentage of all of the races (each individually) except Blacks to the total population in these beats (much like what we saw in the top 20 crime beats)
* However, we see these beats rank relatively high for percentage of black citizens to the population in these beats and the percentage of the population in these beats on food stamps in the context of all the police beats in Chicago, but ranking lower than the arrest, crime, and arrest:crime beats
* There isn't as clear a dominance of one race in this data as we've seen before, but Black residents still make up virtually 57% of these beats total population, while Hispanic and White residents make up 27% and 13% of these beats total population, respectively
* We still see the beats in question have a low avergae median income (about $32500, which is nearly half the average income throughout all of Chicago) and about 37% of the residents in these beats are on food stamps (about 15% more than the total percent of the population of Chicago on food stamps). This is well-reflected in the low average and median ranking for these beats in the median income category and the beats high ranking in the 'percentage of population on food stamps' category.
* These beats had a greater percentage of residents earning at most a High School Diploma compared to all of Chicago (50% and 42%, respectively) and residents without a High School Diploma (26% and 18%, respectively). While we see these beats have residents with lower levels of educational attainment compared to the city-wide average, these beats have an average level of educational attainment higher than the Top Crime, Arrest, and Arrests:Crime Beats
* These beats also had a greater percentage of minors comapred to the city-wide average (33% and 27%, repectively) and had roughly the same percentages for all other age groups (about 12%), which is fairly close to thecity-wide average

**Some Conclusions:**
* While the residents of these beats are still categorized as mostly Black, "poor", younger (typically minors), and with lower levels of educational attainment than the city-wide average, the percentage of Black residents in these beats, the median income, the percentage of the population on food stamps, the levels of educational attainment, and precentage of the ages of residents in these beats are considerably closer to the averages of the general population of Chicago. 
* The fact that this data is closer to the average measurements of the City of Chicago would seem to suggest that the Chicago PD uphold a similar standard when arresting citizens, despite their race
* The higher number of injuries in beats with greater percentages of Black residents and residents on food stamps could be justified by the fact that such areas have been identified as areas of higher crime activity
* Just based off these observations, we can make the extended assumption that areas of greater crime activity (areas with a high denisty of Black residents and a high denisty of residents on food stamps) will consequentially increase the likelihood of police-caused injuries. Thus, we cannot necessarily assume that Black citizens are necessarily at a greater risk of police brutality, rather they are more liekly to be in areas of higher crime and arrest, inevitably lending itself to an increased chance of police brutality if we assume police brutality occurances occur with random frequency and are not subjected to any other influencers (big caveat)
    

Now, the top 20 beats in complaints:

In [80]:
topFile.readline()
t20_complaints = topFile.readline().split(' ')
t20_complaints = t20_complaints[:-1]
for i in range (0,20):
    t20_complaints[i] = pad0(t20_complaints[i])
print(t20_complaints)
topFile.close()

['1134', '0531', '2515', '0123', '2212', '0225', '0211', '0232', '1832', '0212', '1813', '1921', '1024', '2223', '0823', '0725', '2533', '0321', '0612', '1913']


In [81]:
df_topComplaints = df_beat[df_beat['beat'].isin(t20_complaints)]
df_topComplaints.set_index('beat', inplace = True)
df_topComplaints = df_topComplaints.reindex(t20_complaints)

In [82]:
df_topComplaints.reset_index(inplace = True)
df_topComplaints

Unnamed: 0,beat,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,med_income,...,rank_no_high_school,rank_total_se,rank_total_se_0-4,rank_total_se_5-9,rank_total_se_10-14,rank_total_se_15-17,rank_total_se_18-19,rank_total_se_20-24,rank_total_se_25-34,rank_total_se_35+
0,1134,4540.937191,0.352457,154.202578,129.991012,4174.674919,36.964347,23.168435,21.935901,26950.148678,...,94,71,103,246,37,3,119,240,260,187
1,531,4542.030363,0.616274,648.822199,258.369386,3535.439682,0.000558,99.306897,0.091641,32097.605411,...,133,138,72,186,119,76,137,111,122,178
2,2515,17510.811575,0.879689,1547.679024,14632.102542,1060.836558,118.592599,81.518772,70.082102,42497.367713,...,8,81,213,150,25,48,145,187,266,237
3,123,12310.436842,0.328119,7208.532699,496.647618,2060.348817,1989.869277,547.310356,7.72802,92468.263216,...,265,101,70,49,65,80,51,6,40,44
4,2212,18827.990509,2.597607,8893.422376,674.215336,8825.0283,55.437843,364.027126,15.859548,76052.055309,...,234,114,28,147,165,154,90,43,53,81
5,225,2135.512643,0.321392,18.189189,6.505488,2080.601172,3.020157,24.362587,2.834045,26704.194693,...,147,24,144,136,149,213,234,165,145,196
6,211,8367.90255,0.657401,1567.804103,335.239714,4701.702383,1573.500138,174.011705,15.644523,43458.955048,...,187,10,261,43,64,72,16,3,1,17
7,232,3218.056585,0.285827,0.708162,1.974553,3109.130638,0.0,105.535069,0.708162,30513.653315,...,173,9,84,118,11,133,261,130,23,84
8,1832,8511.314889,0.17308,5708.571878,612.903004,829.10571,956.034879,384.808698,19.890714,66054.57658,...,238,242,134,9,71,19,36,58,106,110
9,212,7450.954623,0.519403,246.57136,89.219643,6969.606437,77.778942,49.013758,18.76449,25889.16624,...,140,18,90,105,227,108,59,64,61,67


Let's do some aggregate measurements on this data:

First, we will look at race percentages for these beats all together:

In [83]:
totPop = 0
wPop = 0
bPop = 0
hPop = 0
mPop = 0
aPop = 0
oPop = 0
for ind in df_topComplaints.index:
    totPop += df_topComplaints['population'][ind]
    wPop += df_topComplaints['num_white'][ind]
    bPop += df_topComplaints['num_black'][ind]
    hPop += df_topComplaints['num_hispanic'][ind]
    mPop += df_topComplaints['num_mixed'][ind]
    aPop += df_topComplaints['num_asian'][ind]
    oPop += df_topComplaints['num_other'][ind]
print('TOP 20 COMPLAINTS BEATS RACE BREAKDOWN:\n')
print('Percentage of Population that is White: '+str(wPop/totPop*100)+'%')
print('Percentage of Population that is Black: '+str(bPop/totPop*100)+'%')
print('Percentage of Population that is Hispanic: '+str(hPop/totPop*100)+'%')
print('Percentage of Population that is Asian: '+str(aPop/totPop*100)+'%')
print('Percentage of Population that is Mixed: '+str(mPop/totPop*100)+'%')
print('Percentage of Population that is Other: '+str(oPop/totPop*100)+'%')

TOP 20 COMPLAINTS BEATS RACE BREAKDOWN:

Percentage of Population that is White: 25.65563090059039%
Percentage of Population that is Black: 41.546570136227544%
Percentage of Population that is Hispanic: 27.32292984343761%
Percentage of Population that is Asian: 3.379276093424484%
Percentage of Population that is Mixed: 1.869874609044559%
Percentage of Population that is Other: 0.2257184048580448%


Now median income and food stamps:

In [84]:
tot_MI = 0
tot_FS = 0
for ind in df_topComplaints.index:
    tot_MI += df_topComplaints['med_income'][ind]
    tot_FS += df_topComplaints['pop_food_stamps'][ind]
print('TOP 20 COMPLAINTS BEATS INCOME AND FOOD STAMPS BREAKDOWN:\n')
print('Average Median Income: '+str(tot_MI/20.0))
print('Percentage of Population on Food Stamps: '+str(tot_FS/totPop*100)+'%')

TOP 20 COMPLAINTS BEATS INCOME AND FOOD STAMPS BREAKDOWN:

Average Median Income: 48816.119750241385
Percentage of Population on Food Stamps: 8.788087388631952%


Now, the educational attainment percentages:

In [85]:
bachelorsPop = df_topComplaints['bachelors'].sum()
HSPop = df_topComplaints['high_school'].sum()
no_HSPop = df_topComplaints['no_high_school'].sum()
print('TOP 20 COMPLAINTS BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:\n')
print('Percentage of Population with at least a Bachelor\'s Degree: '+str(bachelorsPop/totPop*100)+'%')
print('Percentage of Population with at most a High School Diploma '+str(HSPop/totPop*100)+'%')
print('Percentage of Population without a High School Diploma '+str(no_HSPop/totPop*100)+'%')

TOP 20 COMPLAINTS BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:

Percentage of Population with at least a Bachelor's Degree: 27.00509827805208%
Percentage of Population with at most a High School Diploma 25.631962173520208%
Percentage of Population without a High School Diploma 12.201106815341884%


School enrollment percentages:

In [86]:
total_se = df_topComplaints['total_se'].sum()
total_se_0to4 = df_topComplaints['se_0-4'].sum()
total_se_5to9 = df_topComplaints['se_5-9'].sum()
total_se_10to14 = df_topComplaints['se_10-14'].sum()
total_se_15to17 = df_topComplaints['se_15-17'].sum()
total_se_18to19 = df_topComplaints['se_18-19'].sum()
total_se_20to24 = df_topComplaints['se_20-24'].sum()
total_se_25to34 = df_topComplaints['se_25-34'].sum()
total_se_35plus = df_topComplaints['se_35+'].sum()
total_0to4 = df_topComplaints['0-4'].sum()
total_5to9 = df_topComplaints['5-9'].sum()
total_10to14 = df_topComplaints['10-14'].sum()
total_15to17 = df_topComplaints['15-17'].sum()
total_18to19 = df_topComplaints['18-19'].sum()
total_20to24 = df_topComplaints['20'].sum() + df_topComplaints['21'].sum() + df_topComplaints['22-24'].sum()
total_25to34 = df_topComplaints['25-29'].sum() + df_topComplaints['30-34'].sum() 
total_youngPop = totPop - (df_topComplaints['25-29'].sum() + df_topComplaints['30-34'].sum()) - (df_topComplaints['20'].sum() + df_topComplaints['21'].sum() + df_topComplaints['22-24'].sum()) - df_topComplaints['18-19'].sum() - df_topComplaints['15-17'].sum() -df_topComplaints['10-14'].sum() -df_topComplaints['5-9'].sum()-df_topComplaints['0-4'].sum()

print('TOP 20 COMPLAINTS BEATS SCHOOL ENROLLMENT BREAKDOWN:\n')
print('Percentage of Residents that are Enrolled Students (All): '+str(total_se/totPop*100)+'%')
print('Percentage of Residents that are Enrolled Students (0-4): '+str(total_se_0to4/total_0to4*100)+'%')
print('Percentage of Residents that are Enrolled Students (5-9): '+str(total_se_5to9/total_5to9*100)+'%')
print('Percentage of Residents that are Enrolled Students (10-14): '+str(total_se_10to14/total_10to14*100)+'%')
print('Percentage of Residents that are Enrolled Students (15-17): '+str(total_se_15to17/total_15to17*100)+'%')
print('Percentage of Residents that are Enrolled Students (18-19): '+str(total_se_18to19/total_18to19*100)+'%')
print('Percentage of Residents that are Enrolled Students (20-24): '+str(total_se_20to24/total_20to24*100)+'%')
print('Percentage of Residents that are Enrolled Students (25-34): '+str(total_se_25to34/total_25to34*100)+'%')
print('Percentage of Residents that are Enrolled Students (35+): '+str(total_se_35plus/total_youngPop*100)+'%')

TOP 20 COMPLAINTS BEATS SCHOOL ENROLLMENT BREAKDOWN:

Percentage of Residents that are Enrolled Students (All): 30.812824580390046%
Percentage of Residents that are Enrolled Students (0-4): 59.80174262576822%
Percentage of Residents that are Enrolled Students (5-9): 96.14934737287297%
Percentage of Residents that are Enrolled Students (10-14): 98.87326290537219%
Percentage of Residents that are Enrolled Students (15-17): 97.16710440674238%
Percentage of Residents that are Enrolled Students (18-19): 79.57159941760804%
Percentage of Residents that are Enrolled Students (20-24): 44.55022901444078%
Percentage of Residents that are Enrolled Students (25-34): 14.428225934417647%
Percentage of Residents that are Enrolled Students (35+): 3.199780920303403%


Lastly, we will look at the age breakdowns:

In [87]:
minors = df_topComplaints['<=21'].sum()
twenties = df_topComplaints['22-29'].sum()
thirties = df_topComplaints['30-39'].sum()
forties = df_topComplaints['40-49'].sum()
fifties = df_topComplaints['50-59'].sum()
sixties = df_topComplaints['60-64'].sum()
seniors = df_topComplaints['65+'].sum()
print('TOP 20 COMPLAINTS BEATS AGE BREAKDOWN:\n')
print('Percentage of Residents that are Minors (<=21): '+str(minors/totPop*100)+'%')
print('Percentage of Residents that are in their Twenties: '+str(twenties/totPop*100)+'%')
print('Percentage of Residents that are in their Thirties: '+str(thirties/totPop*100)+'%')
print('Percentage of Residents that are in their Forties: '+str(forties/totPop*100)+'%')
print('Percentage of Residents that are in their Fifties: '+str(fifties/totPop*100)+'%')
print('Percentage of Residents that are in between 60-64: '+str(sixties/totPop*100)+'%')
print('Percentage of Residents that are Seniors (65+): '+str(seniors/totPop*100)+'%')

TOP 20 COMPLAINTS BEATS AGE BREAKDOWN:

Percentage of Residents that are Minors (<=21): 30.21378821419341%
Percentage of Residents that are in their Twenties: 14.335929740502326%
Percentage of Residents that are in their Thirties: 15.946994198322075%
Percentage of Residents that are in their Forties: 12.80226600448823%
Percentage of Residents that are in their Fifties: 11.696564032678399%
Percentage of Residents that are in between 60-64: 4.617030996786221%
Percentage of Residents that are Seniors (65+): 10.387518886079027%


Now, we will compute the average and median ranking for the race, income, food stamp, and educational attainment categories for this data:

In [88]:
df_topComplaints_ranks = df_topComplaints[['beat','rank_%w', 'rank_%b', 'rank_%h', 'rank_%a', 'rank_%m',
       'rank_%o', 'rank_income', 'rank_fs', 'rank_bachelors',
       'rank_high_school', 'rank_no_high_school', 'rank_total_se',
       'rank_total_se_0-4', 'rank_total_se_5-9', 'rank_total_se_10-14',
       'rank_total_se_15-17', 'rank_total_se_18-19', 'rank_total_se_20-24',
       'rank_total_se_25-34', 'rank_total_se_35+']]

In [89]:
beatList = list(df_topComplaints_ranks.beat)
df_topComplaints_ranks['rank_beat'] = df_topComplaints_ranks.beat.apply(lambda x: beatList.index(x)+1)
df_topComplaints_ranks.set_index('beat',inplace = True)
print('TOP 20 COMPLAINTS BEATS MEDIAN RANKINGS: ')
print(df_topComplaints_ranks.median(axis = 0))
print()
print('TOP 20 COMPLAINTS BEATS AVERAGE (MEAN) RANKINGS: ')
print(df_topComplaints_ranks.mean(axis = 0))

TOP 20 COMPLAINTS BEATS MEDIAN RANKINGS: 
rank_%w                167.0
rank_%b                117.5
rank_%h                180.0
rank_%a                163.0
rank_%m                122.5
rank_%o                149.5
rank_income            164.0
rank_fs                115.5
rank_bachelors         159.0
rank_high_school       158.0
rank_no_high_school    160.0
rank_total_se           91.0
rank_total_se_0-4      109.5
rank_total_se_5-9      141.5
rank_total_se_10-14    115.5
rank_total_se_15-17    120.5
rank_total_se_18-19    157.5
rank_total_se_20-24    131.5
rank_total_se_25-34    116.0
rank_total_se_35+      128.5
rank_beat               10.5
dtype: float64

TOP 20 COMPLAINTS BEATS AVERAGE (MEAN) RANKINGS: 
rank_%w                152.25
rank_%b                110.95
rank_%h                159.45
rank_%a                156.65
rank_%m                119.00
rank_%o                143.20
rank_income            148.60
rank_fs                117.95
rank_bachelors         142.80
rank_high_sch

**Key Takeaways:**
* When we look at how these beats rank in the context of all the police beats in Chicago, the mean and median rankings are at just about the 50th percentile in all categories (most in the mid 120-160 range)
* There isn't as clear a dominance of one race in this data as we've seen before, but Black residents still make up virtually 42% of these beats total population, while Hispanic and White residents make up 26% and 27% of these beats total population, respectively. The percentages of Black residents are slightl yhigher than the city-wide average, which is compensated by a slightly lower percentage of White residents compared to the city-wide average. Otherwise, all other race percentages are close to the corresponding city-wide average
* We  see the beats in question have an avergae median income about $500 less than the city-wide average and about 25% of the residents in these beats are on food stamps, about 4% higher than the city-wide average 
* These beats had a slightly fewer (about 2%) percentage of residents residents earning at least a Bachelor's Degree compared to the city-wdie average, which is compensated by a slightly higher percentage of residents (about 2%) never completing High School. The percentage of residents with at most a High School Diploma is close to identical to the city-wide average
* These beats also had a slightly higher percentage of minors comapred to the city-wide average (30% and 27%, repectively) and had the 3% differential shared fairly evenly across all other age groups

**Some Conclusions:**
* Compared to all the other Top Beats, the residents of these beats are extrememly close to the city-wide averages. Thus, we can say that the Beats with the Most Complaints aren't socioeconomically too different from the City of Chicago as a whole.
* The fact that this data is closer to the average measurements of the City of Chicago would seem to suggest that the complaints are farily evenly spread across the City or take place in the most "average" beats of Chicago (the beats with residents most representative of the City as a whole)
* Since we don't see trends that match the other Top Beats, we could specualte that although the beats with greater crime rates, arrests, and arrest rates, would most likely have more complaints, the fact that they don't could be becuase the residents of these beats lack the legal knowledge, cannot afford a lawyer who could advise for an official complaint, or simply didn't feel the need to file a complaint
    

**Now, the 20 beats at the bottom of each category:**

First, crime:

In [90]:
lastFile = open("/kaggle/input/cpd-police-beat-demographics/outputBeats_EDA_LAST.txt","r") 

In [91]:
lastFile.readline()
l20_crime = lastFile.readline().split(' ')
l20_crime = l20_crime[:-1]
for i in range (0,20):
    l20_crime[i] = pad0(l20_crime[i])
print(l20_crime)

['1935', '1934', '1221', '0114', '1234', '1214', '0225', '1215', '0121', '1921', '1225', '1915', '1235', '0215', '1654', '0235', '1653', '1652', '1655', '0430']


In [92]:
df_lastCrime = df_beat[df_beat['beat'].isin(l20_crime)]
df_lastCrime.set_index('beat', inplace = True)
df_lastCrime = df_lastCrime.reindex(l20_crime)

In [93]:
df_lastCrime.reset_index(inplace = True)
df_lastCrime

Unnamed: 0,beat,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,med_income,...,rank_no_high_school,rank_total_se,rank_total_se_0-4,rank_total_se_5-9,rank_total_se_10-14,rank_total_se_15-17,rank_total_se_18-19,rank_total_se_20-24,rank_total_se_25-34,rank_total_se_35+
0,1935,16008.772434,0.58572,12548.292157,1042.134255,766.420365,1063.489059,531.317257,57.119311,74101.195831,...,257.0,256.0,25.0,47.0,59.0,14.0,61.0,124.0,135.0,268.0
1,1934,19672.58499,0.436985,15830.242116,1018.196945,606.431035,1582.88029,613.836666,20.99795,70989.232193,...,262.0,257.0,47.0,86.0,20.0,39.0,3.0,171.0,38.0,131.0
2,1221,8780.177783,0.876067,4210.619816,2872.756274,1364.102238,208.735405,102.095102,21.86897,69107.171923,...,109.0,214.0,38.0,90.0,18.0,64.0,74.0,117.0,174.0,124.0
3,114,16145.43461,0.66811,10782.50253,1057.578252,1246.289115,2736.379391,268.33286,54.352438,110367.182242,...,267.0,177.0,7.0,19.0,56.0,13.0,24.0,11.0,16.0,145.0
4,1234,13555.578956,0.494281,1405.198982,11396.350853,483.668609,180.982367,89.377672,0.000502,36515.462423,...,18.0,26.0,19.0,99.0,267.0,210.0,134.0,97.0,54.0,218.0
5,1214,9832.465466,0.599526,7324.623726,591.281437,728.062932,972.641058,215.856261,0.0,114927.379531,...,256.0,267.0,153.0,158.0,26.0,42.0,20.0,214.0,139.0,229.0
6,225,2135.512643,0.321392,18.189189,6.505488,2080.601172,3.020157,24.362587,2.834045,26704.194693,...,147.0,24.0,144.0,136.0,149.0,213.0,234.0,165.0,145.0,196.0
7,1215,8522.572501,0.57384,4690.609493,2351.748822,1026.392824,235.810105,218.011279,0.0,84770.824603,...,210.0,233.0,121.0,183.0,95.0,63.0,39.0,202.0,164.0,109.0
8,121,6400.496775,0.264266,3113.800044,427.296424,297.086085,2401.104246,131.279268,29.930697,101550.382372,...,270.0,156.0,60.0,270.0,29.0,35.0,12.0,30.0,31.0,133.0
9,1921,13317.112246,0.877715,10516.682256,1523.697738,152.226201,541.494208,561.693335,21.3185,104182.775635,...,230.0,115.0,34.0,74.0,180.0,182.0,13.0,102.0,188.0,147.0


Unfortuantely, some of these beats are located in Chicago-O'Hare Airport and thus have no data...we will drop them from our analysis

*We anticipate this will occur often in our analysis of these beats with the lowest amount of crime, arrests, and injuries, so expect similar exclusions to follow*

In [94]:
df_lastCrime = df_lastCrime.loc[(df_lastCrime.beat != '1653') & (df_lastCrime.beat != '1652') & (df_lastCrime.beat != '1655') & (df_lastCrime.beat != '0430')]

Let's do some aggregate measurements on this data:

First, we will look at race percentages for these beats all together:

In [95]:
totPop = 0
wPop = 0
bPop = 0
hPop = 0
mPop = 0
aPop = 0
oPop = 0
for ind in df_lastCrime.index:
    totPop += df_lastCrime['population'][ind]
    wPop += df_lastCrime['num_white'][ind]
    bPop += df_lastCrime['num_black'][ind]
    hPop += df_lastCrime['num_hispanic'][ind]
    mPop += df_lastCrime['num_mixed'][ind]
    aPop += df_lastCrime['num_asian'][ind]
    oPop += df_lastCrime['num_other'][ind]
print('LAST 20 CRIME BEATS RACE BREAKDOWN:\n')
print('Percentage of Population that is White: '+str(wPop/totPop*100)+'%')
print('Percentage of Population that is Black: '+str(bPop/totPop*100)+'%')
print('Percentage of Population that is Hispanic: '+str(hPop/totPop*100)+'%')
print('Percentage of Population that is Asian: '+str(aPop/totPop*100)+'%')
print('Percentage of Population that is Mixed: '+str(mPop/totPop*100)+'%')
print('Percentage of Population that is Other: '+str(oPop/totPop*100)+'%')

LAST 20 CRIME BEATS RACE BREAKDOWN:

Percentage of Population that is White: 56.38920917837639%
Percentage of Population that is Black: 12.008942619100486%
Percentage of Population that is Hispanic: 20.586000486268336%
Percentage of Population that is Asian: 8.417969763493788%
Percentage of Population that is Mixed: 2.3746861905502574%
Percentage of Population that is Other: 0.22319173544248042%


Now median income and food stamps:

In [96]:
tot_MI = 0
tot_FS = 0
for ind in df_lastCrime.index:
    tot_MI += df_lastCrime['med_income'][ind]
    tot_FS += df_lastCrime['pop_food_stamps'][ind]
print('LAST 20 CRIME BEATS INCOME AND FOOD STAMPS BREAKDOWN:\n')
print('Average Median Income: '+str(tot_MI/len(df_lastCrime)))
print('Percentage of Population on Food Stamps: '+str(tot_FS/totPop*100)+'%')

LAST 20 CRIME BEATS INCOME AND FOOD STAMPS BREAKDOWN:

Average Median Income: 67448.30233610784
Percentage of Population on Food Stamps: 4.111516317771992%


Now, the educational attainment percentages:

In [97]:
bachelorsPop = df_lastCrime['bachelors'].sum()
HSPop = df_lastCrime['high_school'].sum()
no_HSPop = df_lastCrime['no_high_school'].sum()
print('LAST 20 CRIME BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:\n')
print('Percentage of Population with at least a Bachelor\'s Degree: '+str(bachelorsPop/totPop*100)+'%')
print('Percentage of Population with at most a High School Diploma '+str(HSPop/totPop*100)+'%')
print('Percentage of Population without a High School Diploma '+str(no_HSPop/totPop*100)+'%')

LAST 20 CRIME BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:

Percentage of Population with at least a Bachelor's Degree: 50.603442905356346%
Percentage of Population with at most a High School Diploma 16.395925600276897%
Percentage of Population without a High School Diploma 6.238587111827135%


School enrollment percentages:

In [98]:
total_se = df_lastCrime['total_se'].sum()
total_se_0to4 = df_lastCrime['se_0-4'].sum()
total_se_5to9 = df_lastCrime['se_5-9'].sum()
total_se_10to14 = df_lastCrime['se_10-14'].sum()
total_se_15to17 = df_lastCrime['se_15-17'].sum()
total_se_18to19 = df_lastCrime['se_18-19'].sum()
total_se_20to24 = df_lastCrime['se_20-24'].sum()
total_se_25to34 = df_lastCrime['se_25-34'].sum()
total_se_35plus = df_lastCrime['se_35+'].sum()
total_0to4 = df_lastCrime['0-4'].sum()
total_5to9 = df_lastCrime['5-9'].sum()
total_10to14 = df_lastCrime['10-14'].sum()
total_15to17 = df_lastCrime['15-17'].sum()
total_18to19 = df_lastCrime['18-19'].sum()
total_20to24 = df_lastCrime['20'].sum() + df_lastCrime['21'].sum() + df_lastCrime['22-24'].sum()
total_25to34 = df_lastCrime['25-29'].sum() + df_lastCrime['30-34'].sum() 
total_youngPop = totPop - (df_lastCrime['25-29'].sum() + df_lastCrime['30-34'].sum()) - (df_lastCrime['20'].sum() + df_lastCrime['21'].sum() + df_lastCrime['22-24'].sum()) - df_lastCrime['18-19'].sum() - df_lastCrime['15-17'].sum() -df_lastCrime['10-14'].sum() -df_lastCrime['5-9'].sum()-df_lastCrime['0-4'].sum()

print('LAST 20 CRIME BEATS SCHOOL ENROLLMENT BREAKDOWN:\n')
print('Percentage of Residents that are Enrolled Students (All): '+str(total_se/totPop*100)+'%')
print('Percentage of Residents that are Enrolled Students (0-4): '+str(total_se_0to4/total_0to4*100)+'%')
print('Percentage of Residents that are Enrolled Students (5-9): '+str(total_se_5to9/total_5to9*100)+'%')
print('Percentage of Residents that are Enrolled Students (10-14): '+str(total_se_10to14/total_10to14*100)+'%')
print('Percentage of Residents that are Enrolled Students (15-17): '+str(total_se_15to17/total_15to17*100)+'%')
print('Percentage of Residents that are Enrolled Students (18-19): '+str(total_se_18to19/total_18to19*100)+'%')
print('Percentage of Residents that are Enrolled Students (20-24): '+str(total_se_20to24/total_20to24*100)+'%')
print('Percentage of Residents that are Enrolled Students (25-34): '+str(total_se_25to34/total_25to34*100)+'%')
print('Percentage of Residents that are Enrolled Students (35+): '+str(total_se_35plus/total_youngPop*100)+'%')

LAST 20 CRIME BEATS SCHOOL ENROLLMENT BREAKDOWN:

Percentage of Residents that are Enrolled Students (All): 26.47735847445493%
Percentage of Residents that are Enrolled Students (0-4): 71.95359949267655%
Percentage of Residents that are Enrolled Students (5-9): 97.91583970739673%
Percentage of Residents that are Enrolled Students (10-14): 96.77092140432822%
Percentage of Residents that are Enrolled Students (15-17): 95.48322689752924%
Percentage of Residents that are Enrolled Students (18-19): 93.7498955699954%
Percentage of Residents that are Enrolled Students (20-24): 47.79040953817867%
Percentage of Residents that are Enrolled Students (25-34): 17.822499851444164%
Percentage of Residents that are Enrolled Students (35+): 3.2141624003987617%


Lastly, we will look at the age breakdowns:

In [99]:
minors = df_lastCrime['<=21'].sum()
twenties = df_lastCrime['22-29'].sum()
thirties = df_lastCrime['30-39'].sum()
forties = df_lastCrime['40-49'].sum()
fifties = df_lastCrime['50-59'].sum()
sixties = df_lastCrime['60-64'].sum()
seniors = df_lastCrime['65+'].sum()
print('LAST 20 CRIME BEATS AGE BREAKDOWN:\n')
print('Percentage of Residents that are Minors (<=21): '+str(minors/totPop*100)+'%')
print('Percentage of Residents that are in their Twenties: '+str(twenties/totPop*100)+'%')
print('Percentage of Residents that are in their Thirties: '+str(thirties/totPop*100)+'%')
print('Percentage of Residents that are in their Forties: '+str(forties/totPop*100)+'%')
print('Percentage of Residents that are in their Fifties: '+str(fifties/totPop*100)+'%')
print('Percentage of Residents that are in between 60-64: '+str(sixties/totPop*100)+'%')
print('Percentage of Residents that are Seniors (65+): '+str(seniors/totPop*100)+'%')

LAST 20 CRIME BEATS AGE BREAKDOWN:

Percentage of Residents that are Minors (<=21): 19.902498812654702%
Percentage of Residents that are in their Twenties: 23.84319176630273%
Percentage of Residents that are in their Thirties: 21.11625814608894%
Percentage of Residents that are in their Forties: 11.722819539555932%
Percentage of Residents that are in their Fifties: 9.425086129009104%
Percentage of Residents that are in between 60-64: 4.0368375445093285%
Percentage of Residents that are Seniors (65+): 9.953319516552964%


Now, we will compute the average and median ranking for the race, income, food stamp, and educational attainment categories for this data:

In [100]:
df_lastCrime_ranks = df_lastCrime[['beat','rank_%w', 'rank_%b', 'rank_%h', 'rank_%a', 'rank_%m',
       'rank_%o', 'rank_income', 'rank_fs', 'rank_bachelors',
       'rank_high_school', 'rank_no_high_school', 'rank_total_se',
       'rank_total_se_0-4', 'rank_total_se_5-9', 'rank_total_se_10-14',
       'rank_total_se_15-17', 'rank_total_se_18-19', 'rank_total_se_20-24',
       'rank_total_se_25-34', 'rank_total_se_35+']]

In [101]:
beatList = list(df_lastCrime_ranks.beat)
df_lastCrime_ranks['rank_beat'] = df_lastCrime_ranks.beat.apply(lambda x: beatList.index(x)+1)
df_lastCrime_ranks.set_index('beat',inplace = True)
print('LAST 20 CRIME BEATS MEDIAN RANKINGS: ')
print(df_lastCrime_ranks.median(axis = 0))
print()
print('LAST 20 CRIME BEATS AVERAGE (MEAN) RANKINGS: ')
print(df_lastCrime_ranks.mean(axis = 0))
df_lastCrime_ranks

LAST 20 CRIME BEATS MEDIAN RANKINGS: 
rank_%w                 66.0
rank_%b                185.0
rank_%h                140.0
rank_%a                 84.5
rank_%m                 81.5
rank_%o                112.5
rank_income             56.5
rank_fs                215.5
rank_bachelors          49.5
rank_high_school       219.5
rank_no_high_school    217.0
rank_total_se          161.5
rank_total_se_0-4       66.5
rank_total_se_5-9       94.0
rank_total_se_10-14    103.0
rank_total_se_15-17     81.0
rank_total_se_18-19     46.0
rank_total_se_20-24    120.5
rank_total_se_25-34    115.0
rank_total_se_35+      146.0
rank_beat                8.5
dtype: float64

LAST 20 CRIME BEATS AVERAGE (MEAN) RANKINGS: 
rank_%w                 77.7500
rank_%b                170.6250
rank_%h                129.3125
rank_%a                 87.0625
rank_%m                 91.3125
rank_%o                140.9375
rank_income             86.6250
rank_fs                187.9375
rank_bachelors          69.1250
ran

Unnamed: 0_level_0,rank_%w,rank_%b,rank_%h,rank_%a,rank_%m,rank_%o,rank_income,rank_fs,rank_bachelors,rank_high_school,...,rank_total_se,rank_total_se_0-4,rank_total_se_5-9,rank_total_se_10-14,rank_total_se_15-17,rank_total_se_18-19,rank_total_se_20-24,rank_total_se_25-34,rank_total_se_35+,rank_beat
beat,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1935,15.0,197.0,159.0,67.0,20.0,84.0,49.0,252.0,16.0,256.0,...,256.0,25.0,47.0,59.0,14.0,61.0,124.0,135.0,268.0,1
1934,11.0,223.0,176.0,53.0,35.0,173.0,53.0,253.0,10.0,244.0,...,257.0,47.0,86.0,20.0,39.0,3.0,171.0,38.0,131.0,2
1221,75.0,158.0,70.0,121.0,142.0,110.0,57.0,199.0,64.0,195.0,...,214.0,38.0,90.0,18.0,64.0,74.0,117.0,174.0,124.0,3
114,37.0,184.0,158.0,25.0,109.0,87.0,10.0,270.0,11.0,246.0,...,177.0,7.0,19.0,56.0,13.0,24.0,11.0,16.0,145.0,4
1234,142.0,216.0,13.0,134.0,193.0,240.0,162.0,128.0,168.0,211.0,...,26.0,19.0,99.0,267.0,210.0,134.0,97.0,54.0,218.0,5
1214,22.0,186.0,166.0,45.0,76.0,252.0,5.0,241.0,5.0,255.0,...,267.0,153.0,158.0,26.0,42.0,20.0,214.0,139.0,229.0,6
225,236.0,14.0,263.0,213.0,145.0,157.0,226.0,24.0,193.0,107.0,...,24.0,144.0,136.0,149.0,213.0,234.0,165.0,145.0,196.0,7
1215,56.0,168.0,77.0,115.0,55.0,246.0,35.0,218.0,44.0,215.0,...,233.0,121.0,183.0,95.0,63.0,39.0,202.0,164.0,109.0,8
121,72.0,198.0,156.0,2.0,87.0,58.0,19.0,267.0,6.0,270.0,...,156.0,60.0,270.0,29.0,35.0,12.0,30.0,31.0,133.0,9
1921,13.0,260.0,124.0,101.0,6.0,141.0,15.0,258.0,42.0,237.0,...,115.0,34.0,74.0,180.0,182.0,13.0,102.0,188.0,147.0,10


**Key Takeaways:**
* When we look at how these beats rank in the context of all the police beats in Chicago, the mean and median rankings are relatively low  for percentage of Black citizens and percentage of residents on food stamps (both in the bottom 33% of all of Chicago)
* However, we see these beats rank relatively high for percentage of white citizens to the population in these beats and the median income of the residents in these beats in the context of all the police beats in Chicago, (well into the top 33% of all of Chicago)
* White citizens make up roughly 56% of these beats residents and Hispanic residents make up roughly 20% of these beats population. It's also worthwhile to mention that Black citizens make up about 12% of these beats residents. This is significant change from the top 20 beats in crime, where we saw about 78% of the beats residents are Black and only about 2% of these beats residents were White. 
* We also see these beats have an avergae median income much higher than the average median income of Chicago(about $17000 more than the average median income of all Chicago residents) and only about 10% of the residents in these beats are on food stamps (a little less than half the percentage of total Chicago residents on food stamps). This is well-reflected in the high average and median ranking for these beats in the median income category and the beats low ranking in the 'percentage of population on food stamps' category.

**Some Conclusions:**
* These beats are made up of mostly "afluent" White residents, however there are also a reasonable amount of Hispanic and Black citizens as well in these beats. Unsurprisingly, these beats can be categorized as more "wealthy" than the average Chicago beat and signficiantly more welathy than the beats with the most crime
* Just based off these observations, we would assume areas with greater White populations and higher standards of living will be least active in terms of crime. Such areas would be presumptively policed the lightest

**Now, the bottom 20 beats in arrests:**

In [102]:
lastFile.readline()
l20_arrests = lastFile.readline().split(' ')
l20_arrests = l20_arrests[:-1]
for i in range (0,20):
    l20_arrests[i] = pad0(l20_arrests[i])
print(l20_arrests)

['1813', '0215', '1221', '1925', '0121', '0114', '1225', '1921', '1621', '1915', '1235', '1653', '1934', '1215', '1935', '1214', '1652', '1655', '0235', '0430']


In [103]:
df_lastArrest = df_beat[df_beat['beat'].isin(l20_arrests)]
df_lastArrest.set_index('beat', inplace = True)
df_lastArrest = df_lastArrest.reindex(l20_arrests)

In [104]:
df_lastArrest.reset_index(inplace = True)

Unfortuantely, some of these beats are located in Chicago-O'Hare Airport and thus have no data...we will drop them from our analysis

In [105]:
df_lastArrest = df_lastArrest.loc[(df_lastArrest.beat != '1653') & (df_lastArrest.beat != '1652') & (df_lastArrest.beat != '1655') & (df_lastArrest.beat != '0430')]
df_lastArrest

Unnamed: 0,beat,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,med_income,...,rank_no_high_school,rank_total_se,rank_total_se_0-4,rank_total_se_5-9,rank_total_se_10-14,rank_total_se_15-17,rank_total_se_18-19,rank_total_se_20-24,rank_total_se_25-34,rank_total_se_35+
0,1813,9218.081587,0.372833,7078.574154,496.488339,729.868903,559.933237,299.448506,53.76843,137020.089595,...,258.0,186.0,14.0,76.0,81.0,24.0,213.0,112.0,231.0,225.0
1,215,3738.721383,0.342409,195.333519,77.837961,3376.979791,30.924931,49.692694,7.952487,33216.708023,...,137.0,78.0,167.0,140.0,253.0,235.0,53.0,136.0,166.0,24.0
2,1221,8780.177783,0.876067,4210.619816,2872.756274,1364.102238,208.735405,102.095102,21.86897,69107.171923,...,109.0,214.0,38.0,90.0,18.0,64.0,74.0,117.0,174.0,124.0
3,1925,24320.457442,0.429108,17974.530768,2397.553386,1015.14037,1995.019997,791.072452,147.140495,67345.909851,...,255.0,259.0,54.0,42.0,53.0,205.0,116.0,114.0,76.0,139.0
4,121,6400.496775,0.264266,3113.800044,427.296424,297.086085,2401.104246,131.279268,29.930697,101550.382372,...,270.0,156.0,60.0,270.0,29.0,35.0,12.0,30.0,31.0,133.0
5,114,16145.43461,0.66811,10782.50253,1057.578252,1246.289115,2736.379391,268.33286,54.352438,110367.182242,...,267.0,177.0,7.0,19.0,56.0,13.0,24.0,11.0,16.0,145.0
6,1225,5667.511757,0.49373,1240.751479,854.231388,2912.663544,510.814199,134.02173,15.02943,32263.371373,...,195.0,11.0,73.0,3.0,70.0,248.0,103.0,16.0,3.0,150.0
7,1921,13317.112246,0.877715,10516.682256,1523.697738,152.226201,541.494208,561.693335,21.3185,104182.775635,...,230.0,115.0,34.0,74.0,180.0,182.0,13.0,102.0,188.0,147.0
8,1621,11070.111491,2.489748,8418.392843,1372.054708,132.192747,840.76546,306.28619,0.419551,104297.189188,...,233.0,152.0,64.0,81.0,216.0,145.0,152.0,107.0,236.0,129.0
9,1915,11336.462507,0.678057,7760.272984,1505.44695,1071.215711,581.969341,368.229232,49.32829,53025.445634,...,224.0,266.0,270.0,16.0,270.0,271.0,5.0,192.0,48.0,45.0


Let's do some aggregate measurements on this data:

First, we will look at race percentages for these beats all together:

In [106]:
totPop = 0
wPop = 0
bPop = 0
hPop = 0
mPop = 0
aPop = 0
oPop = 0
for ind in df_lastArrest.index:
    totPop += df_lastArrest['population'][ind]
    wPop += df_lastArrest['num_white'][ind]
    bPop += df_lastArrest['num_black'][ind]
    hPop += df_lastArrest['num_hispanic'][ind]
    mPop += df_lastArrest['num_mixed'][ind]
    aPop += df_lastArrest['num_asian'][ind]
    oPop += df_lastArrest['num_other'][ind]
print('LAST 20 ARREST BEATS RACE BREAKDOWN:\n')
print('Percentage of Population that is White: '+str(wPop/totPop*100)+'%')
print('Percentage of Population that is Black: '+str(bPop/totPop*100)+'%')
print('Percentage of Population that is Hispanic: '+str(hPop/totPop*100)+'%')
print('Percentage of Population that is Asian: '+str(aPop/totPop*100)+'%')
print('Percentage of Population that is Mixed: '+str(mPop/totPop*100)+'%')
print('Percentage of Population that is Other: '+str(oPop/totPop*100)+'%')


LAST 20 ARREST BEATS RACE BREAKDOWN:

Percentage of Population that is White: 65.11741987285271%
Percentage of Population that is Black: 9.801163555759507%
Percentage of Population that is Hispanic: 13.171075593989489%
Percentage of Population that is Asian: 8.895695705763936%
Percentage of Population that is Mixed: 2.7160150677311172%
Percentage of Population that is Other: 0.2986301793248909%


Now median income and food stamps:

In [107]:
tot_MI = 0
tot_FS = 0
for ind in df_lastArrest.index:
    tot_MI += df_lastArrest['med_income'][ind]
    tot_FS += df_lastArrest['pop_food_stamps'][ind]
print('LAST 20 ARREST BEATS INCOME AND FOOD STAMPS BREAKDOWN:\n')
print('Average Median Income: '+str(tot_MI/20.0))
print('Percentage of Population on Food Stamps: '+str(tot_FS/totPop*100)+'%')

LAST 20 ARREST BEATS INCOME AND FOOD STAMPS BREAKDOWN:

Average Median Income: 63570.698214887685
Percentage of Population on Food Stamps: 2.9940366059461896%


Now, the educational attainment percentages:

In [108]:
bachelorsPop = df_lastArrest['bachelors'].sum()
HSPop = df_lastArrest['high_school'].sum()
no_HSPop = df_lastArrest['no_high_school'].sum()
print('LAST 20 ARRESTS BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:\n')
print('Percentage of Population with at least a Bachelor\'s Degree: '+str(bachelorsPop/totPop*100)+'%')
print('Percentage of Population with at most a High School Diploma '+str(HSPop/totPop*100)+'%')
print('Percentage of Population without a High School Diploma '+str(no_HSPop/totPop*100)+'%')

LAST 20 ARRESTS BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:

Percentage of Population with at least a Bachelor's Degree: 56.09342144700178%
Percentage of Population with at most a High School Diploma 15.209279433111986%
Percentage of Population without a High School Diploma 3.9358381003988363%


School enrollment percentages:

In [109]:
total_se = df_lastArrest['total_se'].sum()
total_se_0to4 = df_lastArrest['se_0-4'].sum()
total_se_5to9 = df_lastArrest['se_5-9'].sum()
total_se_10to14 = df_lastArrest['se_10-14'].sum()
total_se_15to17 = df_lastArrest['se_15-17'].sum()
total_se_18to19 = df_lastArrest['se_18-19'].sum()
total_se_20to24 = df_lastArrest['se_20-24'].sum()
total_se_25to34 = df_lastArrest['se_25-34'].sum()
total_se_35plus = df_lastArrest['se_35+'].sum()
total_0to4 = df_lastArrest['0-4'].sum()
total_5to9 = df_lastArrest['5-9'].sum()
total_10to14 = df_lastArrest['10-14'].sum()
total_15to17 = df_lastArrest['15-17'].sum()
total_18to19 = df_lastArrest['18-19'].sum()
total_20to24 = df_lastArrest['20'].sum() + df_lastArrest['21'].sum() + df_lastArrest['22-24'].sum()
total_25to34 = df_lastArrest['25-29'].sum() + df_lastArrest['30-34'].sum() 
total_youngPop = totPop - (df_lastArrest['25-29'].sum() + df_lastArrest['30-34'].sum()) - (df_lastArrest['20'].sum() + df_lastArrest['21'].sum() + df_lastArrest['22-24'].sum()) - df_lastArrest['18-19'].sum() - df_lastArrest['15-17'].sum() -df_lastArrest['10-14'].sum() -df_lastArrest['5-9'].sum()-df_lastArrest['0-4'].sum()

print('LAST 20 ARRESTS BEATS SCHOOL ENROLLMENT BREAKDOWN:\n')
print('Percentage of Residents that are Enrolled Students (All): '+str(total_se/totPop*100)+'%')
print('Percentage of Residents that are Enrolled Students (0-4): '+str(total_se_0to4/total_0to4*100)+'%')
print('Percentage of Residents that are Enrolled Students (5-9): '+str(total_se_5to9/total_5to9*100)+'%')
print('Percentage of Residents that are Enrolled Students (10-14): '+str(total_se_10to14/total_10to14*100)+'%')
print('Percentage of Residents that are Enrolled Students (15-17): '+str(total_se_15to17/total_15to17*100)+'%')
print('Percentage of Residents that are Enrolled Students (18-19): '+str(total_se_18to19/total_18to19*100)+'%')
print('Percentage of Residents that are Enrolled Students (20-24): '+str(total_se_20to24/total_20to24*100)+'%')
print('Percentage of Residents that are Enrolled Students (25-34): '+str(total_se_25to34/total_25to34*100)+'%')
print('Percentage of Residents that are Enrolled Students (35+): '+str(total_se_35plus/total_youngPop*100)+'%')

LAST 20 ARRESTS BEATS SCHOOL ENROLLMENT BREAKDOWN:

Percentage of Residents that are Enrolled Students (All): 24.607236003471158%
Percentage of Residents that are Enrolled Students (0-4): 72.55693043108326%
Percentage of Residents that are Enrolled Students (5-9): 98.31242887511864%
Percentage of Residents that are Enrolled Students (10-14): 98.51195034155154%
Percentage of Residents that are Enrolled Students (15-17): 96.55049085931675%
Percentage of Residents that are Enrolled Students (18-19): 94.88317987995191%
Percentage of Residents that are Enrolled Students (20-24): 46.499441835426595%
Percentage of Residents that are Enrolled Students (25-34): 16.946087286180802%
Percentage of Residents that are Enrolled Students (35+): 3.1605511113884828%


Lastly, we will look at the age breakdowns:

In [110]:
minors = df_lastArrest['<=21'].sum()
twenties = df_lastArrest['22-29'].sum()
thirties = df_lastArrest['30-39'].sum()
forties = df_lastArrest['40-49'].sum()
fifties = df_lastArrest['50-59'].sum()
sixties = df_lastArrest['60-64'].sum()
seniors = df_lastArrest['65+'].sum()
print('LAST 20 ARRESTS BEATS AGE BREAKDOWN:\n')
print('Percentage of Residents that are Minors (<=21): '+str(minors/totPop*100)+'%')
print('Percentage of Residents that are in their Twenties: '+str(twenties/totPop*100)+'%')
print('Percentage of Residents that are in their Thirties: '+str(thirties/totPop*100)+'%')
print('Percentage of Residents that are in their Forties: '+str(forties/totPop*100)+'%')
print('Percentage of Residents that are in their Fifties: '+str(fifties/totPop*100)+'%')
print('Percentage of Residents that are in between 60-64: '+str(sixties/totPop*100)+'%')
print('Percentage of Residents that are Seniors (65+): '+str(seniors/totPop*100)+'%')

LAST 20 ARRESTS BEATS AGE BREAKDOWN:

Percentage of Residents that are Minors (<=21): 17.750737939996977%
Percentage of Residents that are in their Twenties: 24.65547057784104%
Percentage of Residents that are in their Thirties: 20.971875610121124%
Percentage of Residents that are in their Forties: 11.765407409170319%
Percentage of Residents that are in their Fifties: 10.079961033937813%
Percentage of Residents that are in between 60-64: 4.241728429533125%
Percentage of Residents that are Seniors (65+): 10.534795768477059%


Now, we will compute the average and median ranking for the race, income, food stamp, and educational attainment categories for this data:

In [111]:
df_lastArrest_ranks = df_lastArrest[['beat','rank_%w', 'rank_%b', 'rank_%h', 'rank_%a', 'rank_%m',
       'rank_%o', 'rank_income', 'rank_fs', 'rank_bachelors',
       'rank_high_school', 'rank_no_high_school', 'rank_total_se',
       'rank_total_se_0-4', 'rank_total_se_5-9', 'rank_total_se_10-14',
       'rank_total_se_15-17', 'rank_total_se_18-19', 'rank_total_se_20-24',
       'rank_total_se_25-34', 'rank_total_se_35+']]

In [112]:
beatList = list(df_lastArrest_ranks.beat)
df_lastArrest_ranks['rank_beat'] = df_lastArrest_ranks.beat.apply(lambda x: beatList.index(x)+1)
df_lastArrest_ranks.set_index('beat',inplace = True)
print('LAST 20 ARREST BEATS MEDIAN RANKINGS: ')
print(df_lastArrest_ranks.median(axis = 0))
print()
print('LAST 20 ARREST BEATS AVERAGE (MEAN) RANKINGS: ')
print(df_lastArrest_ranks.mean(axis = 0))
df_lastArrest_ranks

LAST 20 ARREST BEATS MEDIAN RANKINGS: 
rank_%w                 36.0
rank_%b                185.0
rank_%h                144.5
rank_%a                 63.0
rank_%m                 50.5
rank_%o                 96.5
rank_income             51.0
rank_fs                239.0
rank_bachelors          31.0
rank_high_school       239.5
rank_no_high_school    243.5
rank_total_se          181.5
rank_total_se_0-4       57.0
rank_total_se_5-9       78.5
rank_total_se_10-14     75.5
rank_total_se_15-17     71.5
rank_total_se_18-19     46.0
rank_total_se_20-24    113.0
rank_total_se_25-34    115.0
rank_total_se_35+      136.0
rank_beat                8.5
dtype: float64

LAST 20 ARREST BEATS AVERAGE (MEAN) RANKINGS: 
rank_%w                 52.9375
rank_%b                182.7500
rank_%h                135.2500
rank_%a                 69.3125
rank_%m                 64.4375
rank_%o                118.3750
rank_income             61.1250
rank_fs                211.4375
rank_bachelors          42.0625
r

Unnamed: 0_level_0,rank_%w,rank_%b,rank_%h,rank_%a,rank_%m,rank_%o,rank_income,rank_fs,rank_bachelors,rank_high_school,...,rank_total_se,rank_total_se_0-4,rank_total_se_5-9,rank_total_se_10-14,rank_total_se_15-17,rank_total_se_18-19,rank_total_se_20-24,rank_total_se_25-34,rank_total_se_35+,rank_beat
beat,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1813,18.0,182.0,172.0,73.0,24.0,40.0,1.0,237.0,15.0,266.0,...,186.0,14.0,76.0,81.0,24.0,213.0,112.0,231.0,225.0,1
215,160.0,78.0,215.0,152.0,133.0,115.0,193.0,43.0,136.0,97.0,...,78.0,167.0,140.0,253.0,235.0,53.0,136.0,166.0,24.0,2
1221,75.0,158.0,70.0,121.0,142.0,110.0,57.0,199.0,64.0,195.0,...,214.0,38.0,90.0,18.0,64.0,74.0,117.0,174.0,124.0,3
1925,23.0,206.0,133.0,51.0,23.0,38.0,59.0,245.0,9.0,242.0,...,259.0,54.0,42.0,53.0,205.0,116.0,114.0,76.0,139.0,4
121,72.0,198.0,156.0,2.0,87.0,58.0,19.0,267.0,6.0,270.0,...,156.0,60.0,270.0,29.0,35.0,12.0,30.0,31.0,133.0,5
114,37.0,184.0,158.0,25.0,109.0,87.0,10.0,270.0,11.0,246.0,...,177.0,7.0,19.0,56.0,13.0,24.0,11.0,16.0,145.0,6
1225,116.0,116.0,107.0,49.0,65.0,104.0,197.0,109.0,82.0,212.0,...,11.0,73.0,3.0,70.0,248.0,103.0,16.0,3.0,150.0,7
1921,13.0,260.0,124.0,101.0,6.0,141.0,15.0,258.0,42.0,237.0,...,115.0,34.0,74.0,180.0,182.0,13.0,102.0,188.0,147.0,8
1621,20.0,258.0,120.0,59.0,46.0,218.0,14.0,259.0,59.0,210.0,...,152.0,64.0,81.0,216.0,145.0,152.0,107.0,236.0,129.0,9
1915,35.0,177.0,117.0,89.0,25.0,65.0,95.0,157.0,20.0,224.0,...,266.0,270.0,16.0,270.0,271.0,5.0,192.0,48.0,45.0,10


**Key Takeaways:**
* When we look at how these beats rank in the context of all the police beats in Chicago, the mean and median rankings are relatively low  for percentage of Black citizens and percentage of residents on food stamps (both in the bottom 33% of all of Chicago)
* However, we see these beats rank relatively high for percentage of white citizens to the population in these beats and the median income of the residents in these beats in the context of all the police beats in Chicago, (well into the top 33% of all of Chicago)
* White citizens make up roughly 65% of these beats residents and Hispanic residents make up roughly 13% of these beats population. It's also worthwhile to mention that Black citizens make up about 10% of these beats residents. This is significant increase in the percentage of White residents and sizeable decrease in the percentage of Hispanic and Black residents when compared to the Bottom 20 Crime beats, and an even bigger shift in these categories when comapred against the Top 20 Arrest Beats
* We also see these beats have an avergae median income  higher than the average median income of Chicago(about $14000 more than the average median income of all Chicago residents) and only about 6.5% of the residents in these beats are on food stamps (a little less than half the percentage of total Chicago residents on food stamps). This is well-reflected in the high average and median ranking for these beats in the median income category and the beats low ranking in the 'percentage of population on food stamps' category. The average median income for these beats is lower than the average median income in the bottom 20 Crime beats, but a smaller percentage of the population is on food stamps comapred to the bottom 20 Crime beats
* Many of the residents in these beats were in there twenties and thirties (24% and 20%, respectively) and around 30% of residents were close to equally split amongst residents in their forties, fifties, and senior citizens. Around 17% of residents were minors, 10% lower than the percentage of minors in all of Chicago
*These beats also had a high ranking for residents earning at least a Bachelor's Degree (about 74% of residents) and ranked very low for residents who haven't earned a Bachelor's Degree (about 20% of residents had a High School Diploma as their highest degree, 5% didn't have a High School Diploma)

**Some Conclusions:**
* These beats are made up of mostly "afluent" White residents, however there are also a reasonable amount of Hispanic and Black citizens as well in these beats. Unsurprisingly, these beats can be categorized as more "wealthy" than the average Chicago beat and signficiantly more welathy than the beats with the most crime. These beats have a higher density of White residents and fewer of Hispanic and Black reisdents when compared to the bottom 20 Crime beats. Yet, we see these beats are categorically "less wealthy", but there is a smaller density of residents on food stamps
* These residents also had a higher level of educational attainment than the Last 20 Crime Beats; a level which far exceeds the city-wide average (74% of these beat residents have at least a Bachelor's, 40% of all Chicago residents have at least a Bachelor's). Thus, we see the residents of these beats are not only predominantly White and "afluent", they also have much higher levels of eduational attainment on average
* Just based off these observations, we would assume areas with greater White populations and higher standards of living will be least active in terms of crime and arrests. Areas of greater White population density will have fewer arrests, but areas of about equivalent "high" affluence will be equaly low in crimes and arrests. Such areas would be presumptively policed the lightest

**Now, the last 20 beats in arrests:crime ratio:**

In [113]:
lastFile.readline()
l20_ratio = lastFile.readline().split(' ')
l20_ratio = l20_ratio[:-1]
for i in range (0,20):
    l20_ratio[i] = pad0(l20_ratio[i])
print(l20_ratio)

['1911', '1631', '1613', '2411', '1424', '1611', '1614', '1934', '1933', '1621', '1215', '0430', '1922', '2333', '1323', '1813', '0235', '1814', '1214', '1935']


In [114]:
df_lastRatio = df_beat[df_beat['beat'].isin(l20_ratio)]
df_lastRatio.set_index('beat', inplace = True)
df_lastRatio = df_lastRatio.reindex(l20_ratio)

In [115]:
df_lastRatio.reset_index(inplace = True)
df_lastRatio

Unnamed: 0,beat,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,med_income,...,rank_no_high_school,rank_total_se,rank_total_se_0-4,rank_total_se_5-9,rank_total_se_10-14,rank_total_se_15-17,rank_total_se_18-19,rank_total_se_20-24,rank_total_se_25-34,rank_total_se_35+
0,1911,17767.158676,0.878423,13045.537575,2591.979756,457.467469,1004.638053,642.41616,25.119627,73532.65031,...,228.0,207.0,31.0,116.0,116.0,69.0,15.0,74.0,142.0,61.0
1,1631,18763.284491,2.429306,13612.702295,4140.266891,162.340603,685.879715,160.575984,1.519013,62405.827363,...,178.0,225.0,250.0,222.0,234.0,166.0,58.0,51.0,126.0,115.0
2,1613,13346.466422,1.464612,10470.250498,1899.687615,90.135573,619.052992,191.100197,76.239566,76982.894595,...,216.0,223.0,83.0,196.0,193.0,202.0,95.0,85.0,132.0,123.0
3,2411,24720.41211,1.33716,12517.901605,3697.027598,4012.300739,3822.806287,582.762838,87.613104,65667.057217,...,181.0,118.0,101.0,203.0,212.0,216.0,118.0,70.0,140.0,119.0
4,1424,9525.016282,0.419284,6633.707739,1323.233833,647.697368,614.46359,305.913726,0.0,99156.249796,...,214.0,252.0,173.0,33.0,101.0,164.0,155.0,147.0,159.0,234.0
5,1611,17989.306241,1.993424,15631.944837,1721.878456,186.044395,205.020327,199.089549,45.328688,84946.399284,...,223.0,218.0,138.0,215.0,168.0,240.0,117.0,40.0,88.0,127.0
6,1614,15422.004847,2.782935,11574.064301,1413.802529,228.272084,1511.61998,618.638398,75.607582,55987.455487,...,192.0,215.0,128.0,156.0,14.0,142.0,57.0,65.0,109.0,88.0
7,1934,19672.58499,0.436985,15830.242116,1018.196945,606.431035,1582.88029,613.836666,20.99795,70989.232193,...,262.0,257.0,47.0,86.0,20.0,39.0,3.0,171.0,38.0,131.0
8,1933,10749.653631,0.375128,8942.440029,635.359978,291.345903,593.229421,267.705253,19.573046,114240.079245,...,263.0,246.0,68.0,146.0,3.0,60.0,17.0,159.0,158.0,235.0
9,1621,11070.111491,2.489748,8418.392843,1372.054708,132.192747,840.76546,306.28619,0.419551,104297.189188,...,233.0,152.0,64.0,81.0,216.0,145.0,152.0,107.0,236.0,129.0


Unfortuantely, some of these beats are missing data...we will drop them from our analysis

In [116]:
df_lastRatio = df_lastRatio.loc[(df_lastRatio.beat != '0430') & (df_lastRatio.beat != '2333') & (df_lastRatio.beat != '1323') ]
df_lastRatio

Unnamed: 0,beat,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,med_income,...,rank_no_high_school,rank_total_se,rank_total_se_0-4,rank_total_se_5-9,rank_total_se_10-14,rank_total_se_15-17,rank_total_se_18-19,rank_total_se_20-24,rank_total_se_25-34,rank_total_se_35+
0,1911,17767.158676,0.878423,13045.537575,2591.979756,457.467469,1004.638053,642.41616,25.119627,73532.65031,...,228.0,207.0,31.0,116.0,116.0,69.0,15.0,74.0,142.0,61.0
1,1631,18763.284491,2.429306,13612.702295,4140.266891,162.340603,685.879715,160.575984,1.519013,62405.827363,...,178.0,225.0,250.0,222.0,234.0,166.0,58.0,51.0,126.0,115.0
2,1613,13346.466422,1.464612,10470.250498,1899.687615,90.135573,619.052992,191.100197,76.239566,76982.894595,...,216.0,223.0,83.0,196.0,193.0,202.0,95.0,85.0,132.0,123.0
3,2411,24720.41211,1.33716,12517.901605,3697.027598,4012.300739,3822.806287,582.762838,87.613104,65667.057217,...,181.0,118.0,101.0,203.0,212.0,216.0,118.0,70.0,140.0,119.0
4,1424,9525.016282,0.419284,6633.707739,1323.233833,647.697368,614.46359,305.913726,0.0,99156.249796,...,214.0,252.0,173.0,33.0,101.0,164.0,155.0,147.0,159.0,234.0
5,1611,17989.306241,1.993424,15631.944837,1721.878456,186.044395,205.020327,199.089549,45.328688,84946.399284,...,223.0,218.0,138.0,215.0,168.0,240.0,117.0,40.0,88.0,127.0
6,1614,15422.004847,2.782935,11574.064301,1413.802529,228.272084,1511.61998,618.638398,75.607582,55987.455487,...,192.0,215.0,128.0,156.0,14.0,142.0,57.0,65.0,109.0,88.0
7,1934,19672.58499,0.436985,15830.242116,1018.196945,606.431035,1582.88029,613.836666,20.99795,70989.232193,...,262.0,257.0,47.0,86.0,20.0,39.0,3.0,171.0,38.0,131.0
8,1933,10749.653631,0.375128,8942.440029,635.359978,291.345903,593.229421,267.705253,19.573046,114240.079245,...,263.0,246.0,68.0,146.0,3.0,60.0,17.0,159.0,158.0,235.0
9,1621,11070.111491,2.489748,8418.392843,1372.054708,132.192747,840.76546,306.28619,0.419551,104297.189188,...,233.0,152.0,64.0,81.0,216.0,145.0,152.0,107.0,236.0,129.0


Let's do some aggregate measurements on this data:

First, we will look at race percentages for these beats all together:

In [117]:
totPop = 0
wPop = 0
bPop = 0
hPop = 0
mPop = 0
aPop = 0
oPop = 0
for ind in df_lastRatio.index:
    totPop += df_lastRatio['population'][ind]
    wPop += df_lastRatio['num_white'][ind]
    bPop += df_lastRatio['num_black'][ind]
    hPop += df_lastRatio['num_hispanic'][ind]
    mPop += df_lastRatio['num_mixed'][ind]
    aPop += df_lastRatio['num_asian'][ind]
    oPop += df_lastRatio['num_other'][ind]
print('LAST 20 ARRESTS:CRIME BEATS RACE BREAKDOWN:\n')
print('Percentage of Population that is White: '+str(wPop/totPop*100)+'%')
print('Percentage of Population that is Black: '+str(bPop/totPop*100)+'%')
print('Percentage of Population that is Hispanic: '+str(hPop/totPop*100)+'%')
print('Percentage of Population that is Asian: '+str(aPop/totPop*100)+'%')
print('Percentage of Population that is Mixed: '+str(mPop/totPop*100)+'%')
print('Percentage of Population that is Other: '+str(oPop/totPop*100)+'%')

LAST 20 ARRESTS:CRIME BEATS RACE BREAKDOWN:

Percentage of Population that is White: 73.37929947220283%
Percentage of Population that is Black: 5.3026156083172955%
Percentage of Population that is Hispanic: 11.190545614238145%
Percentage of Population that is Asian: 7.276165219376544%
Percentage of Population that is Mixed: 2.597306954128243%
Percentage of Population that is Other: 0.25406712531363324%


Now median income and food stamps:

In [118]:
tot_MI = 0
tot_FS = 0
for ind in df_lastRatio.index:
    tot_MI += df_lastRatio['med_income'][ind]
    tot_FS += df_lastRatio['pop_food_stamps'][ind]
print('LAST 20 ARREST:CRIME BEATS INCOME AND FOOD STAMPS BREAKDOWN:\n')
print('Average Median Income: '+str(tot_MI/20.0))
print('Percentage of Population on Food Stamps: '+str(tot_FS/totPop*100)+'%')

LAST 20 ARREST:CRIME BEATS INCOME AND FOOD STAMPS BREAKDOWN:

Average Median Income: 73989.02789528212
Percentage of Population on Food Stamps: 2.3994333679973945%


Now, the educational attainment percentages:

In [119]:
bachelorsPop = df_lastRatio['bachelors'].sum()
HSPop = df_lastRatio['high_school'].sum()
no_HSPop = df_lastRatio['no_high_school'].sum()
print('LAST 20 ARRESTS:CRIME BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:\n')
print('Percentage of Population with at least a Bachelor\'s Degree: '+str(bachelorsPop/totPop*100)+'%')
print('Percentage of Population with at most a High School Diploma '+str(HSPop/totPop*100)+'%')
print('Percentage of Population without a High School Diploma '+str(no_HSPop/totPop*100)+'%')

LAST 20 ARRESTS:CRIME BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:

Percentage of Population with at least a Bachelor's Degree: 49.590196017887344%
Percentage of Population with at most a High School Diploma 20.024010705148967%
Percentage of Population without a High School Diploma 4.323991388941203%


School enrollment percentages:

In [120]:
total_se = df_lastRatio['total_se'].sum()
total_se_0to4 = df_lastRatio['se_0-4'].sum()
total_se_5to9 = df_lastRatio['se_5-9'].sum()
total_se_10to14 = df_lastRatio['se_10-14'].sum()
total_se_15to17 = df_lastRatio['se_15-17'].sum()
total_se_18to19 = df_lastRatio['se_18-19'].sum()
total_se_20to24 = df_lastRatio['se_20-24'].sum()
total_se_25to34 = df_lastRatio['se_25-34'].sum()
total_se_35plus = df_lastRatio['se_35+'].sum()
total_0to4 = df_lastRatio['0-4'].sum()
total_5to9 = df_lastRatio['5-9'].sum()
total_10to14 = df_lastRatio['10-14'].sum()
total_15to17 = df_lastRatio['15-17'].sum()
total_18to19 = df_lastRatio['18-19'].sum()
total_20to24 = df_lastRatio['20'].sum() + df_lastRatio['21'].sum() + df_lastRatio['22-24'].sum()
total_25to34 = df_lastRatio['25-29'].sum() + df_lastRatio['30-34'].sum() 
total_youngPop = totPop - (df_lastRatio['25-29'].sum() + df_lastRatio['30-34'].sum()) - (df_lastRatio['20'].sum() + df_lastRatio['21'].sum() + df_lastRatio['22-24'].sum()) - df_lastRatio['18-19'].sum() - df_lastRatio['15-17'].sum() -df_lastRatio['10-14'].sum() -df_lastRatio['5-9'].sum()-df_lastRatio['0-4'].sum()

print('LAST 20 ARRESTS:CRIME BEATS SCHOOL ENROLLMENT BREAKDOWN:\n')
print('Percentage of Residents that are Enrolled Students (All): '+str(total_se/totPop*100)+'%')
print('Percentage of Residents that are Enrolled Students (0-4): '+str(total_se_0to4/total_0to4*100)+'%')
print('Percentage of Residents that are Enrolled Students (5-9): '+str(total_se_5to9/total_5to9*100)+'%')
print('Percentage of Residents that are Enrolled Students (10-14): '+str(total_se_10to14/total_10to14*100)+'%')
print('Percentage of Residents that are Enrolled Students (15-17): '+str(total_se_15to17/total_15to17*100)+'%')
print('Percentage of Residents that are Enrolled Students (18-19): '+str(total_se_18to19/total_18to19*100)+'%')
print('Percentage of Residents that are Enrolled Students (20-24): '+str(total_se_20to24/total_20to24*100)+'%')
print('Percentage of Residents that are Enrolled Students (25-34): '+str(total_se_25to34/total_25to34*100)+'%')
print('Percentage of Residents that are Enrolled Students (35+): '+str(total_se_35plus/total_youngPop*100)+'%')

LAST 20 ARRESTS:CRIME BEATS SCHOOL ENROLLMENT BREAKDOWN:

Percentage of Residents that are Enrolled Students (All): 23.91704548844021%
Percentage of Residents that are Enrolled Students (0-4): 66.02324228547945%
Percentage of Residents that are Enrolled Students (5-9): 96.07864929501491%
Percentage of Residents that are Enrolled Students (10-14): 98.5356769910129%
Percentage of Residents that are Enrolled Students (15-17): 95.67355582430642%
Percentage of Residents that are Enrolled Students (18-19): 91.21354277973323%
Percentage of Residents that are Enrolled Students (20-24): 41.99948212030146%
Percentage of Residents that are Enrolled Students (25-34): 13.458695849334504%
Percentage of Residents that are Enrolled Students (35+): 2.501937025708346%


Lastly, we will look at the age breakdowns:

In [121]:
minors = df_lastRatio['<=21'].sum()
twenties = df_lastRatio['22-29'].sum()
thirties = df_lastRatio['30-39'].sum()
forties = df_lastRatio['40-49'].sum()
fifties = df_lastRatio['50-59'].sum()
sixties = df_lastRatio['60-64'].sum()
seniors = df_lastRatio['65+'].sum()
print('LAST 20 ARRESTS:CRIME BEATS AGE BREAKDOWN:\n')
print('Percentage of Residents that are Minors (<=21): '+str(minors/totPop*100)+'%')
print('Percentage of Residents that are in their Twenties: '+str(twenties/totPop*100)+'%')
print('Percentage of Residents that are in their Thirties: '+str(thirties/totPop*100)+'%')
print('Percentage of Residents that are in their Forties: '+str(forties/totPop*100)+'%')
print('Percentage of Residents that are in their Fifties: '+str(fifties/totPop*100)+'%')
print('Percentage of Residents that are in between 60-64: '+str(sixties/totPop*100)+'%')
print('Percentage of Residents that are Seniors (65+): '+str(seniors/totPop*100)+'%')

LAST 20 ARRESTS:CRIME BEATS AGE BREAKDOWN:

Percentage of Residents that are Minors (<=21): 20.772697666974814%
Percentage of Residents that are in their Twenties: 19.66747272530641%
Percentage of Residents that are in their Thirties: 19.794229179521665%
Percentage of Residents that are in their Forties: 12.694053756605106%
Percentage of Residents that are in their Fifties: 10.718074417489854%
Percentage of Residents that are in between 60-64: 4.694881213717347%
Percentage of Residents that are Seniors (65+): 11.65858184352181%


Now, we will compute the average and median ranking for the race, income, food stamp, and educational attainment categories for this data:

In [122]:
df_lastRatio_ranks = df_lastRatio[['beat','rank_%w', 'rank_%b', 'rank_%h', 'rank_%a', 'rank_%m',
       'rank_%o', 'rank_income', 'rank_fs', 'rank_bachelors',
       'rank_high_school', 'rank_no_high_school', 'rank_total_se',
       'rank_total_se_0-4', 'rank_total_se_5-9', 'rank_total_se_10-14',
       'rank_total_se_15-17', 'rank_total_se_18-19', 'rank_total_se_20-24',
       'rank_total_se_25-34', 'rank_total_se_35+']]

In [123]:
beatList = list(df_lastRatio_ranks.beat)
df_lastRatio_ranks['rank_beat'] = df_lastRatio_ranks.beat.apply(lambda x: beatList.index(x)+1)
df_lastRatio_ranks.set_index('beat',inplace = True)
print('LAST 20 CRIME BEATS MEDIAN RANKINGS: ')
print(df_lastRatio_ranks.median(axis = 0))
print()
print('LAST 20 CRIME BEATS AVERAGE (MEAN) RANKINGS: ')
print(df_lastRatio_ranks.mean(axis = 0))

LAST 20 CRIME BEATS MEDIAN RANKINGS: 
rank_%w                 20.0
rank_%b                223.0
rank_%h                139.0
rank_%a                 68.0
rank_%m                 46.0
rank_%o                106.0
rank_income             39.0
rank_fs                241.0
rank_bachelors          38.0
rank_high_school       244.0
rank_no_high_school    233.0
rank_total_se          223.0
rank_total_se_0-4       82.0
rank_total_se_5-9      146.0
rank_total_se_10-14    101.0
rank_total_se_15-17     88.0
rank_total_se_18-19     58.0
rank_total_se_20-24    107.0
rank_total_se_25-34    139.0
rank_total_se_35+      129.0
rank_beat                9.0
dtype: float64

LAST 20 CRIME BEATS AVERAGE (MEAN) RANKINGS: 
rank_%w                 24.705882
rank_%b                215.764706
rank_%h                137.000000
rank_%a                 72.764706
rank_%m                 58.294118
rank_%o                133.529412
rank_income             37.411765
rank_fs                237.470588
rank_bachelors     

**Key Takeaways:**
* When we look at how these beats rank in the context of all the police beats in Chicago, the mean and median rankings are relatively low for percentage of Black citizens and percentage of residents on food stamps (both in the bottom 33% of all of Chicago)
* However, we see these beats rank relatively high for percentage of white citizens to the population in these beats and the median income of the residents in these beats in the context of all the police beats in Chicago, (top 33% of all of Chicago)
* White citizens make up roughly 73% of these beats residents and Hispanic residents make up roughly 11% of these beats population, a significant jump in the percentage of White residents from the Last 20 Crime and Arrests Beats. It's also worthwhile to mention that Black citizens make up about 5% of these beats residents; the lowest percentage of Black residents we have seen yet
* We also see these beats have an avergae median income  higher than the average median income of Chicago(over $24000 more than the average median income of all Chicago residents) and only about 5.7% of the residents in these beats are on food stamps (a little less than half the percentage of total Chicago residents on food stamps). This is well-reflected in the high average and median ranking for these beats in the median income category and the beats low ranking in the 'percentage of population on food stamps' category. The average median income for these beats is higher than the average median income in the bottom 20 Crime and Arrests beats and a smaller percentage of the population is on food stamps comapred to the bottom 20 Crime and Arrests beats
* Minors, residents in their twenties, and residents in their thirties each made up close to 20% of these residents population, the smallest percentage of minors we have seen thus far and the greatest percentage of residents in their twenties and thirties we have seen. Residents in their forties, fifties, and senior citizens each made up close to 11% of these resident's population, a figure almost identical to the city-wide average for these age groups. 
*These beats also had a high ranking for residents earning at least a Bachelor's Degree (about 66% of residents) and ranked very low for residents who haven't earned a Bachelor's Degree (about 27% of residents had a High School Diploma as their highest degree, 6% didn't have a High School Diploma). However, these beats had lower levels of educational attainment than the bottom 20 arrests beats

**Some Conclusions:**
* These beats are made up of mostly "afluent" White residents and a smaller percentage of Hispanic (and a much smaller percentage of Black) residents than in the other Bottom 20 beats. Unsurprisingly, these beats can be categorized as more "wealthy" than the average Chicago beat and signficiantly more welathy than the beats with the most crime. These beats are categorically the "wealthiest" and have the lowest percentage of residents on food stamps we have seen thus far
* These residents also had a higher level of educational attainment than the city-wide average. Thus, we see the residents of these beats are not only predominantly White and "afluent", they also have much higher levels of eduational attainment on average
* Just based off these observations, we would assume areas with the greater concentration of White populations and with higher standards of living (even more white-dense and afluent than the other bottom beats) will have the lowest arrest rate. Areas with well-educated, extremly affluent, and predominantly white residents will be policed have very few arrests compared to the number of reported crimes in that area, both of which will be some of the lowest in Chicago

**Now, the last 20 beats in police-caused injuries:**

In [124]:
lastFile.readline()
l20_injuries = lastFile.readline().split(' ')
l20_injuries = l20_injuries[:-1]
for i in range (0,20):
    l20_injuries[i] = pad0(l20_injuries[i])
print(l20_injuries)

['1811', '0231', '1235', '1813', '1215', '1225', '1221', '1934', '1935', '1654', '1915', '2131', '0215', '0121', '1214', '1653', '1621', '4100', '2133', '0235']


In [125]:
df_lastInjuries = df_beat[df_beat['beat'].isin(l20_injuries)]
df_lastInjuries.set_index('beat', inplace = True)
df_lastInjuries = df_lastInjuries.reindex(l20_injuries)

In [126]:
df_lastInjuries.reset_index(inplace = True)
df_lastInjuries

Unnamed: 0,beat,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,med_income,...,rank_no_high_school,rank_total_se,rank_total_se_0-4,rank_total_se_5-9,rank_total_se_10-14,rank_total_se_15-17,rank_total_se_18-19,rank_total_se_20-24,rank_total_se_25-34,rank_total_se_35+
0,1811,9028.473706,0.573418,7358.517997,595.748278,439.692705,385.375682,237.197285,11.941742,114075.905365,...,251.0,5.0,13.0,45.0,230.0,5.0,7.0,12.0,239.0,271.0
1,231,2301.134739,0.251106,9.453774,12.231179,2268.375219,4.777473,4.876113,1.420975,25186.540347,...,163.0,33.0,255.0,103.0,170.0,197.0,200.0,55.0,160.0,86.0
2,1235,9280.763918,1.048541,2131.012876,5974.735628,495.753945,534.309816,109.670191,35.281459,44545.372036,...,39.0,166.0,106.0,67.0,111.0,79.0,80.0,54.0,95.0,219.0
3,1813,9218.081587,0.372833,7078.574154,496.488339,729.868903,559.933237,299.448506,53.76843,137020.089595,...,258.0,186.0,14.0,76.0,81.0,24.0,213.0,112.0,231.0,225.0
4,1215,8522.572501,0.57384,4690.609493,2351.748822,1026.392824,235.810105,218.011279,0.0,84770.824603,...,210.0,233.0,121.0,183.0,95.0,63.0,39.0,202.0,164.0,109.0
5,1225,5667.511757,0.49373,1240.751479,854.231388,2912.663544,510.814199,134.02173,15.02943,32263.371373,...,195.0,11.0,73.0,3.0,70.0,248.0,103.0,16.0,3.0,150.0
6,1221,8780.177783,0.876067,4210.619816,2872.756274,1364.102238,208.735405,102.095102,21.86897,69107.171923,...,109.0,214.0,38.0,90.0,18.0,64.0,74.0,117.0,174.0,124.0
7,1934,19672.58499,0.436985,15830.242116,1018.196945,606.431035,1582.88029,613.836666,20.99795,70989.232193,...,262.0,257.0,47.0,86.0,20.0,39.0,3.0,171.0,38.0,131.0
8,1935,16008.772434,0.58572,12548.292157,1042.134255,766.420365,1063.489059,531.317257,57.119311,74101.195831,...,257.0,256.0,25.0,47.0,59.0,14.0,61.0,124.0,135.0,268.0
9,1654,1564.223707,0.755573,721.593449,741.754632,50.504267,38.737576,11.633778,0.0,53202.404598,...,113.0,157.0,112.0,178.0,120.0,83.0,246.0,201.0,163.0,239.0


Unfortuantely, some of these beats are missing data...we will drop them from our analysis

In [127]:
df_lastInjuries = df_lastInjuries.loc[(df_lastInjuries.beat != '2133') & (df_lastInjuries.beat != '4100') & (df_lastInjuries.beat != '1653') & (df_lastInjuries.beat != '2131') ]
df_lastInjuries

Unnamed: 0,beat,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,med_income,...,rank_no_high_school,rank_total_se,rank_total_se_0-4,rank_total_se_5-9,rank_total_se_10-14,rank_total_se_15-17,rank_total_se_18-19,rank_total_se_20-24,rank_total_se_25-34,rank_total_se_35+
0,1811,9028.473706,0.573418,7358.517997,595.748278,439.692705,385.375682,237.197285,11.941742,114075.905365,...,251.0,5.0,13.0,45.0,230.0,5.0,7.0,12.0,239.0,271.0
1,231,2301.134739,0.251106,9.453774,12.231179,2268.375219,4.777473,4.876113,1.420975,25186.540347,...,163.0,33.0,255.0,103.0,170.0,197.0,200.0,55.0,160.0,86.0
2,1235,9280.763918,1.048541,2131.012876,5974.735628,495.753945,534.309816,109.670191,35.281459,44545.372036,...,39.0,166.0,106.0,67.0,111.0,79.0,80.0,54.0,95.0,219.0
3,1813,9218.081587,0.372833,7078.574154,496.488339,729.868903,559.933237,299.448506,53.76843,137020.089595,...,258.0,186.0,14.0,76.0,81.0,24.0,213.0,112.0,231.0,225.0
4,1215,8522.572501,0.57384,4690.609493,2351.748822,1026.392824,235.810105,218.011279,0.0,84770.824603,...,210.0,233.0,121.0,183.0,95.0,63.0,39.0,202.0,164.0,109.0
5,1225,5667.511757,0.49373,1240.751479,854.231388,2912.663544,510.814199,134.02173,15.02943,32263.371373,...,195.0,11.0,73.0,3.0,70.0,248.0,103.0,16.0,3.0,150.0
6,1221,8780.177783,0.876067,4210.619816,2872.756274,1364.102238,208.735405,102.095102,21.86897,69107.171923,...,109.0,214.0,38.0,90.0,18.0,64.0,74.0,117.0,174.0,124.0
7,1934,19672.58499,0.436985,15830.242116,1018.196945,606.431035,1582.88029,613.836666,20.99795,70989.232193,...,262.0,257.0,47.0,86.0,20.0,39.0,3.0,171.0,38.0,131.0
8,1935,16008.772434,0.58572,12548.292157,1042.134255,766.420365,1063.489059,531.317257,57.119311,74101.195831,...,257.0,256.0,25.0,47.0,59.0,14.0,61.0,124.0,135.0,268.0
9,1654,1564.223707,0.755573,721.593449,741.754632,50.504267,38.737576,11.633778,0.0,53202.404598,...,113.0,157.0,112.0,178.0,120.0,83.0,246.0,201.0,163.0,239.0


Let's do some aggregate measurements on this data:

First, we will look at race percentages for these beats all together:

In [128]:
totPop = 0
wPop = 0
bPop = 0
hPop = 0
mPop = 0
aPop = 0
oPop = 0
for ind in df_lastInjuries.index:
    totPop += df_lastInjuries['population'][ind]
    wPop += df_lastInjuries['num_white'][ind]
    bPop += df_lastInjuries['num_black'][ind]
    hPop += df_lastInjuries['num_hispanic'][ind]
    mPop += df_lastInjuries['num_mixed'][ind]
    aPop += df_lastInjuries['num_asian'][ind]
    oPop += df_lastInjuries['num_other'][ind]
print('LAST 20 INJURIES BEATS RACE BREAKDOWN:\n')
print('Percentage of Population that is White: '+str(wPop/totPop*100)+'%')
print('Percentage of Population that is Black: '+str(bPop/totPop*100)+'%')
print('Percentage of Population that is Hispanic: '+str(hPop/totPop*100)+'%')
print('Percentage of Population that is Asian: '+str(aPop/totPop*100)+'%')
print('Percentage of Population that is Mixed: '+str(mPop/totPop*100)+'%')
print('Percentage of Population that is Other: '+str(oPop/totPop*100)+'%')

LAST 20 INJURIES BEATS RACE BREAKDOWN:

Percentage of Population that is White: 61.89162104027499%
Percentage of Population that is Black: 12.88142824393049%
Percentage of Population that is Hispanic: 14.414105469617619%
Percentage of Population that is Asian: 8.041772095985854%
Percentage of Population that is Mixed: 2.534253934169789%
Percentage of Population that is Other: 0.23681916911642126%


Now median income and food stamps:

In [129]:
tot_MI = 0
tot_FS = 0
for ind in df_lastInjuries.index:
    tot_MI += df_lastInjuries['med_income'][ind]
    tot_FS += df_lastInjuries['pop_food_stamps'][ind]
print('LAST 20 INJURY BEATS INCOME AND FOOD STAMPS BREAKDOWN:\n')
print('Average Median Income: '+str(tot_MI/20.0))
print('Percentage of Population on Food Stamps: '+str(tot_FS/totPop*100)+'%')

LAST 20 INJURY BEATS INCOME AND FOOD STAMPS BREAKDOWN:

Average Median Income: 59099.14734397987
Percentage of Population on Food Stamps: 3.829874508726393%


Now, the educational attainment percentages:

In [130]:
bachelorsPop = df_lastInjuries['bachelors'].sum()
HSPop = df_lastInjuries['high_school'].sum()
no_HSPop = df_lastInjuries['no_high_school'].sum()
print('LAST 20 INJURY BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:\n')
print('Percentage of Population with at least a Bachelor\'s Degree: '+str(bachelorsPop/totPop*100)+'%')
print('Percentage of Population with at most a High School Diploma '+str(HSPop/totPop*100)+'%')
print('Percentage of Population without a High School Diploma '+str(no_HSPop/totPop*100)+'%')

LAST 20 INJURY BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:

Percentage of Population with at least a Bachelor's Degree: 51.77050969998077%
Percentage of Population with at most a High School Diploma 15.614822449332214%
Percentage of Population without a High School Diploma 4.74793807746356%


School enrollment percentages:

In [131]:
total_se = df_lastInjuries['total_se'].sum()
total_se_0to4 = df_lastInjuries['se_0-4'].sum()
total_se_5to9 = df_lastInjuries['se_5-9'].sum()
total_se_10to14 = df_lastInjuries['se_10-14'].sum()
total_se_15to17 = df_lastInjuries['se_15-17'].sum()
total_se_18to19 = df_lastInjuries['se_18-19'].sum()
total_se_20to24 = df_lastInjuries['se_20-24'].sum()
total_se_25to34 = df_lastInjuries['se_25-34'].sum()
total_se_35plus = df_lastInjuries['se_35+'].sum()
total_0to4 = df_lastInjuries['0-4'].sum()
total_5to9 = df_lastInjuries['5-9'].sum()
total_10to14 = df_lastInjuries['10-14'].sum()
total_15to17 = df_lastInjuries['15-17'].sum()
total_18to19 = df_lastInjuries['18-19'].sum()
total_20to24 = df_lastInjuries['20'].sum() + df_lastInjuries['21'].sum() + df_lastInjuries['22-24'].sum()
total_25to34 = df_lastInjuries['25-29'].sum() + df_lastInjuries['30-34'].sum() 
total_youngPop = totPop - (df_lastInjuries['25-29'].sum() + df_lastInjuries['30-34'].sum()) - (df_lastInjuries['20'].sum() + df_lastInjuries['21'].sum() + df_lastInjuries['22-24'].sum()) - df_lastInjuries['18-19'].sum() - df_lastInjuries['15-17'].sum() -df_lastInjuries['10-14'].sum() -df_lastInjuries['5-9'].sum()-df_lastInjuries['0-4'].sum()

print('LAST 20 INJURY BEATS SCHOOL ENROLLMENT BREAKDOWN:\n')
print('Percentage of Residents that are Enrolled Students (All): '+str(total_se/totPop*100)+'%')
print('Percentage of Residents that are Enrolled Students (0-4): '+str(total_se_0to4/total_0to4*100)+'%')
print('Percentage of Residents that are Enrolled Students (5-9): '+str(total_se_5to9/total_5to9*100)+'%')
print('Percentage of Residents that are Enrolled Students (10-14): '+str(total_se_10to14/total_10to14*100)+'%')
print('Percentage of Residents that are Enrolled Students (15-17): '+str(total_se_15to17/total_15to17*100)+'%')
print('Percentage of Residents that are Enrolled Students (18-19): '+str(total_se_18to19/total_18to19*100)+'%')
print('Percentage of Residents that are Enrolled Students (20-24): '+str(total_se_20to24/total_20to24*100)+'%')
print('Percentage of Residents that are Enrolled Students (25-34): '+str(total_se_25to34/total_25to34*100)+'%')
print('Percentage of Residents that are Enrolled Students (35+): '+str(total_se_35plus/total_youngPop*100)+'%')

LAST 20 INJURY BEATS SCHOOL ENROLLMENT BREAKDOWN:

Percentage of Residents that are Enrolled Students (All): 26.54670398864914%
Percentage of Residents that are Enrolled Students (0-4): 68.04237256350537%
Percentage of Residents that are Enrolled Students (5-9): 98.04318989380799%
Percentage of Residents that are Enrolled Students (10-14): 98.32844982544516%
Percentage of Residents that are Enrolled Students (15-17): 96.88915886508825%
Percentage of Residents that are Enrolled Students (18-19): 94.56047556951005%
Percentage of Residents that are Enrolled Students (20-24): 48.78836356013508%
Percentage of Residents that are Enrolled Students (25-34): 16.136666339070345%
Percentage of Residents that are Enrolled Students (35+): 3.134505314177717%


Lastly, we will look at the age breakdowns:

In [132]:
minors = df_lastInjuries['<=21'].sum()
twenties = df_lastInjuries['22-29'].sum()
thirties = df_lastInjuries['30-39'].sum()
forties = df_lastInjuries['40-49'].sum()
fifties = df_lastInjuries['50-59'].sum()
sixties = df_lastInjuries['60-64'].sum()
seniors = df_lastInjuries['65+'].sum()
print('LAST 20 INJURIES BEATS AGE BREAKDOWN:\n')
print('Percentage of Residents that are Minors (<=21): '+str(minors/totPop*100)+'%')
print('Percentage of Residents that are in their Twenties: '+str(twenties/totPop*100)+'%')
print('Percentage of Residents that are in their Thirties: '+str(thirties/totPop*100)+'%')
print('Percentage of Residents that are in their Forties: '+str(forties/totPop*100)+'%')
print('Percentage of Residents that are in their Fifties: '+str(fifties/totPop*100)+'%')
print('Percentage of Residents that are in between 60-64: '+str(sixties/totPop*100)+'%')
print('Percentage of Residents that are Seniors (65+): '+str(seniors/totPop*100)+'%')

LAST 20 INJURIES BEATS AGE BREAKDOWN:

Percentage of Residents that are Minors (<=21): 20.438264013128503%
Percentage of Residents that are in their Twenties: 24.665979679554216%
Percentage of Residents that are in their Thirties: 20.02262128823457%
Percentage of Residents that are in their Forties: 11.248172811279217%
Percentage of Residents that are in their Fifties: 9.368278613013507%
Percentage of Residents that are in between 60-64: 3.984207663972296%
Percentage of Residents that are Seniors (65+): 10.272355017540843%


Now, we will compute the average and median ranking for the race, income, food stamp, and educational attainment categories for this data:

In [133]:
df_lastInjuries_ranks = df_lastInjuries[['beat','rank_%w', 'rank_%b', 'rank_%h', 'rank_%a', 'rank_%m',
       'rank_%o', 'rank_income', 'rank_fs', 'rank_bachelors',
       'rank_high_school', 'rank_no_high_school', 'rank_total_se',
       'rank_total_se_0-4', 'rank_total_se_5-9', 'rank_total_se_10-14',
       'rank_total_se_15-17', 'rank_total_se_18-19', 'rank_total_se_20-24',
       'rank_total_se_25-34', 'rank_total_se_35+']]

In [134]:
beatList = list(df_lastInjuries_ranks.beat)
df_lastInjuries_ranks['rank_beat'] = df_lastInjuries_ranks.beat.apply(lambda x: beatList.index(x)+1)
df_lastInjuries_ranks.set_index('beat',inplace = True)
print('LAST 20 INJURY BEATS MEDIAN RANKINGS: ')
print(df_lastInjuries_ranks.median(axis = 0))
print()
print('LAST 20 INJURY BEATS AVERAGE (MEAN) RANKINGS: ')
print(df_lastInjuries_ranks.mean(axis = 0))

LAST 20 INJURY BEATS MEDIAN RANKINGS: 
rank_%w                 58.0
rank_%b                184.0
rank_%h                156.5
rank_%a                 76.5
rank_%m                 60.0
rank_%o                112.5
rank_income             54.5
rank_fs                221.5
rank_bachelors          56.5
rank_high_school       219.5
rank_no_high_school    228.5
rank_total_se          161.5
rank_total_se_0-4       68.5
rank_total_se_5-9       88.0
rank_total_se_10-14    103.0
rank_total_se_15-17     71.5
rank_total_se_18-19     57.0
rank_total_se_20-24    114.5
rank_total_se_25-34    149.5
rank_total_se_35+      132.0
rank_beat                8.5
dtype: float64

LAST 20 INJURY BEATS AVERAGE (MEAN) RANKINGS: 
rank_%w                 70.1250
rank_%b                168.2500
rank_%h                138.4375
rank_%a                 84.5625
rank_%m                 85.5000
rank_%o                140.0000
rank_income             77.2500
rank_fs                194.0000
rank_bachelors          61.8750
r

**Key Takeaways:**
* When we look at how these beats rank in the context of all the police beats in Chicago, the mean and median rankings are relatively low for percentage of Black citizens and percentage of residents on food stamps 
* However, we see these beats rank slightly higher for percentage of white citizens to the population in these beats and the median income of the residents in these beats in the context of all the police beats in Chicago
* White citizens make up roughly 62% of these beats residents. Hispanic and Black residents make up roughly 14% and 12% of these beats population, respectively. This is the highest percentages of Black residents we have seen out of the Bottom categories.
* We also see these beats have an avergae median income higher than the average median income of Chicago(close to $10000 more than the average median income of all Chicago residents) and about 8.9% of the residents in these beats are on food stamps (a little less than half the percentage of total Chicago residents on food stamps). This is well-reflected in the high average and median ranking for these beats in the median income category and the beats low ranking in the 'percentage of population on food stamps' category. The average median income for these beats is higher than the average median income in the bottom 20 Crime and Arrests beats and a smaller percentage of the population is on food stamps comapred to the bottom 20 Crime and Arrests beats
* Minors, residents in their twenties, and residents in their thirties make up 20%, 25%, and 20% of these residents population, respectively. Residents in their forties, fifties, and senior citizens each made up closer to 10% of these resident's population, a figure very close to the city-wide average for these age groups. 
*These beats still had a high ranking for residents earning at least a Bachelor's Degree (about 71% of residents) and ranked very low for residents who haven't earned a Bachelor's Degree (about 22% of residents had a High School Diploma as their highest degree, 6% didn't have a High School Diploma). However, these beats had lower levels of educational attainment than the bottom 20 arrests beats

**Some Conclusions:**
* These beats are made up of mostly "afluent" White residents, but a greater percentage of Black residents than the other Bottom 20 beats. These beats are still quite wealthy when comapred to the city-wide averages and White residents are still the majoirty of residents, but the higher percentages of Hispanic and Black residents would suggest that police-caused injuries aren't necessarily less common in areas of greater White populations, rather is contingent on other factors like standard of living
* These residents also had a higher level of educational attainment than the city-wide average. Thus, we see the residents of these beats are not only predominantly White and "afluent", they also have much higher levels of eduational attainment on average
* Just based off these observations, we would assume areas of least police brutality do occur in predominantly White, affluent areas with well-educated residents, however, Hispanic and Black residents also make up a decent portion of residents in these least-brutal beats. Thus, we make the preliminary assumption that areas of less police burtality aren't necessarily categorized by race, but on other, socioeconomic factors, like income and standard of living

Lastly, the last 20 beats in complaints:

In [135]:
lastFile.readline()
l20_complaints  = lastFile.readline().split(' ')
l20_complaints  = l20_complaints[:-1]
for i in range (0,20):
    l20_complaints[i] = pad0(l20_complaints[i])
print(l20_complaints)
lastFile.close()

['0231', '2412', '1914', '2032', '1614', '1934', '1221', '1915', '1925', '0215', '1214', '1234', '0235', '1235', '1653', '1935', '1215', '1654', '1652', '1655']


In [136]:
df_lastComplaints = df_beat[df_beat['beat'].isin(l20_complaints)]
df_lastComplaints.set_index('beat', inplace = True)
df_lastComplaints = df_lastComplaints.reindex(l20_complaints)
df_lastComplaints.reset_index(inplace = True)
df_lastComplaints

Unnamed: 0,beat,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,med_income,...,rank_no_high_school,rank_total_se,rank_total_se_0-4,rank_total_se_5-9,rank_total_se_10-14,rank_total_se_15-17,rank_total_se_18-19,rank_total_se_20-24,rank_total_se_25-34,rank_total_se_35+
0,231,2301.134739,0.251106,9.453774,12.231179,2268.375219,4.777473,4.876113,1.420975,25186.540347,...,163.0,33.0,255.0,103.0,170.0,197.0,200.0,55.0,160.0,86.0
1,2412,17668.419833,0.797306,6943.269268,3365.387819,1784.258469,4472.869039,802.75753,299.877694,38954.62002,...,82.0,171.0,232.0,234.0,28.0,101.0,249.0,120.0,148.0,144.0
2,1914,10815.53783,0.45455,4217.962294,1318.64184,3770.963557,1230.030521,213.645738,64.293907,35360.739915,...,121.0,231.0,67.0,250.0,36.0,258.0,76.0,83.0,84.0,21.0
3,2032,9426.087159,0.418073,6487.455028,1365.986758,310.382339,903.715659,337.635821,20.911513,70510.322719,...,250.0,245.0,100.0,68.0,102.0,28.0,26.0,106.0,162.0,148.0
4,1614,15422.004847,2.782935,11574.064301,1413.802529,228.272084,1511.61998,618.638398,75.607582,55987.455487,...,192.0,215.0,128.0,156.0,14.0,142.0,57.0,65.0,109.0,88.0
5,1934,19672.58499,0.436985,15830.242116,1018.196945,606.431035,1582.88029,613.836666,20.99795,70989.232193,...,262.0,257.0,47.0,86.0,20.0,39.0,3.0,171.0,38.0,131.0
6,1221,8780.177783,0.876067,4210.619816,2872.756274,1364.102238,208.735405,102.095102,21.86897,69107.171923,...,109.0,214.0,38.0,90.0,18.0,64.0,74.0,117.0,174.0,124.0
7,1915,11336.462507,0.678057,7760.272984,1505.44695,1071.215711,581.969341,368.229232,49.32829,53025.445634,...,224.0,266.0,270.0,16.0,270.0,271.0,5.0,192.0,48.0,45.0
8,1925,24320.457442,0.429108,17974.530768,2397.553386,1015.14037,1995.019997,791.072452,147.140495,67345.909851,...,255.0,259.0,54.0,42.0,53.0,205.0,116.0,114.0,76.0,139.0
9,215,3738.721383,0.342409,195.333519,77.837961,3376.979791,30.924931,49.692694,7.952487,33216.708023,...,137.0,78.0,167.0,140.0,253.0,235.0,53.0,136.0,166.0,24.0


Unfortunately, we have 3 beats in this set with missing data which we will have to drop:

In [137]:
df_lastComplaints = df_lastComplaints.loc[(df_lastComplaints['beat'] != '1653') & (df_lastComplaints['beat'] != '1652') & (df_lastComplaints['beat'] != '1655')]
df_lastComplaints

Unnamed: 0,beat,population,square_mileage,num_white,num_hispanic,num_black,num_asian,num_mixed,num_other,med_income,...,rank_no_high_school,rank_total_se,rank_total_se_0-4,rank_total_se_5-9,rank_total_se_10-14,rank_total_se_15-17,rank_total_se_18-19,rank_total_se_20-24,rank_total_se_25-34,rank_total_se_35+
0,231,2301.134739,0.251106,9.453774,12.231179,2268.375219,4.777473,4.876113,1.420975,25186.540347,...,163.0,33.0,255.0,103.0,170.0,197.0,200.0,55.0,160.0,86.0
1,2412,17668.419833,0.797306,6943.269268,3365.387819,1784.258469,4472.869039,802.75753,299.877694,38954.62002,...,82.0,171.0,232.0,234.0,28.0,101.0,249.0,120.0,148.0,144.0
2,1914,10815.53783,0.45455,4217.962294,1318.64184,3770.963557,1230.030521,213.645738,64.293907,35360.739915,...,121.0,231.0,67.0,250.0,36.0,258.0,76.0,83.0,84.0,21.0
3,2032,9426.087159,0.418073,6487.455028,1365.986758,310.382339,903.715659,337.635821,20.911513,70510.322719,...,250.0,245.0,100.0,68.0,102.0,28.0,26.0,106.0,162.0,148.0
4,1614,15422.004847,2.782935,11574.064301,1413.802529,228.272084,1511.61998,618.638398,75.607582,55987.455487,...,192.0,215.0,128.0,156.0,14.0,142.0,57.0,65.0,109.0,88.0
5,1934,19672.58499,0.436985,15830.242116,1018.196945,606.431035,1582.88029,613.836666,20.99795,70989.232193,...,262.0,257.0,47.0,86.0,20.0,39.0,3.0,171.0,38.0,131.0
6,1221,8780.177783,0.876067,4210.619816,2872.756274,1364.102238,208.735405,102.095102,21.86897,69107.171923,...,109.0,214.0,38.0,90.0,18.0,64.0,74.0,117.0,174.0,124.0
7,1915,11336.462507,0.678057,7760.272984,1505.44695,1071.215711,581.969341,368.229232,49.32829,53025.445634,...,224.0,266.0,270.0,16.0,270.0,271.0,5.0,192.0,48.0,45.0
8,1925,24320.457442,0.429108,17974.530768,2397.553386,1015.14037,1995.019997,791.072452,147.140495,67345.909851,...,255.0,259.0,54.0,42.0,53.0,205.0,116.0,114.0,76.0,139.0
9,215,3738.721383,0.342409,195.333519,77.837961,3376.979791,30.924931,49.692694,7.952487,33216.708023,...,137.0,78.0,167.0,140.0,253.0,235.0,53.0,136.0,166.0,24.0


Let's do some aggregate measurements on this data:

First, we will look at race percentages for these beats all together:

In [138]:
totPop = 0
wPop = 0
bPop = 0
hPop = 0
mPop = 0
aPop = 0
oPop = 0
for ind in df_lastComplaints.index:
    totPop += df_lastComplaints['population'][ind]
    wPop += df_lastComplaints['num_white'][ind]
    bPop += df_lastComplaints['num_black'][ind]
    hPop += df_lastComplaints['num_hispanic'][ind]
    mPop += df_lastComplaints['num_mixed'][ind]
    aPop += df_lastComplaints['num_asian'][ind]
    oPop += df_lastComplaints['num_other'][ind]
print('LAST 20 COMPLAINTS BEATS RACE BREAKDOWN:\n')
print('Percentage of Population that is White: '+str(wPop/totPop*100)+'%')
print('Percentage of Population that is Black: '+str(bPop/totPop*100)+'%')
print('Percentage of Population that is Hispanic: '+str(hPop/totPop*100)+'%')
print('Percentage of Population that is Asian: '+str(aPop/totPop*100)+'%')
print('Percentage of Population that is Mixed: '+str(mPop/totPop*100)+'%')
print('Percentage of Population that is Other: '+str(oPop/totPop*100)+'%')

LAST 20 COMPLAINTS BEATS RACE BREAKDOWN:

Percentage of Population that is White: 56.9503715619748%
Percentage of Population that is Black: 11.135582952327422%
Percentage of Population that is Hispanic: 19.819042252306616%
Percentage of Population that is Asian: 8.873448321712461%
Percentage of Population that is Mixed: 2.7865690677053365%
Percentage of Population that is Other: 0.4349858490217998%


Now median income and food stamps:

In [139]:
tot_MI = 0
tot_FS = 0
for ind in df_lastComplaints.index:
    tot_MI += df_lastComplaints['med_income'][ind]
    tot_FS += df_lastComplaints['pop_food_stamps'][ind]
print('LAST 20 COMPLAINTS BEATS INCOME AND FOOD STAMPS BREAKDOWN:\n')
print('Average Median Income: '+str(tot_MI/20.0))
print('Percentage of Population on Food Stamps: '+str(tot_FS/totPop*100)+'%')

LAST 20 COMPLAINTS BEATS INCOME AND FOOD STAMPS BREAKDOWN:

Average Median Income: 49872.52597011523
Percentage of Population on Food Stamps: 5.3509156185904%


Now, the educational attainment percentages:

In [140]:
bachelorsPop = df_lastComplaints['bachelors'].sum()
HSPop = df_lastComplaints['high_school'].sum()
no_HSPop = df_lastComplaints['no_high_school'].sum()
print('LAST 20 COMPLAINTS BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:\n')
print('Percentage of Population with at least a Bachelor\'s Degree: '+str(bachelorsPop/totPop*100)+'%')
print('Percentage of Population with at most a High School Diploma '+str(HSPop/totPop*100)+'%')
print('Percentage of Population without a High School Diploma '+str(no_HSPop/totPop*100)+'%')

LAST 20 COMPLAINTS BEATS EDUCATIONAL ATTAINMENT BREAKDOWN:

Percentage of Population with at least a Bachelor's Degree: 47.305469670673375%
Percentage of Population with at most a High School Diploma 19.63178997447768%
Percentage of Population without a High School Diploma 7.331306278935937%


School enrollment percentages:

In [141]:
total_se = df_lastComplaints['total_se'].sum()
total_se_0to4 = df_lastComplaints['se_0-4'].sum()
total_se_5to9 = df_lastComplaints['se_5-9'].sum()
total_se_10to14 = df_lastComplaints['se_10-14'].sum()
total_se_15to17 = df_lastComplaints['se_15-17'].sum()
total_se_18to19 = df_lastComplaints['se_18-19'].sum()
total_se_20to24 = df_lastComplaints['se_20-24'].sum()
total_se_25to34 = df_lastComplaints['se_25-34'].sum()
total_se_35plus = df_lastComplaints['se_35+'].sum()
total_0to4 = df_lastComplaints['0-4'].sum()
total_5to9 = df_lastComplaints['5-9'].sum()
total_10to14 = df_lastComplaints['10-14'].sum()
total_15to17 = df_lastComplaints['15-17'].sum()
total_18to19 = df_lastComplaints['18-19'].sum()
total_20to24 = df_lastComplaints['20'].sum() + df_lastComplaints['21'].sum() + df_lastComplaints['22-24'].sum()
total_25to34 = df_lastComplaints['25-29'].sum() + df_lastComplaints['30-34'].sum() 
total_youngPop = totPop - (df_lastComplaints['25-29'].sum() + df_lastComplaints['30-34'].sum()) - (df_lastComplaints['20'].sum() + df_lastComplaints['21'].sum() + df_lastComplaints['22-24'].sum()) - df_lastComplaints['18-19'].sum() - df_lastComplaints['15-17'].sum() -df_lastComplaints['10-14'].sum() -df_lastComplaints['5-9'].sum()-df_lastComplaints['0-4'].sum()

print('LAST 20 COMPLAINTS BEATS SCHOOL ENROLLMENT BREAKDOWN:\n')
print('Percentage of Residents that are Enrolled Students (All): '+str(total_se/totPop*100)+'%')
print('Percentage of Residents that are Enrolled Students (0-4): '+str(total_se_0to4/total_0to4*100)+'%')
print('Percentage of Residents that are Enrolled Students (5-9): '+str(total_se_5to9/total_5to9*100)+'%')
print('Percentage of Residents that are Enrolled Students (10-14): '+str(total_se_10to14/total_10to14*100)+'%')
print('Percentage of Residents that are Enrolled Students (15-17): '+str(total_se_15to17/total_15to17*100)+'%')
print('Percentage of Residents that are Enrolled Students (18-19): '+str(total_se_18to19/total_18to19*100)+'%')
print('Percentage of Residents that are Enrolled Students (20-24): '+str(total_se_20to24/total_20to24*100)+'%')
print('Percentage of Residents that are Enrolled Students (25-34): '+str(total_se_25to34/total_25to34*100)+'%')
print('Percentage of Residents that are Enrolled Students (35+): '+str(total_se_35plus/total_youngPop*100)+'%')

LAST 20 COMPLAINTS BEATS SCHOOL ENROLLMENT BREAKDOWN:

Percentage of Residents that are Enrolled Students (All): 24.110475878533578%
Percentage of Residents that are Enrolled Students (0-4): 61.8609119000019%
Percentage of Residents that are Enrolled Students (5-9): 96.46561578325942%
Percentage of Residents that are Enrolled Students (10-14): 97.83911497893463%
Percentage of Residents that are Enrolled Students (15-17): 96.15945790698176%
Percentage of Residents that are Enrolled Students (18-19): 88.29194381085065%
Percentage of Residents that are Enrolled Students (20-24): 41.91488686125769%
Percentage of Residents that are Enrolled Students (25-34): 15.579622300570453%
Percentage of Residents that are Enrolled Students (35+): 3.3574020482595355%


Lastly, we will look at the age breakdowns:

In [142]:
minors = df_lastComplaints['<=21'].sum()
twenties = df_lastComplaints['22-29'].sum()
thirties = df_lastComplaints['30-39'].sum()
forties = df_lastComplaints['40-49'].sum()
fifties = df_lastComplaints['50-59'].sum()
sixties = df_lastComplaints['60-64'].sum()
seniors = df_lastComplaints['65+'].sum()
print('LAST 20 COMPLAINTS BEATS AGE BREAKDOWN:\n')
print('Percentage of Residents that are Minors (<=21): '+str(minors/totPop*100)+'%')
print('Percentage of Residents that are in their Twenties: '+str(twenties/totPop*100)+'%')
print('Percentage of Residents that are in their Thirties: '+str(thirties/totPop*100)+'%')
print('Percentage of Residents that are in their Forties: '+str(forties/totPop*100)+'%')
print('Percentage of Residents that are in their Fifties: '+str(fifties/totPop*100)+'%')
print('Percentage of Residents that are in between 60-64: '+str(sixties/totPop*100)+'%')
print('Percentage of Residents that are Seniors (65+): '+str(seniors/totPop*100)+'%')

LAST 20 COMPLAINTS BEATS AGE BREAKDOWN:

Percentage of Residents that are Minors (<=21): 19.134654707931855%
Percentage of Residents that are in their Twenties: 23.401206218492305%
Percentage of Residents that are in their Thirties: 20.806350338957643%
Percentage of Residents that are in their Forties: 11.72550729761999%
Percentage of Residents that are in their Fifties: 9.991313560782586%
Percentage of Residents that are in between 60-64: 4.267102168826189%
Percentage of Residents that are Seniors (65+): 10.673824267133686%


Now, we will compute the average and median ranking for the race, income, food stamp, and educational attainment categories for this data:

In [143]:
df_lastComplaints_ranks = df_lastComplaints[['beat','rank_%w', 'rank_%b', 'rank_%h', 'rank_%a', 'rank_%m',
       'rank_%o', 'rank_income', 'rank_fs', 'rank_bachelors',
       'rank_high_school', 'rank_no_high_school', 'rank_total_se',
       'rank_total_se_0-4', 'rank_total_se_5-9', 'rank_total_se_10-14',
       'rank_total_se_15-17', 'rank_total_se_18-19', 'rank_total_se_20-24',
       'rank_total_se_25-34', 'rank_total_se_35+']]

In [144]:
beatList = list(df_lastComplaints_ranks.beat)
df_lastComplaints_ranks['rank_beat'] = df_lastComplaints_ranks.beat.apply(lambda x: beatList.index(x)+1)
df_lastComplaints_ranks.set_index('beat',inplace = True)
print('LAST 20 COMPLAINTS BEATS MEDIAN RANKINGS: ')
print(df_lastComplaints_ranks.median(axis = 0))
print()
print('LAST 20 COMPLAINTS BEATS AVERAGE (MEAN) RANKINGS: ')
print(df_lastComplaints_ranks.mean(axis = 0))

LAST 20 COMPLAINTS BEATS MEDIAN RANKINGS: 
rank_%w                 60.0
rank_%b                186.0
rank_%h                122.0
rank_%a                 67.0
rank_%m                 55.0
rank_%o                110.0
rank_income             87.0
rank_fs                205.0
rank_bachelors          64.0
rank_high_school       211.0
rank_no_high_school    192.0
rank_total_se          215.0
rank_total_se_0-4      106.0
rank_total_se_5-9       99.0
rank_total_se_10-14     95.0
rank_total_se_15-17     88.0
rank_total_se_18-19     61.0
rank_total_se_20-24    117.0
rank_total_se_25-34    135.0
rank_total_se_35+      131.0
rank_beat                9.0
dtype: float64

LAST 20 COMPLAINTS BEATS AVERAGE (MEAN) RANKINGS: 
rank_%w                 76.058824
rank_%b                172.823529
rank_%h                124.352941
rank_%a                 81.823529
rank_%m                 83.823529
rank_%o                126.647059
rank_income             99.058824
rank_fs                175.176471
rank_bach

**Key Takeaways:**
* When we look at how these beats rank in the context of all the police beats in Chicago, the mean and median rankings are still pretty high for percentage of White residents, median income, and percentage of residents with at least a Bachelor's Degree. Rankings are slightly below 66th percentile for percentage of residents without a Bachelor's Degree and percentage of Black residents
* These beats are still more than 50% White, followed by 20% Hispanic and 11% White, figures which match the percentages of the other Bottom beats
* We see the beats in question have an avergae median income about $300 LOWER than the city-wide average and about 13% of the residents in these beats are on food stamps, about 8% lower than the city-wide average. It is strange to see the median income be lower than the city-wide average, but the percentage of residents on food stamps also lower than the city-wide average 
* These beats had about 62% of residents earning at least a Bachelor's Degree with the remaining 38% never completing college. Percentages which are still far greater than the city-wide average
* These beats had close to 20% of its population in each of the minors, twenties, and thirties categories, a trend we continue to see in these Bottom beats. The other age groups were all close to 10%, a continuing trend again

**Some Conclusions:**
* Compared to all the other Bottom Beats, the residents of these aren't nearly as "wealthy", but these beats are still comprised of a large majority of White residents and many of these residents continue to be well-educated
* Thus, we are led to assume these beats are mostly middle-class residents with a college degree. Although they may not be earning the city-wide average median income as a group, the residents of these beats maintain a self-serviceable standard of living
* The trends of the Complaints beats suggests that complaints don't follow the trends of the other Top and Bottom beats. This is subjective to the nature of complaints, a secondary account of police action. For these bottom beats, we can specualte that the wealthier, more affluent beats that were so prevlaent in the other Bottom categories have such light police presence that complaints rarely, if ever, occur. Rather, these "middle-class" beats are more likely to be areas of light crime/police patrol and are thus slightly more likely to even garner any police complaints