<h1>Snow Violations</h1>
11 January 2024

This notebook tries to tease out whether there's a pro-311 bias on the north side, that might be reflected in the snow clearance stats
<br>
My analysis steps:
<ol>
<li><a href="#docs">Review API Documentation</a>
<li><a href="#import">Import Libraries</a>
<li><a href="#retrieve_data">Get Data</a>
<li><a href="#review">Review Data</a>
</ol>

<h3>Preliminary Findings</h3>
<ul>
    <li>the origin of most complaints is phone call (30,697), internet (17,551), mobile (16,278), and alderman's office (3,353)
<li>most complaints are flagged with status as completed
</ul>

<a name = "docs"></a>
    <h1>1. Review Documentation</h1>

<h3>311 data</h3>
<ul>
    <li>Chicago Open Data Portal: <a href="https://data.cityofchicago.org/Service-Requests/311-Service-Requests/v6vf-nfxy">https://data.cityofchicago.org/Service-Requests/311-Service-Requests/v6vf-nfxy</a>
    <li>Chicago API: <a href="https://data.cityofchicago.org/resource/v6vf-nfxy.json">https://data.cityofchicago.org/resource/v6vf-nfxy.json</a>
        <li>API Documentation: https://dev.socrata.com/foundry/data.cityofchicago.org/v6vf-nfxy
            <li><b>Developer Portal (Socrata):</b> <a href="https://dev.socrata.com/">https://dev.socrata.com/</a> (general reference for Socrata)<br>
                <ul>
                    <li>SoQL Like: https://dev.socrata.com/docs/functions/like.html
                </ul>
</ul>

<a name = "import"></a>
<h1>2. Import Libraries</h1>

In [1]:
import pandas as pd
import requests
#import datetime as dt #would only need this if I manipulated dates post-API data retrieval

<a name = "retrieve_data"></a>
    <h1>3. Get Data</h1>

In [2]:
base_url = "https://data.cityofchicago.org/resource/v6vf-nfxy.json"
select = "SR_NUMBER, SR_TYPE, SR_SHORT_CODE, CREATED_DATE, STREET_ADDRESS, COMMUNITY_AREA, OWNER_DEPARTMENT, STATUS, ORIGIN"
# where = "SR_TYPE like '%25Snow%25'"
limit = 99999

url = f"{base_url}?$SELECT={select}&$LIMIT={limit}"
#url = f"{base_url}?$SELECT={select}&$WHERE={where}&$LIMIT={limit}"
print (url)

https://data.cityofchicago.org/resource/v6vf-nfxy.json?$SELECT=SR_NUMBER, SR_TYPE, SR_SHORT_CODE, CREATED_DATE, STREET_ADDRESS, COMMUNITY_AREA, OWNER_DEPARTMENT, STATUS, ORIGIN&$LIMIT=99999


In [3]:
response = requests.get(url)
data = response.json()
print (response)

<Response [200]>


In [4]:
df=pd.DataFrame(data)
df.head()

Unnamed: 0,SR_NUMBER,SR_TYPE,SR_SHORT_CODE,CREATED_DATE,STREET_ADDRESS,COMMUNITY_AREA,OWNER_DEPARTMENT,STATUS,ORIGIN
0,SR23-01550451,Rodent Baiting/Rat Complaint,SGA,2023-09-11T10:25:34.000,3723 N HARDING AVE,16,Streets and Sanitation,Completed,Phone Call
1,SR19-01377790,Sidewalk Inspection Request,PBS,2019-04-24T09:56:50.000,243 E GRAND AVE,8,CDOT - Department of Transportation,Completed,Mobile Device
2,SR19-01328240,Sidewalk Inspection Request,PBS,2019-04-14T18:44:52.000,4046 N FRANCISCO AVE,16,CDOT - Department of Transportation,Completed,Internet
3,SR19-01328252,Sidewalk Inspection Request,PBS,2019-04-14T18:50:07.000,4046 N FRANCISCO AVE,16,CDOT - Department of Transportation,Completed,Internet
4,SR19-01459973,Business Complaints,RBL,2019-05-09T10:26:15.000,4140 N KILPATRICK AVE,15,BACP - Business Affairs and Consumer Protection,Completed,Phone Call


In [5]:
# total number of records containing "snow"
len(df)

99999

### community 311 call count

In [16]:
df_community = df.groupby('COMMUNITY_AREA').size().reset_index(name='complaints')
df_community['COMMUNITY_AREA']=df_community['COMMUNITY_AREA'].astype(int)
df_community

Unnamed: 0,COMMUNITY_AREA,complaints
0,1,520
1,10,701
2,11,454
3,12,407
4,13,321
...,...,...
72,75,538
73,76,15331
74,77,627
75,8,779


### read community population

In [17]:
# retrieved on 1/11/24, but 2020 Census Population figures should be static

df_population = pd.read_csv("../data/population_cmap_2022.csv")

In [18]:
# simplify dataframe to get only essentials
df_population = df_population[['GEOID','GEOG','2020_POP']]
df_population = df_population.rename(columns={'GEOG':'COMMUNITY_NAME'})
df_population['COMMUNITY_CAPS']=df_population['COMMUNITY_NAME'].str.upper()
df_population.head()

Unnamed: 0,GEOID,COMMUNITY_NAME,2020_POP,COMMUNITY_CAPS
0,14,Albany Park,48396,ALBANY PARK
1,57,Archer Heights,14196,ARCHER HEIGHTS
2,34,Armour Square,13890,ARMOUR SQUARE
3,70,Ashburn,41098,ASHBURN
4,71,Auburn Gresham,44878,AUBURN GRESHAM


### merge

In [20]:
df_summary = pd.merge(left=df_community,right=df_population,left_on='COMMUNITY_AREA',right_on='GEOID')
df_summary.head()

Unnamed: 0,COMMUNITY_AREA,complaints,GEOID,COMMUNITY_NAME,2020_POP,COMMUNITY_CAPS
0,1,520,1,Rogers Park,55628,ROGERS PARK
1,10,701,10,Norwood Park,38303,NORWOOD PARK
2,11,454,11,Jefferson Park,26216,JEFFERSON PARK
3,12,407,12,Forest Glen,19596,FOREST GLEN
4,13,321,13,North Park,17559,NORTH PARK


In [21]:
df_summary['complaint_rate']=df_summary['complaints']/df_summary['2020_POP']

In [24]:
df_summary.sort_values('complaint_rate',ascending = False)

Unnamed: 0,COMMUNITY_AREA,complaints,GEOID,COMMUNITY_NAME,2020_POP,COMMUNITY_CAPS,complaint_rate
73,76,15331,76,O'Hare,13418,O'HARE,1.142570
20,28,37374,28,Near West Side,67881,NEAR WEST SIDE,0.550581
30,37,113,37,Fuller Park,2567,FULLER PARK,0.044020
42,48,460,48,Calumet Heights,13088,CALUMET HEIGHTS,0.035147
63,67,1014,67,West Englewood,29647,WEST ENGLEWOOD,0.034202
...,...,...,...,...,...,...,...
0,1,520,1,Rogers Park,55628,ROGERS PARK,0.009348
35,41,255,41,Hyde Park,29456,HYDE PARK,0.008657
29,36,54,36,Oakland,6799,OAKLAND,0.007942
26,33,219,33,Near South Side,28795,NEAR SOUTH SIDE,0.007605
