<h1>Get Snow Violations Data</h1>
10 January 2024

This analysis looks at 311 complaints filed as "Snow – Uncleared Sidewalk Complaint," or SWSNOREM.<br>
<br>
My analysis steps:
<ol>
<li><a href="#get">Get Data</a>
    <li><a href="#prepare">Preprocess Data</a>
        <li><a href="#save">Save Data</a>
</ol>

### Data Sources
<strong>Snow violations:</strong> <a href="https://data.cityofchicago.org/resource/v6vf-nfxy.json">311 Service Requests</a><br>
<!-- <strong>Community names:</strong> <a href="https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Community-Areas-current-/cauq-8yn6">Boundaries- Community Areas</a><br> -->
<strong>Community names and population:</strong> <a href="https://datahub.cmap.illinois.gov/datasets/CMAPGIS::community-data-snapshots-raw-data-2014-2022/explore?layer=21">Chicago Community Areas 2022</a> (CMAP)<br>

<a name = "get"></a>
    <h1>1. Get Data</h1>

In [1]:
import pandas as pd
import requests
#import datetime as dt #would only need this if I manipulated dates post-API data retrieval

In [2]:
base_url = "https://data.cityofchicago.org/resource/v6vf-nfxy.json"
select = "SR_NUMBER, SR_SHORT_CODE, CREATED_DATE, STREET_ADDRESS, COMMUNITY_AREA, WARD, STATUS, ORIGIN, CLOSED_DATE, LATITUDE, LONGITUDE, SR_TYPE"
where = "SR_TYPE like '%25Snow%25'"
limit = 99999

url = f"{base_url}?$SELECT={select}&$WHERE={where}&$LIMIT={limit}"
print (url)

https://data.cityofchicago.org/resource/v6vf-nfxy.json?$SELECT=SR_NUMBER, SR_SHORT_CODE, CREATED_DATE, STREET_ADDRESS, COMMUNITY_AREA, WARD, STATUS, ORIGIN, CLOSED_DATE, LATITUDE, LONGITUDE, SR_TYPE&$WHERE=SR_TYPE like '%25Snow%25'&$LIMIT=99999


In [3]:
response = requests.get(url)
data = response.json()
print (response)

<Response [200]>


### verify dataset looks okay. should see about 68.5k records across 4x snow types as of early January 2024

In [4]:
df=pd.DataFrame(data)
df.head()

Unnamed: 0,SR_NUMBER,SR_SHORT_CODE,CREATED_DATE,STREET_ADDRESS,STATUS,ORIGIN,CLOSED_DATE,SR_TYPE,COMMUNITY_AREA,WARD,LATITUDE,LONGITUDE
0,SR19-00102142,SDW,2019-01-22T17:47:53.000,23 S Drake AVE,Completed,Internet,2019-03-15T07:25:34.000,Snow - Object/Dibs Removal Request,,,,
1,SR22-00050149,SDO,2022-01-10T11:05:32.000,4151 W WASHINGTON BLVD,Completed,Mobile Device,2022-01-10T11:06:41.000,Ice and Snow Removal Request,26.0,28.0,41.881691809,-87.730165409
2,SR19-00123488,SDW,2019-01-27T16:13:36.000,2320 N Luna AVE,Completed,Internet,2019-03-15T07:22:15.000,Snow - Object/Dibs Removal Request,,,,
3,SR24-00092065,SWSNOREM,2024-01-16T07:54:03.000,2315 N LINCOLN PARK WEST,Completed,Mobile Device,2024-01-23T09:53:30.000,Snow – Uncleared Sidewalk Complaint,7.0,43.0,41.922495001,-87.636906
4,SR24-00099426,SDW,2024-01-17T08:37:18.000,2719 W 18TH ST,Completed,Mobile Device,2024-01-23T06:53:42.000,Snow - Object/Dibs Removal Request,29.0,25.0,41.857218001,-87.6939525


In [5]:
len(df)

72006

### read community names
just get the community names for now. Save Census data for analysis steps

In [6]:
# retrieved on 1/11/24, but 2020 Census Population figures should be static
df_population = pd.read_csv("../data/01-raw/population_cmap_2022.csv")

In [7]:
# simplify dataframe to get only essentials
df_population = df_population[['GEOID','GEOG']]
df_population = df_population.rename(columns={'GEOG':'COMMUNITY_NAME'})
df_population.head()

Unnamed: 0,GEOID,COMMUNITY_NAME
0,14,Albany Park
1,57,Archer Heights
2,34,Armour Square
3,70,Ashburn
4,71,Auburn Gresham


<a name = "prepare"></a>
# 2. Preprocess Data

### parse dates

In [8]:
def get_season(month, year):
# Input:
#   month: numeric month of the year (1-12)
#   year (int): 4 digit year
# Returns:
#   season in yyyy-yyyy format. E.g. 2022-2023 means winter 2022-2023, or July 2022 to June 2023.

    if month >= 7:
        return f"{str(year)[-4:]}-{str(year + 1)[-4:]}"
    else:
        return f"{str(year - 1)[-4:]}-{str(year)[-4:]}"

In [9]:
# format and parse dates
df['CREATED_DATE'] = pd.to_datetime(df['CREATED_DATE'])
df['CLOSED_DATE'] = pd.to_datetime(df['CLOSED_DATE'])
df['year'] = df['CREATED_DATE'].dt.year
df['month'] = df['CREATED_DATE'].dt.month
df['date'] = df['CREATED_DATE'].dt.date
df['season'] = df.apply(lambda row: get_season(row['month'], row['year']), axis=1)
df.head()

Unnamed: 0,SR_NUMBER,SR_SHORT_CODE,CREATED_DATE,STREET_ADDRESS,STATUS,ORIGIN,CLOSED_DATE,SR_TYPE,COMMUNITY_AREA,WARD,LATITUDE,LONGITUDE,year,month,date,season
0,SR19-00102142,SDW,2019-01-22 17:47:53,23 S Drake AVE,Completed,Internet,2019-03-15 07:25:34,Snow - Object/Dibs Removal Request,,,,,2019,1,2019-01-22,2018-2019
1,SR22-00050149,SDO,2022-01-10 11:05:32,4151 W WASHINGTON BLVD,Completed,Mobile Device,2022-01-10 11:06:41,Ice and Snow Removal Request,26.0,28.0,41.881691809,-87.730165409,2022,1,2022-01-10,2021-2022
2,SR19-00123488,SDW,2019-01-27 16:13:36,2320 N Luna AVE,Completed,Internet,2019-03-15 07:22:15,Snow - Object/Dibs Removal Request,,,,,2019,1,2019-01-27,2018-2019
3,SR24-00092065,SWSNOREM,2024-01-16 07:54:03,2315 N LINCOLN PARK WEST,Completed,Mobile Device,2024-01-23 09:53:30,Snow – Uncleared Sidewalk Complaint,7.0,43.0,41.922495001,-87.636906,2024,1,2024-01-16,2023-2024
4,SR24-00099426,SDW,2024-01-17 08:37:18,2719 W 18TH ST,Completed,Mobile Device,2024-01-23 06:53:42,Snow - Object/Dibs Removal Request,29.0,25.0,41.857218001,-87.6939525,2024,1,2024-01-17,2023-2024


In [10]:
# review season assignment
df.groupby('season').size()

season
2018-2019      635
2019-2020    10113
2020-2021    30967
2021-2022    22396
2022-2023     4157
2023-2024     3738
dtype: int64

### identify records with missing or suburban address

In [11]:
# find records with missing address
len(df[df['STREET_ADDRESS'].isna()==True])

202

In [12]:
len(df[df['COMMUNITY_AREA'].isna()==True])

178

In [13]:
len(df[df['WARD'].isna()==True])

172

### remove invalid addresses

In [14]:
df_valid_address = df[(df['STREET_ADDRESS'].isna()==False)
                     & (df['WARD'].isna()==False)
                     & (df['COMMUNITY_AREA'].isna()==False)]
len(df_valid_address)

71712

In [15]:
# then correct object types to integer
df_valid_address['COMMUNITY_AREA'] = df_valid_address['COMMUNITY_AREA'].astype(int)
df_valid_address['WARD'] = df_valid_address['WARD'].astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_valid_address['COMMUNITY_AREA'] = df_valid_address['COMMUNITY_AREA'].astype(int)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_valid_address['WARD'] = df_valid_address['WARD'].astype(int)


### merge in community names

In [16]:
df_clean = pd.merge(left=df_valid_address,right=df_population,left_on='COMMUNITY_AREA',right_on='GEOID')
df_clean.head()

Unnamed: 0,SR_NUMBER,SR_SHORT_CODE,CREATED_DATE,STREET_ADDRESS,STATUS,ORIGIN,CLOSED_DATE,SR_TYPE,COMMUNITY_AREA,WARD,LATITUDE,LONGITUDE,year,month,date,season,GEOID,COMMUNITY_NAME
0,SR22-00050149,SDO,2022-01-10 11:05:32,4151 W WASHINGTON BLVD,Completed,Mobile Device,2022-01-10 11:06:41,Ice and Snow Removal Request,26,28,41.881691809,-87.730165409,2022,1,2022-01-10,2021-2022,26,West Garfield Park
1,SR24-00092065,SWSNOREM,2024-01-16 07:54:03,2315 N LINCOLN PARK WEST,Completed,Mobile Device,2024-01-23 09:53:30,Snow – Uncleared Sidewalk Complaint,7,43,41.922495001,-87.636906,2024,1,2024-01-16,2023-2024,7,Lincoln Park
2,SR24-00099426,SDW,2024-01-17 08:37:18,2719 W 18TH ST,Completed,Mobile Device,2024-01-23 06:53:42,Snow - Object/Dibs Removal Request,29,25,41.857218001,-87.6939525,2024,1,2024-01-17,2023-2024,29,North Lawndale
3,SR23-01930406,SDO,2023-11-06 21:46:58,1410 E 62ND ST,Completed,Internet,2023-11-09 18:25:10,Ice and Snow Removal Request,42,20,41.782635001,-87.5906235,2023,11,2023-11-06,2023-2024,42,Woodlawn
4,SR24-00081383,SNPBLBS,2024-01-13 14:26:54,501 W GRAND AVE,Completed,Internet,2024-02-07 07:23:00,Snow Removal - Protected Bike Lane or Bridge S...,24,42,41.891336874,-87.641257857,2024,1,2024-01-13,2023-2024,24,West Town


In [18]:
### for convenience, get dataset of just unshoveled snow
df_unshoveled = df_clean[df_clean['SR_SHORT_CODE']=='SWSNOREM']
len(df_unshoveled)

22685

<a name = "save"></a>
# 3. Export Data

In [19]:
df_clean.to_csv("../../data/02-tidied/311-complaints-snow-all-types.csv", index=False)
df_clean.to_csv("../../data/02-tidied/311-complaints-unshoveled.csv", index=False)