<h1>Get Snow Violations Data</h1>
10 January 2024

This analysis looks at 311 complaints filed as "Snow – Uncleared Sidewalk Complaint," or SWSNOREM.<br>
<br>
My analysis steps:
<ol>
<li><a href="#configure">Import Libraries</a>
<li><a href="#get">Get Data</a>
    <li><a href="#prepare">Prepare Data</a>
        <li><a href="#save">Save Data</a>
</ol>

### Data Sources
<strong>Snow violations:</strong> <a href="https://data.cityofchicago.org/resource/v6vf-nfxy.json">311 Service Requests</a><br>
<!-- <strong>Community names:</strong> <a href="https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Community-Areas-current-/cauq-8yn6">Boundaries- Community Areas</a><br> -->
<strong>Community names and population:</strong> <a href="https://datahub.cmap.illinois.gov/datasets/CMAPGIS::community-data-snapshots-raw-data-2014-2022/explore?layer=21">Chicago Community Areas 2022</a> (CMAP)<br>

<a name = "configure"></a>
<h1>1. Import Libraries</h1>

In [1]:
import pandas as pd
import requests
#import datetime as dt #would only need this if I manipulated dates post-API data retrieval

<a name = "get"></a>
    <h1>2. Get Data</h1>

In [2]:
base_url = "https://data.cityofchicago.org/resource/v6vf-nfxy.json"
select = "SR_NUMBER, SR_SHORT_CODE, CREATED_DATE, STREET_ADDRESS, COMMUNITY_AREA, WARD, STATUS, ORIGIN, CLOSED_DATE, LATITUDE, LONGITUDE, SR_TYPE"
where = "SR_TYPE like '%25Snow%25'"
limit = 99999

url = f"{base_url}?$SELECT={select}&$WHERE={where}&$LIMIT={limit}"
print (url)

https://data.cityofchicago.org/resource/v6vf-nfxy.json?$SELECT=SR_NUMBER, SR_SHORT_CODE, CREATED_DATE, STREET_ADDRESS, COMMUNITY_AREA, WARD, STATUS, ORIGIN, CLOSED_DATE, LATITUDE, LONGITUDE, SR_TYPE&$WHERE=SR_TYPE like '%25Snow%25'&$LIMIT=99999


In [3]:
response = requests.get(url)
data = response.json()
print (response)

<Response [200]>


### verify dataset looks okay. should see about 68.5k records across 4x snow types as of early January 2024

In [4]:
df=pd.DataFrame(data)
df.head()

Unnamed: 0,SR_NUMBER,SR_SHORT_CODE,CREATED_DATE,STREET_ADDRESS,COMMUNITY_AREA,WARD,STATUS,ORIGIN,CLOSED_DATE,LATITUDE,LONGITUDE,SR_TYPE
0,SR21-00149474,SDO,2021-01-27T13:13:50.000,1700 W 15TH ST,28,28,Completed,Mobile Device,2021-01-27T19:50:47.000,41.86145694,-87.668881384,Ice and Snow Removal Request
1,SR21-00177207,SDO,2021-02-01T10:19:34.000,1300 S HEATH AVE,28,28,Completed,Mobile Device,2021-02-01T15:12:41.000,41.864744399,-87.684402435,Ice and Snow Removal Request
2,SR21-00179217,SDO,2021-02-01T13:17:16.000,3242 W FULTON BLVD,27,28,Completed,Mobile Device,2021-02-01T21:10:54.000,41.886675065,-87.707984972,Ice and Snow Removal Request
3,SR21-00269268,SDO,2021-02-17T12:43:47.000,819 S BISHOP ST,28,28,Completed,Mobile Device,2021-02-19T00:58:56.000,41.871081463,-87.662623788,Ice and Snow Removal Request
4,SR20-05442947,SDW,2020-11-11T19:40:31.000,3401 W 53RD ST,63,14,Completed,Mobile Device,2020-11-16T06:50:26.000,41.79690071,-87.708733081,Snow - Object/Dibs Removal Request


In [5]:
len(df)

71909

### read community names

In [6]:
# retrieved on 1/11/24, but 2020 Census Population figures should be static

df_population = pd.read_csv("../data/population_cmap_2022.csv")

In [7]:
# simplify dataframe to get only essentials
df_population = df_population[['GEOID','GEOG']]
df_population = df_population.rename(columns={'GEOG':'COMMUNITY_NAME'})
df_population.head()

Unnamed: 0,GEOID,COMMUNITY_NAME
0,14,Albany Park
1,57,Archer Heights
2,34,Armour Square
3,70,Ashburn
4,71,Auburn Gresham


<a name = "prepare"></a>
# 3. Prepare Data

### parse dates

In [8]:
def get_season(month, year):
# Input:
#   month: numeric month of the year (1-12)
#   year (int): 4 digit year
# Returns:
#   season in yyyy-yyyy format. E.g. 2022-2023 means winter 2022-2023, or July 2022 to June 2023.

    if month >= 7:
        return f"{str(year)[-4:]}-{str(year + 1)[-4:]}"
    else:
        return f"{str(year - 1)[-4:]}-{str(year)[-4:]}"

In [9]:
# format and parse dates
df['CREATED_DATE'] = pd.to_datetime(df['CREATED_DATE'])
df['CLOSED_DATE'] = pd.to_datetime(df['CLOSED_DATE'])
df['year'] = df['CREATED_DATE'].dt.year
df['month'] = df['CREATED_DATE'].dt.month
df['date'] = df['CREATED_DATE'].dt.date
df['season'] = df.apply(lambda row: get_season(row['month'], row['year']), axis=1)
df.head()

Unnamed: 0,SR_NUMBER,SR_SHORT_CODE,CREATED_DATE,STREET_ADDRESS,COMMUNITY_AREA,WARD,STATUS,ORIGIN,CLOSED_DATE,LATITUDE,LONGITUDE,SR_TYPE,year,month,date,season
0,SR21-00149474,SDO,2021-01-27 13:13:50,1700 W 15TH ST,28,28,Completed,Mobile Device,2021-01-27 19:50:47,41.86145694,-87.668881384,Ice and Snow Removal Request,2021,1,2021-01-27,2020-2021
1,SR21-00177207,SDO,2021-02-01 10:19:34,1300 S HEATH AVE,28,28,Completed,Mobile Device,2021-02-01 15:12:41,41.864744399,-87.684402435,Ice and Snow Removal Request,2021,2,2021-02-01,2020-2021
2,SR21-00179217,SDO,2021-02-01 13:17:16,3242 W FULTON BLVD,27,28,Completed,Mobile Device,2021-02-01 21:10:54,41.886675065,-87.707984972,Ice and Snow Removal Request,2021,2,2021-02-01,2020-2021
3,SR21-00269268,SDO,2021-02-17 12:43:47,819 S BISHOP ST,28,28,Completed,Mobile Device,2021-02-19 00:58:56,41.871081463,-87.662623788,Ice and Snow Removal Request,2021,2,2021-02-17,2020-2021
4,SR20-05442947,SDW,2020-11-11 19:40:31,3401 W 53RD ST,63,14,Completed,Mobile Device,2020-11-16 06:50:26,41.79690071,-87.708733081,Snow - Object/Dibs Removal Request,2020,11,2020-11-11,2020-2021


In [10]:
# review season assignment
df.groupby('season').size()

season
2018-2019      635
2019-2020    10113
2020-2021    30967
2021-2022    22396
2022-2023     4157
2023-2024     3641
dtype: int64

### identify records with missing or suburban address

In [11]:
# find records with missing address
len(df[df['STREET_ADDRESS'].isna()==True])

202

In [12]:
len(df[df['COMMUNITY_AREA'].isna()==True])

178

In [13]:
len(df[df['WARD'].isna()==True])

172

### remove invalid addresses

In [14]:
df_valid_address = df[(df['STREET_ADDRESS'].isna()==False)
                     & (df['WARD'].isna()==False)
                     & (df['COMMUNITY_AREA'].isna()==False)]
len(df_valid_address)

71615

In [15]:
# then correct object types to integer
df_valid_address['COMMUNITY_AREA'] = df_valid_address['COMMUNITY_AREA'].astype(int)
df_valid_address['WARD'] = df_valid_address['WARD'].astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_valid_address['COMMUNITY_AREA'] = df_valid_address['COMMUNITY_AREA'].astype(int)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_valid_address['WARD'] = df_valid_address['WARD'].astype(int)


### merge in community names

In [16]:
df_clean = pd.merge(left=df_valid_address,right=df_population,left_on='COMMUNITY_AREA',right_on='GEOID')
df_clean.head()

Unnamed: 0,SR_NUMBER,SR_SHORT_CODE,CREATED_DATE,STREET_ADDRESS,COMMUNITY_AREA,WARD,STATUS,ORIGIN,CLOSED_DATE,LATITUDE,LONGITUDE,SR_TYPE,year,month,date,season,GEOID,COMMUNITY_NAME
0,SR21-00149474,SDO,2021-01-27 13:13:50,1700 W 15TH ST,28,28,Completed,Mobile Device,2021-01-27 19:50:47,41.86145694,-87.668881384,Ice and Snow Removal Request,2021,1,2021-01-27,2020-2021,28,Near West Side
1,SR21-00177207,SDO,2021-02-01 10:19:34,1300 S HEATH AVE,28,28,Completed,Mobile Device,2021-02-01 15:12:41,41.864744399,-87.684402435,Ice and Snow Removal Request,2021,2,2021-02-01,2020-2021,28,Near West Side
2,SR21-00179217,SDO,2021-02-01 13:17:16,3242 W FULTON BLVD,27,28,Completed,Mobile Device,2021-02-01 21:10:54,41.886675065,-87.707984972,Ice and Snow Removal Request,2021,2,2021-02-01,2020-2021,27,East Garfield Park
3,SR21-00269268,SDO,2021-02-17 12:43:47,819 S BISHOP ST,28,28,Completed,Mobile Device,2021-02-19 00:58:56,41.871081463,-87.662623788,Ice and Snow Removal Request,2021,2,2021-02-17,2020-2021,28,Near West Side
4,SR20-05442947,SDW,2020-11-11 19:40:31,3401 W 53RD ST,63,14,Completed,Mobile Device,2020-11-16 06:50:26,41.79690071,-87.708733081,Snow - Object/Dibs Removal Request,2020,11,2020-11-11,2020-2021,63,Gage Park


<a name = "save"></a>
# 4. Save Data

In [19]:
df_clean.to_csv("../data/02-prepped/311-complaints-snow.csv", index=False)