<h1>Get Snow Violations Data</h1>
10 January 2024

This analysis looks at 311 complaints filed as "Snow – Uncleared Sidewalk Complaint," or SWSNOREM.<br>
<br>
My analysis steps:
<ol>
<li><a href="#configure">Import Libraries</a>
<li><a href="#get">Get Data</a>
    <li><a href="#prepare">Prepare Data</a>
        <li><a href="#save">Save Data</a>
</ol>

<a name = "configure"></a>
<h1>1. Import Libraries</h1>

In [94]:
import pandas as pd
import requests
#import datetime as dt #would only need this if I manipulated dates post-API data retrieval

<a name = "get"></a>
    <h1>2. Get Data</h1>

In [111]:
base_url = "https://data.cityofchicago.org/resource/v6vf-nfxy.json"
select = "SR_NUMBER, SR_SHORT_CODE, CREATED_DATE, STREET_ADDRESS, COMMUNITY_AREA, WARD, STATUS, ORIGIN, CLOSED_DATE, LATITUDE, LONGITUDE, SR_TYPE"
where = "SR_TYPE like '%25Snow%25'"
limit = 99999

url = f"{base_url}?$SELECT={select}&$WHERE={where}&$LIMIT={limit}"
print (url)

https://data.cityofchicago.org/resource/v6vf-nfxy.json?$SELECT=SR_NUMBER, SR_SHORT_CODE, CREATED_DATE, STREET_ADDRESS, COMMUNITY_AREA, WARD, STATUS, ORIGIN, CLOSED_DATE, LATITUDE, LONGITUDE, SR_TYPE&$WHERE=SR_TYPE like '%25Snow%25'&$LIMIT=99999


In [112]:
response = requests.get(url)
data = response.json()
print (response)

<Response [200]>


### verify dataset looks okay. should see about 68.5k records across 4x snow types as of early January 2024

In [113]:
df=pd.DataFrame(data)
df.head()

Unnamed: 0,SR_NUMBER,SR_SHORT_CODE,CREATED_DATE,STREET_ADDRESS,STATUS,ORIGIN,CLOSED_DATE,SR_TYPE,COMMUNITY_AREA,WARD,LATITUDE,LONGITUDE
0,SR19-00102142,SDW,2019-01-22T17:47:53.000,23 S Drake AVE,Completed,Internet,2019-03-15T07:25:34.000,Snow - Object/Dibs Removal Request,,,,
1,SR22-00050149,SDO,2022-01-10T11:05:32.000,4151 W WASHINGTON BLVD,Completed,Mobile Device,2022-01-10T11:06:41.000,Ice and Snow Removal Request,26.0,28.0,41.881691809,-87.730165409
2,SR19-00123488,SDW,2019-01-27T16:13:36.000,2320 N Luna AVE,Completed,Internet,2019-03-15T07:22:15.000,Snow - Object/Dibs Removal Request,,,,
3,SR23-01930406,SDO,2023-11-06T21:46:58.000,1410 E 62ND ST,Completed,Internet,2023-11-09T18:25:10.000,Ice and Snow Removal Request,42.0,20.0,41.782635001,-87.5906235
4,SR23-01979807,SDW,2023-11-14T14:38:50.000,1421 N MENARD AVE,Completed,Mobile Device,2023-11-27T11:56:32.000,Snow - Object/Dibs Removal Request,25.0,29.0,41.906376001,-87.770403


In [114]:
len(df)

68554

<a name = "prepare"></a>
# 3. Prepare Data

### parse dates

In [115]:
def get_season(month, year):
# Input:
#   month: numeric month of the year (1-12)
#   year (int): 4 digit year
# Returns:
#   season in yy-yy format. E.g. 22-23 means winter 2022-2023, or July 2022 to June 2023.

    if month >= 7:
        return f"{str(year)[-2:]}-{str(year + 1)[-2:]}"
    else:
        return f"{str(year - 1)[-2:]}-{str(year)[-2:]}"

In [116]:
# format and parse dates
df['CREATED_DATE'] = pd.to_datetime(df['CREATED_DATE'])
df['CLOSED_DATE'] = pd.to_datetime(df['CLOSED_DATE'])
df['year'] = df['CREATED_DATE'].dt.year
df['month'] = df['CREATED_DATE'].dt.month
df['date'] = df['CREATED_DATE'].dt.date
df['season'] = df.apply(lambda row: get_season(row['month'], row['year']), axis=1)
df.head()

Unnamed: 0,SR_NUMBER,SR_SHORT_CODE,CREATED_DATE,STREET_ADDRESS,STATUS,ORIGIN,CLOSED_DATE,SR_TYPE,COMMUNITY_AREA,WARD,LATITUDE,LONGITUDE,year,month,date,season
0,SR19-00102142,SDW,2019-01-22 17:47:53,23 S Drake AVE,Completed,Internet,2019-03-15 07:25:34,Snow - Object/Dibs Removal Request,,,,,2019,1,2019-01-22,18-19
1,SR22-00050149,SDO,2022-01-10 11:05:32,4151 W WASHINGTON BLVD,Completed,Mobile Device,2022-01-10 11:06:41,Ice and Snow Removal Request,26.0,28.0,41.881691809,-87.730165409,2022,1,2022-01-10,21-22
2,SR19-00123488,SDW,2019-01-27 16:13:36,2320 N Luna AVE,Completed,Internet,2019-03-15 07:22:15,Snow - Object/Dibs Removal Request,,,,,2019,1,2019-01-27,18-19
3,SR23-01930406,SDO,2023-11-06 21:46:58,1410 E 62ND ST,Completed,Internet,2023-11-09 18:25:10,Ice and Snow Removal Request,42.0,20.0,41.782635001,-87.5906235,2023,11,2023-11-06,23-24
4,SR23-01979807,SDW,2023-11-14 14:38:50,1421 N MENARD AVE,Completed,Mobile Device,2023-11-27 11:56:32,Snow - Object/Dibs Removal Request,25.0,29.0,41.906376001,-87.770403,2023,11,2023-11-14,23-24


In [117]:
# review season assignment
df.groupby('season').size()

season
18-19      635
19-20    10113
20-21    30967
21-22    22396
22-23     4157
23-24      286
dtype: int64

### identify records with missing or suburban address

In [118]:
# find records with missing address
len(df[df['STREET_ADDRESS'].isna()==True])

202

In [119]:
len(df[df['COMMUNITY_AREA'].isna()==True])

175

In [120]:
len(df[df['WARD'].isna()==True])

169

### remove invalid addresses

In [121]:
df_valid_address = df[(df['STREET_ADDRESS'].isna()==False)
                     & (df['WARD'].isna()==False)
                     & (df['COMMUNITY_AREA'].isna()==False)]
len(df_valid_address)

68263

In [122]:
# then correct object types to integer
df_valid_address['COMMUNITY_AREA'] = df_valid_address['COMMUNITY_AREA'].astype(int)
df_valid_address['WARD'] = df_valid_address['WARD'].astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_valid_address['COMMUNITY_AREA'] = df_valid_address['COMMUNITY_AREA'].astype(int)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_valid_address['WARD'] = df_valid_address['WARD'].astype(int)


### read community area names

In [123]:
# retrieved on 1/10/24
#from https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Community-Areas-current-/cauq-8yn6

df_communities = pd.read_csv("../data/CommAreas.csv")

In [124]:
# simplify dataframe to get only essentials
df_communities = df_communities[['AREA_NUMBE','COMMUNITY']]
df_communities = df_communities.rename(columns={'COMMUNITY':'COMMUNITY_NAME'})
df_communities.head()

Unnamed: 0,AREA_NUMBE,COMMUNITY_NAME
0,35,DOUGLAS
1,36,OAKLAND
2,37,FULLER PARK
3,38,GRAND BOULEVARD
4,39,KENWOOD


### merge in community names

In [125]:
df_clean = pd.merge(left=df_valid_address,right=df_communities,left_on='COMMUNITY_AREA',right_on='AREA_NUMBE')
df_clean.head()

Unnamed: 0,SR_NUMBER,SR_SHORT_CODE,CREATED_DATE,STREET_ADDRESS,STATUS,ORIGIN,CLOSED_DATE,SR_TYPE,COMMUNITY_AREA,WARD,LATITUDE,LONGITUDE,year,month,date,season,AREA_NUMBE,COMMUNITY_NAME
0,SR22-00050149,SDO,2022-01-10 11:05:32,4151 W WASHINGTON BLVD,Completed,Mobile Device,2022-01-10 11:06:41,Ice and Snow Removal Request,26,28,41.881691809,-87.730165409,2022,1,2022-01-10,21-22,26,WEST GARFIELD PARK
1,SR21-00179282,SDO,2021-02-01 13:23:33,4100 W CARROLL AVE,Completed,Mobile Device,2021-02-01 21:10:54,Ice and Snow Removal Request,26,28,41.886832912,-87.728423604,2021,2,2021-02-01,20-21,26,WEST GARFIELD PARK
2,SR24-00033667,SWSNOREM,2024-01-06 09:47:03,4015 W ARTHINGTON ST,Completed,Phone Call,2024-01-08 09:57:27,Snow – Uncleared Sidewalk Complaint,26,24,41.869818001,-87.7260465,2024,1,2024-01-06,23-24,26,WEST GARFIELD PARK
3,SR21-00149624,SDO,2021-01-27 13:28:47,4500 W MAYPOLE AVE,Completed,Mobile Device,2021-01-27 19:18:47,Ice and Snow Removal Request,26,28,41.884790069,-87.738237328,2021,1,2021-01-27,20-21,26,WEST GARFIELD PARK
4,SR21-00194056,SDO,2021-02-03 14:07:31,212 N KENNETH AVE,Completed,Mobile Device,2021-02-03 15:56:32,Ice and Snow Removal Request,26,28,41.8839303,-87.737340893,2021,2,2021-02-03,20-21,26,WEST GARFIELD PARK


<a name = "save"></a>
# 4. Save Data

In [126]:
df_clean.to_csv("../data/311-snow-violations.csv", index=False)