<h1>Get Snow Violations Data</h1>
10 January 2024

This analysis looks at 311 complaints filed as "Snow – Uncleared Sidewalk Complaint," or SWSNOREM.<br>
<br>
My analysis steps:
<ol>
<li><a href="#configure">Import Libraries</a>
<li><a href="#get">Get Data</a>
    <li><a href="#prepare">Prepare Data</a>
        <li><a href="#save">Save Data</a>
</ol>

<a name = "configure"></a>
<h1>1. Import Libraries</h1>

In [1]:
import pandas as pd
import requests
#import datetime as dt #would only need this if I manipulated dates post-API data retrieval

<a name = "get"></a>
    <h1>2. Get Data</h1>

In [2]:
base_url = "https://data.cityofchicago.org/resource/v6vf-nfxy.json"
select = "SR_NUMBER, CREATED_DATE, STREET_ADDRESS, COMMUNITY_AREA, WARD, STATUS, ORIGIN, CLOSED_DATE"
where = "SR_SHORT_CODE='SWSNOREM'"
limit = 99999

url = f"{base_url}?$SELECT={select}&$WHERE={where}&$LIMIT={limit}"
print (url)

https://data.cityofchicago.org/resource/v6vf-nfxy.json?$SELECT=SR_NUMBER, CREATED_DATE, STREET_ADDRESS, COMMUNITY_AREA, WARD, STATUS, ORIGIN, CLOSED_DATE&$WHERE=SR_SHORT_CODE='SWSNOREM'&$LIMIT=99999


In [3]:
response = requests.get(url)
data = response.json()
print (response)

<Response [200]>


### verify dataset looks okay. should see about 21.5k records as of early January 2024

In [4]:
df=pd.DataFrame(data)
df.head()

Unnamed: 0,SR_NUMBER,CREATED_DATE,STATUS,ORIGIN,CLOSED_DATE,STREET_ADDRESS,COMMUNITY_AREA,WARD
0,SR19-02929866,2019-11-12T09:31:20.000,Completed,Internet,2019-11-21T09:15:03.000,,,
1,SR20-05649092,2020-12-17T13:24:37.000,Completed,Phone Call,2020-12-18T15:12:51.000,5200 S BLACKSTONE AVE,41.0,4.0
2,SR21-00001364,2021-01-01T11:40:33.000,Completed,Mobile Device,2021-01-04T14:33:44.000,66 E CHESTNUT ST,8.0,42.0
3,SR21-00000774,2021-01-01T09:11:15.000,Completed,Internet,2021-01-06T13:37:58.000,2648 N WHIPPLE ST,22.0,32.0
4,SR20-05723378,2020-12-31T07:19:37.000,Completed,Mobile Device,2021-01-08T15:30:57.000,5015 N SPRINGFIELD AVE,14.0,39.0


In [5]:
len(df)

21547

<a name = "prepare"></a>
# 3. prepare data

In [6]:
def get_season(month, year):
# Input:
#   month: numeric month of the year (1-12)
#   year (int): 4 digit year
# Returns:
#   season in yy-yy format. E.g. 22-23 means winter 2022-2023, or July 2022 to June 2023.

    if month >= 7:
        return f"{str(year)[-2:]}-{str(year + 1)[-2:]}"
    else:
        return f"{str(year - 1)[-2:]}-{str(year)[-2:]}"

In [7]:
# format and parse dates
df['CREATED_DATE'] = pd.to_datetime(df['CREATED_DATE'])
df['CLOSED_DATE'] = pd.to_datetime(df['CLOSED_DATE'])
df['year'] = df['CREATED_DATE'].dt.year
df['month'] = df['CREATED_DATE'].dt.month
df['date'] = df['CREATED_DATE'].dt.date
df['season'] = df.apply(lambda row: get_season(row['month'], row['year']), axis=1)
df.head()

Unnamed: 0,SR_NUMBER,CREATED_DATE,STATUS,ORIGIN,CLOSED_DATE,STREET_ADDRESS,COMMUNITY_AREA,WARD,year,month,date,season
0,SR19-02929866,2019-11-12 09:31:20,Completed,Internet,2019-11-21 09:15:03,,,,2019,11,2019-11-12,19-20
1,SR20-05649092,2020-12-17 13:24:37,Completed,Phone Call,2020-12-18 15:12:51,5200 S BLACKSTONE AVE,41.0,4.0,2020,12,2020-12-17,20-21
2,SR21-00001364,2021-01-01 11:40:33,Completed,Mobile Device,2021-01-04 14:33:44,66 E CHESTNUT ST,8.0,42.0,2021,1,2021-01-01,20-21
3,SR21-00000774,2021-01-01 09:11:15,Completed,Internet,2021-01-06 13:37:58,2648 N WHIPPLE ST,22.0,32.0,2021,1,2021-01-01,20-21
4,SR20-05723378,2020-12-31 07:19:37,Completed,Mobile Device,2021-01-08 15:30:57,5015 N SPRINGFIELD AVE,14.0,39.0,2020,12,2020-12-31,20-21


In [8]:
# review season assignment
df.groupby('season').size()

season
18-19     378
19-20    6543
20-21    6533
21-22    6426
22-23    1628
23-24      39
dtype: int64

### remove records with missing address

In [24]:
# find records with missing address
len(df[df['STREET_ADDRESS'].isna()==True])

25

In [27]:
df_valid_address = df[df['STREET_ADDRESS'].isna()==False]
len(df_valid_address)

21522

<a name = "save"></a>
# 4. Save Data

In [28]:
df_valid_address.to_csv("../data/311-snow-violations.csv", index=False)