<h1>Get Snow Violations Data</h1>
10 January 2024

This analysis looks at 311 complaints filed as "Snow – Uncleared Sidewalk Complaint," or SWSNOREM.<br>
<br>
My analysis steps:
<ol>
<li><a href="#get">Get Data</a>
    <li><a href="#prepare">Preprocess Data</a>
        <li><a href="#save">Save Data</a>
</ol>

### Data Sources
<strong>Snow violations:</strong> <a href="https://data.cityofchicago.org/resource/v6vf-nfxy.json">311 Service Requests</a><br>
<!-- <strong>Community names:</strong> <a href="https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Community-Areas-current-/cauq-8yn6">Boundaries- Community Areas</a><br> -->
<strong>Community names and population:</strong> <a href="https://datahub.cmap.illinois.gov/datasets/CMAPGIS::community-data-snapshots-raw-data-2014-2022/explore?layer=21">Chicago Community Areas 2022</a> (CMAP)<br>

<a name = "get"></a>
    <h1>1. Get Data via API</h1>

In [1]:
import pandas as pd
import requests
#import datetime as dt #would only need this if I manipulated dates post-API data retrieval

In [2]:
base_url = "https://data.cityofchicago.org/resource/v6vf-nfxy.json"
select = "SR_NUMBER, SR_SHORT_CODE, CREATED_DATE, STREET_ADDRESS, COMMUNITY_AREA, WARD, STATUS, ORIGIN, CLOSED_DATE, LATITUDE, LONGITUDE, SR_TYPE"
where = "SR_TYPE like '%25Snow%25'"
limit = 99999

url = f"{base_url}?$SELECT={select}&$WHERE={where}&$LIMIT={limit}"
print (url)

In [3]:
response = requests.get(url)
data = response.json()
print (response)

### verify dataset looks okay. should see about 68.5k records across 4x snow types as of early January 2024

In [4]:
df=pd.DataFrame(data)
df.head()

In [5]:
len(df)

### read community names
just get the community names for now. Save Census data for analysis steps

In [6]:
# retrieved on 1/11/24, but 2020 Census Population figures should be static
df_population = pd.read_csv("../data/01-raw/population_cmap_2022.csv")

In [7]:
# simplify dataframe to get only essentials
df_population = df_population[['GEOID','GEOG']]
df_population = df_population.rename(columns={'GEOG':'COMMUNITY_NAME'})
df_population.head()

<a name = "prepare"></a>
# 2. Preprocess Data

### parse dates

In [8]:
def get_season(month, year):
# Input:
#   month: numeric month of the year (1-12)
#   year (int): 4 digit year
# Returns:
#   season in yyyy-yyyy format. E.g. 2022-2023 means winter 2022-2023, or July 2022 to June 2023.

    if month >= 7:
        return f"{str(year)[-4:]}-{str(year + 1)[-4:]}"
    else:
        return f"{str(year - 1)[-4:]}-{str(year)[-4:]}"

In [9]:
# format and parse dates
df['CREATED_DATE'] = pd.to_datetime(df['CREATED_DATE'])
df['CLOSED_DATE'] = pd.to_datetime(df['CLOSED_DATE'])
df['year'] = df['CREATED_DATE'].dt.year
df['month'] = df['CREATED_DATE'].dt.month
df['date'] = df['CREATED_DATE'].dt.date
df['season'] = df.apply(lambda row: get_season(row['month'], row['year']), axis=1)
df.head()

In [10]:
# review season assignment
df.groupby('season').size()

### identify records with missing or suburban address

In [11]:
# find records with missing address
len(df[df['STREET_ADDRESS'].isna()==True])

In [12]:
len(df[df['COMMUNITY_AREA'].isna()==True])

In [13]:
len(df[df['WARD'].isna()==True])

### remove invalid addresses

In [14]:
df_valid_address = df[(df['STREET_ADDRESS'].isna()==False)
                     & (df['WARD'].isna()==False)
                     & (df['COMMUNITY_AREA'].isna()==False)]
len(df_valid_address)

In [15]:
# then correct object types to integer
df_valid_address['COMMUNITY_AREA'] = df_valid_address['COMMUNITY_AREA'].astype(int)
df_valid_address['WARD'] = df_valid_address['WARD'].astype(int)

### merge in community names

In [16]:
df_clean = pd.merge(left=df_valid_address,right=df_population,left_on='COMMUNITY_AREA',right_on='GEOID')
df_clean.head()

In [17]:
### for convenience, get dataset of just unshoveled snow
df_unshoveled = df_clean[df_clean['SR_SHORT_CODE']=='SWSNOREM']
len(df_unshoveled)

<a name = "save"></a>
# 3. Export Data

In [18]:
df_clean.to_csv("../../data/01-tidied/311-complaints-snow-all-types.csv", index=False)
df_clean.to_csv("../../data/01-tidied/311-complaints-unshoveled.csv", index=False)