<h1>Snow Clearance Fines, 2019-2023</h1>
23 January 2024

This notebook cleans up FOIA-requested data provided by the Department of Administrative Hearings. Though a full year of data is provided for the 2018-2019 season, I chose to look only at 2019-2023 to align with data available for 311 complaints.<br>
<br>
My analysis steps:
<ol>
<li><a href="#read">Read Data</a>
  <li><a href="#prep">Prepare Data for Analysis</a> 
   <li><a href="#correct">Correct Bad Addresses</a>   
<li><a href="#prep">Geocode Data</a>
</ol>

### Summary of Data Preparation

Dataset provided by Chicago's Department of Administrative Hearings (H064920-011124.xlsx) contained 3058 records dating from 1/1/2001 to 9/12/2023.

<strong>Removed 498 records out of date range</strong>
From this dataset I removed:
<ul>
    <li>2 records with missing dates
        <li>19 records between 1/1/2001 from 2/13/2018
<li>2 records after 6/30/2023
<li>475 records from winter 2018-2019
</ul>
    
That left 2560 records.

<strong>Corrected 117 addresses to enable geocoding</strong>
Of the 2560 remaining records:
    <ul>
<li>117 could not initially be geocoded due to typos or ambiguities in addresses (e.g. Belmlont, Lawerence, North instead of North Ave)
<li>4 could not be geocoded because address was listed as "UNKNOWN"
    </ul>
    
<strong>other data preparation</strong>
    <ul>
    <li>parsed dates to month and year
    <li>calculated season for all dates (e.g. 2019-2020 is for July 1 2019 to June 30 2020)
    </ul>
    
<strong>geocoding</strong><br>
In QGIS outside of this notebook, I geocoded addresses to first identify latitude and longitude corresponding to each address, and then mapping those coordinates to Chicago community areas using a spatial join

<a name="read"></a>
# 1. Read Data

In [19]:
import pandas as pd
import requests
#import datetime as dt #would only need this if I manipulated dates post-API data retrieval

In [20]:
df = pd.read_csv("../data/fines-2001-2023.csv")
df.head()

Unnamed: 0,Docket Number,Violation Date,Violation Address,Issuing Department Code,Imposed Fine Detailed
0,18CP093079,11/27/2018,5045 N TRIPP,POLICE,$0.00
1,18DS11011L,01/05/2018,741 W 123RD ST,STRTSAN,$0.00
2,18DS16572L,02/08/2018,5033 S CICERO AVE,STRTSAN,$50.00
3,18DS23469L,02/13/2018,4100 W MADISON ST,STRTSAN,$50.00
4,18DT002526,11/30/2018,3401 W DIVERSEY,TRANPORT,$110.00


In [21]:
len(df)

3058

<a name="prepare"></a>
# 2. Prepare Data
Dataset provided by Chicago's Department of Administrative Hearings (H064920-011124.xlsx) contained 3058 records dating from 1/1/2001 to 9/12/2023.

From this dataset I removed:
<ul>
    <li>2 records with missing dates
        <li>19 records between 1/1/2001 from 2/13/2018
<li>2 records after 6/30/2023
<il>475 records from winter 2018-2019
    </ul>
    
This leaves 2560 records.

### remove records with missing violation dates

In [22]:
# find records with missing Violation Date. NaT means "Not a Time"
df[df['Violation Date'].isna()]

Unnamed: 0,Docket Number,Violation Date,Violation Address,Issuing Department Code,Imposed Fine Detailed
1143,21DT00019A,,1601 E 95TH ST,TRANPORT,$0.00
1144,21DT00019A,,1601 E 95TH ST,TRANPORT,$150.00


In [23]:
df = df[df['Violation Date'].notna()]
len(df)

3056

### parse dates

In [24]:
def get_season(month, year):
# Input:
#   month: numeric month of the year (1-12)
#   year (int): 4 digit year
# Returns:
#   season in yyyy-yyyy format. E.g. 2022-2023 means winter 2022-2023, or July 2022 to June 2023.

    if month >= 7:
        return f"{str(year)[-4:]}-{str(year + 1)[-4:]}"
    else:
        return f"{str(year - 1)[-4:]}-{str(year)[-4:]}"

In [25]:
# format and parse dates
df['Violation Date'] = pd.to_datetime(df['Violation Date'])
df['year'] = df['Violation Date'].dt.year
df['month'] = df['Violation Date'].dt.month
df['date'] = df['Violation Date'].dt.date
df['season'] = df.apply(lambda row: get_season(row['month'], row['year']), axis=1)
df['Imposed Fine Detailed'] = pd.to_numeric(df['Imposed Fine Detailed'].replace('[\$,]', '', regex=True), errors='coerce')
df.head()

Unnamed: 0,Docket Number,Violation Date,Violation Address,Issuing Department Code,Imposed Fine Detailed,year,month,date,season
0,18CP093079,2018-11-27,5045 N TRIPP,POLICE,0.0,2018,11,2018-11-27,2018-2019
1,18DS11011L,2018-01-05,741 W 123RD ST,STRTSAN,0.0,2018,1,2018-01-05,2017-2018
2,18DS16572L,2018-02-08,5033 S CICERO AVE,STRTSAN,50.0,2018,2,2018-02-08,2017-2018
3,18DS23469L,2018-02-13,4100 W MADISON ST,STRTSAN,50.0,2018,2,2018-02-13,2017-2018
4,18DT002526,2018-11-30,3401 W DIVERSEY,TRANPORT,110.0,2018,11,2018-11-30,2018-2019


### summarize by season

In [26]:
df.groupby('season').size()

season
2000-2001     15
2011-2012      1
2017-2018      3
2018-2019    475
2019-2020    525
2020-2021    996
2021-2022    915
2022-2023    124
2023-2024      2
dtype: int64

In [27]:
len(df)

3056

### filter for 2019 to 2023 only

In [28]:
df = df[(df['season'].isin(['2019-2020','2020-2021','2021-2022','2022-2023']))]
len(df)

2560

### add city and state, for geocoding

In [29]:
df['City']='Chicago'
df['State']='Illinois'
df.head()

Unnamed: 0,Docket Number,Violation Date,Violation Address,Issuing Department Code,Imposed Fine Detailed,year,month,date,season,City,State
243,19DS68300L,2019-11-13,4710 S WESTERN AVE,STRTSAN,150.0,2019,11,2019-11-13,2019-2020,Chicago,Illinois
244,19DS69216L,2019-11-13,1425 W MORSE AVE,STRTSAN,50.0,2019,11,2019-11-13,2019-2020,Chicago,Illinois
245,19DS70010L,2019-11-13,715 E 47TH ST,STRTSAN,150.0,2019,11,2019-11-13,2019-2020,Chicago,Illinois
246,19DS72153L,2019-11-12,300 W WASHINGTON ST,STRTSAN,0.0,2019,11,2019-11-12,2019-2020,Chicago,Illinois
247,19DS72153L,2019-11-12,300 W WASHINGTON ST,STRTSAN,150.0,2019,11,2019-11-12,2019-2020,Chicago,Illinois


<a name ="correct"></a>
# 3. Correct Bad Addresses

In [30]:
# strip out trailing whitespace to get address merge to function. otherwise about 75% don't link correctly
df['Violation Address'] = df['Violation Address'].str.strip()

In [31]:
df_corrections = pd.read_csv("../data/address-corrections.csv")
df_corrections['Violation Address'] = df_corrections['Violation Address'].str.strip()
df_corrections.head()

Unnamed: 0,Violation Address,Cleaned Address,Corrected
0,10320 S CHURCH,10320 S CHURCH ST,True
1,1108 W GRAND,1108 W GRAND AVE,True
2,1111 W MARSHFIELD,1111 N MARSHFIELD,True
3,1146 W GRAND,1146 W GRAND AVE,True
4,115 W LAWERENCE,115 W LAWRENCE,True


In [32]:
df_cleaned = pd.merge(df,df_corrections, on='Violation Address', how='left')

In [33]:
# for correct records, copy existing address to cleaned address 
df_cleaned.loc[df_cleaned['Corrected'].isna(), 'Cleaned Address'] = df_cleaned['Violation Address']
df_cleaned.head()

Unnamed: 0,Docket Number,Violation Date,Violation Address,Issuing Department Code,Imposed Fine Detailed,year,month,date,season,City,State,Cleaned Address,Corrected
0,19DS68300L,2019-11-13,4710 S WESTERN AVE,STRTSAN,150.0,2019,11,2019-11-13,2019-2020,Chicago,Illinois,4710 S WESTERN AVE,
1,19DS69216L,2019-11-13,1425 W MORSE AVE,STRTSAN,50.0,2019,11,2019-11-13,2019-2020,Chicago,Illinois,1425 W MORSE AVE,
2,19DS70010L,2019-11-13,715 E 47TH ST,STRTSAN,150.0,2019,11,2019-11-13,2019-2020,Chicago,Illinois,715 E 47TH ST,
3,19DS72153L,2019-11-12,300 W WASHINGTON ST,STRTSAN,0.0,2019,11,2019-11-12,2019-2020,Chicago,Illinois,300 W WASHINGTON ST,
4,19DS72153L,2019-11-12,300 W WASHINGTON ST,STRTSAN,150.0,2019,11,2019-11-12,2019-2020,Chicago,Illinois,300 W WASHINGTON ST,


In [34]:
len(df_cleaned)

2561

In [35]:
df_cleaned.groupby('Corrected').size()

Corrected
True    117
dtype: int64

<a name="geocode"></a>
# 4. Export and Geocode Data

In [36]:
df_cleaned.to_csv("../data/fines-2019-2023.csv")

### Geocoding Data
I did the following in QGIS to assign community areas to each fine:
<ol>
    <li>gathered files- points (CSV of snow clearance fines and addresses) and shapes (Chicago community areas GeoJSON)
                <li>imported fines/addresses as Delimited Text Layer
                    <li>geocoded addresses using MMQGIS
            <li>imported regions GeoJSON
               <li>performed a point-in-polygon spatial join
                   <li>exported geocoded data as fines-geocoded-communities.csv
    </ol>