# Tidy Unshoveled Sidewalk Fines
3/1/24

This notebook cleans up FOIA-requested data provided by the Department of Administrative Hearings. Though a full year of data is provided for the 2018-2019 season, I chose to look only at 2019-2023 to align with data available for 311 complaints.<br>

I received two FOIA responses from the Department of Administrative Hearings for dockets concerning unshoveled sidewalk citations. The first included a field indicating the issuing department, and the second included more info about respondents, hearing dates, and case disposition. This notebook merges those two responses into one unified dataset.

This dataset contains multiple records per docket, and I haven't yet made an attempt to normalize this into separate tables.

<div style="color:red">to redo section re: comparing recordcounts, and update this intro and add recordcount info<br>
might want to normalize more if i can find a primary key in the detailed dockets
</div>

### notes on data quality
one docket with no issuing department...
one docket with no violation date (21DT00019A), was filtered out

# Read Data

In [162]:
import pandas as pd

### read and review simple dockets dataset

In [163]:
df_dockets = pd.read_csv("../../data/01-raw/FOIA-H064920-011124-snow-dockets.csv")
df_dockets.head()

Unnamed: 0,Docket Number,Violation Date,Violation Address,Issuing Department Code,Imposed Fine Detailed
0,22CS00058A,02/04/2022,6621 S LANGLEY,BAFCONP,$0.00
1,22CS00059A,01/04/2022,6613 S LANGLEY AVE,BAFCONP,$0.00
2,18CP093079,11/27/2018,5045 N TRIPP,POLICE,$0.00
3,19CP003510,01/16/2019,5045 N TRIPP,POLICE,$0.00
4,19CP004491,01/06/2019,1800 S LAKE SHORE DR,POLICE,$0.00


In [164]:
len(df_dockets)

3058

### identify any dockets with missing violation date, which will be filtered out

In [165]:
df_dockets[df_dockets['Violation Date'].isna()]

Unnamed: 0,Docket Number,Violation Date,Violation Address,Issuing Department Code,Imposed Fine Detailed
1472,21DT00019A,,1601 E 95TH ST,TRANPORT,$0.00
1473,21DT00019A,,1601 E 95TH ST,TRANPORT,$150.00


### filter for 7/1/19 to 6/30/23

In [166]:
#convert violation date to datetime, so it can be filtered
df_dockets['Violation Date'] = pd.to_datetime(df_dockets['Violation Date'])

In [167]:
df_dockets = df_dockets[(df_dockets['Violation Date']>='7/1/19') & (df_dockets['Violation Date']<='6/30/23')]

In [168]:
# get number of unique dockets, for reference
df_dockets['Docket Number'].nunique()

1917

### read detailed dockets dataset

In [169]:
df_dockets_detailed = pd.read_csv("../../data/01-raw/FOIA-H067483-021424-snow-dockets-detailed.csv")
df_dockets_detailed.head()

Unnamed: 0,Respondent Full Name,Docket Number,Nov #,Violation Date,Hearing Date,Violation Address,Violation Description,Case Disposition Label,Imposed Fine Detailed
0,1010 W MADISON LLC,23DT000582,T000215159,01/31/2023,7/10/2023,1010 W MADISON,10-8-180 Snow and ice removal.,Non-Suit,$0.00
1,10-18 S HOMAN AVE HOLDINGS LLC c/o BRIAN GOTTE...,19DS41774L,241774L,01/22/2019,6/3/2019,10 S HOMAN AVE,10-8-180 Snow and ice removal.,Liable,$0.00
2,10-18 S HOMAN AVE HOLDINGS LLC c/o EV HAS LLC,19DS41774L,241774L,01/22/2019,6/3/2019,10 S HOMAN AVE,10-8-180 Snow and ice removal.,Liable,$0.00
3,"1100 E 47TH STREET LLC, C/O PETER CASSEL",22DT001042,T000207342,01/04/2022,5/23/2022,1100 E 47TH ST,10-8-180 Snow and ice removal.,Non-Suit,$0.00
4,"1101 BERWYN LLC, C/O JOHN BARRETT",23DT000559,T000220112,02/02/2023,3/20/2023,1101 W BERWYN,10-8-180 Snow and ice removal.,Non-Suit,$0.00


In [170]:
len(df_dockets_detailed)

4330

### filter detailed dockets for 7/1/19 to 6/30/23

In [171]:
#convert violation date to datetime, so it can be filtered
df_dockets_detailed['Violation Date'] = pd.to_datetime(df_dockets_detailed['Violation Date'])

In [172]:
df_dockets_detailed = df_dockets_detailed[(df_dockets_detailed['Violation Date']>='7/1/19') & (df_dockets_detailed['Violation Date']<='6/30/23')]

In [173]:
len(df_dockets_detailed)

3662

In [174]:
# get number of unique dockets, for reference
df_dockets_detailed['Docket Number'].nunique()

1918

### review any inconsistencies
As it turns out, the detailed docket dataset contains one docket (24DT000388) which is not assigned to any department

In [175]:
df_dockets_test_merge = pd.merge(df_dockets,df_dockets_detailed,on = "Docket Number",how="right")

In [176]:
df_dockets_weird = df_dockets_test_merge[df_dockets_test_merge['Issuing Department Code'].isna()]
df_dockets_weird

Unnamed: 0,Docket Number,Violation Date_x,Violation Address_x,Issuing Department Code,Imposed Fine Detailed_x,Respondent Full Name,Nov #,Violation Date_y,Hearing Date,Violation Address_y,Violation Description,Case Disposition Label,Imposed Fine Detailed_y
6230,24DT000388,NaT,,,,CTLTC 008002384127,T000215624,2023-01-17,5/6/2024,359 N CANAL ST,10-8-180 Snow and ice removal.,Pending,$0.00


# Normalize Data

### dockets to departments

In [177]:
df_dockets_to_depts = df_dockets[['Docket Number','Issuing Department Code']].drop_duplicates()

In [178]:
df_dockets_to_depts.head()

Unnamed: 0,Docket Number,Issuing Department Code
0,22CS00058A,BAFCONP
1,22CS00059A,BAFCONP
63,20CP00166A,POLICE
65,20CP00167A,POLICE
66,20CP005520,POLICE


In [179]:
len(df_dockets_to_depts)

1917

### dockets to addresses

In [183]:
df_dockets_to_addresses = df_dockets_detailed[['Docket Number','Violation Address']].drop_duplicates()

In [184]:
df_dockets_to_addresses.head()

Unnamed: 0,Docket Number,Violation Address
0,23DT000582,1010 W MADISON
3,22DT001042,1100 E 47TH ST
4,23DT000559,1101 W BERWYN
5,22DT001387,100 UNKNOWN
6,21DS20339M,1200 W BELMONT AVE


In [185]:
len(df_dockets_to_addresses)

1918

### dockets to respondents

In [187]:
df_dockets_to_respondents = df_dockets_detailed[['Docket Number','Respondent Full Name']].drop_duplicates()

In [188]:
len(df_dockets_to_respondents)

2240

# Merge Department Info into Detailed Dockets Dataset

In [155]:
len(df_dockets_detailed)

3662

In [156]:
df_dockets_detailed_dept = pd.merge(df_dockets_detailed,df_dockets_to_depts, on = 'Docket Number', how = "left")

In [157]:
len(df_dockets_detailed_dept)

3662

# Prep Data
Dataset provided by Chicago's Department of Administrative Hearings (H064920-011124.xlsx) contained 3058 records dating from 1/1/2001 to 9/12/2023.

From this dataset I removed:
<ul>
    <li>2 records with missing dates
        <li>19 records between 1/1/2001 from 2/13/2018
<li>2 records after 6/30/2023
<il>475 records from winter 2018-2019
    </ul>
    
This leaves 2560 records.

### remove records with missing violation dates

In [23]:
# find records with missing Violation Date. NaT means "Not a Time"
df_dockets[df_dockets['Violation Date'].isna()]

Unnamed: 0,Docket Number,Violation Date,Violation Address,Issuing Department Code,Imposed Fine Detailed
1472,21DT00019A,,1601 E 95TH ST,TRANPORT,$0.00
1473,21DT00019A,,1601 E 95TH ST,TRANPORT,$150.00


# Export Data

In [190]:
df_dockets_to_depts.to_csv('../../data/02-tidied/dockets-to-depts.csv', index= False)
df_dockets_to_addresses.to_csv('../../data/02-tidied/dockets-to-addresses.csv', index = False)
df_dockets_detailed.to_csv('../../data/02-tidied/dockets-to-details.csv', index = False)
df_dockets_to_respondents.to_csv('../../data/02-tidied/dockets-to-respondents.csv', index = False)