<h1>Snow Clearance Fines, 2019-2023</h1>
31 January 2024

This analysis looks at fines levied for uncleared sidewalks, based on FOIA data requested from the Department of Administrative Hearings (H064920-011124.xlsx). This dataset contained 3058 records dating from 1/1/2001 to 9/12/2023. Looking at only records between 7/1/2019 and 6/30/2023, we have 2560 records.<br>
<br>
My analysis steps:
<ol>
<li><a href="#read">Read Data</a>
<li><a href="#prep">Prepare Data</a>- label by season, etc.
    <li><a href="#summarize">Summarize</a>- by issuing department, by year, by community area
</ol>

<div style="color:red;">
    <h3>Preliminary Findings</h3>
<ul>
    <li>I'm looking at an old version of the geocoded fines dataset, to get started. Will use updated dataset when it's ready. Should have about 2556 records.
</ul>
    </div>

<a name="read"></a>
# 1. Read Data

In [2]:
import pandas as pd
import requests
#import datetime as dt #would only need this if I manipulated dates post-API data retrieval

In [4]:
df = pd.read_csv("../data/old_fines-geocoded-communities.csv")
df.head()

Unnamed: 0,Docket Number,Violation Date,Violation Address,City,State,Issuing Department Code,Imposed Fine Detailed,result_num,osm_id,display_name,...,latlong,community,area,shape_area,perimeter,area_num_1,area_numbe,comarea_id,comarea,shape_len
0,18CP093079,2018-11-27,5045 N TRIPP,Chicago,IL,POLICE,$0.00,0,162465502,"5045, North Tripp Avenue, Mayfair, Albany Park...",...,"41.9731502,-87.73406095",ALBANY PARK,0.0,53542230.82,0.0,14.0,14.0,0.0,0.0,39339.01644
1,19CP003510,2019-01-16,5045 N TRIPP,Chicago,IL,POLICE,$0.00,0,162465502,"5045, North Tripp Avenue, Mayfair, Albany Park...",...,"41.9731502,-87.73406095",ALBANY PARK,0.0,53542230.82,0.0,14.0,14.0,0.0,0.0,39339.01644
2,19CP009688,2019-02-13,5045 N TRIPP,Chicago,IL,POLICE,$0.00,0,162465502,"5045, North Tripp Avenue, Mayfair, Albany Park...",...,"41.9731502,-87.73406095",ALBANY PARK,0.0,53542230.82,0.0,14.0,14.0,0.0,0.0,39339.01644
3,21DT000885,2021-02-19,3859 W LAWRENCE,Chicago,IL,TRANPORT,$0.00,0,34319057,"3859, West Lawrence Avenue, Albany Park, Chica...",...,"41.96819227184388,-87.72489666413433",ALBANY PARK,0.0,53542230.82,0.0,14.0,14.0,0.0,0.0,39339.01644
4,21DT000885,2021-02-19,3859 W LAWRENCE,Chicago,IL,TRANPORT,$110.00,0,34319057,"3859, West Lawrence Avenue, Albany Park, Chica...",...,"41.96819227184388,-87.72489666413433",ALBANY PARK,0.0,53542230.82,0.0,14.0,14.0,0.0,0.0,39339.01644


In [5]:
len(df)

2883

<a name="prepare"></a>
# 3. Prepare Data

### remove records with missing violation dates

In [11]:
# find records with missing Violation Date. NaT means "Not a Time"
df[df['Violation Date'].isna()]

Unnamed: 0,Docket Number,Violation Date,Violation Address,City,State,Issuing Department Code,Imposed Fine Detailed


In [12]:
df_clean = df[df['Violation Date'].notna()]
len(df_clean)

2560

### parse dates

In [26]:
def get_season(month, year):
# Input:
#   month: numeric month of the year (1-12)
#   year (int): 4 digit year
# Returns:
#   season in yy-yy format. E.g. 22-23 means winter 2022-2023, or July 2022 to June 2023.

    if month >= 7:
        return f"{str(year)[-4:]}-{str(year + 1)[-4:]}"
    else:
        return f"{str(year - 1)[-4:]}-{str(year)[-4:]}"

In [34]:
# format and parse dates
df['Violation Date'] = pd.to_datetime(df['Violation Date'])
df['year'] = df['Violation Date'].dt.year
df['month'] = df['Violation Date'].dt.month
df['date'] = df['Violation Date'].dt.date
df['season'] = df.apply(lambda row: get_season(row['month'], row['year']), axis=1)
df['Imposed Fine Detailed'] = pd.to_numeric(df['Imposed Fine Detailed'].replace('[\$,]', '', regex=True), errors='coerce')
df.head()

Unnamed: 0,Docket Number,Violation Date,Violation Address,Issuing Department Code,Imposed Fine Detailed,year,month,date,season
494,19DT005843,2019-08-08,1615 N FAIRFIELD,TRANPORT,0.0,2019,8,2019-08-08,2019-2020
495,19DT005843,2019-08-08,1615 N FAIRFIELD,TRANPORT,200.0,2019,8,2019-08-08,2019-2020
496,20DT001152,2019-09-05,1717 N HUMBOLDT BLVD,TRANPORT,0.0,2019,9,2019-09-05,2019-2020
497,20DT001152,2019-09-05,1717 N HUMBOLDT BLVD,TRANPORT,500.0,2019,9,2019-09-05,2019-2020
498,20DT001152,2019-09-05,1717 N HUMBOLDT BLVD,TRANPORT,1000.0,2019,9,2019-09-05,2019-2020


### look at just full years

In [28]:
df = df[(df['season'].isin(['2019-2020','2020-2021','2021-2022','2022-2023']))]
len(df)

2560

<a name="summarize"></a>
# 4. Summarize Data

In [37]:
# by season
df.groupby('season').agg(
    sum_fine_amt=('Imposed Fine Detailed', 'sum'),
    n_fines=('Docket Number','nunique')
).reset_index()

Unnamed: 0,season,sum_fine_amt,n_fines
0,2019-2020,96640.0,357
1,2020-2021,243510.0,764
2,2021-2022,214349.0,700
3,2022-2023,31570.0,96


In [39]:
# by issuing department
df.groupby('Issuing Department Code').agg(
    sum_fine_amt=('Imposed Fine Detailed', 'sum'),
    n_fines=('Docket Number','nunique')
).reset_index()

Unnamed: 0,Issuing Department Code,sum_fine_amt,n_fines
0,BAFCONP,0.0,2
1,POLICE,1700.0,25
2,STRTSAN,176610.0,497
3,TRANPORT,407759.0,1393


<a name="appendix"></a>
# Appendix

### read community population

In [6]:
# retrieved on 1/11/24, but 2020 Census Population figures should be static

df_population = pd.read_csv("../data/population_cmap_2022.csv")

In [7]:
# simplify dataframe to get only essentials
df_population = df_population[['GEOID','GEOG','2020_POP']]
df_population = df_population.rename(columns={'GEOG':'COMMUNITY_NAME'})
df_population['COMMUNITY_CAPS']=df_population['COMMUNITY_NAME'].str.upper()
df_population.head()

Unnamed: 0,GEOID,COMMUNITY_NAME,2020_POP,COMMUNITY_CAPS
0,14,Albany Park,48396,ALBANY PARK
1,57,Archer Heights,14196,ARCHER HEIGHTS
2,34,Armour Square,13890,ARMOUR SQUARE
3,70,Ashburn,41098,ASHBURN
4,71,Auburn Gresham,44878,AUBURN GRESHAM


<a name="ward"></a>
# 2. By Community Area by 311 Type, 2019-2023

In [8]:
df_community_by_type = df.pivot_table(index='COMMUNITY_NAME', columns='SR_TYPE', values='SR_NUMBER', aggfunc='size', fill_value=0)
df_community_by_type

SR_TYPE,Ice and Snow Removal Request,Snow - Object/Dibs Removal Request,Snow Removal - Protected Bike Lane or Bridge Sidewalk,Snow – Uncleared Sidewalk Complaint
COMMUNITY_NAME,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Albany Park,540,195,16,400
Archer Heights,205,253,0,60
Armour Square,65,52,16,77
Ashburn,1506,75,1,108
Auburn Gresham,1449,179,1,128
...,...,...,...,...
West Lawn,248,434,0,146
West Pullman,752,16,0,23
West Ridge,505,200,5,804
West Town,1029,304,34,1609


### merge in community data

In [9]:
df_community_summary = pd.merge(left=df_community_by_type,right=df_population,on='COMMUNITY_NAME')
df_community_summary.head()

Unnamed: 0,COMMUNITY_NAME,Ice and Snow Removal Request,Snow - Object/Dibs Removal Request,Snow Removal - Protected Bike Lane or Bridge Sidewalk,Snow – Uncleared Sidewalk Complaint,GEOID,2020_POP,COMMUNITY_CAPS
0,Albany Park,540,195,16,400,14,48396,ALBANY PARK
1,Archer Heights,205,253,0,60,57,14196,ARCHER HEIGHTS
2,Armour Square,65,52,16,77,34,13890,ARMOUR SQUARE
3,Ashburn,1506,75,1,108,70,41098,ASHBURN
4,Auburn Gresham,1449,179,1,128,71,44878,AUBURN GRESHAM


In [10]:
# per 10,000 capita, per year over 4 years
df_community_summary['Streets Per 10k'] = \
(10000/4)*df_community_summary['Ice and Snow Removal Request']/df_community_summary['2020_POP']

df_community_summary['Dibs Per 10k'] = \
(10000/4)*df_community_summary['Snow - Object/Dibs Removal Request']/df_community_summary['2020_POP']

df_community_summary['Sidewalks Per 10k'] = \
(10000/4)*df_community_summary['Snow – Uncleared Sidewalk Complaint']/df_community_summary['2020_POP']

df_community_summary['Bike-Bridge Per 10k'] = \
(10000/4)*df_community_summary['Snow Removal - Protected Bike Lane or Bridge Sidewalk']/df_community_summary['2020_POP']

In [11]:
df_community_summary.head()

Unnamed: 0,COMMUNITY_NAME,Ice and Snow Removal Request,Snow - Object/Dibs Removal Request,Snow Removal - Protected Bike Lane or Bridge Sidewalk,Snow – Uncleared Sidewalk Complaint,GEOID,2020_POP,COMMUNITY_CAPS,Streets Per 10k,Dibs Per 10k,Sidewalks Per 10k,Bike-Bridge Per 10k
0,Albany Park,540,195,16,400,14,48396,ALBANY PARK,27.894867,10.073147,20.662865,0.826515
1,Archer Heights,205,253,0,60,57,14196,ARCHER HEIGHTS,36.101719,44.554804,10.566357,0.0
2,Armour Square,65,52,16,77,34,13890,ARMOUR SQUARE,11.699064,9.359251,13.858891,2.87977
3,Ashburn,1506,75,1,108,70,41098,ASHBURN,91.610297,4.562266,6.569663,0.06083
4,Auburn Gresham,1449,179,1,128,71,44878,AUBURN GRESHAM,80.718838,9.971478,7.130443,0.055707


In [12]:
df_community_summary.to_csv("../results/311_community_by_type.csv", index=False)

# 3. Summary Stats for Uncleared Sidewalk Complaints

In [13]:
# remove incomplete seasons, 2018-2019 and 2023-2024
df_uncleared = df[(df['SR_SHORT_CODE']=='SWSNOREM') & (df['season'].isin(['2019-2020','2020-2021','2021-2022','2022-2023']))]
len(df_uncleared)

21079

### uncleared by season

In [14]:
df_by_season = df_uncleared.groupby('season').agg(complaints=('SR_NUMBER','count')).reset_index()
df_by_season

Unnamed: 0,season,complaints
0,2019-2020,6541
1,2020-2021,6494
2,2021-2022,6416
3,2022-2023,1628


### uncleared by type

In [15]:
df_by_type = df_uncleared.groupby('ORIGIN').agg(complaints=('SR_NUMBER','count')).reset_index()
df_by_type

Unnamed: 0,ORIGIN,complaints
0,Alderman's Office,724
1,E-Mail,22
2,Generated In House,1
3,Internet,7068
4,Mobile Device,8260
5,Open311 Interface,1
6,Phone Call,4818
7,Salesforce Mobile App,36
8,spot-open311-Chicago+Works,97
9,spot-open311-SeeClickFix,52


In [16]:
# consolidate other
df_uncleared['MODIFIED_ORIGIN'] = df_uncleared['ORIGIN'].replace(['E-Mail', 'Generated In House', 'Open311 Interface','Salesforce Mobile App','spot-open311-Chicago+Works','spot-open311-SeeClickFix'], 'Other')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_uncleared['MODIFIED_ORIGIN'] = df_uncleared['ORIGIN'].replace(['E-Mail', 'Generated In House', 'Open311 Interface','Salesforce Mobile App','spot-open311-Chicago+Works','spot-open311-SeeClickFix'], 'Other')


In [17]:
df_by_type = df_uncleared.groupby('MODIFIED_ORIGIN').agg(complaints=('SR_NUMBER','count')).reset_index()
df_by_type

Unnamed: 0,MODIFIED_ORIGIN,complaints
0,Alderman's Office,724
1,Internet,7068
2,Mobile Device,8260
3,Other,209
4,Phone Call,4818


### uncleared by status

In [18]:
df_by_status = df_uncleared.groupby('STATUS')['SR_NUMBER'].agg('count').reset_index()
df_by_status

Unnamed: 0,STATUS,SR_NUMBER
0,Canceled,978
1,Completed,20041
2,Open,60
