<h1>Recent Uncleared Sidewalk Violations, 2023-2024</h1>
11 January 2024

This analysis looks at 311 complaints filed as "Snow – Uncleared Sidewalk Complaint," or SWSNOREM in the current (2023-2024) snow season. My previous analysis shows that data seems to be up-to-date within an hour.<br>
<br>

<div style="color:red;">
    <h3>Preliminary Findings</h3>
<ul>
    <li>...
</ul>
    </div>

<a name="read"></a>
# 1. Read Data

In [1]:
import pandas as pd
import requests
#import datetime as dt #would only need this if I manipulated dates post-API data retrieval

In [2]:
base_url = "https://data.cityofchicago.org/resource/v6vf-nfxy.json"
select = "SR_NUMBER, SR_SHORT_CODE, CREATED_DATE, STREET_ADDRESS, COMMUNITY_AREA, WARD, STATUS, ORIGIN, CLOSED_DATE, LATITUDE, LONGITUDE, SR_TYPE"
# where = "SR_SHORT_CODE = 'SWSNOREM' AND CREATED_DATE >= '2023-07-01 00:00:00'"
where = "SR_SHORT_CODE = 'SWSNOREM'"
limit = 99999

url = f"{base_url}?$SELECT={select}&$WHERE={where}&$LIMIT={limit}"
print (url)

https://data.cityofchicago.org/resource/v6vf-nfxy.json?$SELECT=SR_NUMBER, SR_SHORT_CODE, CREATED_DATE, STREET_ADDRESS, COMMUNITY_AREA, WARD, STATUS, ORIGIN, CLOSED_DATE, LATITUDE, LONGITUDE, SR_TYPE&$WHERE=SR_SHORT_CODE = 'SWSNOREM'&$LIMIT=99999


In [3]:
response = requests.get(url)
data = response.json()
print (response)

<Response [200]>


In [4]:
df=pd.DataFrame(data)
df.head()

Unnamed: 0,SR_NUMBER,SR_SHORT_CODE,CREATED_DATE,STREET_ADDRESS,COMMUNITY_AREA,WARD,STATUS,ORIGIN,LATITUDE,LONGITUDE,SR_TYPE,CLOSED_DATE
0,SR24-00088099,SWSNOREM,2024-01-15T09:54:58.000,3161 N LINCOLN AVE,6,32,Open,Mobile Device,41.939343001,-87.667812,Snow – Uncleared Sidewalk Complaint,
1,SR24-00088247,SWSNOREM,2024-01-15T10:12:18.000,5549 W IRVING PARK RD,15,30,Open,Internet,41.953125959,-87.766530726,Snow – Uncleared Sidewalk Complaint,
2,SR22-00180417,SWSNOREM,2022-02-03T07:55:35.000,1355 E 53RD ST,41,4,Completed,Internet,41.799419522,-87.593072508,Snow – Uncleared Sidewalk Complaint,2022-02-03T15:03:42.000
3,SR24-00078179,SWSNOREM,2024-01-12T19:21:49.000,5301 N BROADWAY,77,48,Open,Internet,41.978322001,-87.6595185,Snow – Uncleared Sidewalk Complaint,
4,SR24-00049709,SWSNOREM,2024-01-09T07:26:06.000,4337 W 17TH ST,29,24,Completed,Phone Call,41.857677001,-87.73371,Snow – Uncleared Sidewalk Complaint,2024-01-09T12:07:07.000


### parse dates 

In [5]:
def get_season(month, year):
# Input:
#   month: numeric month of the year (1-12)
#   year (int): 4 digit year
# Returns:
#   season in yy-yy format. E.g. 22-23 means winter 2022-2023, or July 2022 to June 2023.

    if month >= 7:
        return f"{str(year)[-4:]}-{str(year + 1)[-4:]}"
    else:
        return f"{str(year - 1)[-4:]}-{str(year)[-4:]}"

In [6]:
# format and parse dates
df['CREATED_DATE'] = pd.to_datetime(df['CREATED_DATE'])
df['CLOSED_DATE'] = pd.to_datetime(df['CLOSED_DATE'])
df['year'] = df['CREATED_DATE'].dt.year
df['month'] = df['CREATED_DATE'].dt.month
df['date'] = df['CREATED_DATE'].dt.date
df['season'] = df.apply(lambda row: get_season(row['month'], row['year']), axis=1)
df.head()

Unnamed: 0,SR_NUMBER,SR_SHORT_CODE,CREATED_DATE,STREET_ADDRESS,COMMUNITY_AREA,WARD,STATUS,ORIGIN,LATITUDE,LONGITUDE,SR_TYPE,CLOSED_DATE,year,month,date,season
0,SR24-00088099,SWSNOREM,2024-01-15 09:54:58,3161 N LINCOLN AVE,6,32,Open,Mobile Device,41.939343001,-87.667812,Snow – Uncleared Sidewalk Complaint,NaT,2024,1,2024-01-15,2023-2024
1,SR24-00088247,SWSNOREM,2024-01-15 10:12:18,5549 W IRVING PARK RD,15,30,Open,Internet,41.953125959,-87.766530726,Snow – Uncleared Sidewalk Complaint,NaT,2024,1,2024-01-15,2023-2024
2,SR22-00180417,SWSNOREM,2022-02-03 07:55:35,1355 E 53RD ST,41,4,Completed,Internet,41.799419522,-87.593072508,Snow – Uncleared Sidewalk Complaint,2022-02-03 15:03:42,2022,2,2022-02-03,2021-2022
3,SR24-00078179,SWSNOREM,2024-01-12 19:21:49,5301 N BROADWAY,77,48,Open,Internet,41.978322001,-87.6595185,Snow – Uncleared Sidewalk Complaint,NaT,2024,1,2024-01-12,2023-2024
4,SR24-00049709,SWSNOREM,2024-01-09 07:26:06,4337 W 17TH ST,29,24,Completed,Phone Call,41.857677001,-87.73371,Snow – Uncleared Sidewalk Complaint,2024-01-09 12:07:07,2024,1,2024-01-09,2023-2024


### look at just current year

In [7]:
df['CREATED_DATE'] = pd.to_datetime(df['CREATED_DATE'])
df = df[(df['season']=='2023-2024')]
len(df)

308

### read community names

In [8]:
# retrieved on 1/11/24, but 2020 Census Population figures should be static

df_population = pd.read_csv("../data/population_cmap_2022.csv")

In [9]:
# simplify dataframe to get only essentials
df_population = df_population[['GEOID','GEOG']]
df_population = df_population.rename(columns={'GEOG':'COMMUNITY_NAME'})
df_population.head()

Unnamed: 0,GEOID,COMMUNITY_NAME
0,14,Albany Park
1,57,Archer Heights
2,34,Armour Square
3,70,Ashburn
4,71,Auburn Gresham


### remove invalid addresses

In [10]:
df_valid_address = df[(df['STREET_ADDRESS'].isna()==False)
                     & (df['WARD'].isna()==False)
                     & (df['COMMUNITY_AREA'].isna()==False)]
len(df_valid_address)

307

In [11]:
# then correct object types to integer
df_valid_address['COMMUNITY_AREA'] = df_valid_address['COMMUNITY_AREA'].astype(int)
df_valid_address['WARD'] = df_valid_address['WARD'].astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_valid_address['COMMUNITY_AREA'] = df_valid_address['COMMUNITY_AREA'].astype(int)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_valid_address['WARD'] = df_valid_address['WARD'].astype(int)


### merge and save

In [12]:
df_clean = pd.merge(left=df_valid_address,right=df_population,left_on='COMMUNITY_AREA',right_on='GEOID')

In [15]:
df_clean.to_csv("../data/complaints-2023-24.csv",index=False)

### review data

In [14]:
df_clean.groupby('COMMUNITY_NAME').size().reset_index(name='complaints').sort_values(by='complaints',ascending=False)

Unnamed: 0,COMMUNITY_NAME,complaints
56,West Town,33
35,Near North Side,22
30,Logan Square,21
22,Humboldt Park,18
49,Uptown,13
27,Lake View,11
8,Bridgeport,11
29,Lincoln Square,11
39,North Center,10
40,North Lawndale,10
