# Request Type Analysis

Look at the request type values from 311.  Questions to consider:

  - Counts
  - Spatial (NCs) distribution
  - Time to complete
  - Time to complete by service provider
  - Spatial (service region) distr
  - Repeated addresses

Steps in this notebook:

1.  Setup
2.  Create geodataframe/dataframe from cleaned data and [census](https://data.lacity.org/Community-Economic-Development/Census-Data-by-Neighborhood-Council/nwj3-ufba)
3.  Examine the data
4.  Compute the measure
5.  Show measure as choropleth
6.  So what (next steps)

# 1 - Setup

In [1]:
%run start.py
from utils import read_new311_shape, dt_to_object

# 2 - Get Data Files

Two data sets:

  1. extended311 for point features
  2. cleaned, certified NCs for polygons

In [2]:
%%time
extended311_gdf = read_new311_shape('../data/311/extended311-geo-shape.zip')

CPU times: user 1min 48s, sys: 2.91 s, total: 1min 51s
Wall time: 1min 51s


In [3]:
extended311_gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 1360751 entries, 0 to 1360750
Data columns (total 27 columns):
 #   Column           Non-Null Count    Dtype         
---  ------           --------------    -----         
 0   SRNumber         1360751 non-null  object        
 1   created_dt       1360751 non-null  datetime64[ns]
 2   updated_dt       1360751 non-null  datetime64[ns]
 3   owner            1360748 non-null  object        
 4   request_type     1360751 non-null  object        
 5   service_dt       1287904 non-null  datetime64[ns]
 6   closed_dt        1329385 non-null  datetime64[ns]
 7   address          1360751 non-null  object        
 8   street           1099991 non-null  object        
 9   zip_code         1360412 non-null  object        
 10  latitude         1360751 non-null  float64       
 11  longitude        1360751 non-null  float64       
 12  location         1360751 non-null  object        
 13  APC              1360711 non-null  object        

Certified, cleaned neighborhoods is a common idiom at this stage so ...

In [4]:
neighborhoods_gdf = gpd.read_file('../data/neighborhoods/Neighborhood_Councils_(Certified)_cleaned.shp')

neighborhoods_gdf.rename(columns={'NAME': 'name',
                        'NC_ID': 'nc_id',
                        'SERVICE_RE': 'service_region'},
              inplace=True);

In [None]:
neighborhoods_gdf.info()

# 3 - Some Data Massaging

Well, there's a discrepancy here.  The census data has 97 NC's and the certified dataset has 99 (I think the right number is 99).

Not going to agonize over this at this stage but want to understand things.  Adjusting for what matches as this stage should be good enough for now.

In [8]:
extended311_gdf.iloc[27]

SRNumber                                                1-1831752501
created_dt                                       2021-01-01 03:40:39
updated_dt                                       2021-01-02 20:27:44
owner                                                          LASAN
request_type                                     Homeless Encampment
service_dt                                       2021-01-02 00:00:00
closed_dt                                        2021-01-02 15:34:24
address                                    8985 W VENICE BLVD, 90034
street                                                        VENICE
zip_code                                                       90034
latitude                                                   34.027325
longitude                                                -118.392454
location                             (34.0273245834, -118.392453575)
APC                                             West Los Angeles APC
cd                                

In [9]:
extended311_gdf.iloc[27]['created_dt'].day_of_week

4

In [10]:
extended311_gdf.iloc[27]['created_dt'].date()

datetime.date(2021, 1, 1)

In [11]:
extended311_gdf['day_of_week'] = extended311_gdf['created_dt'].apply(lambda dt: dt.day_of_week)

In [12]:
extended311_gdf.day_of_week.value_counts()

1    263865
0    263111
2    249475
3    213204
4    153955
6    124919
5     92222
Name: day_of_week, dtype: int64

In [13]:
extended311_gdf['date'] = extended311_gdf['created_dt'].apply(lambda dt: dt.date())

In [14]:
extended311_gdf['date'].value_counts(sort=False)

2021-01-01    1782
2021-01-02    2826
2021-01-03    3378
2021-01-04    6406
2021-01-05    5934
2021-01-06    5850
2021-01-07    5614
2021-01-08    3814
2021-01-09    2412
2021-01-10    3218
              ... 
2021-11-24    2887
2021-11-25    1049
2021-11-26    2196
2021-11-27    1565
2021-11-28    2518
2021-11-29    5319
2021-11-30    5338
2021-12-01    5350
2021-12-02    4297
2021-12-03      41
Name: date, Length: 337, dtype: int64

In [15]:
extended311_gdf['month'] = extended311_gdf['created_dt'].apply(lambda dt: dt.month)

In [16]:
extended311_gdf['month'].value_counts(sort=False)

1     127379
2     116644
3     133144
4     125189
5     127800
6     131365
7     124976
8     127591
9     119994
10    109239
11    107742
12      9688
Name: month, dtype: int64

In [17]:
extended311_gdf['quarter'] = extended311_gdf['created_dt'].apply(lambda dt: dt.quarter)

In [18]:
extended311_gdf['quarter'].value_counts(sort=False)

1    377167
2    384354
3    372561
4    226669
Name: quarter, dtype: int64

In [19]:
still_open_gdf = extended311_gdf[extended311_gdf['closed_dt'].isnull()].reset_index()

In [None]:
pd.options.display.max_rows

In [20]:
pd.set_option("max_rows", 200)
pd.set_option("min_rows", 20)
still_open_gdf['date'].value_counts(sort=False, dropna=False).to_frame().reset_index()
#pd.reset_option("max_rows")

Unnamed: 0,index,date
0,2021-01-01,4
1,2021-01-02,6
2,2021-01-03,11
3,2021-01-04,17
4,2021-01-05,11
5,2021-01-06,18
6,2021-01-07,26
7,2021-01-08,8
8,2021-01-09,10
9,2021-01-10,5


In [21]:
extended311_gdf_info = Output(layout={'border': '1px solid black',
                            'width': '50%'})

still_open_gdf_info = Output(layout={'border': '1px solid black',
                            'width': '50%'})

with extended311_gdf_info:
    display(HTML('<center><b>created count</b></center>'))
    display(extended311_gdf['date'].value_counts(sort=False))

with still_open_gdf_info:
    display(HTML('<center><b>still open count</b></center>'))
    display(still_open_gdf['date'].value_counts(sort=False))

HBox([extended311_gdf_info, still_open_gdf_info])

HBox(children=(Output(layout=Layout(border='1px solid black', width='50%')), Output(layout=Layout(border='1px …

In [22]:
f1 = extended311_gdf['date'].value_counts(sort=False).to_frame().reset_index().rename(columns={'index': 'day', 'date': 'created count'})
f2 = still_open_gdf['date'].value_counts(sort=False).to_frame().reset_index().rename(columns={'index': 'day', 'date': 'open count'})   

merged_counts = pd.merge(f1, f2, on="day")
merged_counts['percentage'] = merged_counts.apply(lambda row: row['open count']/row['created count'], axis=1)

In [23]:
merged_counts

Unnamed: 0,day,created count,open count,percentage
0,2021-01-01,1782,4,0.002245
1,2021-01-02,2826,6,0.002123
2,2021-01-03,3378,11,0.003256
3,2021-01-04,6406,17,0.002654
4,2021-01-05,5934,11,0.001854
5,2021-01-06,5850,18,0.003077
6,2021-01-07,5614,26,0.004631
7,2021-01-08,3814,8,0.002098
8,2021-01-09,2412,10,0.004146
9,2021-01-10,3218,5,0.001554


In [25]:
graffiti_gdf = read_new311_shape('../data/311/graffiti.geojson.zip')

In [26]:
graffiti_counts = graffiti_gdf['nc'].value_counts().to_frame().reset_index().rename(columns={'index': 'nc_id', 'nc': 'count'})

In [27]:
graffiti_counts

Unnamed: 0,nc_id,count
0,78,26836
1,50,18197
2,52,15584
3,125,13082
4,86,10799
5,87,9303
6,109,9222
7,44,8669
8,55,7887
9,110,7870


In [28]:
len(graffiti_gdf)

315577

In [29]:
graffiti_merged = pd.merge(neighborhoods_gdf, graffiti_counts, how="left", on=["nc_id"])

In [30]:
graffiti_merged

Unnamed: 0,OBJECTID,name,WADDRESS,DWEBSITE,DEMAIL,DPHONE,nc_id,CERTIFIED,TOOLTIP,NLA_URL,service_region,region_id,color_code,geometry,count
0,1,ARLETA NC,http://www.arletanc.org/,http://empowerla.org/ANC,ANC@EmpowerLA.org,213-978-1551,6,2002-10-22,ARLETA NC,navigatela/reports/nc_reports.cfm?id=6,REGION 1 - NORTH EAST VALLEY,1,#00BFFF,"POLYGON ((-118.45006 34.24992, -118.45057 34.2...",1896
1,2,ARROYO SECO NC,http://www.asnc.us/,http://empowerla.org/ASNC,ASNC@EmpowerLA.org,213-978-1551,42,2002-10-02,ARROYO SECO NC,navigatela/reports/nc_reports.cfm?id=42,REGION 8 - NORTH EAST LA,8,#FF8C00,"POLYGON ((-118.22326 34.10393, -118.22368 34.1...",869
2,3,ATWATER VILLAGE NC,http://www.atwatervillage.org/,http://empowerla.org/AVNC,AVNC@EmpowerLA.org,213-978-1551,37,2003-02-11,ATWATER VILLAGE NC,navigatela/reports/nc_reports.cfm?id=37,REGION 7 - EAST,7,#87CEEB,"POLYGON ((-118.27577 34.15377, -118.26185 34.1...",929
3,4,BEL AIR-BEVERLY CREST NC,http://babcnc.org/,http://empowerla.org/BABCNC,BABCNC@EmpowerLA.org,213-978-1551,64,2002-10-08,BEL AIR-BEVERLY CREST NC,navigatela/reports/nc_reports.cfm?id=64,REGION 11 - WEST LA,11,#7FFFD4,"POLYGON ((-118.47487 34.12635, -118.47412 34.1...",221
4,5,BOYLE HEIGHTS NC,http://bhnc.net/,http://empowerla.org/BHNC,BHNC@EmpowerLA.org,213-978-1551,50,2002-05-21,BOYLE HEIGHTS NC,navigatela/reports/nc_reports.cfm?id=50,REGION 8 - NORTH EAST LA,8,#FF8C00,"POLYGON ((-118.21441 34.06064, -118.21305 34.0...",18197
5,6,COMMUNITY AND NEIGHBORS FOR NINTH DISTRICT UNI...,http://www.canndunc.org/,http://empowerla.org/CANNDU,CANNDU@EmpowerLA.org,213-978-1551,86,2003-03-11,COMMUNITY AND NEIGHBORS FOR NINTH DISTRICT UNI...,navigatela/reports/nc_reports.cfm?id=86,REGION 9 - SOUTH LA 2,9,#FF00FF,"POLYGON ((-118.28081 33.96237, -118.28084 33.9...",10799
6,7,CANOGA PARK NC,http://www.canogaparknc.org/,http://empowerla.org/CPNC,CPNC@EmpowerLA.org,213-978-1551,13,2002-06-18,CANOGA PARK NC,navigatela/reports/nc_reports.cfm?id=13,REGION 3 - SOUTH WEST VALLEY,3,#EE82EE,"POLYGON ((-118.58856 34.23547, -118.58845 34.1...",1103
7,8,CENTRAL ALAMEDA NC,https://centralalameda.com/,https://empowerla.org/canc/,CANC@EmpowerLA.org,213-978-1551,110,2003-09-30,CENTRAL ALAMEDA NC,navigatela/reports/nc_reports.cfm?id=110,REGION 9 - SOUTH LA 2,9,#FF00FF,"POLYGON ((-118.23771 33.98920, -118.23848 33.9...",7870
8,9,CENTRAL HOLLYWOOD NC,http://www.chnc.org/,http://empowerla.org/CHNC,CHNC@EmpowerLA.org,213-978-1551,32,2002-04-09,CENTRAL HOLLYWOOD NC,navigatela/reports/nc_reports.cfm?id=32,REGION 5 - CENTRAL 1,5,#DC143C,"POLYGON ((-118.34435 34.10154, -118.32376 34.1...",1776
9,10,CENTRAL SAN PEDRO NC,http://centralsanpedro.org/,http://empowerla.org/CENTRALSPNC,CentralSPNC@EmpowerLA.org,213-978-1551,95,2002-02-12,CENTRAL SAN PEDRO NC,navigatela/reports/nc_reports.cfm?id=95,REGION 12 - HARBOR,12,#DB7093,"POLYGON ((-118.30123 33.72792, -118.30117 33.7...",3870


# 4 - Compute the Measure

Computation is simple.  Use the geometry of the NC to compute area in miles squared.

For the density I'm simply using total population.  I suspect it would be interesting to examine some of the other ethnic measures?  Maybe a nice pull down to select?  Ah... for another day.

In [None]:
from pyproj import Geod

geod = Geod(ellps="WGS84")

def square_miles(geo):
    square_meters = abs(geod.geometry_area_perimeter(geo)[0])
    return (square_meters * 10.764) / 27878000

In [None]:
neighborhood_merged['sq_miles'] = neighborhood_merged.apply(lambda row: square_miles(row.geometry), axis=1)

In [None]:
neighborhood_merged['density'] = neighborhood_merged.apply(lambda row: row['Total Population'] / row['sq_miles'], axis=1)

Remember I like to look at one of the values.

In [None]:
neighborhood_merged.iloc[27]

Some sanity checking on the data before we generate the display.

In the real world we'll have to do some more work on this data!

In [None]:
neighborhood_merged.density.max()

In [None]:
neighborhood_merged.density.min()

In [None]:
len(neighborhood_merged)

# 5 - Display the Choropleth

In [31]:
graffiti_gdf['address'].value_counts()

2500 S HOOPER AVE, 90011          389
12843 W FOOTHILL BLVD, 91342      317
3600 S MAIN ST, 90007             211
3400 S MAIN ST, 90007             200
3500 S MAIN ST, 90007             176
5701 S MAIN ST, 90037             173
4020 S AVALON BLVD, 90011         173
4324 S AVALON BLVD, 90011         171
5043 S NORMANDIE AVE, 90037       170
1200 N SILVER LAKE BLVD, 90026    169
                                 ... 
927 W 59TH PL, 90044                1
1049 N ALVARADO ST, 90026           1
925 W 59TH PL, 90044                1
8550 N WILLIS AVE, 91402            1
11336 N CORBIN AVE, 91326           1
1346 E 22ND ST, 90011               1
4232 S FIGUEROA ST, 90037           1
1301 E 46TH ST, 90011               1
950 S MARIPOSA AVE, 90006           1
6911 N BEN AVE, 91605               1
Name: address, Length: 100503, dtype: int64

In [32]:
graffiti_gdf[graffiti_gdf['nc_name'].notnull()].query(f"nc_name.str.contains('South Central')")['address'].value_counts()

2500 S HOOPER AVE, 90011        389
3600 S MAIN ST, 90007           211
3400 S MAIN ST, 90007           200
3500 S MAIN ST, 90007           176
3700 S MAIN ST, 90007           157
3600 S SAN PEDRO ST, 90011      138
2300 S SAN PEDRO ST, 90011      134
2300 S HILL ST, 90007           128
3688 S MAIN ST, 90007           122
3701 S BROADWAY, 90007          122
                               ... 
2309 S ALAMEDA ST, 90058          1
654 E 29TH ST, 90011              1
HOPE ST AT 35TH ST, 90007         1
MAPLE AVE AT 23RD ST, 90011       1
1401 E 21ST ST, 90011             1
1924 S LOS ANGELES ST, 90011      1
251 3/4 E 29TH ST, 90011          1
103 W 39TH ST, 90037              1
3708 S MAPLE AVE, 90011           1
123 E 32ND ST, 90011              1
Name: address, Length: 3396, dtype: int64

In [33]:
from ipyleaflet import FullScreenControl

In [34]:
imagery = basemap_to_tiles(basemaps.Esri.WorldImagery)
imagery.base = True
osm = basemap_to_tiles(basemaps.OpenStreetMap.Mapnik)
osm.base = True


map_display = Map(center=(34.05, -118.25), zoom=11,
                  layers=[imagery, osm],
                  layout=Layout(height="900px"),
                  scroll_wheel_zoom=True)

#map_display.add_control(LayersControl())
#map_display += nc_layer

map_display.add_control(FullScreenControl())
map_display

Map(center=[34.05, -118.25], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom…

refer to : https://www.youtube.com/watch?v=wjzAy_yLrdA

In [35]:
from ipyleaflet import Choropleth, Map
from branca.colormap import linear
a_geojson = json.loads(graffiti_merged.to_json())

graffiti_density = dict(zip(graffiti_merged['name'].tolist(), graffiti_merged['count'].tolist()))
for i in a_geojson['features']:
    i['id'] = i['properties']['name']

layer = Choropleth(
                    geo_data=a_geojson,
                    choro_data=graffiti_density,
                    colormap=linear.YlOrRd_09, #linear.Blues_05,
                    style={'fillOpacity': 1.0, "color":"black"},)
                    #key_on="name")

map_display.add_layer(layer)

I need to revisit a tooltip type popup.  For now this will work.

In [36]:
geo_json = GeoJSON(
    data=a_geojson,
    style={
        'opacity': 1, 'dashArray': '9', 'fillOpacity': 0.6, 'weight': 1
    },
    hover_style={
        'color': 'white', 'dashArray': '0', 'fillOpacity': 0.5
    },
    name='NCs'
)

html = HTML('''Hover over a district''')
html.layout.margin = '0px 20px 20px 20 px'
control = WidgetControl(widget=html, position='bottomright')

def update_html(feature, **kwargs):
    html.value = '''<h3><b>NC: {}</b></h3>
                    <h4>Count: {}'''.format(feature['properties']['name'],
                                                           feature['properties']['count'])
    
map_display.add_control(control)  # does += work for this?

layer.on_hover(update_html)

# 6 - So What?

I say this tounge in cheeck.  Things to think about:

  1. Should we examine measures besides total population?
  2. Does it make sense to extend the 311 data as we did with the service regions?
  3. Do we just use this to select an NC then query 311 (or ...)?
  
