# Brooklyn Story

I live in San Diego, so my mental map of Brooklyn needs work.  The geography, population density, demographics, environmental concerns, and weather (rainfall) are so very different from my world.  So what story does this data tell for last year (2021)?  Story elements are:

  1. Rainfall - Since we're taking a stormwater angle, this adds some context.
  2. NYC DEP -  Focus on 311 reports and look at the distribution in time and space.
  3. Stewards - Combining events, turfs, and organizations.


# Rainfall

Let's start this story by looking at precipitation in NYC for 2021.  It provides good historical background.

This section is a summary of [previous analysis](03.5-rainfall.ipynb).

In [None]:
rainfall_df = pd.read_csv('../data/raw/weather/brooklyn-2021.csv', parse_dates=['DATE'])

In [None]:
rainfall_df = pd.read_parquet('../data/processed/brooklyn/brooklyn-rainfall-2021.parq')

In [None]:
rainfall_ts = pd.Series(rainfall_df['PRCP'].values, index=rainfall_df['DATE'].values)

In [None]:
rainfall_ts.plot(figsize=(20, 8));

In [None]:
print(f"Holy crap - {rainfall_df['PRCP'].sum()} inches of rain!")

For context, San Diego has a rainfall average of 10.49 inches, and we had 5.24 inches in 2021.  Proof we live in a **desert**?

# 311 - Water and Sewer

First, we'll look at all this rain through the eyes of the DEP using 311 data.  I want to examine the temporal and spatial dimensions:

  1. Calendar heatmap - When do we see the most 311 requests
  2. Spatial heatmap - Where do we see the most 311 requests
  3. Hotspots (clusters) - What polygons (turfs) have a high number of reports
  
When we're through with this section we'll have the bounding areas for high density reporting of 311 requests.

In [None]:
brooklyn_311_gdf = gpd.read_parquet('../data/processed/brooklyn/brooklyn-2021-311.parq').reset_index().drop(columns='index')

len(brooklyn_311_gdf)

I want to see a distribution of the request types for DEP.

In [None]:
ax = brooklyn_311_gdf['Complaint Type'].value_counts().plot.barh(figsize=(10, 5))
ax.invert_yaxis()

So, in keeping with my goal of shrinking the data set for analysis and visualization let's start with `Water System` and `Sewer` request types.

In [None]:
water_sewer_311_gdf = brooklyn_311_gdf[brooklyn_311_gdf['Complaint Type'].isin(['Water System', 'Sewer'])]

len(water_sewer_311_gdf)

A little bit better?

**Note:** There is also information in the `Descriptor` column.  Let's look at that.

In [None]:
water_sewer_311_gdf.groupby(['Complaint Type','Descriptor'])['Unique Key'].count()

We see a bit more detail with this.  Once again because I started with a focus on stormwater and catch basins I will narrow it down to the Sewer request type.

In [None]:
sewer_311_gdf = water_sewer_311_gdf.query(f"`Complaint Type` == 'Sewer'").reset_index().drop(columns='index')
len(sewer_311_gdf)

In [None]:
#sewer_311_gdf

So this is the data frame we'll look at.  With 11K rows it's big enough to get an idea but not so big it kills the browser.

## Calendar Heatmap

Let's start by looking at `when` the 311 requests happen.  I found this simple little package called [july](https://github.com/e-hulten/july/) that does what I want.

In [None]:
import july

In [None]:
events_df = sewer_311_gdf[['Created Date', 'Complaint Type']].copy()#.set_index('Created Date'))

In [None]:
events_df['Created Date'] = pd.to_datetime(events_df['Created Date']).dt.date

In [None]:
stats_df = events_df.groupby('Created Date').count().rename(columns={'Complaint Type': 'count'})

In [None]:
#stats_df

In [None]:
july.heatmap(dates=stats_df.index, 
             data=stats_df['count'], #events_gdf['Complaint Type'].values, #.values, 
             cmap='YlOrRd',
             month_grid=True, 
             horizontal=True,
             value_label=False,
             date_label=False,
             weekday_label=True,
             month_label=True, 
             year_label=True,
             colorbar=False,
             fontfamily="monospace",
             fontsize=12,
             title=None,
             titlesize='large',
             dpi=100);

In [None]:
july.calendar_plot(stats_df.index, stats_df['count'], value_label=True);

In [None]:
july.month_plot(stats_df.index, stats_df['count'], month=9,  value_label=True);#, ax=foo[0]);

## Heatmap

Let's look at this same data frame on the map

In [None]:
brooklyn_gdf = gpd.read_parquet('../data/processed/brooklyn/brooklyn-boundary.parq')

In [None]:
center = brooklyn_gdf.iloc[0].geometry.centroid.y, brooklyn_gdf.iloc[0].geometry.centroid.x

In [None]:
from ipyleaflet import Heatmap

In [None]:
m = Map(center=center, 
        zoom=12,
        layout=Layout(height="800px"),
        scroll_wheel_zoom=True)

#heat_data_gdf = water_sewer_311_gdf[water_sewer_311_gdf['Complaint Type'] == 'Sewer']

heat_data = [[point.xy[1][0], point.xy[0][0]] for point in sewer_311_gdf.geometry]
heat_map = Heatmap(locations=heat_data, radius=20, blur=10)

m.add_layer(heat_map)

m

# Steward Organizations

At this point we can build a map that combines multiple layers and provides

In [None]:
imagery = basemap_to_tiles(basemaps.Esri.WorldImagery)
imagery.base = True
osm = basemap_to_tiles(basemaps.OpenStreetMap.Mapnik)
osm.base = True

google_map = TileLayer(
    url="https://mt1.google.com/vt/lyrs=m&x={x}&y={y}&z={z}",
    attribution="Google",
    name="Google Maps",
)
google_map.base = True

google_satellite = TileLayer(
    url="https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}",
    attribution="Google",
    name="Google Satellite"
)
google_satellite.base = True

map_display = Map(center=center, zoom=12,
                  layers=[google_satellite, google_map, imagery, osm],
                  layout=Layout(height="800px"),
                  scroll_wheel_zoom=True)

map_display.add_control(LayersControl())

map_display.add_control(FullScreenControl())

map_display

In [None]:
heat_data = [[point.xy[1][0], point.xy[0][0]] for point in sewer_311_gdf.geometry]
heat_map = Heatmap(locations=heat_data, radius=20, blur=10, name='Heat')

In [None]:
#map_display.add_layer(heat_map)

I am going to add a different 311 data set for the visualization.  I created it in a [different notebook](./05.5-clustering).  I used a DEM file from NYC open data portal and found the closest elevation to the given 311 lat/long.

A summary of the 311 data:

   -  311 for the Sewer request type
   -  Unique locations with counts (i.e. 243 locations and 1349 requests)
   -  Included points have 3 or more requests
   -  I created 3 bins \[0 - 3\], \(3 - 5), \[5 - and above
   -  I added elevation data to each of the points

In [None]:
more_than_three_gdf = gpd.read_parquet('../data/processed/brooklyn/brooklyn-311-elevation.parq')
len(more_than_three_gdf)

In [None]:
more_than_three_gdf['count'].sum()

In [None]:
c_map = {'small': '#4E9A26', 'medium': '#EBC621', 'large': '#AC1212'}

In [None]:
markers = list()

for i, row in tqdm(more_than_three_gdf.iterrows()):
    
    fill_color = c_map[row.bin]
    marker = CircleMarker(location=(row.geometry.y, row.geometry.x), radius=5, stroke=False, fill_color=fill_color, fill_opacity=1.0)
    msg = HTML()
    msg.value = "count: {}<br>elevation: {}".format(row['count'], row['elevation'])
    marker.popup = msg
    markers.append(marker)
    more_than_three_gdf.loc[i, 'marker'] = marker

dep_cluster = MarkerCluster(markers=markers, name='311 Call')

the_311_layer = LayerGroup(name=f"311", layers=markers)
#map_display.add_layer(the_311_layer)


Finally, bring in the social network part for selected turfs.

In [None]:
brooklyn_turfs_gdf = gpd.read_parquet('../data/processed/brooklyn/brooklyn-turfs.parq')

In [None]:
brooklyn_turfs_subset_gdf = brooklyn_turfs_gdf[['OrgName', 'OrgWebSite', 'PrimST', 'PopID', 'Shape_Area', 'geometry']].copy().to_crs('epsg:2263')

In [None]:
pertinent_turfs_primst = ['Waterfront / Beach / Shoreline', 
                          'Watershed / Sewershed', 
                          'Stream / River / Canal', 
                          'Salt Marsh', 
                          'Public Right of Way (Sidewalk, street ends, traffic island, public plaza)', 
                          'Freshwater Wetland']

In [None]:
primst_turfs_gdf = brooklyn_turfs_subset_gdf.query(f"PrimST in @pertinent_turfs_primst").reset_index().drop(columns='index')
primst_turfs_gdf.sort_values('Shape_Area', inplace=True, ascending=False)

At this point I am going to read in a file created in one of the [hacking notebooks](./05.7-turfs-and-counts.ipynb).

It is pretty simple, it point in polygon to get a count of request types / turf.

**Note:** Turfs polygons can be stacked on top of each!

In [None]:
primst_with_counts_gdf = gpd.read_parquet('../data/processed/brooklyn/primst-turfs-counts.parq')

In [None]:
primst_alters_gdf = gpd.read_parquet('../data/processed/brooklyn/primst-with-alters.parq')

In [None]:
def random_color(feature):
    return {
        'color': 'black',
        'fillColor': random.choice(['red', 'yellow', 'green', 'orange', 'purple', 'blue']),
    }

import json
import random
#turfs_geojson = brooklyn_turf_subset_gdf.to_json()
turfs_geojson = primst_alters_gdf.iloc[15:55].to_crs('epsg:4326').to_json()

geo_json = GeoJSON(
    data=json.loads(turfs_geojson),
    #style={
    #    'opacity': 0.1, 'dashArray': '9', 'fillOpacity': 0.1, 'weight': 1
   # },
    hover_style={
        'color': 'white', 'dashArray': '0', 'fillOpacity': 0.8
    },
    #style_callback=random_color,
    name='turfs geojson'
)

map_display.add_layer(geo_json)

turf_html = HTML('''Hover over a turf''')
turf_html.layout.margin = '0px 20px 20px 20 px'
turf_control = WidgetControl(widget=turf_html, position='bottomright')

def update_turf_html(feature, **kwargs):
    turf_html.value = f"<b>Name: {feature['properties']['OrgName']}\n" +\
                             f"<br><b>Primary: {feature['properties']['PrimST']}\n" +\
                             f"<br><b>PopID: {feature['properties']['PopID']}\n" +\
                             f"<br>311 Requests: {feature['properties']['request_count']}\n" +\
                             f"<br>Alters: {feature['properties']['alter_count']}"
    
map_display.add_control(turf_control)  # does += work for this?

geo_json.on_hover(update_turf_html)

In [None]:
elements_df = pd.read_parquet('../data/processed/SN/elements.parq')
connections_df = pd.read_parquet('../data/processed/SN/connections.parq')

In [None]:
def org_name(popid):
    """
    Use elements_df
    """
    org = elements_df.query(f"PopID == @popid").reset_index()
    return org.iloc[0]['Label']

In [None]:
def alters(popid):
    alter_popids = list(connections_df.query(f"`Respondent PopID` == @popid")['PopID _ALTER'])
    alter_orgs = [org_name(x) for x in alter_popids]
    return alter_orgs

In [None]:
alters(1475)

In [None]:
interact(alters, popid=primst_alters_gdf['PopID']);

Some initial hacking on the org relationships.

In [None]:
funny = interact(alters, popid=primst_alters_gdf['PopID'])

In [None]:
foo = Output(layout=Layout(border='1px solid black', width='25%'))

In [None]:
with foo:
    display(interact(alters, popid=primst_alters_gdf['PopID']))

In [None]:
foo

In [None]:
type(funny)

In [None]:
HBox([map_display, foo])