# Extend the 311 Data

Extend clean311 data set with:

  1.  Read clean311 and certified neighborhood council data
  2.  Add service region information to clean311
  3.  Save this version
  3.  Add columns for ipyleaflet widgetry
  4.  Save this final version
  
Still working to keep these under 2GB for git lfs.

# 1 - Data inputs

The two files are new311-shape.zip and Neighborhood-Councils-(Certified)_cleaned (see NC-service-regions.ipynb)

**Note** - I'm using the utility function to read and transform column names.

In [None]:
%run start.py

import utils
from utils import read_new311_shape, marker_color_map, dt_to_object

import numpy as np

In [None]:
%%time
new311_gdf = read_new311_shape('../data/311/clean311-geo-shape.zip')

In [None]:
ncs_gdf = gpd.read_file('../data/neighborhoods/Neighborhood_Councils_(Certified)_cleaned.zip/')

In [None]:
ncs_gdf.rename(columns={'NAME': 'name',
                        'NC_ID': 'nc_id',
                        'SERVICE_RE': 'service_region'},
              inplace=True);

# 2 - Extend with Service Region Information

Nothing real special here.  I am using the "brute force" appoach with two separate calls to apply.  I'll use alternative for the next section.

You can see that I'm checking for pd.NA since it's possible for NaN value.  Just passing it on.

In [None]:
region_id_dict = dict(zip(ncs_gdf.nc_id, ncs_gdf.region_id))
service_region_dict = dict(zip(ncs_gdf.nc_id, ncs_gdf.service_region))

In [None]:
def service_region(nc_id):
    if not(nc_id is pd.NA):
        return service_region_dict.get(nc_id)
    
def region_id(nc_id):
    if not(nc_id is pd.NA):
        return int(region_id_dict.get(nc_id))

In [None]:
%%time
new311_gdf['service_region'] = new311_gdf['nc'].apply(lambda nc_id: service_region(nc_id))
new311_gdf['region_id'] = new311_gdf['nc'].apply(lambda nc_id: region_id(nc_id))
new311_gdf['region_id'] = new311_gdf['region_id'].astype('Int64')

So this is what the gdf looks like now.

In [None]:
new311_gdf.info()

In [None]:
new311_regions_gdf = new311_gdf.copy()

new311_regions_gdf = dt_to_object(new311_regions_gdf)

new311_regions_gdf.to_file('../data/311/clean311-regions.shp')


In [None]:
new311_regions_gdf.to_file('../data/311/clean311-regions.geojson', driver='GeoJSON')

# 3 - ipyleaflet Information

This information is used upstream to build markers for the ipyleaflet (and/of folium) map.  The two values I'm adding is color (based on colors from H4LA) and the HTML text string for a popup.

**Note** - 1) I'm using a different approach to the apply function for this.  In this one I'm returing a series with the two columns so I only iterate on the gdf once; 2)Need to refactor for the marker function.  Can't save markers so this needs to be a mapping type function on the gdf.

In [None]:
def popup_message(row):
    sr_number = row['SRNumber']
    request_type = row['request_type']
    when = row['created_dt'].strftime("%m/%d/%Y")
    
    dt = row['closed_dt']
    if not(pd.isnull(dt)):
        finished = dt.strftime("%m/%d/%Y")
    else:
        finished = "Still Open"
    
    return f"Report: {sr_number}<br>Request Type: {request_type}<br>When: {when}<br>Completed: {finished}"

def marker_color(row):
    return marker_color_map[row['request_type']]

def marker_info(row):
    return pd.Series((marker_color(row), popup_message(row)))

def marker(row):
    marker = CircleMarker(location=(row.geometry.y, row.geometry.x), radius=5, stroke=False, fill_color=row.marker_color, fill_opacity=1.0)
    marker.popup = HTML(row.popup_message)
    
    return marker

In [None]:
new311_gdf.info()

In [None]:
%%time
new311_gdf[['marker_color', 'popup_message']] = new311_gdf.apply(lambda row: marker_info(row), axis=1)

In [None]:
new311_gdf.info()

# 4 - Save the Extended Dataframe

Since this is a gdf we need to change the datetime because of ESRI driver.  Really need to refactor this approach.  I suspect I want to add a utility.

In [None]:
new311_gdf = dt_to_object(new311_gdf)

In [None]:
%%time
new311_gdf.to_file('../data/311/extended311-geo-shape.shp')