# Combining Data with Maps

This tutorial was developed to demonstrate the use of open government data to address a civic issue.  The specific use case we've focused on is local businesses and economic development in the city of LA.  Summarizing the journey so far:

  1. We started by identifying the questions we need to answer.
  2. We looked at [curation](1-data-wrangling.ipynb] strategies for different open data sources.
  3. Finally we [looked](2-data-fusion.ipynb) how we might combine different data sources.
  
This part of the tutorial will develop some ideas to put it all together.  We'll create an interactive map using:

  1. ipyleaflet map solution
  2. Geocoded businesses data, Opportunity Zones, and LA Business Improvement Districts
  3. Final product is map with Wilshire BID, OZ's and businesses.


Before I start we need to set the env up.  I like to do (most) all my imports upfront.  I do it with a start.py in my profile_default.  This accomplishes the same thing.

**Note:** This can be a bit slow because it initializes osmnx.  

In [None]:
#imports
%run start.py

In [None]:
businesses_gdf = gpd.read_parquet('../data/businesses-gdf.parq')
oz_zones = gpd.read_parquet('../data/opportunity-zones.parq')
bids_gdf = gpd.read_file('../data/Business Improvement Districts.zip')
la_city_councils = gpd.read_file('../data/LA_City_Council_Districts_(Adopted_2021).zip')
la_boundary_gdf = gpd.read_parquet('../data/la-boundary.parq')

# Basic Mapping

For this tutorial I am going to focus on ipyleaflet.  If you want to see similar ideas with folium check out [this demonstration](https://github.com/hackforla/lasan).  Both these mapping systems are wrappers on the leaflet.js library.

I will go through a sequence of cells showing the various steps.  You will have to bounce back and forth a bit to `see` things.  

First we'll look at a standard way to center the map.  I'll use the la_boundary_gdf to position the map at the center.

In [None]:
center = la_boundary_gdf.iloc[0].geometry.centroid.y, la_boundary_gdf.iloc[0].geometry.centroid.x

Now I will use the center point to position the map.

I'm adding four initial layers as the base maps.  These will be selectable and can be useful for different phases of analysis.

In [None]:
imagery = basemap_to_tiles(basemaps.Esri.WorldImagery)
imagery.base = True
osm = basemap_to_tiles(basemaps.OpenStreetMap.Mapnik)
osm.base = True

google_map = TileLayer(
    url="https://mt1.google.com/vt/lyrs=m&x={x}&y={y}&z={z}",
    attribution="Google",
    name="Google Maps",
)
google_map.base = True

google_satellite = TileLayer(
    url="https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}",
    attribution="Google",
    name="Google Satellite"
)
google_satellite.base = True

map_display = Map(center=center, zoom=10,
                  layers=[google_satellite, google_map, imagery, osm],
                  layout=Layout(height="900px"),
                  scroll_wheel_zoom=True)

map_display.add_control(LayersControl())

map_display.add_control(FullScreenControl())

map_display

There is now a basic map widget.  You can select one of the four map tiles for background.  I have defaulted to OSM.

## Overlays

We need to add the various features of the analysis as overlays on the map.  We'll add our layers to the map via LayersControl.

The specific layers/controls we'll look at are:

  1. LA boundary
  2. BIDs as geodataframe
  3. BIDs as geojson
  4. Overlay popup

LA Boundary:

In [None]:
la_boundary = GeoData(geo_dataframe = la_boundary_gdf,
                   style={'color': 'black', 'fillColor': '#3366cc', 'opacity':0.05, 'weight':1.9, 'dashArray':'2', 'fillOpacity':0.6},
                   hover_style={'fillColor': 'red' , 'fillOpacity': 0.2},
                   name = 'LA Boundary')

map_display += la_boundary

Go back to the map.

Next we'll add bids_gdf using the GeoData class from ipyleaflet.  It is straight forward, but has some limitations.

In [None]:
bids = GeoData(geo_dataframe = bids_gdf,
                   style={'color': 'black', 'fillColor': '#3366cc', 'opacity':0.05, 'weight':1.9, 'dashArray':'2', 'fillOpacity':0.6},
                   hover_style={'fillColor': 'red' , 'fillOpacity': 0.2},
                   name = 'BIDs gdf')

map_display += bids

Go back to the map again.  You can use the overlays control to select which are displayed.

Now I want to demonstrate how to use geojson for the layer overlay.  You will see there is a bit more control over how the layers are displayed.  This control is the one used for the choropleth map in ipyleaflet.  

In [None]:
import json
import random
bids_geojson = bids_gdf.to_json()

Adding this simple function to colorize polygons in the geojson map features.

**Note:** This is one way that `code development` can get messy in the jlab development mindset!

In [None]:
def random_color(feature):
    return {
        'color': 'black',
        'fillColor': random.choice(['red', 'yellow', 'green', 'orange', 'purple', 'blue']),
    }

In [None]:
#def random_color(feature):
#    return {
#        'color': 'black',
#        'fillColor': random.choice(['red', 'yellow', 'green', 'orange', 'purple', 'blue']),
#    }

geo_json = GeoJSON(
    data=json.loads(bids_geojson),
    style={
        'opacity': 1, 'dashArray': '9', 'fillOpacity': 0.5, 'weight': 1
    },
    hover_style={
        'color': 'white', 'dashArray': '0', 'fillOpacity': 0.8
    },
    style_callback=random_color,
    name='BIDs geojson'
)

map_display.add_layer(geo_json)

Once again, go back to the map.  Turn off the LA boundary and Bids gdf to just see this last overlay.  It looks better because it is colored and you see the boundaries.  You can do a similar stroke on the boundaries with BIDs gdf but adding color to the polygons is a bit more complicated.  I am going to leave that alone.

## Information popup

One limitation of ipyleaflet is lack of a tooltip type of control to show information about features below the mouse.  One common way to accomplish this is by adding an HTML widget and the appropriate callback to display the content.

I want to declutter the map so I'll remove a couple of layers before adding the HTML control.

In [None]:
map_display.remove_layer(la_boundary)
map_display.remove_layer(bids)

You can see this control is quite straight forward once you know which column you want to display.  The BID dataset if very simple so we'll just display the prog_name attribute.

In [None]:
bid_html = HTML('''Hover over a district''')
bid_html.layout.margin = '0px 20px 20px 20 px'
bid_control = WidgetControl(widget=bid_html, position='bottomright')

def update_bid_html(feature, **kwargs):
    bid_html.value = f"<b>{feature['properties']['prog_name']}"
    
map_display.add_control(bid_control)  # does += work for this?

geo_json.on_hover(update_bid_html)

Check out the map again.  You should see the HTML control on the bottom right of the display.  It will show the BID name as you move the mouse.

Before I move on, let's look at one of the [newer mapping](https://leafmap.org/) packages in the python ecosystem.

Interesting eh?

## leafmap

In [None]:
import leafmap

In [None]:
m = leafmap.Map()
m.add_gdf(bids_gdf, layer_name="LA BIDs", fill_colors=["red", "green", "blue"])
m

Interesting for sure.  This package warrents further investigation.

# BID analysis

Next we'll look at businesses and BIDs.  There are lot's of businesses, and that is hard to map with browswer-based maps.  I will show the basics, but focus on one BID in particular - Wilshire Center.

As usual, since I want to do some analysis we'll get to a common crs.

**Note:** This crs worked in the previous mapping section but we'll change for operations.

In [None]:
#bids_gdf.crs

## Wilshire BID

In [None]:
bids_gdf.to_crs("epsg:4326", inplace=True)

Next we'll create a gdf of length one for the Wilshire BID.

In [None]:
wilshire_gdf = bids_gdf.query(f"prog_name == 'WILSHIRE CENTER'").reset_index().drop(columns=['index'])

In [None]:
#wilshire_gdf

Join the Opportunity Zone data with the Wilshire BID to get the OZ's in Wilshire.

In [None]:
wilshire_oz_intersects_gdf = wilshire_gdf.overlay(oz_zones, how='intersection')#.drop(columns=['index'])

Let's use the built in map to look at the tracts.

In [None]:
wilshire_oz_intersects_gdf.explore(
    column="TRACTCE", 
     tooltip=["TRACTCE","prog_name"], 
     popup=True, 
     tiles="CartoDB positron", 
     cmap="Set1", 
     style_kwds=dict(color="black") 
    )

Visual inspection shows 9 polygons. 

In [None]:
len(wilshire_oz_intersects_gdf)

At first glance 13 seemed a bit weird to me.  Some of the [census tracts](https://www.caliper.com/glossary/what-is-a-tract.htm) are ..., well interesting.  That is for another day.

Following along with my static analysis I want to see these two feature(s) side-by-side.  I would like to understand the percentage of this BID that is in a OZ.

In [None]:
count_output = Output(layout={'border': '1px solid black',
                            'width': '50%'})

density_output = Output(layout={'border': '1px solid black',
                            'width': '50%'})

with count_output:
    display(wilshire_oz_intersects_gdf.explore())

with density_output:
    display(wilshire_gdf.explore())

HBox([count_output, density_output])

Interesting.  My first question is what percentage of the BID is "covered" by the OZ?

I need to compute areas for these geodataframes and then get the percentage.

This leads to another crs adventure.  First we can look at [planar](https://www.conservation.ca.gov/cgs/Pages/Program-RGMP/california-state-plane-coordinate-system.aspx) as a guide to get to a crs.  I selected this [crs](https://epsg.io/2229).

In [None]:
#wilshire_oz_intersects_gdf.crs

In [None]:
#wilshire_gdf.crs

In [None]:
wilshire_oz_intersects_gdf['area'] = wilshire_oz_intersects_gdf.to_crs("epsg:2229").area

In [None]:
oz_area = wilshire_oz_intersects_gdf['area'].sum() 
bid_area = wilshire_gdf.to_crs("epsg:2229").area.iloc[0]

In [None]:
round((oz_area / bid_area), 2)

So 57% of the BID is also an OZ.  Visual inspections seems ok?

## Wilshire ipyleaflet map

Now let's get back to the leaflet map to start digging in a bit more.

Clean up map_display first.

In [None]:
map_display.remove_layer(geo_json)
map_display.remove_control(bid_control)

**Note:** Exercise for the user - At this point, the map output cell is `way up there`.  If you get tired of bouncing back and forth during this dev phase you can create a new view for output and move the cell off to this side.  Try a right mouse click on the map_display output cell.

Using the same workflow to build two overlays for this map.

In [None]:
wilshire_boundary = GeoData(geo_dataframe = wilshire_gdf,
                   style={'color': 'black', 'fillColor': '#3366cc', 'opacity':0.05, 'weight':1.9, 'dashArray':'2', 'fillOpacity':0.6},
                   hover_style={'fillColor': 'red' , 'fillOpacity': 0.2},
                   name = 'Wilshire Boundary')

map_display += wilshire_boundary

In [None]:
#def random_color(feature):
#    return {
#        'color': 'black',
#        'fillColor': random.choice(['red', 'yellow', 'green', 'orange']),
#    }

oz_json = GeoJSON(
    data=json.loads(wilshire_oz_intersects_gdf.to_json()),
    style={
        'opacity': 1, 'dashArray': '9', 'fillOpacity': 0.5, 'weight': 1
    },
    hover_style={
        'color': 'white', 'dashArray': '0', 'fillOpacity': 0.8
    },
    style_callback=random_color,
    name='OZs geojson'
)

map_display.add_layer(oz_json)

For the on_hover what columns do we want to display?

In [None]:
wilshire_oz_intersects_gdf.columns

Since we're interested in OZ's (census tracts), let's look at TRACTCE and GEOID.  GEOID is very important when dealing with census (and other gov dataset for that matter).

In [None]:
oz_html = HTML('''Hover over OZ''')
oz_html.layout.margin = '0px 20px 20px 20 px'
oz_control = WidgetControl(widget=oz_html, position='bottomright')

def update_oz_html(feature, **kwargs):
    oz_html.value = f"<b>Census Tract:{feature['properties']['TRACTCE']}\n<br><b>GEOID: {feature['properties']['GEOID']}"
    
map_display.add_control(oz_control)  # does += work for this?

oz_json.on_hover(update_oz_html)

Finally we can center the map on the Wilshire BID and zoom it a bit.

In [None]:
map_display.center = wilshire_gdf.iloc[0].geometry.centroid.y, wilshire_gdf.iloc[0].geometry.centroid.x
map_display.zoom = 15

So we now have the Wilshire BID with the Opportunity Zones basemap (see the output cell for 3).  

# Wilshire Businesses

The final step is to add the businesses in the Wilshire BID to our map.

To accomplish this we'll:

   1. Use the wilshire poly to select businesses gdf
   2. See which NAICS codes are in the district
   3. Build a color map for the markers
   4. Use the color map and the gdf to create the markers
   5. Add the clustered markers and the markers to the map

## Businesses in Wilshire

Build a geodataframe with the businesses in the Wilshire BID.

In [None]:
wilshire_businesses_gdf = businesses_gdf.sjoin(wilshire_gdf, how='inner', predicate='within')

In [None]:
len(wilshire_businesses_gdf)

This is a reasonable number for a browser-based map.

## NAICS analysis

Next I will filter on valid NAICS codes.  This is just easier but not really necessary.

In [None]:
#wilshire_businesses_gdf.columns

In [None]:
wilshire_businesses_gdf.sector_desc.value_counts()

So the sector codes I want to remove:

In [None]:
bad_naics_codes = ['na', '99', '88']

In [None]:
wilshire_biz_with_naics_gdf = wilshire_businesses_gdf.query(f"sector not in @bad_naics_codes").reset_index().drop(columns=['index', 'level_0'])

In [None]:
len(wilshire_biz_with_naics_gdf)

## Marker Color Map

In [None]:
naics_codes = wilshire_biz_with_naics_gdf.sector.value_counts().keys().to_list()

**Note:** Serious hack ahead!  I need [colors](https://stackoverflow.com/questions/1168260/algorithm-for-generating-unique-colors) for the map markers.  Found this list to use.  Ugh!

In [None]:
colors = ['#FFA6FE',
'#FFDB66',
'#006401',
'#010067',
'#95003A',
'#007DB5',
'#FF00F6',
'#FFEEE8',
'#774D00',
'#90FB92',
'#0076FF',
'#D5FF00',
'#FF937E',
'#6A826C',
'#FF029D',
'#FE8900',
'#7A4782',
'#7E2DD2',
'#85A900',
'#FF0056',
'#A42400',
'#00AE7E',
'#683D3B',
'#BDC6FF',
'#263400',
'#BDD393',
'#00B917',
'#9E008E',
'#001544',
'#C28C9F',
'#FF74A3',
'#01D0FF',
'#004754',
'#E56FFE',
'#788231',
'#0E4CA1',
'#91D0CB',
'#BE9970',
'#968AE8',
'#BB8800',
'#43002C',
'#DEFF74',
'#00FFC6',
'#FFE502',
'#620E00',
'#008F9C',
'#98FF52',
'#7544B1']

In [None]:
print(f"Number of colors: {len(colors)}")
print(f"Number of colors I need: {len(naics_codes)}")

I will combine these two lists for a color map lookup.j

In [None]:
a_cmap = colors[5:26]

In [None]:
cmap = dict(zip(naics_codes, a_cmap))

In [None]:
cmap

We're getting close to the end!

## Markers

What do we want to show when we select a marker?  Here are the possiblities:

In [None]:
wilshire_biz_with_naics_gdf.columns

There are some similarities between `PRIMARY NAICS DESCRIPTION` (from the dataset) and `sector_desc` (added in preproccessing).  I'll use the later.

In [None]:
markers = list()

for i, row in tqdm(wilshire_biz_with_naics_gdf.iterrows()):
    
    fill_color = cmap[row.sector]
    marker = CircleMarker(location=(row.geometry.y, row.geometry.x), radius=5, stroke=False, fill_color=fill_color, fill_opacity=1.0)
    msg = HTML()
    msg.value = "Business: {}<br>Address: {}<br/>Sector: {}".format(row['BUSINESS NAME'], 
                                                                    row['STREET ADDRESS'],
                                                                    row['sector_desc'])
    marker.popup = msg
    markers.append(marker)
    wilshire_biz_with_naics_gdf.loc[i, 'marker'] = marker

wilshire_businesses_cluster = MarkerCluster(markers=markers, name='Businesses - Clustered')


## Add to map

Finally!!

Sort of right at the boundary of permant maps with ipyleaflet.  Use the layer control to look at the two ways I'm showing the businesses.  MarkerClusters help?

In any event, play with this map and tell me what you think.

**Note:** I added both the clustered and unclustered marker sets.

In [None]:
map_display.add_layer(wilshire_businesses_cluster)

all_businesses = LayerGroup(name=f"All Businesses", layers=markers)
map_display += all_businesses

Tired of bouncing around (and haven't done the seperate output cell), so let's display the final Wilshire map.

In [None]:
map_display

Whoa - crap on a map!  Try it in full screen mode (with the little square control).

Very last thing is to show how to add a legend to the map.  It can help

In [None]:
naics_descrips = wilshire_biz_with_naics_gdf.sector_desc.value_counts().keys().to_list()

legend_cmap = dict(zip(naics_descrips, a_cmap))

legend = LegendControl(legend_cmap, name='Legend', position='bottomright')

map_display.add_control(legend)

# Summary/Conclusion

We've looked at some techniques to filter/combine/aggregate the datasets.  A couple of observations:

  1.  Pretty easy to get clutered maps.  You will need to add more focused attribution for selected features (i.e. OZ for a business?).
  2.  Coding can get messy in notebooks.  Think reuse and factor into .py.
  3.  I didn't cover folium but you can check out [previous work](https://github.com/hackforla/lasan) for some ideas with that package.
  4.  leafmap is definitely worth a look-see.
  5.  Food-for-thought: how can we add some (biz) stats to the bids_gdf?