## Easier Geospatial Exploratory Data Analysis with Folium

When getting to learn a dataset through exploratory data analysis, it is crucial that the data viz tools are easy to use and modify. If the tool gets in the way of the work during this initial stage of searching for insights, observing trends, or noting interesting features to explore within a new dataset, it can stifle the curiosity and creative thinking necessary for uncovering new areas of exploration.

This reality is particularly true when working with geospatial data. 

The inherently visual nature of geographic data raises the bar for even simple figures. We as users also bring certain expectations when evaluating a map, because we’ve used them from a young age, whether imagining ourselves discovering new worlds through maps in the front cover of our favorite books or perhaps tracing our finger along the interstates in an atlas on a family road trip. Now, we’re also accustomed to being able to interact with maps thanks to tools like Google Maps, Yelp, Airbnb, and countless others.

So, whether working individually or as part of a collaborative team, being able to quickly build and iterate on both functional and beautiful visualizations is an important option in our toolkit when working with and exploring geospatial data. Folium, a Python package powered by some underlying javascript integrations, offers us this balance of form and function with a lightweight grammar to create and customize interactive maps with only a few lines of code.

In this example, we’ll take a look at a project considering where to plant [rain gardens](https://www.epa.gov/soakuptherain/soak-rain-rain-gardens) in New Orleans, LA and use census data to compare the areas where these rain gardens would be cultivated.

<br>

#### Getting Ready - Gathering Our Data & Tools
While this tutorial will not go into the finer details of working with geospatial data, there are [several](https://www.learndatasci.com/tutorials/geospatial-data-python-geopandas-shapely/) [helpful](https://sites.northwestern.edu/researchcomputing/2022/01/04/plotting-geospatial-data-with-python/) [resources](https://geographicdata.science/book/notebooks/03_spatial_data.html) on the matter. For the context of this project, we will use a collection of small datasets that have already been cleaned to highlight the functionality that Folium provides. The first contains [census data about each neighborhood in New Orleans](https://github.com/phork37/NewOrleansRainGardens/blob/main/data/output/block_groups_output.zip), while the second dataset is a [list of the proposed rain garden locations](https://github.com/phork37/NewOrleansRainGardens/blob/main/data/output/gardens_rainfall.csv). Note that the .zip file will need to be opened and stored locally since it is larger than the 100MB Github file limit.

With these files stored locally, we can start working in a Python notebook to build our interactive map. Our first steps include importing Folium, along with Pandas and its Geospatial counterpart, Geopandas, along with Shapely's geometry features for plotting the latitude and longitude points for each raingarden.

<br>

In [12]:
import pandas as pd
import numpy as np

from shapely.geometry import Point
import geopandas as gpd

import folium

# silence warning about shapely and pygeos interaction
import warnings
warnings.filterwarnings('ignore')

# move the directory one level up to access the data folder
import os
os.chdir('../')

<br>

Once the packages are loaded, we import the two datasets. Because the rain gardens are stored in a CSV file, we'll need to use Geopandas to convert the `lat` and `lon` columns into a single `geometry` column to properly plot the points.

In the process, we set the coordinate system (`crs`) to one that will accurately display the points on a map and point the geometry to the relevant columns in our dataframe. We also take the opportunity to rename some columns for additional clarity.

<br>

In [2]:
# read in the block groups shapefile with included socio-economic data
block_groups = gpd.read_file("data/output/block_groups_output.shp")
block_groups.rename(columns={'percent_no':'percent_nonwhite', 'percent_po':'percent_lowincome',
                             'nonwhite_t':'nonwhite_text', 'pov_text':'lowincome_text'}, inplace=True)

# read in the location of rain gardens with included precipitation data
gardens_rainfall = pd.read_csv("data/output/gardens_rainfall.csv")
gardens_rainfall = gpd.GeoDataFrame(gardens_rainfall, crs='epsg:4269', geometry=gpd.points_from_xy(gardens_rainfall.lon, gardens_rainfall.lat))

In [3]:
block_groups.sample(5)

Unnamed: 0,GEOID,NAME,percent_nonwhite,percent_lowincome,nonwhite_text,lowincome_text,geometry
327,220710023003,"Block Group 3, Census Tract 23, Orleans Parish...",98.0,47.0,98%,47%,"POLYGON ((-90.05857 29.98881, -90.05262 29.989..."
250,220710119002,"Block Group 2, Census Tract 119, Orleans Paris...",29.0,34.0,29%,34%,"POLYGON ((-90.11658 29.94083, -90.11471 29.939..."
252,220710009024,"Block Group 4, Census Tract 9.02, Orleans Pari...",100.0,27.0,100%,27%,"POLYGON ((-90.01724 29.97470, -90.01691 29.975..."
357,220710033085,"Block Group 5, Census Tract 33.08, Orleans Par...",89.0,32.0,89%,32%,"POLYGON ((-90.06970 30.00765, -90.06858 30.007..."
369,220710006042,"Block Group 2, Census Tract 6.04, Orleans Pari...",69.0,32.0,69%,32%,"POLYGON ((-90.02011 29.92782, -90.01978 29.933..."


In [4]:
gardens_rainfall.sample(5)

Unnamed: 0,address,garden_area,station_id,daily_avg,monthly_avg,yearly_avg,lat,lon,daily_gallons,monthly_gallons,yearly_gallons,geometry
1,9005 Olive Street,3600,USC00166679,0.17,5.0,59.0,29.970157,-90.120296,381.0,11214.0,132325.0,POINT (-90.12030 29.97016)
28,1750 Tennessee,5160,USC00166672,0.17,5.0,60.0,29.970975,-90.020749,546.0,16073.0,192881.0,POINT (-90.02075 29.97097)
5,3304 Monroe Street,7550,USC00166679,0.17,5.0,59.0,29.967801,-90.118395,800.0,23518.0,277515.0,POINT (-90.11840 29.96780)
16,4737 Virgilian,5590,USC00166678,0.17,5.0,54.0,30.017004,-90.01675,592.0,17413.0,188059.0,POINT (-90.01675 30.01700)
34,822 Bartholomew,4681,US1LAOR0006,0.19,5.0,47.0,29.964089,-90.036636,554.0,14581.0,137064.0,POINT (-90.03664 29.96409)


<br>

Looking at a sample of our census data, we can see the unique `GEOID` for each neighborhood, along with two socio-economic features, `Percent Non-White` and `Percent Low Income`, along with two columns which contain the same observations, but as a string with a percent sign as a suffix. As we'll see below, this small addition of redundant information can make mapping quite intuitive.

In the preview of our rain gardens dataset, we can see the average and total rainfall for each day, month, and year, according to NOAA data. We can also confirm that our transformation from a flat CSV to geospatial dataset worked with the `geometry` column storing Point data that is the combination of the `lat` and `lon` columns.

With the necessary packages and our data in place, we can start building the structure of our interactive map.

<br>

#### Building Interactive Maps with Folium

The first step in creating a Folium map is to store an instance of a `folium.Map` class. It is important that we store it in a variable because each subsequent layer we add to the interactive display will be added to the original map itself. When instantiating a new Map, we can set a default location via the latitude and longitude, along with the starting zoom depth, as shown below. We can also set a basemap along with the initial map variable, but in this case, we'll set it to 'None' for now and add our own basemap later.

<br>

In [5]:
# Initialize the basemap for contextual location
m = folium.Map(location=[29.995, -90.05], zoom_start=12, tiles=None)

<br>

#### Feature Groups - Easy Layers in Folium

Often when working with plotting tools, it can be easy to lose track of the stack or order of various layers, wasting time re-organizing elements just so everything displays correctly. With the use of `FeatureGroups`, another Folium class, we are able to group individual elements together. Not only do these feature groups allow for a cleaner mental mapping of our codebase, but it creates more intuitive data visualizations as well. This inclusion will be particularly helpful in setting up our interative layers later on.

In this code snippet, we create two feature groups, one for each of the census variables we are interested in for this project. The relevant parameters include the name of the feature group and setting 'overlay' to False, which means that each layer will appear independent of one another. This setting is crucial so that when we display the map the colors of each choropleth layer don't overlap and interfere with one another. Also note that each figure is added to our original map object.

<br>

In [6]:
# create two feature groups that will each contain a choropleth
fg1 = folium.FeatureGroup(name='Percent Non-White',overlay=False).add_to(m)
fg2 = folium.FeatureGroup(name='Percent Below Povery Line',overlay=False).add_to(m)

<br>

#### Creating Choropleths

Now that we have our infastructure in place (with only three lines of code!), it's time to start visualizing our data. One of the most familiar ways to present geographic data is a choropleth, in which categorical or continuous data is presented in a variety of hues or intensity to highlight how trends change across distances or geographic boundaries. In our case, we want to see how each neighborhood's socio-economic features are dispersed across the greater New Orleans area.

Similar to the Map and Feature Groups, a Choropleth map is just another class within the folium package. As such, we can build both layers relatively easily, just changing out a couple of variables in our dataset. Let's briefly look at each parameter and the purpose it serves.


- `name`: The name of our layer, similar to the feature group above, this will appear in the legend, which we'll create below.
- `geo_data`: The dataset where our geometry column is stored.
- `data`: The dataset where the variable we want to display in the choropleth is stored. In this case, it's the same as our `geo_data` parameter, but it doesn't have to be, as long as each dataset has the same unique ID that is passed into the `key_on` input.
- `columns`: A list of the column names needed, in this case only the unique ID and the variable of interest
- `key_on`: The unique ID used to determine the individual shapes to be plotted and connect the geographic and non-geographic data.
- `bins`: Determines how many categorical breaks are created in the dataset and thus how many different hues or shades will be displayed on the map.
- `fill_color`: Accepting most swatches used in ggplot and other tools, we can pass a string in, so long as it offers enough options to cover the number of bins set above.
- `fill_opacity`: Bound between 0 and 1, this determines the level of transparency of the choropleth. In this case, we want to be able to see just a bit of the underlying map, so we set it to 0.8.
- `line_opacity`: Similarly, sets the transparency of the borders. We set this lower since it's not the focal point of the display.
- `highlight`: This controls whether the outline of each shape highlights when it's hovered over. Since we're going to be encouraging interactions, we set this to true.
- `line_color`: Sets the color of the border lines.

<br>

In [7]:
# Create choropleth map
ch1 = folium.Choropleth(
            name='Percent Non-White',
            geo_data=block_groups,
            data=block_groups,
            columns=['GEOID', 'percent_nonwhite'],
            key_on='feature.properties.GEOID',
            bins=9,
            fill_color='Purples',
            fill_opacity=0.6, 
            line_opacity=0.3,
            highlight=True, 
            line_color='black').geojson.add_to(fg1)

# Create choropleth map
ch2 = folium.Choropleth(
            name='Percent Below Povery Line',
            overlay=False,
            geo_data=block_groups, 
            data=block_groups,
            columns=['GEOID', 'percent_lowincome'],
            key_on='feature.properties.GEOID',
            fill_color='YlOrRd',
            fill_opacity=0.6, 
            line_opacity=0.3,
            highlight=True,
            line_color='black').geojson.add_to(fg2)


<br>

With these parameters set and customized, it's worth noting two more small details. First, after creating the Choropleth, we don't want to send it to the map object we created, because we want it to be stored in the relevant feature group. Second, one admittedly odd nuance in folium is that for the choropleth to be stored in the feature group, it needs to first be transformed into a geojson data structure. So we make this change and save our new map layer to the feature group. We then turn our attention to transforming our maps into interactive exploratory tools.

<br>

#### Encouraging Data Exploration Through Pop-Ups

Folium features two primary ways to add contextual data to interactive vizualizations. The first is tooltips, which are windows that appear when a user hovers over a particular element. We will use these with our rain garden locations below. For our neighborhood choropleth maps, we'll deploy pop-ups which work similarly to tooltips, but appear when a user clicks, rather than upon hovering. 

To attach the pop-up feature to their respective layer, we use the `add_child()` method on the stored choropleths. This in turn stores each pop-up to the correct feature group so that the right pop-up appears on the right layer. Within the pop-up class, we provide a list of the relevant columns (it inherits the dataframe stored in the choropleth above, so it knows where to look for our variables!) and then we have the opportunity to change how the column names appear with the `aliases` parameter. We set labels to True so that the names appear, pre-formatted with bold and adequate spacing and alignment without any fuss or custom print statements.

<br>

In [8]:

# Create the on-click popup for additional details for each block group
ch1.add_child(
    folium.features.GeoJsonPopup(['GEOID', 'nonwhite_text', 'lowincome_text'], labels=True, 
                                   aliases=['GEOID', 'Non-White Population', 'Below Povery Line'])
)

ch2.add_child(
    folium.features.GeoJsonPopup(['GEOID', 'nonwhite_text', 'lowincome_text'], labels=True, 
                                   aliases=['GEOID', 'Non-White Population', 'Below Povery Line'])
)


<folium.features.GeoJson at 0x7f863da74d00>

<br>

Just like that, our choropleth layers are created, stored, and enhanced with contextual data. And with the suite of intuitive parameters at our disposal, it's easy to see how once these tools are familiar, exploratory data analysis becomes a quick interative process of prototyping and investigating geospatial data.

But we're not done yet, because we can add on another layer to see how the features stored in the choropleths compare to the locations of the proposed rain gardens.

<br>

#### Adding Individual Locations & Interactive Tooltips

Displaying a series of individual points on our newly created folium map is only one degree more involved than the choropleth layers we've built so far. Because each point is unique, we actually need to run a `for` loop over the Geopandas dataframe so that each parameter is individualized the the correct point.

After creating our for loop, we call a new folium class, the `Marker`. We then use the location parameter and pass in a list that includes the latitude and longitude for the currently indexed row. Then, within the marker, we can directly add a tooltip as part of the initial instantiation process. The code may look like a lot, but because we can use HTML directly within the tooltip call, we create a bit of formatting such that our value names are bold and the spacing makes our variables easier to read. We then add each point and its tooltip to the map.

<br>

In [9]:

# add rain gardens as points with hover tooltips including formatted text
for i in range(0,len(gardens_rainfall)):
   folium.Marker(
      location=[gardens_rainfall.iloc[i]['lat'], gardens_rainfall.iloc[i]['lon']],
      tooltip=( '<b>Address:</b> ' + str(gardens_rainfall.at[i, 'address'])
               + '<br>' # for readability, take each average and sum and store as int, then include as a string
               + '<br>' + '<b>Avg. Monthly Rainfall:</b> ' + str(int(gardens_rainfall.at[i, 'monthly_avg'])) + " inches" 
               + '<br>' + '<b>Avg. Monthly Rainfall Captured:</b> ' + str(int(gardens_rainfall.at[i, 'monthly_gallons'])) + " gallons"
               + "<br>"
               + '<br>' + '<b>Avg. Annual Rainfall:</b> ' + str(int(gardens_rainfall.at[i, 'yearly_avg'])) + " inches" 
               + '<br>' + '<b>Avg. Annual Rainfall Captured:</b> ' + str(int(gardens_rainfall.at[i, 'yearly_gallons'])) + " gallons"
               ),
   ).add_to(m)



<br>

#### Final Touches

With our feature groups, choropleths, and points layers created, there's only a couple more elements to add: the basemap and the legend that will provide users the ability to switch between layers.

There are a number of options for how the basemap can appear, but each is set through the `TileLayer` option within the folium package. In this case, we use a fairly minimalistic map provided by CartoDB. We set the overlay parameter to True so that our choropleths will appear over the basemap and then give it a name. We then add it to the map object.

Similarly, we use the `LayerControl` class to create our window that will display the created feature groups as options users can select between. We set collapsed to false so that it's easier to see and interact with and (say it with me) add it to our map object.

Finally, we simply call our map object and, with less than 100 lines of code, we can begin exploring our multi-layered interactive map.

<br>

In [10]:
# add basemap overlay
folium.TileLayer('cartodbpositron',overlay=True,name="Basemap").add_to(m)

# add layer controls to map, display by default
folium.LayerControl(collapsed=False).add_to(m)

#display the map layers
m

<br>

#### Sharing Our Findings

While the actual analysis is beyond the scope of this tutorial, it's helpful to know that our new data visualization is easy to share. Yet another helpful feature of folium is that the entire map, interaction and all, can be stored in a self-contained HTML file. All we have to do is save the map and then we can share it via email or cloud storage and a colleague can open the file in browser and explore our interactive visualization without needing to have any of the data, packages, or code on their local machine.

<br>

In [11]:
m.save("raingarden_map.html")

<br>

#### Making Maps Faster

Folium, like any new tool, can take a little getting used to and might require a few forays into the documentation. But this brief tutorial highlights how it can serve as an efficient tool for creating effective interactive data visualizations. The above example serves our purpose of facilitating dynamic exploration of the interaction between our variables of interest with much less code than it takes to launch a Shiny or Streamlit app of similar scope.

Of course, there are trade-offs in both power and customization in comparison to these more exhaustive tools, but that is a feature, rather than a bug, in the early stages of data exploration. Once we familiarize ourselves with the language of the folium library, it can quickly become a valuable tool we reach for when working with geospatial data.