# HEATMAPS FOR EPIDEMIOLOGY USING FOLIUM

## Packages

We import `folium` and the `plugins` module from `folium` to create heatmaps. The `pandas` library is used for data manipulation and analysis, and the `geopandas` library is used for working with geospatial data. The `numpy` library is imported for numerical operations and to generate random data. The `shapely.geometry` module is used to create geometric objects, specifically points in this case. Finally, the `ipywidgets` library is imported to create interactive widgets for displaying the heatmaps side by side.

In [1]:
import folium
from folium.plugins import HeatMap, HeatMapWithTime
import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Point
from ipywidgets import HBox, VBox, Output

## What is a Spatial Heatmap?

A spatial heatmap is a data visualization technique that represents the density or intensity of data points in a geographic area using color gradients. In epidemiology, heatmaps are often used to visualize the distribution of disease cases, helping to identify hotspots and patterns of spread. Other use cases in epidemiology include tracking vaccination rates, visualizing environmental health risks, and monitoring healthcare access.

An example of a spatial heatmap could be a map showing the concentration of COVID-19 cases in a city, where areas with higher case counts are represented with warmer colors (e.g., red) and areas with lower case counts are represented with cooler colors (e.g., blue). This allows public health officials to quickly identify regions that may require more resources or interventions.

In a heatmap, each data point contributes to the overall intensity of color in its vicinity, with areas of higher density appearing more prominently. This visual representation helps in quickly identifying patterns and trends in spatial data, making it a valuable tool for epidemiologists and public health professionals. Optional weights can be applied to data points to reflect their relative importance or severity, enhancing the interpretability of the heatmap.

## Generating Epidemiological Data

In this section, we generate a sample dataset as a pandas DataFrame object, representing epidemiological data. The dataset includes random latitude and longitude coordinates within a specified range, simulating the locations of disease cases. Each case is assigned a random weight to indicate its severity or importance. This synthetic data will be used to create a heatmap in the subsequent sections.

In [14]:
# Generate sample epidemiological data for 300 cases Brooklyn, NY
# Craete the data as a pandas DataFrame assigning it to the variable name `data`
data = pd.DataFrame({
    'lat': np.random.uniform(40.6, 40.7, 300),
    'lon': np.random.uniform(-74.0, -73.9, 300),
    'weight': np.random.randint(1, 50, 300)
})

The DataFrame is converted to a geopandas GeoDataFrame by creating Point geometries from the latitude and longitude columns. This allows for easier spatial analysis and visualization using geospatial libraries.

In [15]:
# Convert the DataFrame to a GeoDataFrame assigned to the variable name `gdf`
geometry = [Point(xy) for xy in zip(data['lon'], data['lat'])]
gdf = gpd.GeoDataFrame(data, geometry=geometry)

We display the first five rows of the generated DataFrame to verify the data structure and content.

In [16]:
# Display the first five rows of the generated DataFrame
gdf.head()

Unnamed: 0,lat,lon,weight,geometry
0,40.600206,-73.90778,32,POINT (-73.90778 40.60021)
1,40.604828,-73.969623,41,POINT (-73.96962 40.60483)
2,40.655285,-73.973781,17,POINT (-73.97378 40.65528)
3,40.671015,-73.93652,16,POINT (-73.93652 40.67101)
4,40.691082,-73.920519,7,POINT (-73.92052 40.69108)


## Generating a Base Map

The `Map` object from the `folium` library is used to create a base map centered at specified latitude and longitude coordinates, with a defined zoom level. This base map serves as the canvas for adding heatmap layers and other geographical features.

In [17]:
# Create a base map centered around Brooklyn, NY assigned to the variable name `base_map`
base_map = folium.Map(location=[40.65, -73.94], zoom_start=12, tiles='CartoDB Positron')

# Display the map
base_map

## Plotting the Heatmap

The `HeatMap` plugin from the `folium.plugins` module is utilized to create a heatmap layer on the base map. The heatmap visualizes the density of epidemiological data points based on their latitude and longitude coordinates, with optional weights to indicate the severity of each case. The heatmap is customized with parameters such as radius, blur, and maximum zoom level to enhance its appearance and effectiveness in conveying spatial patterns.

In [18]:
# Plot a heatmap (without weights for simplicity) assigned to the variable name `m`
m = folium.Map(location=[40.65, -73.94], zoom_start=12, tiles='CartoDB Positron')
heat_data = [[row['lat'], row['lon']] for index, row in gdf.iterrows()]
HeatMap(heat_data, radius=15, blur=10, max_zoom=1).add_to(m)
m

The heatmap above illustrates the distribution of epidemiological cases, with areas of higher density represented by warmer colors. heat reflects point density, not exact values. Nearby points reinforce each other. This visualization aids in identifying hotspots and patterns in the spread of disease, providing valuable insights for public health interventions and resource allocation.

Weighted heatmaps can be particularly useful in epidemiology, as they allow for the representation of varying levels of disease severity or case importance. By assigning weights to each data point, the heatmap can highlight areas with not only a high number of cases but also those with more severe outcomes, enabling more targeted public health responses.

We create a weighted heatmap below, where each data point's influence on the heatmap is determined by its assigned weight. This approach allows for a more nuanced visualization of epidemiological data, emphasizing areas with more significant health impacts. Note that weights are relative, not absolute values.

In [19]:
# Plot the weighted data Heatmap assigned to the variable name `m`
m = folium.Map(location=[40.65, -73.94], zoom_start=12, tiles='CartoDB Positron')
heat_data = [[row['lat'], row['lon'], row['weight']] for index, row in gdf.iterrows()]
HeatMap(heat_data, radius=15, blur=10, max_zoom=1).add_to(m)
m

## Visualization Parameters

Keyword arguments allows us to control the appearance and behavior of the heatmap. Some of the key parameters include:

- `radius`: The radius of each point in the heatmap, affecting the spread of color around each data point.
- `blur`: The amount of blur applied to the heatmap points, influencing the smoothness of the heatmap.
- `max_zoom`: The maximum zoom level at which the heatmap will be displayed.
- `min_opacity`: The minimum opacity of the heatmap, controlling its transparency.

Below, we set the `radius` to $20$ and `blur` to $15$ to create a heatmap with a broader spread and smoother appearance, enhancing the visualization of epidemiological data.

In [21]:
# Plot the weighted data Heatmap with different parameters assigned to the variable name `m`
# Set the radius to 20 and blur to 15
m = folium.Map(location=[40.65, -73.94], zoom_start=12, tiles='CartoDB Positron')
heat_data = [[row['lat'], row['lon'], row['weight']] for index, row in gdf.iterrows()]
HeatMap(heat_data, radius=20, blur=15, max_zoom=1).add_to(m)
m

We can plot both heatmaps side by side (using ipywidgets) to compare the effects of different parameters on the visualization. The first heatmap uses a radius of $15$ and a blur of $10$, while the second heatmap uses a radius of $20$ and a blur of $15$. This comparison highlights how adjusting these parameters can influence the appearance and interpretability of the heatmap, allowing for better insights into the spatial distribution of epidemiological data.

In [22]:
# Plot two heatmaps side by side for comparison
out1 = Output()
out2 = Output()

with out1:
    m1 = folium.Map(location=[40.65, -73.94], zoom_start=12, tiles='CartoDB Positron')
    heat_data = [[row['lat'], row['lon'], row['weight']] for index, row in gdf.iterrows()]
    HeatMap(heat_data, radius=15, blur=10, max_zoom=1).add_to(m1)
    display(m1)
with out2:
    m2 = folium.Map(location=[40.65, -73.94], zoom_start=12, tiles='CartoDB Positron')
    heat_data = [[row['lat'], row['lon'], row['weight']] for index, row in gdf.iterrows()]
    HeatMap(heat_data, radius=20, blur=15, max_zoom=1).add_to(m2)
    display(m2)

HBox([out1, out2])

HBox(children=(Output(), Output()))

## Time-varying Heatmaps

Time-varying heatmaps can be created by generating multiple heatmap layers, each representing data from different time periods. By using a time slider or animation, users can visualize how the distribution of epidemiological cases changes over time. This dynamic representation can help identify trends, outbreaks, and the effectiveness of interventions.

To generate a time series we randomly sample $75$ cases and create a list with five nested lists of these $75$ random cases.

In [23]:
# Create a nested list of five lists each with 75 random cases from the geodataframe
# Assign the result to the variable `heat_data_time`
heat_data_time = [
    gdf.sample(75)[["lat", "lon"]].values.tolist()
    for _ in range(5)
]

In [24]:
# Confirm that there are five lists nested in the `heat_data_time` list
len(heat_data_time)

5

Now we use the `HeatMapWithTime` function to create a map that displays the five time intervals.

In [25]:
# Create a time series plot
m_time = folium.Map(location=[40.65, -73.94], zoom_start=12, tiles='CartoDB Positron')


HeatMapWithTime(
heat_data_time,
radius=20,
auto_play=True,
max_opacity=0.8
).add_to(m_time)


m_time

## Exporting Maps

We can save folium maps. Below, we export the time series heatmap to an HTML file.

In [26]:
# Export the time series heatmap to an HTML file
# Use the file name "epidemiological_heatmap_time_series.html"
m_time.save("epidemiological_heatmap_time_series.html")