## Introduction

This notebook serves as an introductory guide to using pysal and its associated libraries for spatial data analysis and visualization. It provides a  pathway through essential spatial data techniques, from creating customized choropleth maps and exploring interactive maps to calculating spatial weights for analyzing spatial relationships. Each section builds on the previous one, gradually equipping users with the skills to handle various spatial analysis tasks. This structure ensures that users can apply the covered techniques in a range of spatial contexts.


The contents are organized as follows:

- Reading Spatial Data: GeoPandas and libpysal Basics - An introduction to working with GeoDataFrames and loading spatial datasets.
- Visualizing Spatial Data: Mapping - Techniques for creating and styling various maps, including choropleths and interactive maps.
- Representing Spatial Structure: Spatial Weights - Methods to define and visualize spatial relationships, covering contiguity-based weights and graphs.
- Spatial Lag - Calculating and interpreting spatial lags to explore spatial dependence.
- Global and Local Spatial Autocorrelation - Tools for measuring spatial patterns, including Moran’s I and local indicators of spatial association (LISA).



## Reading Spatial Data: Geopandas and libpysal

This section covers the foundational steps for handling spatial data, including loading and managing datasets with GeoPandas and libpysal. Using GeoDataFrames, we can store and manipulate spatial data efficiently, taking advantage of libpysal's built-in datasets to explore real-world spatial datasets. By following these steps, users will develop a solid grasp of managing spatial datasets and preparing them for visualization and analysis.

In [None]:
import libpysal

In [None]:
libpysal.examples.available()

In [None]:
db = libpysal.examples.available()

In [None]:
db.Name.values

In [None]:
libpysal.examples.explain('chicagoSDOH')

In [None]:
SDOH = libpysal.examples.load_example('chicagoSDOH')

In [None]:
import geopandas as gpd

In [None]:
gdf = gpd.read_file(SDOH.get_path('Chi-SDOH.shp'))

In [None]:
gdf.head()

In [None]:
gdf.shape

In [None]:
gdf.info()

## Visualizing Spatial Data

Effective visualization is crucial for spatial analysis, as it allows us to interpret spatial patterns and trends. In this section, users will learn how to create and stylize various map types, from basic geographic representations to complex visualizations that convey deeper spatial insights. This is the first step in bringing spatial data to life and is essential for communicating analysis results to others.

### Visualizing spatial support

Here, we delve into the concept of spatial support and how to visualize it. Spatial support refers to the underlying spatial structure or boundaries of a dataset, which provides essential context for spatial relationships. Visualizing these structures enhances our understanding of how data points interact spatially, setting the stage for more advanced analyses.

In [None]:
gdf.plot()

In [None]:
gdf.crs

### Choropleth Mapping

Choropleth maps are a powerful tool for visualizing regional data, particularly when examining patterns across geographic boundaries. This section focuses on creating choropleth maps of the Years of Potential Life Lost (YPLL) rate, a public health measure of years lost due to premature death. Users will learn techniques to fine-tune these maps, making them both informative and visually appealing.

*YPLL*: Years of Potential of Life Lost measuring years lost due to premature deaths per 100,000 population below age 75.[^1]

[^1]: https://pmc.ncbi.nlm.nih.gov/articles/instance/6991288/bin/jamanetwopen-3-e1919928-s001.pdf

In [None]:
gdf.YPLL_rate.describe()

In [None]:
import seaborn as sns
sns.displot(gdf, x='YPLL_rate');



In [None]:
gdf.plot(column='YPLL_rate')

In [None]:
gdf.plot(column='YPLL_rate', legend=True)

In [None]:
gdf.plot(column='YPLL_rate', legend=True, scheme='quantiles', k=10)

In [None]:
gdf.plot(column='YPLL_rate', legend=True, scheme='quantiles', k=10,
        legend_kwds={'bbox_to_anchor': (2, 1)})

In [None]:
import seaborn as sns

In [None]:
gdf.plot(column='YPLL_rate', legend=True, scheme='fisherjenks', k=10,
        legend_kwds={'bbox_to_anchor': (2, 1)})

#### turning axis-off

In [None]:
ax = gdf.plot(column='YPLL_rate', legend=True, scheme='fisherjenks', k=10,
        legend_kwds={'bbox_to_anchor': (2, 1)})
ax.set_axis_off()

### legend title

In [None]:
ax = gdf.plot(column='YPLL_rate', legend=True, scheme='fisherjenks', k=10,
              legend_kwds={'bbox_to_anchor': (2, 1), 
                           'title': 'Years of Potential Life Lost (YPLL)'})
ax.set_axis_off()

### legend formatting

In [None]:
import matplotlib.ticker as mticker

ax = gdf.plot(column='YPLL_rate', legend=True, scheme='fisherjenks', k=10,
              legend_kwds={
                  'bbox_to_anchor': (2, 1),
                  'title': 'Years of Potential Life Lost (YPLL)',
                   'fmt': "{:,.0f}"  # Integer formatting
              })
ax.set_axis_off()

### Interactive Mapping
Interactive maps add a dynamic element to spatial analysis, allowing users to explore data in real time. This section introduces tools for creating interactive maps where users can zoom, pan, and interact with specific data points, making the data more accessible and engaging.

In [None]:
y_name = 'YPLL_rate'
gdf.explore(column=y_name, tooltip=[y_name],
           legend=True,
           scheme='naturalbreaks',
           k=10,
           legend_kwds=dict(colorbar=False))

### Multiple layers

In [None]:
import geodatasets
groceries = gpd.read_file(geodatasets.get_path("geoda.groceries")).explode(ignore_index=True)

In [None]:
import folium

m = gdf.explore(
    column=y_name,  # make choropleth based on "POP2010" column
    scheme="naturalbreaks",  # use mapclassify's natural breaks scheme
    legend=True,  # show legend
    k=10,  # use 10 bins
    tooltip=False,  # hide tooltip
    popup=[y_name],  # show popup (on-click)
    legend_kwds=dict(colorbar=False),  # do not use colorbar
    name="chicago",  # name of the layer in the map
)

groceries.explore(
    m=m,  # pass the map object
    color="red",  # use red color on all points
    marker_kwds=dict(radius=5, fill=True),  # make marker radius 10px with fill
    tooltip="Address",  # show "name" column in the tooltip
    tooltip_kwds=dict(labels=False),  # do not show column label in the tooltip
    name="groceries",  # name of the layer in the map
)

folium.TileLayer("CartoDB positron", show=False).add_to(
    m
)  # use folium to add alternative tiles
folium.LayerControl().add_to(m)  # use folium to add layer control
m 

## Representing Spatial Structure: Spatial Weights

Spatial weights are fundamental to spatial analysis, defining the structure and connectivity between spatial units. This section explains how to create and visualize spatial weights using different methods, such as contiguity and kernel weights. Understanding spatial weights is essential for analyzing how spatial units influence each other, paving the way for spatial dependence and autocorrelation analyses.

In [None]:
wq = libpysal.weights.Queen.from_dataframe(gdf, use_index=True)

In [None]:
wq.n

In [None]:
wq.neighbors[0]

In [None]:
wq.neighbors[544]

In [None]:
from splot.libpysal import plot_spatial_weights

In [None]:
plot_spatial_weights(wq, gdf, figsize=(20,20));

In [None]:
wr = libpysal.weights.Rook.from_dataframe(gdf, use_index=True)

In [None]:
plot_spatial_weights(wr, gdf, figsize=(20,20));

In [None]:
import matplotlib.pyplot as plt
fig, axs = plt.subplots(1, 2, figsize=(20, 20))
plot_spatial_weights(wq, gdf, ax=axs[0])
axs[0].set_title('Queen Graph')
plot_spatial_weights(wr, gdf, ax=axs[1])
axs[1].set_title('Rook Graph');

In [None]:
wker10 = libpysal.weights.Kernel.from_dataframe(gdf, fixed=False, k=10)

In [None]:
wker10.weights[0]

In [None]:
wker10.histogram

### A Note on the Graph Class
This section introduces the Graph class in libpysal, which allows users to represent spatial structures as graphs. By using graphs, users can visualize the relationships between spatial units more intuitively, an important feature for spatial network analysis. This functionality enriches spatial analysis by offering a way to capture and interpret the connectivity of spatial features.

In [None]:
from libpysal import graph
gq = graph.Graph.from_W(wq)

In [None]:
gq.explore(gdf)

In [None]:
gq.explore(gdf, focal=170)

In [None]:
m = gdf.loc[gq[170].index].explore(color="#25b497")
gdf.loc[[170]].explore(m=m, color="#fa94a5")
gq.explore(gdf, m=m, focal=170)

                      

## Spatial Lag
A spatial lag captures the influence of neighboring values on each spatial unit, providing insights into spatial dependence. Here, we demonstrate how to calculate spatial lag using different spatial weights, helping users understand the extent to which a location’s characteristics are affected by its neighbors.

In [None]:
y_name = 'YPLL_rate'

In [None]:
wq.weights[0]

In [None]:
wq.transform='r'

In [None]:
wq.weights[0]

In [None]:
slag = libpysal.weights.lag_spatial(wq, gdf[y_name])

In [None]:
slag[:5]

In [None]:
gdf['slag'] = slag

In [None]:
gdf[[y_name, 'slag']].head()

In [None]:
wker10.transform='r'
slag_kern = libpysal.weights.lag_spatial(wker10, gdf[y_name])

In [None]:
gdf['slag_kern'] = slag_kern

In [None]:
gdf[[y_name, 'slag', 'slag_kern']].head()

In [None]:
gdf.plot.scatter(x='slag', y='slag_kern', figsize=(8,6),
                     title='Spatial Lags');

In [None]:
ax = gdf.plot.scatter(x='slag', y='slag_kern', figsize=(8, 6), title='Spatial Lags')

# Get the limits for x and y axes to ensure the line spans the plot range
xlims = ax.get_xlim()
ylims = ax.get_ylim()

# Plot the diagonal line
min_lim = min(xlims[0], ylims[0])  # Start from the minimum of both axes
max_lim = max(xlims[1], ylims[1])  # End at the maximum of both axes

ax.plot([min_lim, max_lim], [min_lim, max_lim], 'r--')  # 'r--' makes a dashed red line
ax.set_xlim(xlims)
ax.set_ylim(ylims)

plt.show()

In [None]:
import matplotlib.pyplot as plt
import geopandas as gpd
import libpysal
import esda
import numpy as np



# Define the variable of interest - replace 'SID79' with your variable
variable = 'YPLL_rate'

# Create a spatial weights matrix based on Queen contiguity
w = libpysal.weights.Queen.from_dataframe(gdf, use_index=True)
w.transform = 'r'  # Row-standardize the weights

# Calculate the spatial lag of the variable
gdf['lag_variable'] = libpysal.weights.lag_spatial(w, gdf[variable])

# Generate a spatially permuted version of the variable
np.random.seed(123)  # Set a seed for reproducibility
gdf['permuted_variable'] = np.random.permutation(gdf[variable])

# Calculate the spatial lag of the permuted variable
gdf['lag_permuted_variable'] = libpysal.weights.lag_spatial(w, gdf['permuted_variable'])




# Plot the mapscv
fig, axs = plt.subplots(2, 2, figsize=(12, 12))

# First row: Original variable and its spatial lag
gdf.plot(column=variable, cmap='viridis', legend=True, ax=axs[0, 0])
axs[0, 0].set_title(f'Choropleth of {variable}')
axs[0, 0].set_axis_off()  # Turn off axis for this subplot

gdf.plot(column='lag_variable', cmap='viridis', legend=True, ax=axs[0, 1])
axs[0, 1].set_title(f'Spatial Lag of {variable}')
axs[0, 1].set_axis_off()  # Turn off axis for this subplot

# Second row: Permuted variable and its spatial lag
gdf.plot(column='permuted_variable', cmap='viridis', legend=True, ax=axs[1, 0])
axs[1, 0].set_title(f'Spatial Permutation of {variable}')
axs[1, 0].set_axis_off()  # Turn off axis for this subplot

gdf.plot(column='lag_permuted_variable', cmap='viridis', legend=True, ax=axs[1, 1])
axs[1, 1].set_title(f'Spatial Lag of Permuted {variable}')
axs[1, 1].set_axis_off()  # Turn off axis for this subplot


# Display the plots
plt.tight_layout()
plt.show()


In [None]:
import matplotlib.pyplot as plt
import geopandas as gpd
import libpysal
import esda
import numpy as np



# Define the variable of interest - replace 'SID79' with your variable
variable = 'YPLL_rate'

# Create a spatial weights matrix based on Queen contiguity
w = libpysal.weights.Queen.from_dataframe(gdf, use_index=True)
w.transform = 'r'  # Row-standardize the weights

# Calculate the spatial lag of the variable
gdf['lag_variable'] = libpysal.weights.lag_spatial(w, gdf[variable])

# Generate a spatially permuted version of the variable
np.random.seed(123)  # Set a seed for reproducibility
gdf['permuted_variable'] = np.random.permutation(gdf[variable])

# Calculate the spatial lag of the permuted variable
gdf['lag_permuted_variable'] = libpysal.weights.lag_spatial(w, gdf['permuted_variable'])

# Define a common color range for all plots
vmin = min(gdf[[variable, 'lag_variable', 'permuted_variable', 'lag_permuted_variable']].min())
vmax = max(gdf[[variable, 'lag_variable', 'permuted_variable', 'lag_permuted_variable']].max())

# Plot the maps in a 2x2 grid with the same color range
fig, axs = plt.subplots(2, 2, figsize=(12, 12))

# First row: Original variable and its spatial lag
gdf.plot(column=variable, cmap='viridis', legend=True, ax=axs[0, 0], vmin=vmin, vmax=vmax)
axs[0, 0].set_title(f'Choropleth of {variable}')
axs[0, 0].set_axis_off()  # Turn off axis for this subplot

gdf.plot(column='lag_variable', cmap='viridis', legend=True, ax=axs[0, 1], vmin=vmin, vmax=vmax)
axs[0, 1].set_title(f'Spatial Lag of {variable}')
axs[0, 1].set_axis_off()  # Turn off axis for this subplot

# Second row: Permuted variable and its spatial lag
gdf.plot(column='permuted_variable', cmap='viridis', legend=True, ax=axs[1, 0], vmin=vmin, vmax=vmax)
axs[1, 0].set_title(f'Spatial Permutation of {variable}')
axs[1, 0].set_axis_off()  # Turn off axis for this subplot

gdf.plot(column='lag_permuted_variable', cmap='viridis', legend=True, ax=axs[1, 1], vmin=vmin, vmax=vmax)
axs[1, 1].set_title(f'Spatial Lag of Permuted {variable}')
axs[1, 1].set_axis_off()  # Turn off axis for this subplot




# Display the plots
plt.tight_layout()
plt.show()


## Global Spatial Autocorrelation
Global spatial autocorrelation quantifies the degree of clustering or dispersion in a spatial dataset. This section introduces Moran’s I, a measure of spatial autocorrelation, to assess the overall spatial pattern of a variable. By analyzing global spatial autocorrelation, users gain a high-level understanding of spatial structure across the study area.

In [None]:
import esda

In [None]:
moran_res = esda.moran.Moran(gdf[y_name], wq)

In [None]:
moran_res.I, moran_res.p_sim

In [None]:
moran_res = esda.moran.Moran(gdf[y_name], wq, permutations=9999)
moran_res.I, moran_res.p_sim

In [None]:
from splot.esda import plot_moran

plot_moran(moran_res, zstandard=True);

In [None]:
moran_res_r = esda.moran.Moran(gdf['permuted_variable'], wq)

from splot.esda import plot_moran

plot_moran(moran_res_r, zstandard=True);

In [None]:
moran_res_r.I, moran_res_r.p_sim

In [None]:
import splot.esda

In [None]:
from splot.esda import moran_scatterplot
from esda.moran import Moran_Local
import matplotlib.pyplot as plt

# calculate Moran_Local and plot
moran_loc = Moran_Local(gdf[y_name], wq)
fig, ax = moran_scatterplot(moran_loc, zstandard=True)
ax.set_xlabel(f"{y_name}")
ax.set_ylabel(f"Spatial Lag of {y_name}")
plt.show()

## Local Spatial Autocorrelation
While global measures offer an overview, local spatial autocorrelation techniques reveal spatial patterns at a finer scale. This section demonstrates how to use Local Indicators of Spatial Association (LISA) to identify hotspots, clusters, and outliers within the dataset. This localized approach is particularly valuable for identifying specific areas of interest within a larger spatial context.

In [None]:
from splot.esda import moran_scatterplot
from esda.moran import Moran_Local
import matplotlib.pyplot as plt

# calculate Moran_Local and plot
moran_loc = Moran_Local(gdf[y_name], wq)
fig, ax = moran_scatterplot(moran_loc, zstandard=True, p=0.05)
ax.set_xlabel(f"{y_name}")
ax.set_ylabel(f"Spatial Lag of {y_name}")
plt.show()

In [None]:
from splot.esda import lisa_cluster

lisa_cluster(moran_loc, gdf, p=0.05, figsize = (9,9))
plt.show()



In [None]:
from splot.esda import plot_local_autocorrelation
plot_local_autocorrelation(moran_loc, gdf, y_name, figsize=(9, 9))
plt.show()

In [None]:
from splot.esda import plot_local_autocorrelation
plot_local_autocorrelation(moran_loc, gdf, y_name, figsize=(9,9), quadrant=1)
plt.show()

In [None]:
from splot.esda import plot_local_autocorrelation
plot_local_autocorrelation(moran_loc, gdf, y_name, figsize=(9,9), quadrant=3)
plt.show()