# Modeling Neighborhood Dynamics with `geosnap`

The geosnap package is designed for geodemographic analysis and regionalization applied to longitudinal data. Following those analyses, it also provides tools for modeling neighborhood composition into the future using spatial and temporal transition rules learned from the past.

In [None]:
%load_ext watermark
%watermark -v -a "author: eli knaap" -d -u -p segregation,libpysal,geopandas,geosnap

In [None]:
from geosnap import DataStore
from geosnap.io import get_acs
from geosnap.analyze import cluster, regionalize

In [None]:
from geosnap.visualize import plot_timeseries, animate_timeseries

In [None]:
import geopandas as gpd

## Examining Data

In [None]:
store = DataStore()

The DataStore class provides access to hundreds of neighbrohood indicators for the U.S. collected from federal agencies. We store these datasets in the cloud and stream them on demand. But if you plan on doing repeated analyses you can store the data locally (which we've already done on the JupyterHub)

In [None]:
dir(store)

Each dataset in the datastore covers the entire country for a single time period. To generate a dataset for a single place, geosnap provides several convenience functions

In [None]:
chicago = get_acs(store, county_fips='17031', level='tract', years=list(range(2013, 2017)))  # without specifying a subset of years, we get everything

In [None]:
chicago

(If several people hit the server at once, things can slow down. There's a local copy of the data in that case)

In [None]:
# chicago = gpd.read_parquet("data/chicago_acs.parquet")

In [None]:
chicago.info()

In [None]:
chicago.head()

There are also convenient plotting methods for looking at change over time. A useful feature here is that the choropleth bins are the same for each time period, making it easy to see change over time

In [None]:
plot_timeseries(chicago, "median_home_value", scheme='quantiles', k=7, nrows=2, ncols=2, cmap='YlOrBr')

Still it can be difficult to see minute changes across the various maps. The `animate_timeseries` function can make it easier to see what's happening, like the steady income decline in Midlothian near the southern edge of the region

In [None]:
animate_timeseries(chicago, 'median_home_value', scheme='quantiles', k=7, cmap='YlOrBr', filename='figs/chicago_income_change.gif', fps=1.5)

In [None]:
from IPython.display import Image

In [None]:
Image("figs/chicago_income_change.gif", width=800)

Note here that we're comparing overlapping samples from the ACS 5-year survey, which the Census Bureau recommends against. Here it just makes a good example :)

## Modeling Neighborhood Types

With `geosnap`, it's possible to look at *temporal* geodemographics without writing much code. Under the hood, the package provides tools for scaling each dataset within its own time period, adjusting currency values for inflation, and ensuring that times, variables, and geometries stay aligned properly. Together those tools make it easy to explore how different portions of the region transition into different neighborhood types over time, and if desired, model the evolution of neighborhood change as a spatial Markov process.

Any variables could be used to examine neighborhood transitions, but we'll return to the simple set of sociodemographic veriables used before to understand if/how patterns of racial and socioeconomic segregation and neighborhood partitioning unfold over time

In [None]:
columns = ['median_household_income', 'median_home_value', 'p_asian_persons', 'p_hispanic_persons', 'p_nonhisp_black_persons', 'p_nonhisp_white_persons']

In [None]:
cluster?

In [None]:
chicago_ward = cluster(chicago, columns=columns, method='ward', n_clusters=6)

The simplest version of the function returns the geodataframe with new cluster labels appended

In [None]:
chicago_ward.head()

In [None]:
plot_timeseries(chicago_ward, 'ward', categorical=True, nrows=2, ncols=2)

In [None]:
animate_timeseries(chicago_ward, 'ward', categorical=True, filename='figs/chicago_type_change.gif', fps=1.5)

The vast majority of tracts are assigned to the same geodemographic type in each time period, but some transition into different types over time. The ones that *do* transition tend to be those on the edges of large contiguous groups (i.e. change tends to happen along the periphery and move inward, implying a certain kind of spatial dynamic)

In [None]:
Image('figs/chicago_type_change.gif', width=800)

If we add the argument `return_model=True`, then the function returns the same geodataframe as before, as well as a `ModelResults` class that holds additional diagnostic measures, as well as plotting and simulation methods

In [None]:
chicago_ward, chi_model = cluster(chicago, columns=columns, method='ward', n_clusters=6, return_model=True)

In [None]:
chi_model?

In [None]:
type(chi_model)

For example, the `silhouette_scores` attribute makes computing a silhouette coefficient for the cluster model a one-liner:

In [None]:
chi_model.silhouette_scores

Each observation is given its own silhouette score to identify potential spatial outliers, or the measures can be summarized to provide an aggregate statistic

In [None]:
chi_model.silhouette_scores.silhouette_score.mean()

Since the data are indexed by time, we can also examine whether certain time periods have a poorer fit versus others:

In [None]:
chi_model.silhouette_scores.groupby('year').silhouette_score.mean()

## Analyzing Neighborhood Change

With the cluster model in hand, each census tract is represented as a series of neighborhood types over time (i.e. what we plotted above). To understand which neighborhoods have experienced the most change, the `ModelResults` class implements a method called "LINCS", the Local Indicator of Neighborhood Change. The `lincs` attribute measures how often a given spatial unit shares its cluster assignment with the other units over time.

If a "neighborhood" is grouped with many different neighborhoods over time (rather than joining a single group with the same members repeatedly), then it shows more variation and thus a higher LINC score

In [None]:
chi_lincs = chi_model.lincs

In [None]:
chi_lincs

In [None]:
chi_lincs.plot('linc',legend=True, cmap='plasma')

Yellow places have changed the most in our cluster model, and blue places have remained the most stagnant. We can use the LISA statistics from `esda` to locate hotspots of change or stagnation

In [None]:
chi_lincs.linc.plot(kind='density')

In [None]:
from esda import Moran_Local

In [None]:
from libpysal.weights import Queen

In [None]:
w = Queen.from_dataframe(chi_model.lincs)

In [None]:
linc_lisa = Moran_Local(chi_lincs.linc, w)

Recall that the LISA statistic measures the association between a focal observation and its neighbors. When we have spatial units (i.e. tracts) with a high LINC score, and their neighboring tracts *also* have high LINC scores, then we've found a local pocket of neighborhood change.

In [None]:
linc_lisa.Is

In [None]:
chi_lincs.assign(i=linc_lisa.Is).plot('i', legend=True)

In [None]:
from splot.esda import plot_local_autocorrelation, lisa_cluster

In [None]:
plot_local_autocorrelation(linc_lisa, chi_lincs.to_crs(3857), 'linc')

In [None]:
import contextily as ctx

In [None]:
fig, ax = lisa_cluster(linc_lisa, chi_lincs.to_crs(3857), alpha=0.6, figsize=(8,10))
ctx.add_basemap(ax=ax, source=ctx.providers.CartoDB.Positron, zoom=11)
fig.tight_layout()

Red areas of high-high clusters of LINC scores are places undergoing change, whereas blue places (low LINC scores surrounded by low scores) are those that have changed very little over time. Orange places are particularly interesting, as they represent local pockets of change surrounded by larger pockets of stagnation.

Substantively, this example shows that Chicago's famously segregated South Side and West Side form large regions of the city that demonstrate little demographic/socioeconomic change, particularly in neighborhoods like Rosewood and West Garfield. By contrast, places like Brideport and Portage Park have witnessed substantial change over the last decade according to this model

## Modeling Neighborhood Transitions

We can also use the sequence of labels to create a spatial Markov transition model. These models examine how often one neighborhood type transitions into another type--then how these transition rates change under different conditions of spatial context

In [None]:
from geosnap.visualize import plot_transition_matrix

In [None]:
plot_transition_matrix(chicago_ward, cluster_col='ward')

And we can use those transition rates to make predictions about future conditions

In [None]:
future = chi_model.predict_markov_labels(time_steps=10, increment=1)

In [None]:
animate_timeseries(future, 'predicted', categorical=True, filename='figs/chicago_predictions.gif', fps=1.5)

In [None]:
Image('figs/chicago_predictions.gif', width=800)

From a social equity perspective, these predictions can help inform investments in place that are likely to provide the greatest return, such as providing place-based affordable houising in high-opportunity (but low likelihood of change) or by providing displacement protections in places that show large potential for change