# Segregation Analysis with PySAL

In [None]:
%load_ext watermark
%load_ext autoreload
%autoreload 2

In [None]:
%watermark -v -a "author: eli knaap" -d -u -p segregation,libpysal,geopandas

Here, we'll use PySAL's `segregation` module to analyze racial segregation in southern california

In [None]:
import geopandas as gpd

## Data Prep

In [None]:
scag = gpd.read_parquet("data/scag_region.parquet")

We need to reproject the data into a more appropriate coordinate system. We can estimate the appropriate 
UTM zone using a method on the geodataframe

In [None]:
scag = scag.to_crs(scag.estimate_utm_crs())

In [None]:
scag.crs

In [None]:
scag.dropna(subset=['p_hispanic_persons']).explore(column='p_hispanic_persons',
                                                scheme='quantiles', 
                                                cmap='Blues',
                                                k=8,
                                                tooltip=['p_hispanic_persons'], 
                                                style_kwds={'stroke':False})

some background on [fips codes](https://www.policymap.com/2012/08/tips-on-fips-a-quick-guide-to-geographic-place-codes-part-iii/)

In [None]:
scag['county'] = scag.geoid.str[:5]

In [None]:
scag.county.unique()

In [None]:
county_names = ["Los Angeles", "Imperial", "Orange", "San Bernadino", "San Diego", "Riverside", "Ventura"]

In [None]:
namer = dict(zip(scag.county.unique(), county_names))

In [None]:
namer

Now that we know which county is which, we could just use these codes to divide up the region into pieces. But lets go ahead and replace the codes with their names. It's more to type, but if we want to subset later, we won't have to go look up the codes again

In [None]:
scag['county'] = scag.county.replace(to_replace=namer)

In [None]:
scag.county

In [None]:
coastal = scag[scag.county.isin(["Los Angeles", "Orange", "San Diego", "Ventura"])]

In [None]:
inland = scag[scag.county.isin(['Riverside', "San Bernadino", "Imperial"])]

In [None]:
coastal.plot(column='county')

In [None]:
inland.plot(column='county')

In [None]:
pop_groups = ['n_asian_persons', 'n_hispanic_persons', 'n_nonhisp_black_persons', 'n_nonhisp_white_persons']

In [None]:
scag[pop_groups] = scag[pop_groups].astype(int)
scag['n_total_pop'] = scag['n_total_pop'].astype(int)

## Calculating Segregation Measures

The `segregation` package calculates dozens of segregation indices, each of which captures something different about the ways that population groups interact or remain separated in space. Most of the commonly-used statistics are global or aggregate measures, meaning they summarize the total level of segregation across all units in a study region. 

### Classic (aspatial) Single-Group Indices

Single-group indices measure the partitioning of one population group relative to everyone else. 

In [None]:
from segregation.singlegroup import Dissim, Gini, Entropy

In [None]:
dissim_hisp = Dissim(scag, "n_hispanic_persons", "n_total_pop")
dissim_black = Dissim(scag, "n_nonhisp_black_persons", "n_total_pop")

gini_hisp = Gini(scag, "n_hispanic_persons", "n_total_pop")
gini_black = Gini(scag, "n_nonhisp_black_persons", "n_total_pop")

entropy_hisp = Entropy(scag, "n_hispanic_persons", "n_total_pop")
entropy_black = Entropy(scag, "n_nonhisp_black_persons", "n_total_pop")

Each class has a `statistic` attribute that holds the computed value for each segregation measure

In [None]:
dissim_hisp.statistic

In [None]:
dissim_black.statistic

In [None]:
gini_hisp.statistic

In [None]:
gini_black.statistic

In [None]:
entropy_hisp.statistic

In [None]:
entropy_black.statistic

According to the Dissimilarity and Gini indices, the black population in southern California is more segregated than the Latinx/Hispanic population, but the reverse is true according to the Entropy index

#### Batch Computation

To examine several indices at once, `segregation` provides a set of "batch_compute" functions. 

In [None]:
from segregation.batch import batch_compute_singlegroup

In [None]:
scag['n_total_pop'] = scag['n_total_pop'].astype(int)

In [None]:
scag_all_singlegroup = batch_compute_singlegroup(scag.dropna(subset=['n_hispanic_persons']), "n_hispanic_persons", "n_total_pop")

In [None]:
from segregation.singlegroup import BoundarySpatialDissim

In [None]:
BoundarySpatialDissim(scag, "n_hispanic_persons", "n_total_pop").statistic

In [None]:
scag_all_singlegroup

### Multigroup Indices

Multigroup measures capture the partitioning of several groups simultaneously

In [None]:
from segregation.multigroup import MultiInfoTheory, MultiGini, MultiDiversity

In [None]:
multi_div_coast = MultiDiversity(coastal, pop_groups)
multi_div_inland = MultiDiversity(inland, pop_groups)


multi_info_coast = MultiInfoTheory(coastal, pop_groups)
multi_info_inland = MultiInfoTheory(inland, pop_groups)

For multigroup diversity:

In [None]:
print(f"coast: {multi_div_coast.statistic}")
print(f"inland: {multi_div_inland.statistic}")

for multigroup information theory:

In [None]:
print(f"coast: {multi_info_coast.statistic}")
print(f"inland: {multi_info_inland.statistic}")

Regardless which index is used, multigroup segregation is higher in the coastal region than the inland one

#### Batch Computation

Again, the measures can be "batch computed"

In [None]:
from segregation.batch import batch_compute_multigroup

In [None]:
scag_all_multigroup = batch_compute_multigroup(scag, groups=pop_groups)

In [None]:
scag_all_multigroup

### Spatial Segregation Indices



Every index in the `segregation` package can leverage spatial relationships in its computation. Some segregation indices include a spatially-explicit formulation, e.g. the [spatial dissimilarity index](https://journals.sagepub.com/doi/abs/10.1080/00420989320080551?). Others can be generalized into spatial versions using the logic of [Reardon et al](https://link.springer.com/article/10.1353/dem.0.0019), in which case we adopt the notion of ['egohoods'](https://escholarship.org/uc/item/71m5522z)

In [None]:
from libpysal import weights

In [None]:
from segregation.singlegroup import SpatialDissim

In [None]:
w_queen = weights.Queen.from_dataframe(scag)
w_dist = weights.DistanceBand.from_dataframe(scag, 2500)

#### Single Group

In [None]:
# aspatial
dissim = SpatialDissim(scag, 'n_hispanic_persons', 'n_total_pop')

In [None]:
dissim.statistic

In [None]:
# spatially-explicit index (using queen neighborhoods)

dissim_queen = SpatialDissim(scag, 'n_hispanic_persons', 'n_total_pop', w=w_queen)

In [None]:
dissim_queen.statistic

In [None]:
# spatially-explicit index using distance-based neighborhoods of 2500m)
# exeryone inside the distance-band has the same interaction potential

dissim_dist = SpatialDissim(scag, 'n_hispanic_persons', 'n_total_pop', w=w_dist)

In [None]:
dissim_dist.statistic

In [None]:
# spatially-implicit Dissimilarity index
# the interaction potential among people inside the distance-band is weighted by proximity

dissim_implicit_linear = Dissim(scag, 'n_hispanic_persons', 'n_total_pop', distance=2500)

In [None]:
dissim = Dissim(scag, 'n_hispanic_persons', 'n_total_pop')

In [None]:
dissim.statistic

In [None]:
dissim_implicit_linear.statistic

In [None]:
dissim_implicit_gaussian = Dissim(scag, 'n_hispanic_persons', 'n_total_pop', distance=2500, function='gaussian')

In [None]:
dissim_implicit_gaussian.statistic

In [None]:
# spatially-implicit Dissimilarity index
dissim_implicit = Dissim(scag, 'n_hispanic_persons', 'n_total_pop', distance=3000)

In [None]:
dissim_implicit.statistic

#### Multi Group

In [None]:
spatial_info_queen = MultiInfoTheory(scag, pop_groups, w=w_queen)
spatial_info_dist = MultiInfoTheory(scag, pop_groups, w=w_dist)

In [None]:
info_spatial = MultiInfoTheory(scag, groups=pop_groups, distance=2000)

In [None]:
spatial_info_queen.statistic

In [None]:
spatial_info_dist.statistic

In [None]:
info_spatial.statistic

## Spatial Segregation Dynamics

#### Multiscalar Profile

The multiscalar segregation profile is a way of measuring how global versus local the segregation patterns are in a region. As stylized examples, consider a city where one population groups lives on the eastern half and another group lives on the western half (large-scale/macro segregation) versus a city full of dense apartment buildings, but each building is occupied exclusively by members of a single population group

![](https://knaaptime.com/images/macromicro.jpeg)

In [None]:
from segregation.dynamics import compute_multiscalar_profile

In [None]:
distances = [1500., 2500., 3500., 4500., 5500.]

In [None]:
prof = compute_multiscalar_profile(scag,segregation_index=MultiInfoTheory, groups=pop_groups, distances=distances)

In [None]:
prof.plot()

We can also look at how the segregation profiles differ by region. If we plot them all on the same graph, we can compare the slopes of the lines to see how the shape of segregation differs between places in the southern cal region

In [None]:
coastal_prof = compute_multiscalar_profile(coastal, segregation_index=MultiInfoTheory, groups=pop_groups, distances=distances)
inland_prof = compute_multiscalar_profile(inland, segregation_index=MultiInfoTheory, groups=pop_groups, distances=distances)

In [None]:
import pandas as pd

In [None]:

pd.Series(prof, name='socal').plot(legend=True)
pd.Series(coastal_prof, name='coastal').plot(legend=True)
pd.Series(inland_prof, name='inland').plot(legend=True)

## Local Segregation Measures

Unlike global measures, local segregation statistics measure 

In [None]:
from segregation.local import LocalDistortion, MultiLocationQuotient

In [None]:
d = LocalDistortion(scag, groups=pop_groups)

In [None]:
d.data

In [None]:
import contextily as ctx

In [None]:
d.data.crs

In [None]:
ax = d.data.to_crs(3857).plot('distortion',  scheme='fisherjenks', cmap='RdBu_r', alpha=0.6, figsize=(10,10) )
ctx.add_basemap(ax=ax)

In [None]:
d.data.explore('distortion',cmap ='RdBu_r', style_kwds={'stroke':False}, scheme='fisherjenks', tiles="CartoDB Positron")

## Single-Value Inference

This shows that segregation in the coastal region is considerably larger than the inland region at every scale, though have similar shapes to their overall segregation profiles.

In [None]:
from segregation.inference import SingleValueTest

In [None]:
entropy_test = SingleValueTest(entropy_black)

In [None]:
entropy_test.p_value

In [None]:
entropy_test.plot()

## Comparative Inference

In [None]:
from segregation.inference import TwoValueTest

In [None]:
info_test = TwoValueTest(MultiInfoTheory(coastal, pop_groups),
            MultiInfoTheory(inland, pop_groups))

In [None]:
info_test.est_point_diff

In [None]:
info_test.plot()

## Exercise

1. Which county in the socal region has the greatest level of multiracial segregation, (using the 4 categories above) according to the multigroup Information Theory index?

2. According to the Gini index, is hispanic/latino segregation in Riverside County greater or less than Ventura County? Is that difference significant?


In [None]:
#%load solutions/06.py