# Exploring Disparities in Exposure to Freeways


In this notebook, we explore the following questions:

1. What percentage of the San Diego County population resided within 1,000 of the state highway network in 2010?
2. Are there disparities across racial and ethnic groups in these percentages?

(Please note that we are using the Census definitions of race and ethnicity. More details on these definitions are available [here](https://www.census.gov/newsroom/blogs/random-samplings/2021/08/measuring-racial-ethnic-diversity-2020-census.html)).

We will be carrying out a disparity analysis using different gecomputational techniques:

- choropleth mapping
- interactive visualization
- clipping
- buffering
- areal interpolation

In [None]:
from IPython.display import IFrame
IFrame('https://www.latimes.com/projects/la-me-freeway-how-close-map/', width=700, height=350)

## Background

- [LA Times Interactive Map](https://www.latimes.com/projects/la-me-freeway-how-close-map/)
- [LA Times 2016-11-07](https://www.latimes.com/local/lanow/la-me-ln-freeway-building-pollution-20161107-story.html)
- [LA Times Freeway Pollution Project](https://www.latimes.com/projects/la-me-freeway-pollution/)






In [None]:
import base
import geopandas
import warnings
warnings.filterwarnings('ignore')

## Mapping San Diego

We will be using data for San Diego county from the 2010 US Census provided through the package [GeoSNAP](https://spatialucr.github.io/geosnap-guide/content/home.html).

We first read in a database that attributes from the census and the geometries for the [Census tracts](https://www2.census.gov/geo/pdfs/education/CensusTracts.pdf).

In [None]:
sd = base.gdf

This is a [GeoDataFrame]( ) that has a large number of attributes we can explore:

In [None]:
sd.shape

The first number tells us that there are 627 census tracts, and for each tract we have data on 195 attributes.

We can peak at the first five records:

In [None]:
sd.head()

We can examine the spatial arrangement of the census tracts:


In [None]:
sd.explore(tooltip=False)

The map is interactive and allows for:

- panning (click and drag)
- zooming in (double click, or scroll-forward)
- zoming out (shift-double click or scroll-backwards)
- hover (tooltip will popup - turned off for now)

So let's explore one particular attribute by mapping its spatial distribution using a choropleth map. We will pick the median home value (in thousands of dollars) for each tract and use a decile classification:

In [None]:
sd.explore(column='median_home_value', scheme='quantiles', k=10, legend=True,
           tooltip=['median_home_value'],
          legend_kwds=dict(colorbar=False))

We can modify the map in a number of ways:

- change the number of classes (k=5)
- change the cmap (colormap, cmap='Greens')

In [None]:
sd.explore(column='median_home_value', scheme='quantiles', k=5, legend=True,
           cmap='Greens',tooltip=['median_home_value'],
          legend_kwds=dict(colorbar=False))

## Exploring San Diego's Spatial Sociodemographic Structure

For our disparities research, we want to explore how different population groups are spatially distributed relative to the freeway network. So we can first look at the spatial distribution of three groups:

- nonhispanic white
- hispanic
- nonhispanic black

These are from the census definitions.

In [None]:
vars = ['n_total_pop', 'n_nonhisp_white_persons', 'n_hispanic_persons', 'n_nonhisp_black_persons']

In [None]:
county_totals = sd[vars].sum() # totals in the county
county_totals

Composition for the county:

In [None]:
county_totals/county_totals[0]

Overall then, the "white" population was 48.5 percent of the total population in 2010, people indicating hispanic represent 32 percent of the population and individuals indicating black for race represet 4.8 percent of the 2010 population. (There are other groups that we do not include in what follows so these three do not represent the entire population).

For each tract, we want to the the composition of the tract's population, expressed as the percentage of the tract's population that was in each of these three groups:

In [None]:
base.choro3(sd, 'p_nonhisp_white_persons', 'p_hispanic_persons', 'p_nonhisp_black_persons');

Here lighter colors indicate a higher percentage of the tract's population is in a particular group.

In [None]:
import seaborn as sns

vars = ['p_nonhisp_white_persons', 'p_hispanic_persons', 'p_nonhisp_black_persons']

sns.set_theme(style='ticks')
sns.pairplot(sd[vars]);

## Integrating Road Networks for Environmental Justice Analysis

### California Highway Network

In [None]:
from base import roads

In [None]:
roads.plot();

### Clipping to Select San Diego Components of State Network

In [None]:
sd_county = sd.unary_union

In [None]:
sd_county

In [None]:
sd_roads = geopandas.clip(roads, sd_county)

In [None]:
sd_roads.plot();

In [None]:
sd_roads.explore()

## Buffering: Defining Areas of Concern 

We will define a [*buffer*](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.buffer.html) that contains all the points within 1,000 feet of a freeway.


In [None]:
b1000 = sd_roads.buffer(304.8) # units are in meters

In [None]:
b1000.explore() # buffer segments will occlude one another in places

In [None]:
bdf = geopandas.GeoDataFrame(geometry=[b1000.unary_union])
bdf.crs = sd.crs

In [None]:
bdf.explore() # zoom in to see the buffer overlay on the network

### Visual Analysis of Disparities
Recall, that the Census data we examined above is reported for census tracts.

We could visually compare the spatial distribution of the different groups relative to the freeway network:

In [None]:
base.choro3roads();

We can also explore the hispanic population in more detail

In [None]:
import folium
opacity=0.3
m = sd.explore(column='p_hispanic_persons',
              
              tooltip=['p_hispanic_persons'],
              style_kwds={'fillOpacity':opacity},
              cmap='viridis', scheme='quantiles',
               k=5,
              legend_kwds=dict(colorbar=False),
               legend=True)
bdf.explore(m=m,
          color='blue', style_kwds={'fillOpacity':opacity}
        
         )
               
folium.TileLayer('Stamen Toner', control=True).add_to(m)  # use folium to add alternative tiles
folium.LayerControl().add_to(m)  # use folium to add layer control

m

## Who Lives in the Buffer?

Now that we have defined our area of concern as the highway buffer, we would like to explore whether there are
differences in the socioeconomic composition of the populations who resided within the buffer versus elsewhere.

That is, were people of color disproportionately residing in the buffer region?

To answer this question we will compare the composition of the overall population  to that of the buffer.

The overall composition from above is:

In [None]:
base.county_composition

Unfortunately, there is no official data published that reports the composition within the buffer.

## Areal Interpolation
We can however, adopt [*areal interpolation*](https://github.com/pysal/tobler#pysal-tobler) to estimate the population inside the buffers.

In [None]:
import tobler
ae = tobler.area_weighted.area_interpolate

In [None]:
extensive_variables = ['n_total_pop',
                       'n_nonhisp_white_persons',
                       'n_hispanic_persons',
                       'n_nonhisp_black_persons'
                       ]

In [None]:
estimates = ae(source_df=sd, target_df=bdf,
               extensive_variables=extensive_variables,
               allocate_total=False)

In [None]:
estimates.head() # estimates of population in buffer

In [None]:
county_totals

In [None]:
estimates.sum() / county_totals # percent of each group living in buffer

Overall, we see that 23.7 percent of San Diego's population lived within 1,000 feet of a freeway in 2010.

Loosely speaking, if you randomly selected a person from San Diego county in 2010, the probability that individual lived within the buffer would 0.237.

If we now condition on race/ethnicity, we see differences emerge. If you randomly selected a person who identified as nonhispanic white on the census, that probability drops to 0.222.

The probability that a randomly selected person from the hispanic population in the county resided in the buffer is estimed at 0.274.

And for individuals who identified as black, the probability of residing in the freeway buffer is 0.221.







Another way to look at this is to compare the population composition for the county as a whole to the population composition for the buffers.

In [None]:
base.county_composition

In [None]:
estimates[extensive_variables]/estimates.n_total_pop[0]

What we find is that the white population is underrepresented within the buffer (45.5 percent of the buffer population versus 48.5 percent of the county population),
the hispanic population is overrepresented within the buffer (37 percent of the buffer population versus 32 percent of the county population), and the 
black population is slightly underrepresented within the buffer (4.5 percent of the buffer population versus 4.8 percent of the county poulation).

## Summary

This notebook introduces a number of spatial methods that are used in environmental disparities research. 
These focus on exploratory analysis which lets the student interact with the spatial patterns underlying the world they experience.