### Background

My wife Caitlyn Van Heest and I worked together to create this first draft interactive map of American Community Survey Census Tracts (2018 data courtesy of Colorado Information Marketplace) that looks at the index of Black / African American (hereafter "B/AA") Household frequency among census blocks. If you're unfamiliar with indexing, the goal is to identify higher concentrations of some occurrence (B/AA households) within the data ignoring the scale at which you are looking. 

*For example: 
Census block A has 500 total inhabitants and 50 are B/AA
Census block B has 100 total inhabitants and 40 are B/AA.*


*Among these two blocks (600 total inhabitants) 90 identify as B/AA. The index for block A will be 50/500 (0.10) over (90/600), equaling 0.66. The index for block B is (40/100) / (90/600) = 2.66. Even though block B is smaller than A, there is a higher concentration B/AA inhabitants in block b. In fact, block B has over two-and-a-half times the mean concentration of black households among all census blocks in this example.* 


The map layer with race/ethnicity indices uses the mean concentration of households per race/ethnicity among Colorado as the denominator and not the United States at large.
In the map below, different layers exist for ranges of indices from 1.01X up to the max around ~15.9x the average concentration of B/AA inhabitants among the whole state. Each layer can be toggled using the small layer pane in the upper right corner. 

![Layer Toggles](../../references/map_layer_toggle.JPG)

The individual markers throughout the map are healthcare locations likely to be providing care for COVID-19 cases (limited to those inside 1x or more B/AA blocks). Click on a marker to see the name, phone, and if they accept Medicaid (y/n).

![Tooltips](../../references/tooltip.JPG)

-------

In [225]:
import geopandas as gpd
import descartes
import folium
from folium.plugins import MarkerCluster
import branca.colormap as cm
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

In [None]:
#read in geoJSON data for the ACS blocks and healthcare locations
hc_df = gpd.read_file('../../data/output/healthcare.geojson')
ACS_df = gpd.read_file('../../data/output/ACS.geojson')

# high_indexing = gpd.read_file('../../data/output/black_aa_indexed.geojson')
hc_df_clipped = gpd.read_file('../../data/output/clipped_healthcare.geojson')

#read in polygons of black / aa households indexed in colorado
index_101_500 = gpd.read_file('../../data/output/index_101_500.geojson')
index_500_1000 = gpd.read_file('../../data/output/index_500_1000.geojson')
index_1000_1300 = gpd.read_file('../../data/output/index_1000_1300.geojson')
index_1301_plus = gpd.read_file('../../data/output/index_1301_plus.geojson')

In [None]:
#launch OSM basemap focused on colorado at state level
map_osm = folium.Map(location=[39.3324, -105.1420], zoom_start=7)

#add highest indexing polygons
folium.GeoJson(index_1301_plus.to_json(),
               name='13x (or more) concentration of Black/AA households relative to Overall Colorado',
               tooltip=folium.features.GeoJsonTooltip(
                   fields=['pop', 'black_indexed',
                           'pct_black'], localize=True)).add_to(map_osm)
#add 2nd highest indexing polygons

folium.GeoJson(index_1000_1300.to_json(),
               name='10x to 13x concentration of Black/AA households',
               tooltip=folium.features.GeoJsonTooltip(
                   fields=['pop', 'black_indexed',
                           'pct_black'], localize=True)).add_to(map_osm)

#add 3rd highest indexing polygons
folium.GeoJson(index_500_1000.to_json(),
               name='5x to 10x concentration of Black/AA households',
               tooltip=folium.features.GeoJsonTooltip(
                   fields=['pop', 'black_indexed',
                           'pct_black'], localize=True)).add_to(map_osm)

#add 4th highest indexing polygons
folium.GeoJson(index_101_500.to_json(),
               name='1.01x to 5x concentration of Black/AA households',
               tooltip=folium.features.GeoJsonTooltip(
                   fields=['pop', 'black_indexed',
                           'pct_black'], localize=True)).add_to(map_osm)

# use a clustered view of markers
marker_cluster = MarkerCluster().add_to(map_osm)

# add a marker for every record in the filtered data
for each in hc_df_clipped.iterrows():
    folium.Marker(
        [each[1]['Latitude'], each[1]['Longitude']],
        popup=str(each[1]['FAC_NAME'] + '\nPhone: ' + each[1]['PHONE'] +
                  '\nMedicaidAccept: ' + each[1]['MEDICAID']),
        clustered_marker=True).add_to(marker_cluster)

folium.LayerControl().add_to(map_osm)


map_osm

In [None]:
html_string = map_osm.get_root().render()
with open("../../data/output/map_html.html", "w") as text_file:
    print(html_string, file=text_file)

## Visualizing Distributions of Population Indices

Visualizing the distributions of indices gives insight into the frequencies of different concentrations of race/ethnicity groups in the state. For example. the distribution of white households indexed in Colorado looks like this:

![Layer Toggles](../../references/figures/white_index_hist.png)

You can see the most frequent indices tend to be around 1.25x. This can be interpreted that most frequently, blocks in Colorado have about 1.25x the "typical" number of white households as compared to the average concentration of white households in all of Colorado.

In contrast, here are the distributions for the indices of Black/African American, Asian, Hawaiian/Pacific Islander, and Native American race/ethnicities among those same ACS blocks.

![histograms by groups](../../references/figures/hist_by_group.png)

In [None]:
###commentary

## Commentary


### Plotting ECDF of Race/Ethnicity Index Values

The empirical cumulative distribution function allows us to clearly see how much of a trait/column/feature's distribution falls before a certain value. We plotted the ECDF as an additional visual for the distributions to emphasize the localization of different groups in Colorado. Groups that have high concentrations of households in relatively few blocks will have steeply rising curves that then approach the limit of 100% very slowly. See Hawaiian / Pacific Islanders for example. Among indexed values, more than 90% are 0, meaning there are no Hawaiian/ Pacific Islanders for 90+% of blocks in Colorado. Because the average concentration of households across the state is so low, the blocks that do have households identifying as such express huge concentrations (up to 175x the mean concentration of households identifying as such compared to Colorado in general). Race/Ethnic groups that are more common across blocks would look like a more slowly rising curve that eventually reach 100.

Visualizing these distributions does not directly shine light on any huge inequity itself at this point. However, very high concentrations of minority communities are often associated with historical practices such as [redlining]('https://en.wikipedia.org/wiki/Redlining') and [upward mobility is impacted by one's family geography, especially among Black Americans]('https://www.nber.org/papers/w24441.pdf'). For the purposes of this analysis, the focus is on geography and access to care among these minority communities so that individual awareness, giving, and support can be better focused on communities likely to be facing disparities in health equity.

![ecdf by group](../../references/figures/ecdf_by_group.png)

Tips for interpretation:
The y axis contains percentages. The X axis contains the index (the concentration of households of the specific group per block compared to the average concentration in the state) among the specified group.

Example: 90% of blocks contain less than ~3x the concentration of Black / African American Households than the average block in Colorado. 

The overall shape of the curve is important. Curves that rise sharply at or near x=0 imply a high number of blocks with few to no households of the specified group among an increasingly large fraction of total blocks. A curve that rises steadily from bottom left to top right would imply a race/ethnicity that is geographically dispersed and occupies blocks at many different concentrations within the state.

### Choropleth Mapping


In [None]:
#read in separated datasets
acs_geometry = gpd.read_file('../../data/output/acs_geometry_only.geojson').to_json()
acs_metrics = pd.read_csv('../../data/output/acs_data_only.csv')

In [None]:
m = folium.Map(location=[39.3324, -105.1420],
               tiles='cartodbpositron',
               zoom_start=7)

# def add_polygon_layer(json_geometry, dataframe, column, map_var, layer_name):
#     frame = dataframe.set_index('geonum')[column]
#     print(frame.describe())
#     folium.GeoJson(json_geometry,
#                    name = layer_name,
#                    style_function=lambda x: {
#                        'fillcolor': step(frame[x['properties']['geonum']]),
#                        'color': 'grey',
#                        'weight': '0.25',
#                        'fillopacity': 0.8
#                    }).add_to(map_var)

# layer_names = [
#     'Black/African American Index', 'Asian American Index',
#     'Hawaiian/Pacific Islander Index', 'Native American Index', 'Other Index',
#     'Non-White Households Index']

# for position, group in enumerate(['black_nh_index','asian_nh_index','hawpi_nh_index','ntvam_nh_index','other_nh_index','non_white_nh_index']):
#     add_polygon_layer(acs_geometry, acs_metrics, group, m, layer_names[position])

# B/AA Populations
frame = acs_metrics.set_index('geonum')['black_nh_index']
maxscale = frame.max()
step = cm.linear.PuBuGn_09.to_step(8).scale(1, maxscale)
folium.GeoJson(acs_geometry,
               name='Black/African American Index',
               style_function=lambda x: {
                   'fillColor': step(frame[x['properties']['geonum']]),
                   'color': 'grey',
                   'weight': 0.25,
                   'fillOpacity': 0.5
               }).add_to(m)
m.add_child(step)

# Asian American Populations
frame = acs_metrics.set_index('geonum')['asian_nh_index']
maxscale = frame.max()
step = cm.linear.PuBuGn_09.to_step(8).scale(1, maxscale)
folium.GeoJson(acs_geometry,
               name='Asian Index',
               style_function=lambda x: {
                   'fillColor': step(frame[x['properties']['geonum']]),
                   'color': 'grey',
                   'weight': 0.25,
                   'fillOpacity': 0.5
               }).add_to(m)

folium.LayerControl().add_to(m)
m