# Census 2020

## 2020 US Census Data

The United States Census is conducted every 10 years, as mandated by Article I, Section 2 of the Constitution. The 2020 Census collected detailed demographic information for over 330 million people across the USA. This data is crucial for several purposes, including:

1. Apportioning seats in the U.S. House of Representatives among the 50 states.
2. Redistricting and adjusting electoral boundaries based on population changes.
3. Allocating federal funds to local communities for various programs.

In this notebook, we will explore the 2020 Census dataset using the [hvPlot](https://hvplot.holoviz.org/) library, which allows for easy, interactive plotting of large datasets. We will employ [Datashader](https://datashader.org/) within hvPlot to visualize racial demographics and population density, following a similar approach to our analysis of the [2010 Census data](https://examples.holoviz.org/gallery/census/census.html#). Using this 2020 dataset, we aim to investigate if there are significant shifts in population or racial distributions over the past decade.

The 2020 dataset, available [here](#), provides racial information and approximate locations  for individuals, with privacy maintained by randomizing the precise coordinates at the census block level.

### Load data and set up

The census data has been saved in a [Parquet](https://parquet.apache.org)-format file, a columnar data structure which will be loaded into a [Dask](https://dask.org/) dataframe.

In [None]:
import datashader as ds
import hvplot.dask # noqa
import dask.dataframe as dd

In [None]:
df = dd.read_parquet('data/census2020.parq')
df = df.persist()
df.head(3)

This dataset contains over 334 million rows, with each row representing a person counted in the census. Each record includes a location in Web Mercator format and a race category for the individual.

Let's define some default plot options as well some geographic ranges to look at later.

In [None]:
from holoviews import opts

plot_width  = 900
plot_height = 525
opts.defaults(
    opts.Points(width=plot_width, height=plot_height,
                xaxis=None, yaxis=None, bgcolor='black'))

CITIES = {
    'USA': ((-13900000, -7350000), (2880000, 6318000)), # Continental USA
    'Lake Michigan': ((-10205770.92, -9347497.64), (4975536.36, 5477830.73)),
    'Chicago': ((-9828397.84, -9718191.55), (5096783.58, 5160979.44)),
    'Chinatown': ((-9759379.76, -9754926.98), (5137042.27, 5140031.14)),
    'New York City': ((-8281056.92, -8175303.40), (4940333.23, 4999075.73)),
    'Los Angeles': ((-13194699.24, -13114549.21), (3979227.61, 4023432.27)),
    'Houston': ((-10692237.09, -10539729.39), (3433046.58, 3517697.17)),
    'Austin': ((-10899291.34, -10855876.74), (3525420.53, 3551199.11)),
    'New Orleans': ((-10059942.38, -10006509.03), (3480433.44, 3509978.50)),
    'Atlanta': ((-9448798.38, -9355290.01), (3955187.40, 4007338.12)),
}

### Population Density

To handle the rendering of such a large dataset efficiently using hvPlot, we will use the `rasterize=True` parameter. This aggregates individual data points into pixels, allowing for smooth and interactive visualization of large datasets. The resulting aggregated grid will be rendered with a linear color normalization (`cnorm='linear'`) and a grayscale colormap (`gray`). Denser areas will appear in lighter shades (white), while lower-density areas will be represented by darker shades of gray:


In [None]:
from colorcet import fire, gray, dimgray

x_range, y_range = CITIES['USA']

df.hvplot.points('x', 'y', rasterize=True, cnorm='linear', xlim=x_range, ylim=y_range, cmap=gray)

As you can (barely) see, the plot provides little visual information, as the lower-density areas blend into the dark background, making them indistinguishable. To better highlight these lower-density regions, we will switch to a lighter gray colormap (`dimgray`), which will improve the visibility of less populated areas without overwhelming the higher-density ones:

In [None]:
df.hvplot.points('x', 'y', rasterize=True, cnorm='linear', xlim=x_range, ylim=y_range, cmap=dimgray)

The plot above reveals the boundaries of the dataset and shows that many areas in the western USA have such low population density that some pixels contain no data at all. The key takeaway is that the population distribution is highly non-uniform, with dense urban areas and sparse rural regions. Given this uneven distribution, it makes sense to switch to a non-uniform color mapping by using a logarithmic scale (`cnorm='log'`) to better visualize the variations in population density across the magnitude ranges:

In [None]:
df.hvplot.points('x', 'y', rasterize=True, cnorm='log', xlim=x_range, ylim=y_range, cmap=dimgray)

With the logarithmic mapping, we can now see much more structure in the dataset, revealing geographic patterns across varying population densities. Small towns in rural areas are now visible, while densely populated urban centers stand out sharply.

However, choosing `log` was somewhat arbitrary. It effectively highlights local patterns at different densities, but other nonlinear mappings might reveal different insights. Instead of experimenting with various functions, we can use [histogram equalization](https://en.wikipedia.org/wiki/Histogram_equalization) to adjust the mapping automatically. This technique redistributes pixel values so that each color represents a similar number of data points, making it useful for any dataset regardless of its distribution:

In [None]:
df.hvplot.points('x', 'y', rasterize=True, cnorm='eq_hist', xlim=x_range, ylim=y_range, cmap=dimgray)

While histogram equalization helps reveal patterns across the full range of population densities, the choice of colormap is equally important in ensuring that these patterns are perceived correctly. The previous grayscale colormap (`dimgray`) was useful for distinguishing between high- and low-density areas, but grayscale often falls short when it comes to highlighting subtle variations in data values.

To improve visual clarity and make these differences more perceptible, we will switch to a **perceptually uniform colormap**, such as **`fire`**. Perceptually uniform colormaps are designed so that changes in data values are reflected as uniform changes in color intensity, making patterns easier to spot across the entire range of values.

In [None]:
df.hvplot.points('x', 'y', rasterize=True, cnorm='eq_hist', xlim=x_range, ylim=y_range, cmap=fire)

We can also highlight the top 1% of the pixels by population density. By creating a custom colormap with 100 equal ranges, we can emphasize the densest areas by assigning a distinct color to the top 1%. The rest of the plot uses a uniform grayscale colormap, with the highest-density pixels standing out in red:

In [None]:
import numpy as np
gray2 = [(int(i),int(i),int(i)) for i in np.linspace(0,255,99)]
gray2 += ['red']

df.hvplot.points('x', 'y', rasterize=True, cnorm='eq_hist', xlim=x_range, ylim=y_range, cmap=gray2)

In general, it is clear that the population distribution is dominated by densely populated urban centers, with more sparsely populated rural areas surrounding them. While population density gives a broad overview, it doesn't provide a full picture of the underlying demographics.

To gain deeper insights into the makeup of the population, we will now focus on the distribution of racial categories. By examining the racial demographics within the dataset, we can uncover patterns and trends in different regions, which may reveal more nuanced population dynamics that aren't apparent from density alone.

### Categorical Data (Race)

To accurately visualize the racial demographics, we will use the same color keys for each unique race that was applied to the 2010 dataset. We will also introduce a few new colors to represent the extra racial categories included in this dataset:

In [None]:
print(list(df['race'].unique().compute()))

In [None]:
color_key = {
    'Asian': 'red',
    'Black' : 'lime',
    'Hawaiian Pacific Islander': 'orange',
    'Hispanic or Other' : 'fuchsia',
    'Mixed' : 'purple',
    'Native American' : 'yellow',
    'White' : 'aqua',
    }

Now, we will aggregate the data using the `ds.by()` parameter, which allows us to group the dataset by the `race` column. This will produce separate aggregates for each racial category which will now be colored using the specified `color_key`.

Additionally, we will enable the **`dynspread=True`** parameter to dynamically adjust the size of rendered points. This ensures that areas with sparse data points remain visible, even when zooming out or when the data points are too densely packed. This dynamic spreading of points enhances visibility without overwhelming the plot, allowing us to capture both the subtle variations in rural areas and the dense clusters in urban regions. 

You can see how this works by zooming into specific regions of the map if you are working in a live Python environment. The plot will dynamically re-render, adjusting the point sizes and density to provide clearer visibility in the zoomed areas.

In [None]:
import holoviews as hv
import xyzservices.providers as xyz
from IPython.display import display

label = hv.Tiles(xyz.CartoDB.PositronOnlyLabels())

def plot_city(longitude_range, latitude_range, w=plot_width, h=plot_height, bgcolor='black'):
    """Plots a map of the pop density aggregated by race within specified longitude and latitude ranges"""
    return df.hvplot.points('x', 'y', rasterize=True, dynspread=True, cnorm='eq_hist',
                            xlim=longitude_range, ylim=latitude_range,
                            cmap=color_key, bgcolor=bgcolor, colorbar=False, hover=False,
                            aggregator=ds.by('race'), width=w, height=h) * label

In [None]:
%%time
display(plot_city(*CITIES['USA']))

The result shows that the continental USA is overwhelmingly white in geographic terms, apart from some predominantly Hispanic regions along the Southern border, some regions with high densities of blacks in the Southeast, and a few isolated areas of Native Americans in the West.

If you're working in a live Python environment, you can zoom into specific areas of the map, and the plot will automatically re-render to show more details at that zoom level. We can also zoom into specific regions using predefined coordinates:

In [None]:
%%time
display(plot_city(*CITIES['New York City']))

In [None]:
%%time
display(plot_city(*CITIES['Lake Michigan']))

In [None]:
%%time
display(plot_city(*CITIES['Chicago']))

In [None]:
%%time
display(plot_city(*CITIES['Chinatown']))

Here we can see that the Chinatown region of Chicago has a high concentration of Asian residents as expected and Mixed residents as well, a pattern not present in the 2010 data. However, it's unclear whether the presence of the Mixed race group is due to a new classification of an already existing population, or if it reflects an entirely new demographic in the area.

We can also observe that different cities have different racial make-ups while also being highly segregated:

In [None]:
%%time
display(plot_city(*CITIES['Houston']))

In [None]:
%%time
display(plot_city(*CITIES['Atlanta']))

In [None]:
%%time
display(plot_city(*CITIES['Los Angeles']))

### Interactive dashboard

We can create a dashboard comprising some of the features of the plots we have created using a [panel template](https://panel.holoviz.org/reference/templates/Bootstrap.html).

We will manually construct a legend label as well as adding a base map to the plots to further orient the data points. We will also create a single color colormap for the individual racial categories so that we can select a particular race and see their density distribution across the continental USA or a specific location within the USA.

In [None]:
import param
import panel as pn
from matplotlib.colors import LinearSegmentedColormap, rgb2hex

pn.extension(sizing_mode='stretch_width')

In [None]:
# construct legend labels
legend = hv.NdOverlay({k: hv.Points([-100,40]).opts(color=v, size=0) for k, v in color_key.items()})
legend.opts(clone=True, xaxis=None, yaxis=None, height=250, width=250)

In [None]:
def create_colormap_hex_list(color_name, N=256):
    """Create a hex colormap from a single named color"""
    cmap = LinearSegmentedColormap.from_list('custom_cmap', ['black', color_name], N=N)
    hex_colors = [rgb2hex(cmap(i / (N - 1))) for i in range(N)]
    return hex_colors

In [None]:
races = ["All", *df['race'].unique().compute()]

class USPopulation(pn.viewable.Viewer):
    race      = param.Selector(default='All', objects=races)
    city      = param.Selector(default='USA', objects=list(CITIES.keys()))
    map_tile  = param.Boolean(default=False, doc="Turn the map tiles on/off")
    map_alpha = param.Magnitude(default=0.5, doc="Alpha value for the map opacity")
    
    points = param.Parameter()
    base_map = param.Parameter()
    plot = param.Parameter()

    
    def __init__(self, **params):
        super().__init__(**params)
        self.points = None
        self.base_map = None
        self._update_data_points()
        self._update_map()

    def get_data(self, race):
        if race == 'All':
            return df
        else:
            return df[df['race'] == race]
    
    @pn.cache
    def get_colormap(self, race):
        if race == 'All':
            return color_key
        else:
            color = color_key.get(race, 'aqua')
            return create_colormap_hex_list(color)
    
    @param.depends('race', 'city', watch=True)
    def _update_data_points(self):
        """Update the data points based on race and city."""
        df = self.get_data(self.race)
        cmap = self.get_colormap(self.race)
        x_range, y_range = CITIES[self.city]
        
        plot_options = {
            'x': 'x',
            'y': 'y',
            'rasterize': True,
            'dynspread': True,
            'cnorm': 'eq_hist',
            'xlim': x_range,
            'ylim': y_range,
            'cmap': cmap,
            'bgcolor': 'black',
            'colorbar': False,
            'hover': False,
            'aggregator': ds.by('race') if self.race == 'All' else None,
            'width': plot_width,
            'height': plot_height,  
        }

        
        self.points = df.hvplot.points(**plot_options) # * label (causes the plot to go all white)
        
        if self.race == 'All':
            self.points *= legend
        
        self._update_plot()
        
    @param.depends('map_tile', 'map_alpha', 'city', watch=True)
    def _update_map(self):
        """Update the map tiles based on user interaction."""

        if self.map_tile:
            if not self.base_map:
                self.base_map = hv.element.tiles.EsriImagery().opts(alpha=self.map_alpha) # maybe a better map tile?
            else:
                self.base_map.opts(alpha=self.map_alpha)
        else:
            self.base_map = None
        
        self._update_plot()
    
    def _update_plot(self):
        """Combine map and data points."""
        if self.base_map:
            self.plot = self.base_map * self.points
        else:
            self.plot = self.points
        
    @param.depends('plot')
    def plot_view(self):
        return self.plot
    
    def __panel__(self, **kwargs):
        return self.plot_view

In [None]:
pop_map = USPopulation()

widgets = pn.WidgetBox(
            pop_map.param.race,
            pop_map.param.city,
            pop_map.param.map_tile,
            pop_map.param.map_alpha,
            width=350)

intro_text = """
The United States Census is conducted every 10 years, as mandated by Article I, Section 2 of the Constitution.
The 2020 Census collected detailed demographic information for over 330 million people across the USA.

This dashboard uses [Panel](https://panel.holoviz.org) and [hvPlot](https://hvplot.holoviz.org/) 
to visualize the population density data from the census, allowing you to explore demographic patterns across different races and cities.

You can interact with the dashboard by selecting a specific race and city using the dropdown menus on the left.
The Race selector lets you choose between “All” races or a specific racial group, 
while the City selector allows you to focus on a particular urban area or view the entire USA.

Toggling the Map Tile option will display or hide the underlying map, 
and adjusting the Map Alpha slider changes the transparency of the map tiles.

You can also zoom in and out or pan across the map to explore regions of interest in more detail.
"""
    
background = pn.Accordion(('Background', pn.pane.Markdown(intro_text, renderer_options={'breaks': False})
))

dashboard = pn.template.BootstrapTemplate(
    title="US population density map",
    sidebar = [widgets],
)

dashboard.main.append(background)
dashboard.main.append(pop_map)

dashboard.servable();

We mark the dashboard as `.servable()` to allow us to serve it as a standalone app using a command like `panel serve --show census2020.ipynb`.

### Conclusion

In this notebook, we utilized **hvPlot** and **Datashader** to efficiently visualize specific data points of the 2020 US Census data, focusing on both population density and racial demographics. These tools allowed us to explore geographic patterns and identify regions where specific racial groups predominantly reside, providing insights into the spatial distribution of the population.

### Future prospects

Looking ahead, there are several ways to extend this analysis. For example, we could perform deeper investigations into **segregation patterns** within specific metropolitan areas or **compare demographic shifts across time** using historical census data (For example, the [2010 data](../census_one/census_one.ipynb)). 

Additionally, by integrating external datasets—such as economic indicators, education levels, or voting patterns—we could uncover correlations between racial demographics and socio-economic factors. Exploring **dynamic interactions** with these visualizations, such as linking census data with interactive dashboards, would also enable more detailed, real-time analysis of regional population characteristics. These tools offer flexibility and scalability, making them powerful assets for anyone looking to conduct advanced geospatial analysis on large datasets.