# Visualizing COVID-19 hospitalizations in the US with *choromap*

**Javier Orman**  
GitHub:   
LinkedIn: https://www.linkedin.com/in/javierorman/

Latest update: 9/11/2020

## 1. Introduction
In this notebook I showcase the use of the module **choromap**, which produces **animated choropleth maps**, to visualize the spread of COVID-19 in the United States. Data will be retrieved from the API of *The COVID Tracking Project at The Atlantic* and merged with US Census population data to calculate **hospitalization rates** for each state.

### Import dependencies

Data will be imported and organized with *pandas*. *Geopandas* will be used to work with geospatial data and visualizations will be made with *matplotlib*.

In [1]:
import pandas as pd
pd.set_option('display.max_columns', None)

import geopandas as gpd

import requests

import os
import sys

from matplotlib import colors
from matplotlib import pyplot as plt

Set *module_path* for the directory containing the module *choromap*

In [2]:
from pathlib import Path

module_path = str(Path.cwd().parent)

if module_path not in sys.path:
    sys.path.append(module_path)

Next, import classes *ChoroMapBuilder* and *DataFramePrepper* from module *choromap*. The module is [available on GitHub](https://github.com/javierorman/choromap).

In [4]:
from choromap import ChoroMapBuilder, DataFramePrepper

## 2. COVID-19 data for each State

The data used here is available from the API of *The COVID Tracking Project at The Atlantic"*. More information [here](https://covidtracking.com/data/api).

In [None]:
us_covid_df = pd.read_csv('https://covidtracking.com/api/v1/states/daily.csv')
us_covid_df

Here are the variables available in the data:

In [None]:
us_covid_df.columns

Next we change the format of the dates from *yyyymmdd* to *yyyy-mm-dd*.

In [None]:
def to_iso(date):
    """
    Convert a date (str) from yyyymmdd to yyyy-mm-dd,
    e.g. 20200128 -> 2020-01-28
    """
    x = str(date)
    return x[0:4] + '-' + x[4:6] + '-' + x[6:8]

In [None]:
us_covid_df['date'] = us_covid_df['date'].apply(lambda x: to_iso(x))

In [None]:
# First 5 rows
us_covid_df.head(5)

## 3. Population data

Because we are interested in the *rate of hospitalizations* in each state, we need import population data. Here we use a dataset from the US Census, [available here](https://www2.census.gov/programs-surveys/popest/tables/2010-2019/state/totals/nst-est2019-01.xlsx).

In [None]:
census_df = pd.read_excel('datasets/nst-est2019-01.xlsx')

In [None]:
# Eliminate unnecessary rows and columns
census_df = census_df.iloc[8:59, [0, -1]].reset_index(drop=True)
census_df.columns = ['state_full', 'population']

# First 5 rows
census_df.head()

In [None]:
# Eliminate unnecessary characters
census_df['state_full'] = census_df['state_full'].str.replace('.', '')
census_df.head()

https://worldpopulationreview.com/states/state-abbreviations

Because the states in *us_covid_df* are represented by 2-letter codes, we will use [this dataset](https://worldpopulationreview.com/states/state-abbreviations) to properly format *census_df*.

In [None]:
codes_df = pd.read_csv('datasets/codes_data.csv')

In [None]:
codes_df.head()

In [None]:
# Merge census_df and codes_df
pop_df = codes_df.merge(census_df, left_on='State', right_on='state_full', how='inner')

# Eliminate unnecessary columns
pop_df = pop_df.iloc[:, [2, 4]]

# Rename columns
pop_df.columns = ['state', 'population']

# Show the first 5 columns
pop_df.head()

In [None]:
# Merge pop_df and covid_df
us_covid_df = us_covid_df.merge(pop_df, on='state', how='left')
us_covid_df.head()

In [None]:
# Create column 'hosp_curr_100000' with the rate of hospitalizations per 100000 residents in each State.
us_covid_df['hosp_curr_100000'] = (us_covid_df['hospitalizedCurrently'] / us_covid_df['population']) * 100000

## 4. Geographic data

The geospatial shapefiles for drawing the map of the United States are available [here](https://www.arcgis.com/home/item.html?id=f7f805eb65eb4ab787a0a3e1116ca7e5#). 

In [None]:
filepath = 'shapefiles/states_21basic/states.shp'
us_geom_df = gpd.read_file(filepath)

Take out Alaska and Hawaii to keep a practical view of the map.

In [None]:
us_geom_df = us_geom_df[us_geom_df['STATE_ABBR'].isin(['AK', 'HI']) == False]
us_geom_df.plot()

## 5. Map

The module *choromap* includes two classes: ChoroMapBuilder, which has the methods necessary to make the maps and DataFramePrepper, which prepares the dataframes to pass to ChoroMapBuilder.

DataFramePrepper takes two parameters: *info_df* and *geom_df*. 
1. *us_covid_df* is our informational DataFrame with dates, locations and values (rate of hospitalizations in this case) and is passed as *info_df*. 
2. *us_geom_df* is our geometric DataFrame with locations and the vector information to draw the shapes of the states. It's passed as *geom_df*.

In [None]:
prepper = DataFramePrepper(info_df=us_covid_df, geom_df=us_geom_df)

In [None]:
# Prepare the informational DataFrame by selecting the columns to focus on. 
# roll_avg being True means that daily values of hospitalizations will be smoothed out through a 7-day window average.
prepper.prep_info_df(category='hosp_curr_100000', col_dates='date', col_location='state', roll_avg=True)

# Prepare geometric DataFrame
prepper.prep_geom_df(location_col='STATE_ABBR', geometry_col='geometry')

# Merged the info_df and geom_df
merged_df = prepper.merge_info_geom()

# Show top 5 rows of merged_df
merged_df.head()

In [None]:
builder = ChoroMapBuilder(merged_df=merged_df)

Next we build the maps using the ChoroMapBuilder method *make_map*. Here are the possible arguments that we can pass to it:

In [None]:
help(ChoroMapBuilder.make_map)

In [None]:
builder.make_map(title='Rate of current COVID-19 hospitalizations in each state',
                      subtitle='Source: The COVID Tracking Project at The Atlantic', 
                      unit='COVID-19 hospitalizations per 100,000 residents',
                      save_name='hosp_curr_us_100000',
                      count='all', begin_date='2020-03-01',
                      color='Reds', fps=6)