# Tracking Progress Towards Herd Immunity 
> How far do we have to go? 

In [25]:
#@title
#hide
import pandas as pd
import numpy as np
import altair as alt
import datetime as dt

In [23]:
#@title
#hide_input
inf_df = pd.read_csv(
    "https://raw.githubusercontent.com/youyanggu/covid19_projections/master/infection_estimates/past_estimates/2021-01-23_all_estimates_states.csv"
)

In [24]:
#@title
#hide_input
vacc_df = pd.read_csv("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/us_state_vaccinations.csv")

In [6]:
#@title
#hide_input
us_state_abbrev = {
    'Alabama': 'AL',
    'Alaska': 'AK',
    'American Samoa': 'AS',
    'Arizona': 'AZ',
    'Arkansas': 'AR',
    'California': 'CA',
    'Colorado': 'CO',
    'Connecticut': 'CT',
    'Delaware': 'DE',
    'District of Columbia': 'DC',
    'Florida': 'FL',
    'Georgia': 'GA',
    'Guam': 'GU',
    'Hawaii': 'HI',
    'Idaho': 'ID',
    'Illinois': 'IL',
    'Indiana': 'IN',
    'Iowa': 'IA',
    'Kansas': 'KS',
    'Kentucky': 'KY',
    'Louisiana': 'LA',
    'Maine': 'ME',
    'Maryland': 'MD',
    'Massachusetts': 'MA',
    'Michigan': 'MI',
    'Minnesota': 'MN',
    'Mississippi': 'MS',
    'Missouri': 'MO',
    'Montana': 'MT',
    'Nebraska': 'NE',
    'Nevada': 'NV',
    'New Hampshire': 'NH',
    'New Jersey': 'NJ',
    'New Mexico': 'NM',
    'New York': 'NY',
    'North Carolina': 'NC',
    'North Dakota': 'ND',
    'Northern Mariana Islands':'MP',
    'Ohio': 'OH',
    'Oklahoma': 'OK',
    'Oregon': 'OR',
    'Pennsylvania': 'PA',
    'Puerto Rico': 'PR',
    'Rhode Island': 'RI',
    'South Carolina': 'SC',
    'South Dakota': 'SD',
    'Tennessee': 'TN',
    'Texas': 'TX',
    'Utah': 'UT',
    'Vermont': 'VT',
    'Virgin Islands': 'VI',
    'Virginia': 'VA',
    'Washington': 'WA',
    'West Virginia': 'WV',
    'Wisconsin': 'WI',
    'Wyoming': 'WY'
}

In [7]:
#@title
#hide_input
vacc_df["location"] = vacc_df.location.replace("New York State", "New York")
vacc_df['state'] =  vacc_df['location'].map(us_state_abbrev)

In [8]:
#@title
#hide_input
df = inf_df.merge(vacc_df, how="left", on = ["date", "state"])

df["date"] = pd.to_datetime(df["date"], format="%Y-%m-%d")
df["pct_vaccinated"] = df["people_vaccinated_per_hundred"]/100
df["pct_fully_vaccinated"] = df["people_fully_vaccinated_per_hundred"]/100

In [9]:
#@title
#hide_input
# Gu's infection estimates lag data by 14 days 
# Going to estimate current infections by using his projections to identify the ratio of identified cases
state_pct_id = inf_df[~inf_df['current_infected_mean'].isna()] # get data where Gu estimates are not null 
state_pct_id = state_pct_id[state_pct_id.groupby("state")['date'].transform(max) == state_pct_id['date']] # get latest date data
state_pct_id["pct_new_identified"] = state_pct_id["daily_positive_7day_ma"]/state_pct_id["new_infected_mean"] # ratio of positive tests to new cases
state_pct_id["population"] = state_pct_id["total_infected_mean"]/state_pct_id["perc_total_infected_mean"]
state_pct_id["latest_total_infected"] = state_pct_id["total_infected_mean"]
state_pct_id = state_pct_id[["state", "pct_new_identified", "population", "latest_total_infected"]]

df = df.merge(state_pct_id, how = "left", on="state")

In [10]:
#@title
#hide_input
df["new_infected_estimate"] = np.where(df["new_infected_mean"].isna(), 
                                       df["daily_positive_7day_ma"]/df["pct_new_identified"], 
                                       df["new_infected_mean"])

df["new_infected_est_cummulative"] = df.groupby(['state', 'date']).sum() \
                                      .groupby(level=0).cumsum().reset_index()["new_infected_estimate"]

df["total_infected_estimate"] = np.where(df["total_infected_mean"].isna(), 
                                         df["new_infected_est_cummulative"],
                                         df["total_infected_mean"])

df["pct_total_infected_estimate"] = df["total_infected_estimate"]/df["population"]

# Vaccine data begins on Jan 13 and is spotty, going to do linear interpolation to fill in gaps
df["pct_vaccinated"] = np.where(df["date"] == pd.to_datetime("2020-12-14"), 
                                1/df["population"], # assuming all states began vaccination with a single individual on 12/14
                                df["pct_vaccinated"])

df.loc[:,"pct_vaccinated"] = df.groupby('state')["pct_vaccinated"].apply(lambda group: group.interpolate())

In [11]:
#@title
#hide_input
melt_df = df[["date", "state", "population", "pct_total_infected_estimate", "pct_vaccinated"]]. \
            melt(id_vars=['date', 'state', "population"], var_name='immunity_source', value_name='pct_of_pop')

In [12]:
#@title
#hide_input
lab_dict = {"pct_total_infected_estimate":"Infected", "pct_vaccinated":"Vaccinated"}

melt_df["immunity_source"] = melt_df['immunity_source'].map(lab_dict)

In [13]:
#@title
#hide_input
#Make a US row of data 
us_pop = state_pct_id.population.sum()

# us_df = 
melt_df["est_people"] = melt_df["population"] * melt_df["pct_of_pop"]

us_df = melt_df.groupby(["date", "immunity_source"]).sum().reset_index()
us_df["population"] = us_pop
us_df["pct_of_pop"] = us_df["est_people"]/us_df["population"]
us_df["state"] = "US"

melt_df = melt_df.append(us_df)

In [14]:
#@title
#hide_input
state_array = melt_df.state.unique()
melt_df["herd_immunity_estimate"] = .7 

As we move into what is (hopefully) the endgame of the COVID-19 pandemic, the most important question becomes: when will we be safe from the virus? 

Our strategy to control the virus is to reach herd immunity, which will occur when enough people are immune to the virus that any individual contracting COVID-19 is unlikely to spread it to another individual. At this point while there may be flare ups in certain areas, the nationwide pandemic will be suppressed.

Both prior infection from COVID-19 and vaccination provide immunity from future infection, so by tracking both the cumulative number of people who have been infected and the cumulative number of people who have been vaccinated we can estimate the overall level of population immunity. The [Mayo Clinic](https://www.mayoclinic.org/diseases-conditions/coronavirus/in-depth/herd-immunity-and-coronavirus/art-20486808) estimates that at herd immunity will be reached when 70% of the population is immune from either source. 

Below is an estimate of the path towards herd immunity for the United States, and each individual state. 

In [21]:
#@title
#hide_input
alt.data_transformers.disable_max_rows()

dropdown = alt.binding_select(options=state_array)
select = alt.selection_single(fields=['state'], bind=dropdown, name = "Select", init={'state': "US"})

alt.Chart(melt_df, title="Path to Immunity").mark_area().encode(
    x=alt.X(
        "date",
        title=""
    ),
    y=alt.Y(
        "pct_of_pop",
        title="% Population Immune (Estimated)",
        axis=alt.Axis(format=".0%"),
    ),
    color=alt.Color(
        "immunity_source",
        legend=alt.Legend(title="Immunity Source"),
    ),
    tooltip=['date', alt.Tooltip('date'), 'pct_of_pop', 'immunity_source', 'est_people'],
).add_selection(
    select
).transform_filter(
    select
).configure_title(anchor = "start")

## Data Sources
For vaccinations, this dashboard relies on the data from the CDC via [Our World in Data](https://github.com/owid/covid-19-data). The data source only begins on January 12th, though US vaccinations began on [December 14th](https://www.bbc.com/news/world-us-canada-55305720). To fill in the missing data I assume each state began vaccinations on December 14th and linearly interpolate until January 12th. There are also some missing dates in the data which I have linearly interpolated. The most recent data is directly from the source and is not interpolated. 

For infection estimates I'm relying on the excellent [covid19-projections.com](https://covid19-projections.com) by Youyang Gu. Gu's projections are based *only* on data on COVID-19 deaths (not hospitalizations or tests). Gu's work is [used by the CDC](https://www.cdc.gov/coronavirus/2019-ncov/covid-data/forecasting-us.html)

One limitation of Gu's approach is that the projections lag the current date by 14 days. In order to project infections in the last 2 weeks I take Gu's latest estimate of the case detection rate (positive tests/new cases) for each state and apply that to the 7 day moving average of positive tests. This approach is somewhat vulnerable to the ebb and flow of testing, but using the 7 day moving average and the fact that testing is more stable than it was earlier in the pandemic mitigate this vulnerability. 

## Assumptions 
Beyond assuming that Gu's infection numbers are accurate, and that we can predict the last 14 days of infections by using his implied detection rates, this dashboard also assumes that immunity from infection and immunity from vaccinations are both 100% effective at preventing future COVID-19 infections.  

While COVID-19 reinfections have been observed, I think the [balance of the evidence](https://thezvi.wordpress.com/2021/01/20/covid-the-question-of-immunity-from-infection/) finds that prior infection provides nearly complete protection from subsequent symptomatic infection. 

The vaccines currently approved in the US are quite good, but not perfect with efficacy estimates between 94-95%. There is some question as to whether or not they prevent one from spreading the virus to others, but [preliminary evidence](https://www.ynetnews.com/health_science/article/H1jaK7mkd) suggests that they do. 

In addition to 100% efficacy, the dashboard also assumes vaccine-caused immunity kicks in immediately. The currently available vaccines are both two dose vaccines, and while the first dose provides a [significant measure of protection](https://www.bmj.com/content/372/bmj.n18) they do take time (approximately 14 days) to provide protection. 

The dashboard also assumes that no one who was previously infected receives a vaccine. While people who were previously infected may choose to forgo a vaccine at this time, there is not much of an effort to exclude the previously infected from receiving doses, and in fact the CDC is currently [actively recommending](https://www.cdc.gov/coronavirus/2019-ncov/vaccines/facts.html#:~:text=If%20I%20have%20already%20had,already%20had%20COVID%2D19%20infection.) that previous infection not preclude one from getting vaccinated. This is probably the most impactful of the assumptions, given the large percentage of people who have been infected. 