<font face = "Verdana" size ="5">Coronavirus is a family of viruses that are named after their spiky crown. The novel coronavirus, also known as SARS-CoV-2, is a contagious respiratory virus that first reported in Wuhan, China. On 2/11/2020, the World Health Organization designated the name COVID-19 for the disease caused by the novel coronavirus. This notebook aims at exploring COVID-19 through data analysis and projections. 
    
   <br> This notebook is forked from Xingyu Bian's <a href='https://www.kaggle.com/therealcyberlord/coronavirus-covid-19-visualization-prediction'>notebook</a>
 
   <br>Data is provided by the <a href='https://github.com/nytimes/covid-19-data'>New York Times</a>, <a href='https://covidtracking.com/'>COVID Tracking Project</a>, <a href="https://github.com/CSSEGISandData/COVID-19">John Hopkins University</a>, and the <a href="https://github.com/govex/COVID-19/">CDC</a>
   <br>Learn more from the <a href='https://www.who.int/emergencies/diseases/novel-coronavirus-2019'>WHO</a>
   <br>Learn more from the <a href='https://www.cdc.gov/coronavirus/2019-ncov'>CDC</a>
   
   <font face = "Verdana" size ="4">
    <br> Last update: 9/19/21 20:45 EST
    <br><i> New Updates: Update for 9/18 data. NOTE: County Metrics plot is having runtime issues and is currently unavailable </i>
   </font>

   <font face = "Verdana" size ="1">
    <center><img src='https://www.statnews.com/wp-content/uploads/2020/02/Coronavirus-CDC-645x645.jpg'>
     Source: https://www.statnews.com/wp-content/uploads/2020/02/Coronavirus-CDC-645x645.jpg </center> 
    </font>

<br>
<font face = "Verdana" size ="6"> Sections </font>
* <a href='#features'>List of Features</a>
* <a href='#load_us_data'>Load latest Data</a>
* <a href='#world_metrics'>World Metrics</a>    
* <a href='#us_summary'>US Summary</a>
* <a href='#us_features'>Visualizing US Data</a>
* <a href='#build_train_ML'>Building and Training ML models for data extrapolation</a>

<font size="7"><b>Features to be examined</b></font>
 <a id='features'></a>

* (1-3) Stability indices for cases, hospitalizations and deaths
* (4-6) Doubling times for cases, hospitalizations and deaths
* (7-10) % increase in cases, hospitalizations, deaths and recovered (if available) over the last week
* (11-14) % positive cases, Hospitalizations/Capacity, ICU beds/Capacity, Intubations/Ventilator Capacity

<font size="5">Import Packages</font>

In [None]:
import numpy as np 
import matplotlib.pyplot as plt

import pandas as pd 
import random
import math
import time

from sklearn.preprocessing import PolynomialFeatures, MinMaxScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.linear_model import LinearRegression

from scipy.signal import dlsim, TransferFunction, argrelextrema, find_peaks
import tensorflow

import datetime
from datetime import date
import operator 

import collections

# Progress bars for loops
from tqdm import tqdm

# Libraries for creating interactive plots
import plotly.tools as tls
import plotly.graph_objs as go
import plotly
import plotly.figure_factory as ff
import plotly.express as px
from plotly.subplots import make_subplots
from plotly.offline import init_notebook_mode, plot, iplot, download_plotlyjs
init_notebook_mode(connected=True)
plotly.offline.init_notebook_mode(connected=True)

from urllib.request import urlopen
import json

import sys
import warnings

if not sys.warnoptions:
    warnings.simplefilter("ignore")

# !/opt/conda/bin/python3.7 -m pip install --upgrade pip

!pip install us
import us

from datetime import timedelta

In [None]:
# Select proper dates
pastGMTMidnight = True

if (not pastGMTMidnight):
    today = date.today()
else:
    today = (date.today() - timedelta(days = 1))

#yesterday = (today - timedelta(days = 1)).strftime('%Y-%m-%d')
yesterday = today - timedelta(days = 1)
print("Today:", today)
print("Yesterday: ", yesterday)

<font size="6"><b>Loading Latest Data (along with normalization)</b></font>
<a id='load_us_data'></a>

<br>
<font face = "Verdana" size ="6"> Datasets </font>
* <a href='https://raw.githubusercontent.com/nytimes/covid-19-data/master/us.csv'>US Data</a>
* <a href='https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv'>US State Data</a>
* <a href='https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv'>US County Data</a>    
* <a href='https://covidtracking.com/api/v1/states/daily.csv'>Daily Covid Tracking Data</a>
* <a href='https://docs.google.com/spreadsheets/d/e/2PACX-1vR_xmYt4ACPDZCDJcY12kCiMiH0ODyx3E1ZvgOHB8ae1tRcjXbs_yWBOA4j4uoCEADVfC1PS2jYO68B/pub?gid=43720681&single=true&output=csv'>US Racial Data</a>
* <a href='https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'>World Cases Data</a>
* <a href='https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'>World Deaths Data</a>
* <a href='https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv'>World Recoveries Data</a>

In [None]:
# Load latest GitHub data for US
usData = pd.read_csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/us.csv')
usStateData = pd.read_csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv')
usCountyData = pd.read_csv('https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv')

# Load latest data from COVID Tracking Project (has hospitalization data)
covidTrackingDaily = pd.read_csv('https://covidtracking.com/api/v1/states/daily.csv')
covidTrackingDaily = covidTrackingDaily.fillna(0)

# Load in latest positivity data (after some sites stopped updating on 3/07)
usPositivityData = pd.read_csv('https://raw.githubusercontent.com/govex/COVID-19/master/data_tables/testing_data/time_series_covid19_US.csv')
usPositivityData = usPositivityData.fillna(0)

# Load racial data for COVID Tracking Project - NOTE: NOT updated since 3/07/21
racialDataUS = pd.read_csv('https://docs.google.com/spreadsheets/d/e/2PACX-1vS8SzaERcKJOD_EzrtCDK1dX1zkoMochlA9iHoHg_RSw3V8bkpfk1mpw4pfL5RdtSOyx_oScsUtyXyk/pub?gid=43720681&single=true&output=csv')
racialDataUS = racialDataUS.fillna(0)

# Load in WORLD data from John Hopkins, along with FIPS/ISO lookup table for world geo plots
worldCases = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
worldDeaths = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
worldRecoveries = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')
isoLookupTable = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/UID_ISO_FIPS_LookUp_Table.csv')

# Load in "Our World in Data" dataset
owidData = pd.read_csv('https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv')
worldVaccinationData = pd.read_csv('https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv')
worldVaccinationData = worldVaccinationData.fillna(0)

# Load in CDC Vaccination Data
usVaccinationData = pd.read_csv('https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/us_state_vaccinations.csv')
usVaccinationData = usVaccinationData.fillna(0)

# Don't display decimals
pd.set_option('precision', 0)

In [None]:
usData.loc[usData.shape[0] - 14:].style.background_gradient(cmap='RdYlGn_r')

In [None]:
usStateData[usStateData.date == yesterday.strftime('%Y-%m-%d')].style.background_gradient(cmap='RdYlGn_r', subset=['cases','deaths'])
# usStateData.loc[usStateData.shape[0] - 14:].style.background_gradient(cmap='RdYlGn_r', subset=['cases','deaths'])

In [None]:
#usCountyData[usCountyData.date == yesterday].style.background_gradient(cmap='RdYlGn_r', subset=['cases','deaths'])
usCountyData.loc[usCountyData.shape[0] - 14:].style.background_gradient(cmap='RdYlGn_r', subset=['cases','deaths'])

In [None]:
dates = covidTrackingDaily.date
dates_fixed = []
for i in range(dates.shape[0]):
    date_str = str(dates[i])
    fixed_date = datetime.datetime(year=int(date_str[0:4]), month=int(date_str[4:6]), day=int(date_str[6:]))
    dates_fixed.append(fixed_date.strftime("%Y-%m-%d"))
covidTrackingDaily.date = dates_fixed

covidTrackingDaily.loc[0:55].style.background_gradient(cmap='RdYlGn_r')

In [None]:
dates = racialDataUS.Date
dates_fixed = []
for i in range(dates.shape[0]):
    date_str = str(dates[i])
    fixed_date = datetime.datetime(year=int(date_str[0:4]), month=int(date_str[4:6]), day=int(date_str[6:]))
    dates_fixed.append(fixed_date.strftime("%Y-%m-%d"))
racialDataUS.Date = dates_fixed

racialDataUS.loc[0:55].style.background_gradient(cmap='RdYlGn_r')

<font size="4"><b>Only last 7 days of world data are shown (to save space)</b></font>

In [None]:
### Change date needed
# Get next7Days as string
last7Days_str = []
# date = datetime.datetime(2020,9,15)              # Latest date available in data
date = today - timedelta(days = 1)

for i in range(7): 
    date -= datetime.timedelta(days=1)
    last7Days_str.append(date.strftime('%-m/%-d/%Y'))
    
numCols = worldRecoveries.shape[1]
cols = [0,1]
for i in range(7,0,-1):
    cols.append(numCols - i)

<font size="4"><b>World Cases</b></font>

In [None]:
# worldCases.iloc[0:14,cols].style.background_gradient(cmap='RdYlGn_r', axis=1)
worldCases[worldCases.columns[cols]].style.background_gradient(cmap='RdYlGn_r', axis=1)

<font size="4"><b>World Deaths</b></font>

In [None]:
# worldDeaths.iloc[0:14,cols].style.background_gradient(cmap='RdYlGn_r', axis=1)
worldDeaths[worldDeaths.columns[cols]].style.background_gradient(cmap='RdYlGn_r', axis=1)

<font size="4"><b>World Recoveries</b></font>

In [None]:
# worldRecoveries.iloc[0:14,cols].style.background_gradient(cmap='RdYlGn', axis=1)
worldRecoveries[worldRecoveries.columns[cols]].style.background_gradient(cmap='RdYlGn', axis=1)

<font size="4"><b>Data Normalization</b></font>

In [None]:
usCases = np.array(usData.cases)
usDeaths = np.array(usData.deaths)
dates =  np.array(usData.date)

min_max_scaler = MinMaxScaler()
usCases_norm = min_max_scaler.fit_transform(usCases.reshape(-1,1))

<font size="7"><b>World Metrics</b></font>
<a id='world_metrics'></a>

In [None]:
# Clean OWID dataframe
# - Remove data before 1-22-2020
owidData = owidData[owidData.date >= '2020-01-22']

# - Fill missing data with NaN
owidData = owidData.fillna(0)

In [None]:
# Change state name to abbreviation format (e.g. Connecticut -> CT)
def fullStateNameToAbbrev(fullStateNames):
    abbrevs = []
    
    for i in range(fullStateNames.shape[0]):
        abbrevs.append(eval("us.states.lookup('" + str(fullStateNames[i]) + "').abbr"))
    
    return abbrevs

In [None]:
# Create n-day average for an array
def nDayAverage(data, n):
    nDayAvgData = np.zeros(data.shape)
    dataLen = nDayAvgData.shape[0]
    
    for i in range(dataLen):
        idxs = np.arange(-n + 1 + i, i + 1)
        idxs = np.clip(idxs, a_min=0, a_max=np.inf)
        idxs = idxs.astype(int)
        
        nDayAvgData[i] = np.mean(data[idxs])
        
    return nDayAvgData

In [None]:
def allDoublingTimes(data):
    allDoublingTimes = np.zeros(data.shape[0],)
    
    for i in range(-1,-allDoublingTimes.shape[0],-1):
        currentValue = data[i]
        halfOfCurrentValue = currentValue / 2.0
        
        dataMinusHalfOfCurrentCases = abs(data - halfOfCurrentValue)
        
        idxApproxHalfOfValue = np.where(dataMinusHalfOfCurrentCases == np.min(dataMinusHalfOfCurrentCases))[0][0]
        
        allDoublingTimes[i] = (i + data.shape[0]) - idxApproxHalfOfValue
        
    return allDoublingTimes

In [None]:
# -1: State has already recovered
# 365: State is getting worse, no recovery in progress

def findTimeForRecovery(DTData):
    # Find global max (right before things went bad)
    if ((np.diff(DTData[-45:]) > -0.01).all()):  # This is required for good states where DT is continuously rising (at least for the last 45 days)
        maximas = DTData.shape[0] - 1
    else:
        maximas = find_peaks(DTData)[0]
    dataAtMaximas = DTData[maximas]
    globalMaxIdx = np.where(DTData == np.amax(dataAtMaximas))[0][0]
    globalMax = np.mean(DTData[globalMaxIdx])                        # To account for two of the same maxs found
    
    # Then find minima on the data that comes after the maxima
    clippedData = DTData[globalMaxIdx:]
    minimas = argrelextrema(clippedData, np.less_equal)[0]
    latestMinIdxClipped = np.where(clippedData == np.amin(clippedData[minimas]))[0][0]
    latestMinIdx = globalMaxIdx + latestMinIdxClipped
    latestMin = np.mean(clippedData[latestMinIdxClipped])

    # Find avg slope for increase
    avgIncPerDay = 0
    if (latestMinIdx != DTData.shape[0] and DTData[-1] - DTData[latestMinIdx] != 0):
        avgIncPerDay = (DTData[-1] - DTData[latestMinIdx]) / (DTData.shape[0] - latestMinIdx)
        incReqForRecovery = 50 - DTData[-1]                                              # Make sure to take into account of today's DT (find time to reach 50 days, NOT PEAK)
        timeForRecovery = incReqForRecovery / avgIncPerDay                                         
    if (latestMinIdx == DTData.shape[0] or avgIncPerDay < 0.1):
        timeForRecovery = 365
    if (DTData[-1] - DTData[latestMinIdx] == 0):
        timeForRecovery = 365
    if (globalMax == latestMin):
        timeForRecovery = -1

    return timeForRecovery

In [None]:
# Functions for pandas DataFrame cell conditioning of DTs
redHigh = 120
yellowLow = redHigh
yellowHigh = 135
greenLow = yellowHigh
def redColorscale(s):
    is_red = s < redHigh
    return ['background-color: red' if v else '' for v in is_red]

def yellowColorscale(s):
    is_yellow = [1 if x >= yellowLow and x <= yellowHigh else 0 for x in s]
    return ['background-color: yellow' if v else '' for v in is_yellow]

def greenColorscale(s):
    is_green = s > greenLow
    return ['background-color: green' if v else '' for v in is_green]


def fullColorscale(s):
    output = []
    for val in s:
        if (val < redHigh):
            output.append("red")
        elif (val >= yellowLow and val <= yellowHigh):
            output.append("goldenrod")
        else:
            output.append("green")

    return output

In [None]:
# This function detects if a list has zeros at the end (e.g. [1,9,2,5,0,0,0]) and fills the zeros with the last non-zero value from the end (e.g. [1,9,2,5,0,0,0])
def fillZeroTail(data):
    needsFilling = data[-1] == 0
    if (needsFilling):
        # Find idx of first non-zero value from the end and its value
        firstNonZeroIdxFromEnd = data.shape[0]
        firstNonZeroValFromEnd = 0
        for i in range(data.shape[0] - 1,0,-1):
            if (data[i] != 0):
                firstNonZeroIdxFromEnd = i
                firstNonZeroValFromEnd = data[i]
                break
                
        # Fill the trailing zeros with firstNonZeroValFromEnd
        for i in range(firstNonZeroIdxFromEnd + 1, data.shape[0]):
            data[i] = firstNonZeroValFromEnd
    
    return data

In [None]:
#### Need to change numDates
# Choropleth for world with animation
allCountries = np.unique(worldCases['Country/Region'])
numCountries = allCountries.shape[0]

d0 = datetime.datetime(2020, 1, 22)
d1 = datetime.datetime(today.year, today.month, today.day)
delta = d1 - d0
numDates = delta.days

numEntries = numCountries * numDates
blankTableSpace = np.zeros(numEntries,).tolist()

# Get all dates as strings
allDates = []
date = datetime.datetime(2020,1,21)              # Latest date available in data
for i in range(numDates): 
    date += datetime.timedelta(days=1)
    allDates.append(date.strftime('%-m/%-d/%Y'))

# Duplicate country names, ISOs, and Pop for each day
allCountriesRep = np.zeros(numEntries,).astype(str)
allCountryISOsRep = np.zeros(numEntries,).astype(str)
allCountryPopRep = np.zeros(numEntries,)
for i in range(numCountries):
    idxs = np.arange(i*numDates,(i+1)*numDates)
    allCountriesRep[idxs] = allCountries[i]  
    allCountryISOsRep[idxs] = np.array(isoLookupTable.loc[isoLookupTable['Country_Region'] == allCountries[i]].iso3)[0]
    allCountryPopRep[idxs] = np.array(isoLookupTable.loc[isoLookupTable['Country_Region'] == allCountries[i]].Population)[0]

# Create data table for choropleth plot
dummyData = np.zeros(numEntries,).tolist()
d = {'Country': allCountriesRep.tolist(), 'Date': np.ones(numEntries,).astype(str).tolist(), 'ISO': allCountryISOsRep, 'Population': allCountryPopRep, 
     'Cases': dummyData, 'NewCasesPer100k': dummyData, 'DT_Cases': dummyData, 
     'Hospitalizations': dummyData, 'NewHospitalizationsPer100k': dummyData, 'DT_Hos': dummyData, 
     'Deaths': dummyData,'NewDeathsPer100k': dummyData, 'DT_Deaths': dummyData,
     'Recoveries': dummyData, 'NewRecoveriesPer100k': dummyData, 'MortalityRate(%)': dummyData}
worldGeoPlotData = pd.DataFrame(data=d)
    
# Find/compute world data for each day
for i in tqdm(range(numCountries)):       
    # Find rows for current country
    countryIdxs = np.array(worldGeoPlotData.loc[worldGeoPlotData.Country == allCountries[i]].index)
    countryPopulation = np.array(worldGeoPlotData.Population)[countryIdxs[0]]
    
    # Set dates properly
    worldGeoPlotData.at[countryIdxs, 'Date'] = allDates
    
    # Get cases for country, but first concanete it if necessary
    countryDFCases = worldCases.loc[worldCases['Country/Region'] == allCountries[i]]
    countryCasesIdxs = np.array(countryDFCases.index)
    concatCountryDataCases = countryDFCases.groupby('Country/Region').sum().reset_index()
    
    countryCases = np.array(concatCountryDataCases.iloc[0,3:]).astype(int)
    countryCasesPer100k = countryCases / (countryPopulation / 100000)
    allCountryDT_Cases = allDoublingTimes(countryCases)
    countryNewCasesPer100k = np.concatenate(([0],np.diff(countryCasesPer100k)), axis=0)
    worldGeoPlotData.at[countryIdxs, 'Cases'] = [0] + countryCases 
    worldGeoPlotData.at[countryIdxs, 'DT_Cases'] = allCountryDT_Cases
    worldGeoPlotData.at[countryIdxs, 'NewCasesPer100k'] = np.round(countryNewCasesPer100k, 3)
    
    # Get deaths for country, but first concanete it if necessary
    countryDFDeaths = worldDeaths.loc[worldDeaths['Country/Region'] == allCountries[i]]
    countryDeathsIdxs = np.array(countryDFDeaths.index)
    concatCountryDataDeaths = countryDFDeaths.groupby('Country/Region').sum().reset_index()
    
    countryDeaths = np.array(concatCountryDataDeaths.iloc[0,3:]).astype(int)
    countryDeathsPer100k = countryDeaths / (countryPopulation / 100000)
    allCountryDT_Deaths = allDoublingTimes(countryDeaths)
    countryNewDeathsPer100k = np.concatenate(([0],np.diff(countryDeathsPer100k)), axis=0)
    worldGeoPlotData.at[countryIdxs, 'Deaths'] = countryDeaths
    worldGeoPlotData.at[countryIdxs, 'DT_Deaths'] = allCountryDT_Deaths
    worldGeoPlotData.at[countryIdxs, 'NewDeathsPer100k'] = np.round(countryNewDeathsPer100k, 3)
    
    # Get hospitalizations for country  (no concatenation needed)
    # Since not all countries have published hos data, check if country exists in OWID data first
    if (owidData.loc[owidData.location == allCountries[i]].shape[0] != 0 or allCountries[i] == 'US'):
        if (allCountries[i] != 'US'):      # US abbrevation used in John Hopkins data, but not in OWID data
            countryHos = owidData.loc[owidData.location == allCountries[i]]['hosp_patients'][:-1]
            firstDate = np.datetime64(np.array(owidData.loc[owidData.location == allCountries[i]]['date'])[0])
            needsEmptyData = firstDate != '2020-01-22'
        else:
            country = 'United States'
            countryHos = owidData.loc[owidData.location == country]['hosp_patients'][:-1]
            firstDate = np.datetime64(np.array(owidData.loc[owidData.location == country]['date'])[0])
            needsEmptyData = firstDate != '2020-01-22'

        # Append empty data (if country doesn't have data since 1-22-2020)
        if (needsEmptyData):
            numAdditionalRows = len(countryIdxs) - countryHos.shape[0]
            emptyData = np.zeros(numAdditionalRows,)
            countryHos = np.concatenate((emptyData,countryHos))
    else:
        countryHos = np.zeros(len(countryIdxs),)
    
    countryHos = fillZeroTail(countryHos)
    countryDT_Hos = allDoublingTimes(countryHos)
    countryHosPer100k = countryHos / (countryPopulation / 100000)
    countryNewHosPer100k = np.concatenate(([0],np.diff(countryHosPer100k)), axis=0)
    worldGeoPlotData.at[countryIdxs, 'Hospitalizations'] = countryHos
    worldGeoPlotData.at[countryIdxs, 'NewHospitalizationsPer100k'] = np.round(countryNewHosPer100k, 3)
    worldGeoPlotData.at[countryIdxs, 'DT_Hos'] = countryDT_Hos
    
    # Get recoveries for country, but first concanete it if necessary
    countryDFRec = worldRecoveries.loc[worldRecoveries['Country/Region'] == allCountries[i]]
    countryRecIdxs = np.array(countryDFRec.index)
    concatCountryDataRec = countryDFRec.groupby('Country/Region').sum().reset_index()
    
    countryRec = np.array(concatCountryDataRec.iloc[0,3:]).astype(int)
    countryRecPer100k = countryRec / (countryPopulation / 100000)
    countryNewRecPer100k = np.concatenate(([0],np.diff(countryRecPer100k)), axis=0)
    worldGeoPlotData.at[countryIdxs, 'Recoveries'] = countryRec
    worldGeoPlotData.at[countryIdxs, 'NewRecoveriesPer100k'] = np.round(countryNewRecPer100k, 3)
    
    # Get mortality rate for country (deaths/cases)
    if (not countryCases.any()):                                       # If any value is not zero
        countryMortalityRate = countryDeaths / countryCases * 100
    else:
        countryMortalityRate = countryDeaths / (countryCases + 0.1) * 100
    worldGeoPlotData.at[countryIdxs, 'MortalityRate(%)'] = np.round(countryMortalityRate, 3)

    
# Sort dates (and convert to date_time to ensure proper date sorting)
worldGeoPlotData.Date = pd.to_datetime(worldGeoPlotData.Date)
worldGeoPlotData = worldGeoPlotData.sort_values(by=['Date'], ascending=False)
worldGeoPlotData.Date = (worldGeoPlotData.Date).astype(str)

    
# Now create the choropleth data
fig = px.choropleth(worldGeoPlotData, locations="ISO",
                    color="DT_Cases",
                    hover_name="Country",
                    hover_data=['Cases','NewCasesPer100k','DT_Cases',
                                'Hospitalizations','NewHospitalizationsPer100k', 'DT_Hos',
                                'Deaths','NewDeathsPer100k','DT_Deaths',
                                'Recoveries','NewRecoveriesPer100k', 'MortalityRate(%)'],
                    animation_frame='Date',
                    color_continuous_scale="RdYlGn")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.update_layout(title='World Metrics')
fig.show()

In [None]:
### Change date needed
## First get world clustering data
# todayDate = '2020-09-15'
todayDate = yesterday.strftime("%Y-%m-%d")
allCountries = np.unique(worldGeoPlotData.Country)
numCountries = allCountries.shape[0]

worldDataToday = worldGeoPlotData.loc[worldGeoPlotData.Date == todayDate]
preExistingData = worldDataToday.sort_values(by=['DT_Cases','DT_Deaths'], ascending=True)

d = {'Country': preExistingData.Country, 'DT_Cases': preExistingData.DT_Cases, 
     'DT_Deaths': preExistingData.DT_Deaths, 'DT_Hos': preExistingData.DT_Hos,
     'SeverityIndex': np.zeros(numCountries,).tolist()}
worldClusteringData = pd.DataFrame(data=d)

# Calculate 7 day avg for new cases per 100k for all countries
for i in range(numCountries):
    countryIdx = worldClusteringData.loc[worldClusteringData.Country == allCountries[i]].index[0]
    countryData = worldGeoPlotData.loc[worldGeoPlotData.Country == allCountries[i]]
    countryNewCasesPer100k_7DayAvg = np.mean(np.array(countryData.NewCasesPer100k)[0:8])
    worldClusteringData.at[countryIdx, 'SeverityIndex'] = countryNewCasesPer100k_7DayAvg
    
# # Normalize the 7 day avg data
# min_max_scaler = MinMaxScaler()
# worldClusteringData.SeverityIndex = min_max_scaler.fit_transform(worldClusteringData.SeverityIndex.to_numpy().reshape(-1,1))
# worldClusteringData.SeverityIndex = np.round(worldClusteringData.SeverityIndex / np.amax(worldClusteringData.SeverityIndex), 3)
worldClusteringData.SeverityIndex = worldClusteringData.SeverityIndex.fillna(0) 
median = np.median(np.array(worldClusteringData.SeverityIndex))
worldClusteringData.SeverityIndex = np.clip(np.divide(worldClusteringData.SeverityIndex, median), a_min = 0, a_max = np.inf)
    

# Now display the table    
worldClusteringData['DT_Cases'].round(0)
worldClusteringData['DT_Deaths'].round(0)
worldClusteringData['DT_Hos'].round(0)
worldClusteringData['SeverityIndex'].round(2)

worldClusteringData.style.\
    apply(redColorscale, subset=['DT_Cases','DT_Deaths','DT_Hos']).\
    apply(yellowColorscale, subset=['DT_Cases','DT_Deaths','DT_Hos']).\
    apply(greenColorscale, subset=['DT_Cases','DT_Deaths','DT_Hos']).\
    format({
        'DT_Cases': '{:,.0f}'.format,
        'DT_Deaths': '{:,.0f}'.format,
        'SeverityIndex': '{:,.2f}'.format,
    }).\
    background_gradient(cmap='RdYlGn_r', subset='SeverityIndex')

<font size="5"><b>World Clustering</b></font>

In [None]:
# Now create the 3D scatter plot
colors = fullColorscale(worldClusteringData.DT_Cases.astype(float))                    # Find colors for all data points
worldClusteringData['DT_Cases'] = worldClusteringData['DT_Cases'].astype(str)          # Convert DT_Cases values to string
fig = px.scatter_3d(worldClusteringData, x='DT_Cases', y='DT_Deaths', z='SeverityIndex', color='DT_Cases', color_discrete_sequence=colors, hover_name='Country')
fig.update_layout(title='World Doubling Time Clustering by Country', width=800, showlegend=False)
fig.show()

<font size="5"><b>World Sunburst Plot</b></font>

<font size="4">Click on inner pie rings to expand sunburst for a certain continent</font>

In [None]:
# Grab key columns for today's data
owidKeyData = owidData[['continent','location','date','total_cases','total_deaths','new_cases_per_million']]
owidKeyDataToday = owidKeyData.loc[owidKeyData.date == todayDate]
owidKeyDataToday.new_cases_per_million = np.round(owidKeyDataToday.new_cases_per_million, 3)

owidKeyDataToday = owidKeyDataToday[owidKeyDataToday.continent != 0]   # Drop rows with continent zero
owidKeyDataToday = owidKeyDataToday.fillna(0.1)

In [None]:
# Sunburst Plot for World Data
fig = px.sunburst(owidKeyDataToday, path=['continent','location'], values='total_cases',
                  #color='new_cases_per_million', 
                  hover_data=['total_cases','total_deaths','new_cases_per_million'],)
                  #color_continuous_scale='rdylgn_r')   
fig.show()

<font size="5"><b>World Vaccination Data</b></font>

In [None]:
# worldVaccinationData['total_vaccinations'] = worldVaccinationData['total_vaccinations'].astype(int)
# worldVaccinationData['date'] = pd.to_datetime(worldVaccinationData['date']).astype(str)
# worldVaccinationData = worldVaccinationData.sort_values(by=['date'], ascending=False)
# worldVaccinationData = worldVaccinationData.ffill(axis = 0)     # Fill in missing data w/ last observed data

# # Trim off last day so that some contries do not disappear in the plot (due to yesterday data not being reported)
# worldVaccinationData = worldVaccinationData.loc[worldVaccinationData.date >= '2020-12-13']
# worldVaccinationData = worldVaccinationData.loc[worldVaccinationData.date <= todayDate]
worldVaccinationData = worldVaccinationData.ffill(axis = 0)     # Fill in missing data w/ last observed data

worldVaccinationData['date'] = pd.to_datetime(worldVaccinationData['date'])
worldVaccinationData.sort_values(by=['date'], ascending=False, inplace=True)
worldVaccinationData['date'] = worldVaccinationData['date'].astype(str)

# Create choropleth plot
fig = px.choropleth(worldVaccinationData, locations="iso_code",
                    color="people_fully_vaccinated_per_hundred",
                    hover_name="location",
                    animation_frame="date",
                    color_continuous_scale="RdYlGn")
fig.show()

<font size="5"><b>World Doubling Time Time Series Plot by Country</b></font>

In [None]:
# Helper functions to create the appropriate buttons
def makePlotButtons(options, nTracesPerOption=1):    
    buttons = []
    for i, opt in enumerate(options):
        visibleArr = np.full((nTracesPerOption*len(options),), 
                             False, dtype=bool)
        
        startIdx = nTracesPerOption*i
        for idx in range(nTracesPerOption):
            visibleArr[startIdx + idx] = True
            
        buttons.append(dict(label=str(opt),
                            method='restyle',
                            args=[{'visible': list(visibleArr)}]))    # 'Visible' arg determines which plots are shown 
                                                                      # depending on which dropdown is selected 
    return buttons    

In [None]:
def putUSInFront(countries):
    a = allCountries
    b = np.delete(a, np.where(a == 'US'))
    return np.concatenate((['US'], b), axis=0)

In [None]:
allCountries = np.unique(worldGeoPlotData.Country)
allCountries = putUSInFront(allCountries)

allTraces = []
for country in tqdm(allCountries):
    countryDTData = worldGeoPlotData.loc[worldGeoPlotData.Country == country][['Date','DT_Cases', 'DT_Deaths', 'DT_Hos']]
    countryDTData = countryDTData[::-1]
    
    # Clip data between 0 and inf (due to small countries having drops in cumulative cases deaths)
    countryDTData.DT_Cases = np.clip(countryDTData.DT_Cases, a_min=0, a_max=np.inf)
    countryDTData.DT_Deaths = np.clip(countryDTData.DT_Deaths, a_min=0, a_max=np.inf)
    
    countryDT_Cases = go.Scatter(x=countryDTData.Date, y=countryDTData.DT_Cases, name='DT_Cases')
    countryDT_Deaths = go.Scatter(x=countryDTData.Date, y=countryDTData.DT_Deaths, name='DT_Deaths')
    #countryDT_Hos = go.Scatter(x=countryDTData.Date, y=countryDTData.DT_Hos, name='DT_Hos')
    
    # Append traces to traces list
    allTraces.append(countryDT_Cases)
    allTraces.append(countryDT_Deaths)

fig = go.Figure(data=allTraces)
fig.update_layout(title='Doubling Times for Selected Country', 
                  xaxis_title='Date', yaxis_title='Doubling Time [Days]',
                      updatemenus=[
                        dict(
                            active=-1,
                            #buttondefaults=makeDTPlotButtons()[0],
                            buttons=makePlotButtons(allCountries, 2),
                            direction="down",
                            pad={"r": 10, "t": 10},
                            showactive=True,
                            x=1,
                            xanchor="left",
                            y=1.25,
                            yanchor="top"
                        )
                      ],
                  width=800, height=600)

fig.show()
#plt.plot(countryDTData_Rev.Date, countryDTData_Rev.DT_Cases)

<font size="7"><b>US Summary</b></font>
<a id='us_summary'></a>

<font size="4.5">DT is an abbreviation for Doubling Time</font>

<font size="4.5">Severity Index is the 7 Day Average of New Cases per 100k for all US States divided by the median of that data </font>

### <font size="5"><b>US Summary Table (for 9/18/21)</b></font>

In [None]:
allStateNames = np.unique(usStateData.state)

#################################################################################################
# DT_Cases/DT_Deaths
numStates = allStateNames.shape[0]
stateCaseData = np.array(usStateData.cases)
stateDeathData = np.array(usStateData.deaths)

doublingTimesCases = np.zeros(numStates,)
doublingTimesDeaths = np.zeros(numStates,)

for i in range(numStates):
    caseData = stateCaseData[np.where(usStateData.state == allStateNames[i])]
    deathData = stateDeathData[np.where(usStateData.state == allStateNames[i])]
    
    doublingTimesCases[i] = allDoublingTimes(caseData)[-1]
    doublingTimesDeaths[i] = allDoublingTimes(deathData)[-1]
#################################################################################################
# Severity Index
severityIdxs = np.zeros(numStates,)

populationData = pd.read_csv('../input/population-data/Population_v2.csv')

for i in range(numStates):
    stateCases = stateCaseData[np.where(usStateData.state == allStateNames[i])]
    
    if (populationData.loc[np.where(populationData == allStateNames[i])[0]].shape[0] != 0):       
        # Check if population data is available
        stateRow = populationData.loc[np.where(populationData == allStateNames[i])[0]]
        statePop = stateRow.iloc[0,1]
    
        newCasesData = np.diff(stateCases)
        newCasesPer100k = (newCasesData / (statePop / 100000)).astype(int)
        newCasesPer100kFiltered = nDayAverage(newCasesPer100k, 7)
        severityIdxs[i] = np.mean(newCasesPer100k[-7:])
#################################################################################################    
# Put all the computed data into a table
d = {'State': allStateNames.tolist(), 'DT_Cases': doublingTimesCases.tolist(), 'DT_Deaths': doublingTimesDeaths.tolist(), 'SeverityIndex': severityIdxs.tolist()}
usSummary = pd.DataFrame(data=d)

# Sort the data
usSummary_Sorted = usSummary.sort_values(by=['SeverityIndex'], ascending=False)

median = np.median(np.array(usSummary_Sorted.SeverityIndex))
usSummary_Sorted.SeverityIndex = np.divide(usSummary_Sorted.SeverityIndex, median)

# Remove any negative severity index data
usSummary_Sorted.SeverityIndex = np.clip(usSummary_Sorted.SeverityIndex, a_min=0, a_max=np.inf)

columns_titles = ['State', 'DT_Cases', 'DT_Deaths', 'SeverityIndex']
usSummary_Sorted = usSummary_Sorted.reindex(columns=columns_titles)

# Apply discrete color scale to data        
usSummary_Sorted.style.\
    apply(redColorscale, subset=['DT_Cases','DT_Deaths']).\
    apply(yellowColorscale, subset=['DT_Cases','DT_Deaths']).\
    apply(greenColorscale, subset=['DT_Cases','DT_Deaths']).\
    background_gradient(cmap='RdYlGn_r', subset='SeverityIndex').\
    format({
        'DT_Cases': '{:,.0f}'.format,
        'DT_Deaths': '{:,.0f}'.format,
        'SeverityIndex': '{:,.2f}'.format,
    })

<font size="5"><b>Clustering Test</b></font>

In [None]:
colors = fullColorscale(usSummary_Sorted.DT_Cases.astype(float))                       # Find colors for all data points
usSummary_Sorted['DT_Cases'] = usSummary_Sorted['DT_Cases'].astype(str)                # Convert DT_Cases values to string
fig = px.scatter_3d(usSummary_Sorted, x='DT_Cases', y='DT_Deaths', z='SeverityIndex', color='DT_Cases', color_discrete_sequence=colors, hover_name='State')
fig.update_layout(title='Doubling Time Clustering by State', width=800, showlegend=False)
fig.show()

<font size="7"><b>US Data Visualization</b></font>
<a id='us_features'></a>

<font size="5"><b>Doubling Times per US State (Geographical Plot followed by Bar Plot)</b></font>

In [None]:
## DT for Cases/Deaths/Hos (selected via drop-down menu)

# Obtain case/death data for ALL days for ALL 50 states
allStateNames = np.array(usStateData.state)
allStateNames = np.unique(allStateNames)

stateCaseData = np.array(usStateData.cases)
stateDeathData = np.array(usStateData.deaths)

numStates = allStateNames.shape[0]
doublingTimesCases = np.zeros(numStates,)
doublingTimesDeaths = np.zeros(numStates,)
timeSincePeak = np.zeros(numStates,)
recoveryTime = np.zeros(numStates,)

for i in tqdm(range(numStates)):
    caseData = stateCaseData[np.where(usStateData.state == allStateNames[i])]
    deathData = stateDeathData[np.where(usStateData.state == allStateNames[i])]
    
    # Get data to calculate days since peak
    allDTsCases = allDoublingTimes(caseData)
    allDTsDeaths = allDoublingTimes(deathData)
    
    DTPeaks = np.array(find_peaks(nDayAverage(allDTsCases, 5)))[0]
    valuesAtPeaks = allDTsCases[DTPeaks]
    if (np.amax(valuesAtPeaks) > 50):
        latestPeak = DTPeaks[np.where(valuesAtPeaks == np.amax(valuesAtPeaks[np.where(valuesAtPeaks >= 50)[0]]))[0][0]]        # Ensures that latest peak started at a DT above 50
    else:
        latestPeak = DTPeaks[-1]
    timeSincePeak[i] = caseData.shape[0] - latestPeak
    
    # Get recovery time data
    filteredDTsCases = nDayAverage(allDTsCases, 7)
    recoveryTime[i] = np.round(findTimeForRecovery(filteredDTsCases), 0)
    
    doublingTimesCases[i] = allDTsCases[-1]
    doublingTimesDeaths[i] = allDTsDeaths[-1]

#########################################################################################################################
## DT for Hospitalizations
# Get ALL data, state, and hospitalizedCumulative data and make new DataFrame
covidTrackingDates = np.array(covidTrackingDaily.date)
covidTrackingState = np.array(covidTrackingDaily.state)
covidTrackingHosCumul = np.array(covidTrackingDaily.hospitalizedCumulative)
dataLen = covidTrackingState.shape[0]

# Get full state names
fullStateNames = []
for i in range(dataLen):
    fullName = eval('us.states.' + covidTrackingState[i] + '.name')
    fullStateNames.append(fullName)
    
covidTrackingState = fullStateNames

d = {'Date': covidTrackingDates.tolist(), 'State': covidTrackingState, 'HosCumul': covidTrackingHosCumul.tolist()}
hosCumulData = pd.DataFrame(data=d)

doublingTimesHos = np.zeros(numStates,)

# Get hosCumul for each state for all days and compute the doubling times
for i in range(numStates):
    hosCumulAllDays = np.array(hosCumulData.HosCumul[np.where(hosCumulData.State == allStateNames[i])[0]])
    currentHosCumul = hosCumulAllDays[0]
    halfOfCurrentHosCumul = currentHosCumul / 2.0
    
    hosCumul_MinusHalfOfCurrentHosCumul = abs(hosCumulAllDays - halfOfCurrentHosCumul)
    
    idxApproxHalfOfHos = np.where(hosCumul_MinusHalfOfCurrentHosCumul == np.min(hosCumul_MinusHalfOfCurrentHosCumul))[0][0]

    doublingTimesHos[i] = idxApproxHalfOfHos
    
    
# Concatenate the data
d = {'State': allStateNames.tolist(), 
     'DT_Cases': doublingTimesCases.tolist(), 
     'DT_Deaths': doublingTimesDeaths.tolist(),
     'DT_Hos': doublingTimesHos.tolist()}
doublingTimeData = pd.DataFrame(data=d)
#doublingTimeData.DT_Hos = doublingTimeData[DT_Hos > 0]

maxDT_Cases = np.amax(doublingTimeData.DT_Cases)
maxDT_Deaths = np.amax(doublingTimeData.DT_Deaths)

doublingTimeHos = doublingTimeData[doublingTimeData.DT_Hos > 0]            # Remove states with no data (hence 0  doubling time)
maxDT_Hos = np.amax(doublingTimeHos.DT_Hos)
########################################################################################################################
# Create choropleth plots for selected doubling times
choroplethDTCases = go.Choropleth(
    name='Cases',
    locations=fullStateNameToAbbrev(doublingTimeData.State),
    locationmode='USA-states',
    z = doublingTimeData.DT_Cases,
    zmin = 0,
    zmax = maxDT_Cases,
    #colorscale=["red","yellow","green"],
    colorscale=[[0,'red'], [(redHigh - 1)/maxDT_Cases,'red'], [yellowLow/maxDT_Cases,'yellow'], [yellowHigh/maxDT_Cases,'yellow'], [greenLow/maxDT_Cases,'green'], [1,'green']],
    autocolorscale=False,
    text='Doubling Time (Days)', 
    marker_line_color='grey',
    colorbar_title="Doubling Time"
)
choroplethDTDeaths = go.Choropleth(
    name='Deaths',
    locations=fullStateNameToAbbrev(doublingTimeData.State),
    locationmode='USA-states',
    z = doublingTimeData.DT_Deaths,
    zmin = 0,
    zmax = maxDT_Deaths,
    #colorscale=["red","yellow","green"],
    colorscale=[[0,'red'], [(redHigh - 1)/maxDT_Deaths,'red'], [yellowLow/maxDT_Deaths,'yellow'], [yellowHigh/maxDT_Deaths,'yellow'], [greenLow/maxDT_Deaths,'green'], [1,'green']],
    autocolorscale=False,
    text='Doubling Time (Days)', 
    marker_line_color='grey',
    colorbar_title="Doubling Time"
)
choroplethDTHos = go.Choropleth(
    name='Hospitalizations',
    locations=fullStateNameToAbbrev(np.array(doublingTimeHos.State)),           # Convert to numpy because df indexes will crash when 0 DT states have been removed
    locationmode='USA-states',
    z = doublingTimeHos.DT_Hos,
    zmin = 0,
    zmax = maxDT_Hos,
    #colorscale=["red","yellow","green"],
    colorscale=[[0,'red'], [(redHigh - 1)/maxDT_Hos,'red'], [yellowLow/maxDT_Hos,'yellow'], [yellowHigh/maxDT_Hos,'yellow'], [greenLow/maxDT_Hos,'green'], [1,'green']],
    autocolorscale=False,
    text='Doubling Time (Days)', 
    marker_line_color='grey',
    colorbar_title="Doubling Time"
)

DTPlot = go.Figure(data=[choroplethDTCases,choroplethDTDeaths,choroplethDTHos])

opts = ['Cases','Deaths','Hospitalizations']
DTPlot.update_layout(
    title_text='Doubling Times per US State',
    geo = dict(
        scope='usa',
        projection=go.layout.geo.Projection(type = 'albers usa'),
        showlakes=True,
        lakecolor='rgb(255, 255, 255)'),
    updatemenus=[
        dict(
            active=-1,
            #buttondefaults=makeDTPlotButtons()[0],
            buttons=makePlotButtons(opts),
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0,
            xanchor="left",
            y=1.1,
            yanchor="top"
        )
    ]
)

DTPlot.show()

# Create SORTED doubling time plot with colors for selection option

# First create traces
options = ['DT_Cases','DT_Deaths','DT_Hos']
traces = []
for i, opt in enumerate(options):
    if (opt != 'DT_Hos'):                          # DT_Hos uses doublingTimeHos variable, NOT doublingTimeData dataframe
        dt_data = doublingTimeData.sort_values(by=[opt], ascending=False)
        dt_Int = (dt_data[opt]).astype(int)
        colors = np.full((dt_data[opt].shape[0],), 'blue')
    else:
        dt_data = doublingTimeHos.sort_values(by=[opt], ascending=False)
        dt_Int = dt_data.DT_Hos.astype(int)
        colors = np.full((dt_data[opt].shape[0],), 'blue')
        
    for i in range (0,redHigh):
        colors = np.where(dt_Int == i, 'red', colors)
    for i in range (yellowLow,yellowHigh + 1):
        colors = np.where(dt_Int == i, 'yellow', colors)
    colors = np.where(dt_Int > greenLow, 'green', colors)

    traces.append(go.Bar(x=dt_data[opt], y=dt_data.State, orientation='h', marker=dict(color=colors.tolist()), name=opt))
    

# Now create figure with all the traces
doublingTimeLayoutSORTED = go.Layout(title='Average Doubling Time in each US State/Territory - SORTED', 
                                     xaxis_title='Doubling Time', yaxis_title='State Name',
                                     width=800, height=1100)

doublingTimePlotSORTED = go.Figure(data=traces, layout=doublingTimeLayoutSORTED)

doublingTimePlotSORTED.update_layout(
    updatemenus=[
        dict(
            active=-1,
            #buttondefaults=makeDTPlotButtons()[0],
            buttons=makePlotButtons(opts),
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=-0.3,
            xanchor="right",
            y=1,
            yanchor="top"
        )
    ]
)
doublingTimePlotSORTED.add_shape(
    dict(
        type="line",
        x0=40,
        y0=-1,
        x1=40,
        y1=56,
        line=dict(
            color="Black",
            width=2
        )
))
doublingTimePlotSORTED.add_shape(
    dict(
        type="line",
        x0=redHigh,
        y0=-1,
        x1=yellowLow,
        y1=56,
        line=dict(
            color="Black",
            width=2
        )
))
doublingTimePlotSORTED.update_xaxes(showgrid=True)
                 
doublingTimePlotSORTED.show()

<font size="5"><b>ALL Doubling Times for all US States (with 7 Day Predictions)</b></font>

<font size="4.5">NOTE: Click on data series in legend to toggle data series on/off</font>

In [None]:
def kalmanStep(x_k_k, v_k_k, z_kp1, P_k_k, twoState, a_k_k=0, q=1, r=1):
    if (twoState):
        I = np.identity(2)
        F = np.array([[1,1],[0,1]])
        H = np.array([1,0]).reshape(1,2)
        Q = np.array([[0,0],[0,q]])

        temp = np.array([x_k_k,v_k_k]).reshape(2,1)
        temp1 = np.matmul(F, temp)

        P_kp1_k = np.matmul(np.matmul(F, P_k_k), np.transpose(F)) + Q
        sigmaX = math.sqrt(P_kp1_k[0,0])
        sigmaV = math.sqrt(P_kp1_k[1,1])

        S_kp1 = np.matmul(np.matmul(H, P_kp1_k), np.transpose(H)) + r

        Ws = np.matmul(P_kp1_k, np.transpose(H)) / S_kp1
        Ws = Ws.reshape(2,1)         # For proper matrix multiplication
        
        temp = temp1 + np.matmul(Ws, z_kp1 - np.matmul(H, temp1))
        
        x_kp1_kp1 = temp[0]
        v_kp1_kp1 = temp[1]

        I_minus_WH = I - np.matmul(Ws,H)
        P_kp1_kp1 = np.matmul(np.matmul(I_minus_WH, P_kp1_k), np.transpose(I_minus_WH)) + np.matmul(Ws, r*np.transpose(Ws))
        
        return x_kp1_kp1, v_kp1_kp1, P_kp1_kp1, sigmaX, sigmaV
    else:
        # Three state model
        I = np.identity(3)
        
        F = np.array([[1,1,0.5],[0,1,1],[0,0,1]])
        gamma = np.array([0.1667,0.5,1]).reshape(3,1)
        Q = q * np.matmul(gamma, np.transpose(gamma))
        H = np.array([1,0,0]).reshape(1,3)
        
        temp = np.array([x_k_k,v_k_k,a_k_k]).reshape(3,1)
        temp1 = np.matmul(F, temp)

        P_kp1_k = np.matmul(np.matmul(F, P_k_k), np.transpose(F)) + Q
           
        sigmaX = math.sqrt(P_kp1_k[0,0])
        sigmaV = math.sqrt(P_kp1_k[1,1])
        sigmaA = math.sqrt(P_kp1_k[2,2])

        S_kp1 = np.matmul(np.matmul(H, P_kp1_k), np.transpose(H)) + r

        Ws = np.matmul(np.matmul(P_kp1_k, np.transpose(H)), np.linalg.inv(S_kp1))
        Ws = Ws.reshape(3,1)         # For proper matrix multiplication
        
        temp = temp1 + np.matmul(Ws, z_kp1 - np.matmul(H, temp1))
        
        x_kp1_kp1 = temp[0]
        v_kp1_kp1 = temp[1]
        a_kp1_kp1 = temp[2]

        I_minus_WH = I - np.matmul(Ws,H)
        P_kp1_kp1 = np.matmul(np.matmul(I_minus_WH, P_kp1_k), np.transpose(I_minus_WH)) + np.matmul(Ws, r*np.transpose(Ws))
        
        return x_kp1_kp1, v_kp1_kp1, a_kp1_kp1, P_kp1_kp1, sigmaX, sigmaV, sigmaA
    
    return None

In [None]:
def kalmanPrediction(x_final, v_final, p_final, L, F, Q, a_final=0):
    #Q = np.array([[0,0],[0,q]])
    #F = np.array([[1,1],[0,1]])
    numStates = F.shape[0]
    
    pPreds = np.zeros((L,numStates,numStates))
    pPreds[0] = p_final
    
    if (numStates == 2):
        xPreds = np.zeros(L,)
        vPreds = np.zeros(L,)
        sigmaPredXs = np.zeros(L,)
        sigmaPredVs = np.zeros(L,)
        xPreds[0] = x_final 
        vPreds[0] = v_final
        
        for i in range(1,L):
            xPreds[i] = xPreds[i-1] + vPreds[i-1]
            vPreds[i] = vPreds[i-1]
            pPreds[i] = np.matmul(np.matmul(F, pPreds[i-1]), np.transpose(F)) + Q
            sigmaPredXs[i] = math.sqrt(pPreds[i,0,0])
            sigmaPredVs[i] = math.sqrt(pPreds[i,1,1])
        
        return xPreds, vPreds, pPreds, sigmaPredXs, sigmaPredVs
    else:
        xPreds = np.zeros(L,)
        vPreds = np.zeros(L,)
        aPreds = np.zeros(L,)
        sigmaPredXs = np.zeros(L,)
        sigmaPredVs = np.zeros(L,)
        sigmaPredAs = np.zeros(L,)
        xPreds[0] = x_final 
        vPreds[0] = v_final
        aPreds[0] = a_final
        
        for i in range(1,L):
            xPreds[i] = xPreds[i-1] + vPreds[i-1] + 0.5*aPreds[i-1]
            vPreds[i] = vPreds[i-1] + aPreds[i-1]
            aPreds[i] = aPreds[i-1]
            pPreds[i] = np.matmul(np.matmul(F, pPreds[i-1]), np.transpose(F)) + Q
            sigmaPredXs[i] = math.sqrt(pPreds[i,0,0])
            sigmaPredVs[i] = math.sqrt(pPreds[i,1,1])
            sigmaPredAs[i] = math.sqrt(pPreds[i,2,2])
        
        return xPreds, vPreds, aPreds, pPreds, sigmaPredXs, sigmaPredVs, sigmaPredAs
    
    return None

In [None]:
def kalmanFilter(measuredData, numSteps, numPreds, twoState, a_0_0=1, q=1, r=1):  
    x_0_0 = measuredData[0]
    v_0_0 = np.diff(measuredData)[0]
    
    if (twoState):
        F = np.array([[1,1],[0,1]])
        Q = np.array([[0,0],[0,q]])
        
        xs = np.zeros(numSteps,)
        vs = np.zeros(numSteps,)
        Ps = np.zeros((numSteps,2,2))
        sigmaXs = np.zeros(numSteps + numPreds,)
        sigmaVs = np.zeros(numSteps + numPreds,)
        
        xs[0] = x_0_0
        vs[0] = v_0_0
        Ps[0] = 1e6 * np.identity(2)
        
        # Call step function to filter on measured data
        for i in range(1,numSteps):
            xs[i], vs[i], Ps[i], sigmaXs[i], sigmaVs[i] = kalmanStep(xs[i-1],vs[i-1],measuredData[i],Ps[i-1], q, r)
            
        # Then compute numPreds day predictions
        xPred, vPred, pPred, sigmaPredXs, sigmaPredVs = kalmanPrediction(xs[-1], vs[-1], Ps[-1], numPreds, F, Q)
        
        # Add sigmas from predictions to main sigma arrays
        sigmaXs[numSteps:] = sigmaPredXs
        sigmaVs[numSteps:] = sigmaPredVs
        
        # Return necessary variables
        return xs, vs, Ps, xPred, vPred, pPred, sigmaXs, sigmaVs
    else:
        gamma = np.array([0.1667,0.5,1]).reshape(3,1)
        Q = q * np.matmul(gamma, np.transpose(gamma))
        F = np.array([[1,1,0.5],[0,1,1],[0,0,1]])
        
        xs = np.zeros(numSteps,)
        vs = np.zeros(numSteps,)
        As = np.zeros(numSteps,)
        Ps = np.zeros((numSteps,3,3))
        sigmaXs = np.zeros(numSteps + numPreds,)
        sigmaVs = np.zeros(numSteps + numPreds,)
        sigmaAs = np.zeros(numSteps + numPreds,)
        
        xs[0] = x_0_0
        vs[0] = v_0_0
        As[0] = a_0_0
        Ps[0] = 1e6 * np.identity(3)
        
        # Call step function to filter on measured data
        for i in range(1,numSteps):
            xs[i], vs[i], As[i], Ps[i], sigmaXs[i], sigmaVs[i], sigmaAs[i] = kalmanStep(xs[i-1],vs[i-1],usCases[i],Ps[i-1], False, q, r)
        
        # Then compute numPreds day predictions
        xPred, vPred, aPred, pPred, sigmaPredXs, sigmaPredVs, sigmaPredAs = kalmanPrediction(xs[-1], vs[-1], Ps[-1], numPreds, F, Q, a_final=As[-1])
    
        # Add sigmas from predictions to main sigma arrays
        sigmaXs[numSteps:] = sigmaPredXs
        sigmaVs[numSteps:] = sigmaPredVs
        sigmaAs[numSteps:] = sigmaPredAs
        
        # Return necessary variables
        return xs, vs, As, Ps, xPred, vPred, aPred, pPred, sigmaXs, sigmaVs, sigmaAs
    
    return None

In [None]:
def confidence(value, error, desired):
    maxValue = value + error
    errorRange = 2 * error
    
    confidence = min(max((maxValue - desired) / errorRange * 100, 0), 100)
    
    return confidence

In [None]:
datetime.datetime(2020,9,14)
(today - timedelta(days = 1))

In [None]:
## New Cases/Death Per 100k plots with DROPDOWN MENU
# Obtain case/death data for ALL days for ALL 50 states
allStateNames = np.array(usStateData.state)
allStateNames = np.unique(allStateNames)

stateCaseData = np.array(usStateData.cases)
stateDeathData = np.array(usStateData.deaths)

numStates = allStateNames.shape[0]
doublingTimesCases = np.zeros(numStates,)
doublingTimesDeaths = np.zeros(numStates,)
timeSincePeak = np.zeros(numStates,)
recoveryTime = np.zeros(numStates,)

for i in range(numStates):
    caseData = stateCaseData[np.where(usStateData.state == allStateNames[i])]
    deathData = stateDeathData[np.where(usStateData.state == allStateNames[i])]
    
    # Get data to calculate days since peak
    allDTsCases = allDoublingTimes(caseData)
    allDTsDeaths = allDoublingTimes(deathData)
    
    DTPeaks = np.array(find_peaks(nDayAverage(allDTsCases, 5)))[0]
    valuesAtPeaks = allDTsCases[DTPeaks]
    if (np.amax(valuesAtPeaks) > 50):
        latestPeak = DTPeaks[np.where(valuesAtPeaks == np.amax(valuesAtPeaks[np.where(valuesAtPeaks >= 50)[0]]))[0][0]]        # Ensures that latest peak started at a DT above 50
    else:
        latestPeak = DTPeaks[-1]
    timeSincePeak[i] = caseData.shape[0] - latestPeak
    
    # Get recovery time data
    filteredDTsCases = nDayAverage(allDTsCases, 7)
    recoveryTime[i] = np.round(findTimeForRecovery(filteredDTsCases), 0)
    
    doublingTimesCases[i] = allDTsCases[-1]
    doublingTimesDeaths[i] = allDTsDeaths[-1]

maxDT_Cases = np.amax(doublingTimeData.DT_Cases)
maxDT_Deaths = np.amax(doublingTimeData.DT_Deaths)    
    
# Concatenate the data
d = {'State': allStateNames.tolist(), 
     'DT_Cases': doublingTimesCases.tolist(), 
     'DT_Deaths': doublingTimesDeaths.tolist(),}
dt_data = pd.DataFrame(data=d)

maxDT_Cases = np.amax(doublingTimeData.DT_Cases)
maxDT_Deaths = np.amax(doublingTimeData.DT_Deaths)


allStates = np.unique(dt_data.State)
numStates = allStates.shape[0]

twoState = True
numPreds = 7

# Get next7Days as string
next7Days_str = []
# date = datetime.datetime(2020,9,15)              # Latest date available in data
date = today- timedelta(days = 1)

for i in range(7): 
    next7Days_str.append(date.strftime('%Y-%m-%d'))
    date += datetime.timedelta(days=1)

allTraces = []
for i in tqdm(range(numStates)):
    stateData = usStateData.loc[usStateData.state == allStates[i]]
    
    caseData = np.array(stateData.cases)
    deathData = np.array(stateData.deaths)
    dates = stateData.date
    allDT_Cases = allDoublingTimes(caseData)
    allDT_Deaths = allDoublingTimes(deathData)
    
    numDays = allDT_Cases.shape[0]
    
    filteredDT_Cases = nDayAverage(allDT_Cases, 5)
    filteredDT_Deaths = nDayAverage(allDT_Deaths, 5)
    
    # Run Kalman Filter for predictions
    kalFilt_DTCases, kalFilt_DeltaDTCases, kalFiltPs, DT_CasesPreds, vPreds, pPreds, sigmaXs, sigmaVs = kalmanFilter(filteredDT_Cases, numDays, numPreds, twoState, q=0.1, r=0.1)
    kalFilt_DTDeaths, kalFilt_DeltaDTDeaths, kalFiltPs, DT_DeathsPreds, vPreds, pPreds, sigmaXs, sigmaVs = kalmanFilter(filteredDT_Deaths, numDays, numPreds, twoState, q=0.1, r=0.1)
    DT_CasesPreds = np.round(DT_CasesPreds, 3)
    DT_DeathsPreds = np.round(DT_DeathsPreds, 3)
    
    
    allDT_CasesData = go.Scatter(x=dates, y=allDT_Cases, name='Doubling Times for Cases')
    allDT_CasesFilteredData = go.Scatter(x=dates, y=filteredDT_Cases, name='Doubling Times for Cases (5 Day Average)')
    DT_CasesPredsData = go.Scatter(x=next7Days_str, y=DT_CasesPreds, name='Doubling Times for Cases (Kalman Filter Pred)')
    
    allDT_DeathsData = go.Scatter(x=dates, y=allDT_Deaths, name='Doubling Times for Deaths')
    allDT_DeathsFilteredData = go.Scatter(x=dates, y=filteredDT_Deaths, name='Doubling Times for Deaths (5 Day Average)')
    DT_DeathsPredsData = go.Scatter(x=next7Days_str, y=DT_DeathsPreds, name='Doubling Times for Deaths (Kalman Filter Pred)')
    
    allTraces.append(allDT_CasesData)
    allTraces.append(allDT_CasesFilteredData)
    allTraces.append(DT_CasesPredsData)
    allTraces.append(allDT_DeathsData)
    allTraces.append(allDT_DeathsFilteredData)
    allTraces.append(DT_DeathsPredsData)
    
# Update layout and show the plot
allDTsLayout = go.Layout(title='ALL Doubling Times in Selected State', 
                         xaxis_title='Date', yaxis_title='Doubling Time',
                         width=800, height=600)
allDTsPlot = go.Figure(data=allTraces, layout=allDTsLayout)

allDTsPlot.update_layout(
    updatemenus=[
        dict(
            active=-1,
            #buttondefaults=makeDTPlotButtons()[0],
            buttons=makePlotButtons(allStateNames, 6),
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        )
    ]
)
    
allDTsPlot.show()

In [None]:
### Debug Console (do not run when not needed)
#data = filteredDTs
stateName = 'Alaska'
caseData = stateCaseData[np.where(usStateData.state == stateName)]
allDTs = allDoublingTimes(caseData)
    
filteredDTs = nDayAverage(allDTs, 5)
data = filteredDTs

# Find global max (right before things went bad)
if ((np.diff(data[-45:]) > -0.01).all()):  # This is required for good states where DT is continuously rising (at least for the last 30 days)
    maximas = data.shape[0] - 1
else:
    maximas = find_peaks(data)[0]
    #maximas = argrelextrema(data, np.greater_equal)
dataAtMaximas = data[maximas]
globalMaxIdx = np.where(data == np.amax(dataAtMaximas))[0][0]
globalMax = np.mean(data[globalMaxIdx])                        # To account for two of the same maxs found

# Then find minima on the data that comes after the maxima
clippedData = data[globalMaxIdx:]
minimas = argrelextrema(clippedData, np.less_equal)[0]
latestMinIdxClipped = np.where(clippedData == np.amin(clippedData[minimas]))[0][0]
latestMinIdx = globalMaxIdx + latestMinIdxClipped
latestMin = np.mean(clippedData[latestMinIdxClipped])

# Find avg slope for increase
avgIncPerDay = 0
if (latestMinIdx != data.shape[0] and data[-1] - data[latestMinIdx] != 0):
    avgIncPerDay = (data[-1] - data[latestMinIdx]) / (data.shape[0] - latestMinIdx)
    incReqForRecovery = globalMax - data[-1]
    timeForRecovery = incReqForRecovery / avgIncPerDay                                         # Make sure to take into account of today's DT
if (latestMinIdx == data.shape[0] or avgIncPerDay < 0.1):
    timeForRecovery = 365
if (data[-1] - data[latestMinIdx] == 0):
    timeForRecovery = 365
if (globalMax == latestMin):
    timeForRecovery = -1
    
plt.plot(data)
plt.plot(globalMaxIdx,globalMax,'ko')
plt.plot(latestMinIdx,latestMin,'ro')
plt.title('Doubling Times of Cases in ' + stateName)
plt.xlabel('Days since Data Collection')
plt.ylabel ('Doubling Times')
plt.legend(['Data','Doubling Time Peak','Worst Doubling Time Since Peak'])

print(globalMax,latestMin,avgIncPerDay,timeForRecovery)

<font size="5"><b>Find the Percentage of Positive Tests for each state (latest available)</b></font>

In [None]:
def positivityRates(positive, negative, pending, latestFirst=True):  
    if (latestFirst):
        newPositiveCases = -np.diff(positive)
        newNegativeCases = -np.diff(negative)
        newPendingCases = -np.diff(pending)
    else:
        newPositiveCases = np.diff(positive)
        newNegativeCases = np.diff(negative)
        newPendingCases = np.diff(pending)
    
    # Remove any negative numbers
    newPositiveCases = np.clip(newPositiveCases, a_min=0, a_max=np.inf)
    newNegativeCases = np.clip(newNegativeCases, a_min=0, a_max=np.inf)
    newPendingCases = np.clip(newPendingCases, a_min=0, a_max=np.inf)
    
    positivityRates = np.zeros(newPositiveCases.shape[0],)
    
    totalNewCases = newPositiveCases + newNegativeCases + newPendingCases
    
    positivityRates = np.divide(newPositiveCases, totalNewCases) * 100
    
    #print(newPositiveCases,totalNewCases)
    
#     if (not totalNewCases.any()):
#         positivityRates = np.divide(newPositiveCases, totalNewCases) * 100
#     else:
#         positivityRates = np.divide(newPositiveCases, totalNewCases + 1) * 100
    
    # if any div by 0 errors
    positivityRates[np.isnan(positivityRates)] = 0
    positivityRates[positivityRates == np.inf] = 0
    
    return positivityRates

<font size="4"><b>Time Series of Positive Tests and Positivity Rate for Selected State</b></font>

In [None]:
from plotly.subplots import make_subplots

allStateNames = np.array(usStateData.state)
allStateNames = np.unique(allStateNames)
allStateNamesAbbrev = fullStateNameToAbbrev(allStateNames)

# Update dates in data
usPositivityData.date = pd.to_datetime(usPositivityData.date)

allTraces = []
fig = make_subplots(specs=[[{"secondary_y": True}]])
for state in tqdm(allStateNamesAbbrev):
    dates = usPositivityData.loc[usPositivityData['state'] == state].date

    posCasesForState = usPositivityData.loc[usPositivityData['state'] == state].tests_viral_positive
    negCasesForState = usPositivityData.loc[usPositivityData['state'] == state].tests_viral_negative

    newPositiveCases = np.diff(posCasesForState)
    newNegativeCases = np.diff(negCasesForState)

    newPositiveCases = np.clip(newPositiveCases, a_min=0, a_max=np.inf)
    newNegativeCases = np.clip(newNegativeCases, a_min=0, a_max=np.inf)

    totalNewCases = newPositiveCases + newNegativeCases

    # posRates = np.divide(newPositiveCases, totalNewCases) * 100
    # posRates[np.isnan(posRates)] = 0
    # posRates[posRates == np.inf] = 0
    # posRates[posRates == 100] = 0            # remove 100 % pos

    posRates = positivityRates(posCasesForState, negCasesForState, np.zeros(posCasesForState.shape[0]), latestFirst=False)
    #posRates[posRates == 100] = 0            # remove 100 % pos

    fig.add_trace(go.Bar(x=dates[1:], y=newPositiveCases, name='Pos Cases'), secondary_y=False)
    fig.add_trace(go.Bar(x=dates[1:], y=totalNewCases, name='Total Cases'), secondary_y=False)
    fig.add_trace(go.Scatter(x=dates[1:], y=posRates, name='Positivity Rates'), secondary_y=True)
    fig.add_trace(go.Scatter(x=dates[::-1][1:], y=nDayAverage(posRates[::-1],7), name='7 Day Avg'), secondary_y=True)
    #fig.add_trace(go.Scatter(x=dates[1:], y=moving_averages, name='7 Day Avg v2'), secondary_y=True)

posRatesLayout = go.Layout(title='Test Plot')

#fig = go.Figure(data=[data1,data2,avg], layout=posRatesLayout)
fig.update_layout(barmode='stack', 
                  title='Positive Tests and Positivity Rate for Selected State',
                  updatemenus=[
                      dict(
                        active=-1,
                        #buttondefaults=makeDTPlotButtons()[0],
                        buttons=makePlotButtons(allStateNames, 4),
                        direction="down",
                        pad={"r": 10, "t": 10},
                        showactive=True,
                        x=1,
                        xanchor="left",
                        y=1.25,
                        yanchor="top"
                    )
                ])
fig.update_yaxes(title_text="Number of Tests", secondary_y=False)
fig.update_yaxes(title_text="Positivity Rates", secondary_y=True)

# Show the plot
fig.show()

In [None]:
### Change date needed
# Get today's data for covidTrackingDaily data table
# todayDate = '2020-09-14'
todayDate = yesterday.strftime("%Y-%m-$d")
dates = np.array(usPositivityData.date)

# Get state names
states = np.array(usPositivityData.state)
states = np.unique(states)
numStates = states.shape[0]

percentPositive = np.zeros(numStates,)

for i in tqdm(range(numStates)):
    posCasesForState = usPositivityData.loc[usPositivityData['state'] == states[i]].tests_viral_positive
    negCasesForState = usPositivityData.loc[usPositivityData['state'] == states[i]].tests_viral_negative
    
    posRates = positivityRates(posCasesForState, negCasesForState, np.zeros(posCasesForState.shape[0],), latestFirst=False)
    posRates[posRates == 100] = 0                                                          # Remove 100%s
    percentPositive[i] = np.mean(posRates[-8:])

# Concatenate the data
d = {'State': states.tolist(), 'PercentPos': percentPositive.tolist()}
percentPositiveData = pd.DataFrame(data=d)

fullStateNames = []
for i in range(numStates):
    fullName = eval('us.states.' + states[i] + '.name')
    fullStateNames.append(fullName)
    
percentPositiveData.State = fullStateNames
    

# Chorpleth plot for % positive cases
choroplethPerPosData = go.Choropleth(
    locations=fullStateNameToAbbrev(percentPositiveData.State),
    locationmode='USA-states',
    z = np.round(percentPositiveData.PercentPos, 2),
    zmin = 0,
    zmax = 100,
    #colorscale=["green","yellow","red"],
    colorscale=[[0,'green'], [0.05,'green'], [0.06,'yellow'], [0.1,'yellow'], [0.11,'red'], [1,'red']],
    autocolorscale=False,
    text='% Positive', 
    marker_line_color='grey',
    colorbar_title="Percent Positive"
)
choroplethPerPos = go.Figure(data=choroplethPerPosData)

choroplethPerPos.update_layout(
    title_text='Latest Percent Positive Tests per US State (7 Day Average)',
    geo = dict(
        scope='usa',
        projection=go.layout.geo.Projection(type = 'albers usa'),
        showlakes=True,
        lakecolor='rgb(255, 255, 255)'),
)

choroplethPerPos.show()

# Plot % positive cases SORTED with colors
percentPositiveData_Sorted = percentPositiveData.sort_values(by=['PercentPos'], ascending=True)
percentPositiveData_Sorted_Int = percentPositiveData_Sorted.PercentPos.astype(int)

# Bin sorting
colors = np.full((percentPositiveData_Sorted.PercentPos.shape[0],), 'blue')
for i in range (0,6):
    colors = np.where(percentPositiveData_Sorted_Int == i, 'green', colors)
for i in range (5,11):
    colors = np.where(percentPositiveData_Sorted_Int == i, 'yellow', colors)
colors = np.where(percentPositiveData_Sorted.PercentPos > 10, 'red', colors) 

perPosDataSorted = go.Bar(x=percentPositiveData_Sorted.PercentPos, y=percentPositiveData_Sorted.State, orientation='h', 
                          marker=dict(color=colors))
perPosDataSortedLayout = go.Layout(title='% Positive Tests (7 Day Average) for all States - SORTED', 
                                    xaxis_title='% Positive Tests', yaxis_title='State Name',
                                    width=800, height=1100)
perPosSortedPlot = go.Figure(data=perPosDataSorted, layout=perPosDataSortedLayout)
perPosSortedPlot.add_shape(
    dict(
        type="line",
        x0=5,
        y0=-1,
        x1=5,
        y1=56,
        line=dict(
            color="Black",
            width=3
        )
))
perPosSortedPlot.add_shape(
    dict(
        type="line",
        x0=10,
        y0=-1,
        x1=10,
        y1=56,
        line=dict(
            color="Black",
            width=3
        )
))
perPosSortedPlot.show()

<font size="5"><b>US Vaccination Data</b></font>

In [None]:
uniqueStates = np.array(populationData.State)
uniqueStatesAbbrev = np.array(fullStateNameToAbbrev(uniqueStates))

# Set up empty dataframe for plot
blankSpace = np.zeros((uniqueStatesAbbrev.shape[0],))
d = {'State': uniqueStates, 'Total_Doses_Admin_Per100': blankSpace, 'People_Vaccinated_Per100': blankSpace,
     'People_Fully_Vaccinated_Per100': blankSpace, 'New_Vaccinations_Per100': blankSpace}
choroplethVaccinationData = pd.DataFrame(data=d)

idx = 0
for state in tqdm(uniqueStates):
    stateVaccinationData = usVaccinationData.loc[usVaccinationData['location'] == state]
    stateVaccinationDataToday = stateVaccinationData.loc[stateVaccinationData.date == yesterday]

    # .max() added since some states just report numbers on one day (and don't show cumulative)
    totalPer100 = stateVaccinationData['total_vaccinations_per_hundred'].max()
    peoplePer100 = stateVaccinationData['people_vaccinated_per_hundred'].max()
    fullPer100 = stateVaccinationData['people_fully_vaccinated_per_hundred'].max()
    newPer100 = stateVaccinationData['daily_vaccinations_per_million'].max() / 10000

    # Add data to choropleth dataframe
    choroplethVaccinationData.at[idx, 'Total_Doses_Admin_Per100'] = np.round(totalPer100, 3)
    choroplethVaccinationData.at[idx, 'People_Vaccinated_Per100'] = np.round(peoplePer100, 3)
    choroplethVaccinationData.at[idx, 'People_Fully_Vaccinated_Per100'] = np.round(fullPer100 , 3)
    choroplethVaccinationData.at[idx, 'New_Vaccinations_Per100'] = np.round(newPer100, 3)
    
    idx += 1

# Create choropleth plots for each option
opts = ['Total Doses Per 100','People Per 100','Fully Vaccinated Per 100','New Per 100']
choroplethTotalPer100 = go.Choropleth(
    name=opts[0],
    locations=uniqueStatesAbbrev,
    locationmode='USA-states',
    z = choroplethVaccinationData.Total_Doses_Admin_Per100,
    colorscale=["red","yellow","green"],
    autocolorscale=False,
    marker_line_color='grey',
)
choroplethPeoplePer100 = go.Choropleth(
    name=opts[1],
    locations=uniqueStatesAbbrev,
    locationmode='USA-states',
    z = choroplethVaccinationData.People_Vaccinated_Per100,
    colorscale=["red","yellow","green"],
    autocolorscale=False,
    marker_line_color='grey',
    colorbar_title="% of Population Vaccinated"
)
choroplethFullyVac = go.Choropleth(
    name=opts[2],
    locations=uniqueStatesAbbrev,
    locationmode='USA-states',
    z = choroplethVaccinationData.People_Fully_Vaccinated_Per100,
    colorscale=["red","yellow","green"],
    autocolorscale=False,
    marker_line_color='grey',
    colorbar_title="% of Population Vaccinated"
)
choroplethNewPer100 = go.Choropleth(
    name=opts[3],
    locations=uniqueStatesAbbrev,
    locationmode='USA-states',
    z = choroplethVaccinationData.New_Vaccinations_Per100,
    colorscale=["red","yellow","green"],
    autocolorscale=False,
    marker_line_color='grey',
    colorbar_title="% of Population Vaccinated"
)

vaccinationPlot = go.Figure(data=[choroplethTotalPer100, choroplethPeoplePer100, choroplethFullyVac, choroplethNewPer100])

vaccinationPlot.update_layout(
    title_text='Vaccination Data per US State',
    geo = dict(
        scope='usa',
        projection=go.layout.geo.Projection(type = 'albers usa'),
        showlakes=True,
        lakecolor='rgb(255, 255, 255)'),
    updatemenus=[
        dict(
            active=-1,
            #buttondefaults=makeDTPlotButtons()[0],
            buttons=makePlotButtons(opts),
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0,
            xanchor="left",
            y=1.1,
            yanchor="top"
        )
    ]
)

vaccinationPlot.show()

<font size="5"><b>Kalman Filter for US Cases</b></font>

<font size="4">Two State Model</font>

In [None]:
numDays = usCases.shape[0]
twoState = True
numPreds = 7
kalFiltXs, kalFiltVs, kalFiltPs, xPreds, vPreds, pPreds, sigmaXs, sigmaVs = kalmanFilter(usCases, numDays, numPreds, twoState, q=100, r=10)

In [None]:
kalFiltCumulCasesData = go.Scatter(x=usData.date, y=kalFiltXs, name="Kalman Filter (Actual Data)")
actualCumulCasesData = go.Scatter(x=usData.date, y=usCases, name="Actual Data")

kalFiltCumulCasesPreds = go.Scatter(x=next7Days_str, y=xPreds, name="Kalman Filter (Predictions)", 
                                    error_y = dict(
                                    type = 'data', # value of error bar given in data coordinates
                                    array = sigmaXs[numDays:],
                                    visible = True)
                                   )

kalFiltAndActualLayout = go.Layout(title='US Cases over Time (with 7-day Predictions)', xaxis_title='Date', yaxis_title='# of Cases')
kalFiltAndActualPlot = go.Figure(data=[kalFiltCumulCasesData,actualCumulCasesData,kalFiltCumulCasesPreds], layout=kalFiltAndActualLayout)

kalFiltAndActualPlot.show()

In [None]:
kalFiltNewCasesData = go.Scatter(x=usData.date[1:], y=kalFiltVs, name="Kalman Filter (Actual Data)")
actualNewCasesData = go.Scatter(x=usData.date[1:], y=np.diff(usCases), name="Actual Data")
kalFiltNewCasesPreds = go.Scatter(x=next7Days_str, y=vPreds, name="Kalman Filter (Predictions)",
                                    error_y = dict(
                                    type = 'data', # value of error bar given in data coordinates
                                    array = sigmaVs[numDays:],
                                    visible = True)
                                 )

kalFiltAndActualLayout = go.Layout(title='New US Cases over Time (with 7-day Predictions)', xaxis_title='Date', yaxis_title='# of New Cases')
kalFiltAndActualPlot = go.Figure(data=[kalFiltNewCasesData,actualNewCasesData,kalFiltNewCasesPreds], layout=kalFiltAndActualLayout)

kalFiltAndActualPlot.show()

<font size="4">Three State Model</font>

In [None]:
# Three state model
numDays = usCases.shape[0]
numPreds = 7
twoState = False

kalFiltXs, kalFiltVs, kalFiltAs, kalFiltPs, xPreds, vPreds, aPreds, pPreds, sigmaXs, sigmaVs, sigmaAs = kalmanFilter(usCases, numDays, numPreds, twoState, q=0.75, r=0.1)

In [None]:
kalFiltCumulCasesData = go.Scatter(x=usData.date, y=kalFiltXs, name="Kalman Filter (Actual Data)")
actualCumulCasesData = go.Scatter(x=usData.date, y=usCases, name="Actual Data")

kalFiltCumulCasesPreds = go.Scatter(x=next7Days_str, y=xPreds, name="Kalman Filter (Predictions)", 
                                    error_y = dict(
                                    type = 'data', # value of error bar given in data coordinates
                                    array = sigmaXs[numDays:],
                                    visible = True)
                                   )

kalFiltAndActualLayout = go.Layout(title='US Cases over Time (with 7-day Predictions)', xaxis_title='Date', yaxis_title='# of Cases')
kalFiltAndActualPlot = go.Figure(data=[kalFiltCumulCasesData,actualCumulCasesData,kalFiltCumulCasesPreds], layout=kalFiltAndActualLayout)

kalFiltAndActualPlot.show()

In [None]:
kalFiltNewCasesData = go.Scatter(x=usData.date[1:], y=kalFiltVs, name="Kalman Filter (Actual Data)")
actualNewCasesData = go.Scatter(x=usData.date[1:], y=np.diff(usCases), name="Actual Data")
kalFiltNewCasesPreds = go.Scatter(x=next7Days_str, y=vPreds, name="Kalman Filter (Predictions)",
                                    error_y = dict(
                                    type = 'data', # value of error bar given in data coordinates
                                    array = sigmaVs[numDays:],
                                    visible = True)
                                 )

kalFiltAndActualLayout = go.Layout(title='New US Cases over Time (with 7-day Predictions)', xaxis_title='Date', yaxis_title='# of New Cases')
kalFiltAndActualPlot = go.Figure(data=[kalFiltNewCasesData,actualNewCasesData,kalFiltNewCasesPreds], layout=kalFiltAndActualLayout)

kalFiltAndActualPlot.show()

In [None]:
deltaDailyCases = np.diff(np.diff(usCases))

kalFiltNewCasesData = go.Scatter(x=usData.date[2:], y=kalFiltAs, name="Kalman Filter (Actual Data)")
actualNewCasesData = go.Scatter(x=usData.date[2:], y=deltaDailyCases, name="Actual Data")
actualNewCasesFilteredData =go.Scatter(x=usData.date[2:], y=nDayAverage(deltaDailyCases, 7), name="Actual Data (7 Day Average)")
kalFiltNewCasesPreds = go.Scatter(x=next7Days_str, y=aPreds, name="Kalman Filter (Predictions)",
                                    error_y = dict(
                                    type = 'data', # value of error bar given in data coordinates
                                    array = sigmaAs[numDays:],
                                    visible = True)
                                 )

kalFiltAndActualLayout = go.Layout(title='Change in Daily US Cases over Time (with 7-day Predictions)', xaxis_title='Date', yaxis_title='Change in Daily Cases')
kalFiltAndActualPlot = go.Figure(data=[kalFiltNewCasesData,actualNewCasesData,actualNewCasesFilteredData,kalFiltNewCasesPreds], layout=kalFiltAndActualLayout)

kalFiltAndActualPlot.show()

<font size="5"><b>New Cases/Deaths per 100k per US State (Geographical Plot)</b></font>

In [None]:
# ADDED DROPDOWN MENU code
allStateNames = np.array(usStateData.state)
allStateNames = np.unique(allStateNames)
numStates = allStateNames.shape[0]

populationData = pd.read_csv('../input/population-data/Population_v2.csv')
populationData = np.array(populationData)

avgNewStateCasesPer100k = np.zeros(numStates,)
avgNewStateDeathsPer100k = np.zeros(numStates,)

allStateNames_abbrev = []
for i in tqdm(range(numStates)):
    allStateCases = np.array(usStateData.loc[usStateData.state == allStateNames[i]].cases)
    allStateDeaths = np.array(usStateData.loc[usStateData.state == allStateNames[i]].deaths)
    
    # Get population data
    stateRow = populationData[np.where(populationData == allStateNames[i])[0]]
    statePop = stateRow[0,1]
    
    allNewCases = np.diff(allStateCases)
    allNewDeaths = np.diff(allStateDeaths)
    
    avgNewStateCasesPer100k[i] = np.mean(allNewCases[-7:]) / (statePop / 100000)
    avgNewStateDeathsPer100k[i] = np.mean(allNewDeaths[-7:]) / (statePop / 100000)
    
    allStateNames_abbrev.append(eval("us.states.lookup('" + str(allStateNames[i]) + "').abbr"))
        
    # Get population data
    stateRow = populationData[np.where(populationData == allStateNames[i])[0]]
    statePop = stateRow[0,1]
    
maxNewCasesPer100k = np.amax(avgNewStateCasesPer100k)
maxNewDeathsPer100k = np.amax(avgNewStateDeathsPer100k)
    
# Create choropleth plots for cases
choroplethNewCases = go.Choropleth(
    name='New Cases Per 100k',
    locations=allStateNames_abbrev,
    locationmode='USA-states',
    z = np.round(avgNewStateCasesPer100k, 3),
    zmin = 0,
    zmax = maxNewCasesPer100k,
    colorscale=[[0,'green'], [min(1,3/maxNewCasesPer100k),'green'], [min(1,5/maxNewCasesPer100k),'yellow'], [min(1,10/maxNewCasesPer100k),'yellow'], 
                [min(1,31/maxNewCasesPer100k),'red'], [1,'red']],
    autocolorscale=False,
    text='New Cases per 100k', 
    marker_line_color='grey',
    colorbar_title="New Cases Per 100k (7 Day Average)"
)
choroplethNewDeaths = go.Choropleth(
    name='New Deaths Per 100k',
    locations=allStateNames_abbrev,
    locationmode='USA-states',
    z = np.round(avgNewStateDeathsPer100k, 3),
    colorscale=["green","yellow","red"],
    autocolorscale=False,
    text='New Deaths per 100k', 
    marker_line_color='grey',
    colorbar_title="New Deaths per 100k (7 Day Average)"
)

newCasesDeathsPlot = go.Figure(data=[choroplethNewCases,choroplethNewDeaths])

opts = ['New Cases Per 100k','New Deaths Per 100k']
newCasesDeathsPlot.update_layout(
    title_text='New Cases/Deaths per 100k per State (7 Day Average)',
    geo = dict(
        scope='usa',
        projection=go.layout.geo.Projection(type = 'albers usa'),
        showlakes=True,
        lakecolor='rgb(255, 255, 255)'
    ),
    updatemenus=[
    dict(
        active=-1,
        #buttondefaults=makeDTPlotButtons()[0],
        buttons=makePlotButtons(opts, 1),
        direction="down",
        pad={"r": 10, "t": 10},
        showactive=True,
        x=0,
        xanchor="left",
        y=1.1,
        yanchor="top"
    )
    ]
)

newCasesDeathsPlot.show()

<font size="5"><b>Hospitalization Metrics per US State (last update on 3/07)</b></font>

In [None]:
# Get key hospitalization data
# Hospitalized Currently
# In ICU Currently
# On Ventilator Currently
# covidTrackingDaily.date = pd.to_datetime(covidTrackingDaily.date)
covidTrackingToday = covidTrackingDaily.loc[covidTrackingDaily.date == 20210307]

hosFields = {'State': covidTrackingToday.state, 'inHospital': covidTrackingToday.hospitalizedCurrently,
             'inICU': covidTrackingToday.inIcuCurrently, 'onVentilator': covidTrackingToday.onVentilatorCurrently}
hosData = pd.DataFrame(data=hosFields)

# Remove states with 0 people (due to poor data reporting)
hosDataHospitalized = hosData[hosData.inHospital != 0][['State','inHospital']]
hosDataICU          = hosData[hosData.inICU != 0][['State','inICU']]
hosDataVentilator   = hosData[hosData.onVentilator != 0][['State','onVentilator']]

# Create choropleths for each field
opts = ['Hospitalized', 'In ICU', 'On Ventilator']
choroplethHos = go.Choropleth(
    name=opts[0],
    locations=hosDataHospitalized.State,
    locationmode='USA-states',
    z = hosDataHospitalized.inHospital,
    colorscale=["green","yellow","red"],
    autocolorscale=False,
    marker_line_color='grey',
    colorbar_title="People in Hospital"
)
choroplethICU = go.Choropleth(
    name=opts[1],
    locations=hosDataICU.State,
    locationmode='USA-states',
    z = hosDataICU.inICU,
    colorscale=["green","yellow","red"],
    autocolorscale=False,
    marker_line_color='grey',
    colorbar_title="People in Hospital"
)
choroplethVent = go.Choropleth(
    name=opts[2],
    locations=hosDataVentilator.State,
    locationmode='USA-states',
    z = hosDataVentilator.onVentilator,
    colorscale=["green","yellow","red"],
    autocolorscale=False,
    marker_line_color='grey',
    colorbar_title="People in Hospital"
)


# Create figure and set up layout
hosPlot = go.Figure(data=[choroplethHos,choroplethICU,choroplethVent])

hosPlot.update_layout(
    title_text='Hospitalization Metrics per US State',
    geo = dict(
        scope='usa',
        projection=go.layout.geo.Projection(type = 'albers usa'),
        showlakes=True,
        lakecolor='rgb(255, 255, 255)'),
    updatemenus=[
        dict(
            active=-1,
            buttons=makePlotButtons(opts, 1),
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0,
            xanchor="left",
            y=1.1,
            yanchor="top"
        )
    ]
)

# Show plot
hosPlot.show()

<font size="5"><b>Current Metrics per US County (Geographical Plot)</b></font>

In [None]:
arr = np.empty((5,), dtype=str)
arr[0] = 'CT'

In [None]:
# Obtain unique FIPS codes
uniqueFIPS = np.unique(usCountyData['fips'])
uniqueFIPS = uniqueFIPS[~np.isnan(uniqueFIPS)]
length = uniqueFIPS.shape[0]

countyCases = np.zeros(length,)
countyDeaths = np.zeros(length,)

allStates = np.empty((length,), dtype=str)
allCounties = np.empty((length,), dtype=str)
fullNames = np.empty((length,), dtype=str)

# Load in US County Population Data
usCountyPopData = pd.read_csv('../input/populationdatauscounties/usCountyPopulationData.csv')

for i in tqdm(range(length)):
    allCasesForCounty = np.array(usCountyData.loc[usCountyData['fips'] == uniqueFIPS[i]].cases)
    allDeathsForCounty = np.array(usCountyData.loc[usCountyData['fips'] == uniqueFIPS[i]].deaths)
    
    countyCases[i] = allCasesForCounty[-1]
    countyDeaths[i] = allDeathsForCounty[-1]
    
    stateName = np.array(usCountyData.loc[usCountyData['fips'] == uniqueFIPS[i]].state)[0]
    countyName = np.array(usCountyData.loc[usCountyData['fips'] == uniqueFIPS[i]].county)[0]
    fullName = countyName + ", " + stateName
    
    allStates[i] = stateName
    allCounties[i] = countyName
    fullNames[i] = fullName

# Fix FIPS codes to length 5 strings
int_fips = [int(uniqueFIPS) for uniqueFIPS in uniqueFIPS]
form_fips = ["%05d" % uniqueFIPS for uniqueFIPS in uniqueFIPS]   
numEntries = len(fullNames)
blankTableSpace = np.zeros(numEntries,).tolist()

d = {'State': allStates, 'County': allCounties, 'FullName': fullNames, 'FIPS': form_fips, 'Population': blankTableSpace, 
     'Cases': countyCases.tolist(), 'NewCasesPer100k': blankTableSpace, 'DT_Cases': blankTableSpace,
     'Deaths': countyDeaths.tolist(), 'NewDeathsPer100k': blankTableSpace,'DT_Deaths': blankTableSpace, 
     'SeverityIndex': blankTableSpace}
choroplethData = pd.DataFrame(data=d)    
    
maxDT_Cases = np.amax(choroplethData.DT_Cases)
        
# Compute doubling times for cases and deaths per county
stateNames = np.unique(usCountyData.state)
numStates = stateNames.shape[0]

for i in tqdm(range(numStates)):
    stateData = usCountyData.loc[usCountyData.state == stateNames[i]]
    stateCounties = np.unique(stateData.county)
    stateCounties = np.delete(stateCounties, np.where(stateCounties == 'Unknown'))           # Don't add unknown county to county list
    
    for j in range(stateCounties.shape[0]):
        # Get ALL county data
        countyData = stateData.loc[stateData.county == stateCounties[j]]
        
        # Get cases and deaths for the county
        countyCases = np.array(countyData.cases)
        countyDeaths = np.array(countyData.deaths)
        
        fullName = stateCounties[j] + ', ' + stateNames[i]
        
        # Normalize cases/deaths per 100k and obtain severity index
        if (usCountyPopData.loc[usCountyPopData.FullName == fullName].size != 0):
            countyPop = np.array(usCountyPopData.loc[usCountyPopData.FullName == fullName].Population)[0]
            countyCasesPer100k = countyCases[-1] / (countyPop / 100000)
            countyDeathsPer100k = countyDeaths[-1] / (countyPop / 100000)
            
            newCasesData = np.diff(countyCases)
            newCasesPer100k = (newCasesData / (countyPop / 100000)).astype(int)
            newCasesPer100kFiltered = nDayAverage(newCasesPer100k, 7)
            
            newDeathsData = np.diff(countyDeaths)
            newDeathsPer100k = (newDeathsData / (countyPop / 100000)).astype(int)
            newDeathsPer100kFiltered = nDayAverage(newDeathsPer100k, 7)
            
            severity_index = np.mean(newCasesPer100kFiltered[-7:])
        
        try:
            county_Latest_DT_Cases = allDoublingTimes(countyCases)[-1]
            county_Latest_DT_Deaths = allDoublingTimes(countyDeaths)[-1]
        except:
            county_Latest_DT_Cases = 0
            county_Latest_DT_Deaths = 0
        
        whereToAddData = choroplethData.loc[choroplethData.FullName == fullName]
        
        if (whereToAddData.size != 0 and newCasesPer100kFiltered.shape[0] != 0):                        # Only write data if choroplethData has an entry for that county
            whereToAddDataIdx = whereToAddData.index[0]

            choroplethData.at[whereToAddDataIdx, 'Population'] = countyPop.astype(int)
            
            choroplethData.at[whereToAddDataIdx, 'DT_Cases'] = county_Latest_DT_Cases
            choroplethData.at[whereToAddDataIdx, 'DT_Deaths'] = county_Latest_DT_Deaths
            
            choroplethData.at[whereToAddDataIdx, 'NewCasesPer100k'] = np.round(newCasesPer100kFiltered[-1], 3)
            choroplethData.at[whereToAddDataIdx, 'NewDeathsPer100k'] = np.round(newDeathsPer100kFiltered[-1], 3)
            
            choroplethData.at[whereToAddDataIdx, 'SeverityIndex'] = severity_index

# Normalize severity index data before plotting
severityIdxs = np.array(choroplethData.SeverityIndex)
min_max_scaler = MinMaxScaler()
severityIdxsNorm = min_max_scaler.fit_transform(severityIdxs.reshape(-1,1))
choroplethData.at[0:len(blankTableSpace)+1, 'SeverityIndex'] = np.round(severityIdxsNorm, 2)

# Load in county geometry
import urllib
counties = urllib.request.urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json').read().decode('utf-8')

maxDT_Cases = np.amax(np.array(choroplethData.DT_Cases))

# # Get colors for DT colors
# DT_Cases_Int = (choroplethData.DT_Cases).astype(int)

# colors = np.full((choroplethData.DT_Cases.shape[0],), 'blue')
# for i in range (0,redHigh):
#     colors = np.where(DT_Cases_Int == i, 'red', colors)
# for i in range (yellowLow,yellowHigh + 1):
#     colors = np.where(DT_Cases_Int == i, 'yellow', colors)
# colors = np.where(DT_Cases_Int > greenLow, 'green', colors) 
    
# Create Choropleth Plot
# fig = px.choropleth(choroplethData, geojson=counties, locations='FIPS', color='DT_Cases',
#                     color_continuous_scale="RdYlGn",
#                     scope="usa",
#                     hover_name='FullName', hover_data=['Population','Cases','CasesPer100k','Deaths','DeathsPer100k','DT_Cases','DT_Deaths','SeverityIndex'],
#                     )
# fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
# fig.show()

c = choroplethData.astype(str)

text = '<b>' + c['FullName'] + '</b>' + '<br>' + \
    'Cases: ' + c['Cases'] + '<br>' + \
    'NewCasesPer100k: ' + c['NewCasesPer100k'] + '<br>' + \
    'DT_Cases: ' + c['DT_Cases'] + '<br>' + \
    'Deaths: ' + c['Deaths'] + '<br>' + \
    'NewDeathsPer100k: ' + c['NewDeathsPer100k'] + '<br>' + \
    'DT_Deaths: ' + c['DT_Deaths'] + '<br>' + \
    'Severity Index: ' + c['SeverityIndex']

choroplethCountyMetrics = go.Choropleth(
    locations=choroplethData.FIPS,
    geojson=counties,
    z = np.round(choroplethData.DT_Cases, 3),
    zmin = 0,
    zmax = maxDT_Cases,
    colorscale=[[0,'red'], [redHigh/maxDT_Cases,'red'], [yellowLow/maxDT_Cases,'yellow'], [yellowHigh/maxDT_Cases,'yellow'], [greenLow/maxDT_Cases,'green'], [1,'green']],
    autocolorscale=False,
    text=text, 
    marker_line_color='black',
    colorbar_title="Doubling Time for Cases",
)
countyMetricsPlot = go.Figure(data=choroplethCountyMetrics)

countyMetricsPlot.update_layout(
    title_text='US County Metrics',
    geo = dict(
        scope='usa',
        projection=go.layout.geo.Projection(type = 'albers usa'),
        showlakes=True,
        lakecolor='rgb(255, 255, 255)'),
)

countyMetricsPlot.show()

<font size="5"><b>New Cases Per 100k and Positivity Rates for Each State (Time Series)</b></font>

In [None]:
def stabilityIndices(filteredNewDataPer100k):
    length = filteredNewDataPer100k.shape[0]
    
    stabilityIndices = np.zeros(length,)
    for i in range(length - 1,1,-1):
        if (filteredNewDataPer100k[i - 1] > 0):
            stabilityIndices[i] = filteredNewDataPer100k[i] / filteredNewDataPer100k[i - 1]
    
    return stabilityIndices

In [None]:
### Check whether today is Tuesday
# date.today().isoweekday() == 3

In [None]:
### Change dates needed: Update every Wednesday (09/09; 09/16; ...)
# Obtain case/death data for ALL days for ALL 50 states
allStateNames = np.array(usStateData.state)
allStateNames = np.unique(allStateNames)

allDates = np.array(usStateData.date)

stateCaseData = np.array(usStateData.cases)
stateDeathData = np.array(usStateData.deaths)

numStates = allStateNames.shape[0]

# Get population data
populationData = pd.read_csv('../input/population-data/Population_v2.csv')
populationData = np.array(populationData)

# Get next14Days as string
numPreds = 14
next14Days_str = []
date = datetime.datetime(2020,12,22)              # Latest date available in data
for i in range(numPreds): 
    next14Days_str.append(date.strftime('%Y-%m-%d'))
    date += datetime.timedelta(days=1)

# Create starter table for predicted CT Travel Ban states
blankTableSpace = np.zeros(numStates,).tolist()
d = {'State': allStateNames, 'NewCasesPer100k': blankTableSpace, 'StdDev_Cases': blankTableSpace, 'PositivityRate': blankTableSpace}
CT_TravelBan_12_22 = pd.DataFrame(data=d)
CT_TravelBan_12_29 = pd.DataFrame(data=d)
    
severity_index = np.zeros(numStates,)     # For cases
allTraces = []
for i in tqdm(range(numStates)):   
    if (populationData[np.where(populationData == allStateNames[i])[0]].shape[0] != 0):
        
        # Check if population data is available
        stateRow = populationData[np.where(populationData == allStateNames[i])[0]]
        statePop = stateRow[0,1]
    
        stateData = usStateData.loc[usStateData.state == allStateNames[i]]
        
        caseData = np.array(stateData.cases)
        dates = stateData.date
        
        newCasesData = np.diff(caseData)
        newCasesPer100k = (newCasesData / (statePop / 100000)).astype(int)
        newCasesPer100kFiltered = nDayAverage(newCasesPer100k, 7)
        severity_index[i] = np.mean(newCasesPer100kFiltered[-7:])
        
        deathData = stateDeathData[np.where(usStateData.state == allStateNames[i])]
        newDeathsData = np.diff(deathData)
        newDeathsPer100k = (newDeathsData / (statePop / 100000)).astype(int)
        newDeathsPer100kFiltered = nDayAverage(newDeathsPer100k, 7)
    
        dates = allDates[np.where(usStateData.state == allStateNames[i])]
        #datesNum = np.arange(1,dates.shape[0])
        
        # Calculate positivity rate
        dates_covidtracking = covidTrackingDaily.loc[covidTrackingDaily['state'] == fullStateNameToAbbrev(np.array([allStateNames[i]]))[0]].date
        posCasesForState = covidTrackingDaily.loc[covidTrackingDaily['state'] == fullStateNameToAbbrev(np.array([allStateNames[i]]))[0]].positive
        negCasesForState = covidTrackingDaily.loc[covidTrackingDaily['state'] == fullStateNameToAbbrev(np.array([allStateNames[i]]))[0]].negative
        pendingCasesForState = covidTrackingDaily.loc[covidTrackingDaily['state'] == fullStateNameToAbbrev(np.array([allStateNames[i]]))[0]].pending
        posRates = positivityRates(posCasesForState, negCasesForState, pendingCasesForState)

        
        # Run 14 day Kalman Predictions
        #dates = dates[:np.where(dates == '2020-06-24')[0][0] + 1]                  
        numDays = dates.shape[0] - 1
        twoState = True
        kalFilt_NewCPer100k, kalFilt_DeltaCPer100k, kalFiltPs, NewCPer100kPreds, vPreds, pPreds, sigmaNewCases, sigmaVs = kalmanFilter(newCasesPer100kFiltered, numDays, numPreds, twoState, q=0.05, r=0.1)
        newCasesPer100kPreds = go.Scatter(x=next14Days_str, y=NewCPer100kPreds, name="Predicted New Cases Per 100k (14 Days)",
                                        error_y = dict(
                                        type = 'data', # value of error bar given in data coordinates
                                        array = sigmaXs[numDays:],
                                        visible = False))
        kalFilt_PosRate, kalFilt_DeltaPosRates, kalFiltPs, PosRatePreds, vPreds, pPreds, sigmaPosRates, sigmaVs = kalmanFilter(posRates[::-1], posRates.shape[0], numPreds, twoState, q=1, r=1)
        posRatePreds = np.clip(PosRatePreds, a_min=0, a_max=100)
        posRatePredsData = go.Scatter(x=next14Days_str, y=posRatePreds, name="Predicted Positivity Rates (14 Days)")
        
        # Add new cases per 100k on 12/01 and 12/08 to CT travel ban data table
        idx12_22 = np.where(np.array(next14Days_str) == '2020-12-22')[0][0]
        idx12_29 = np.where(np.array(next14Days_str) == '2020-12-29')[0][0]
        
        newCasesPer100kPred_12_22 = NewCPer100kPreds[idx12_22]
        newCasesPer100kPred_12_29 = NewCPer100kPreds[idx12_29]
        
        predPosRate_12_22 = posRatePreds[idx12_22]
        predPosRate_12_29 = posRatePreds[idx12_29]
        
        CT_TravelBan_12_22.at[i, 'NewCasesPer100k'] = newCasesPer100kPred_12_22
        CT_TravelBan_12_22.at[i, 'StdDev_Cases'] = sigmaNewCases[np.where(sigmaNewCases == 0)[0][-1] + idx12_22]
        CT_TravelBan_12_22.at[i, 'PositivityRate'] = predPosRate_12_22
        
        CT_TravelBan_12_29.at[i, 'NewCasesPer100k'] = newCasesPer100kPred_12_29
        CT_TravelBan_12_29.at[i, 'StdDev_Cases'] = sigmaNewCases[np.where(sigmaNewCases == 0)[0][-1] + idx12_29]
        CT_TravelBan_12_29.at[i, 'PositivityRate'] = predPosRate_12_29
        
        
        # Now create the interactive plots
        newCasesPer100kData = go.Scatter(x=dates, y=newCasesPer100k, name="New Cases Per 100k")
        newDeathsPer100kData = go.Scatter(x=dates, y=newDeathsPer100k, name="New Deaths Per 100k")
        newCasesPer100kData_Filtered = go.Scatter(x=dates, y=newCasesPer100kFiltered, name="New Cases Per 100k (7 Day Average)")
        newDeathsPer100kData_Filtered = go.Scatter(x=dates, y=newDeathsPer100kFiltered, name="New Deaths Per 100k (7 Day Average)")
        stabilityIndicesCases = go.Scatter(x=dates, y=stabilityIndices(newCasesPer100kFiltered), name="Stability Indices")
        positivityRateData = go.Scatter(x=dates_covidtracking[1:], y=nDayAverage(posRates, 7), name="Positivity Rates (7 Day Average)")
        #tenNewCDPer100k = go.Scatter(x=np.concatenate((dates,next14Days_str)), y=np.ones(dates.shape[0] + numPreds)*10, name="10 New Cases/Deaths Per 100k Line")
        
        # Append plot traces
        allTraces.append(newCasesPer100kData)
        allTraces.append(newDeathsPer100kData)
        allTraces.append(newCasesPer100kData_Filtered)
        allTraces.append(newDeathsPer100kData_Filtered)
        allTraces.append(stabilityIndicesCases)
        allTraces.append(positivityRateData)

# Crate Plotly figure and layout object
per100kLayout = go.Layout(title='Metrics Per 100k in Selected State over Time', xaxis_title='Date', yaxis_title='Value',
                              updatemenus=[
                                dict(
                                    active=-1,
                                    #buttondefaults=makeDTPlotButtons()[0],
                                    buttons=makePlotButtons(allStateNames, 6),
                                    direction="down",
                                    pad={"r": 10, "t": 10},
                                    showactive=True,
                                    x=1,
                                    xanchor="left",
                                    y=1.25,
                                    yanchor="top"
                                )
                               ],
                          width=800, height=600)
per100kPlot = go.Figure(data=allTraces, layout=per100kLayout)     
per100kPlot.update_xaxes(range=['2020-01-01', '2021-04-11'])
        
# Show the plot
per100kPlot.show()

<font size="5"><b>CT Travel Ban List for 12/22</b></font>

In [None]:
### Change dates needed: every Tuesday
pd.set_option('precision', 2)
print(str(len(CT_TravelBan_12_22)) + ' states:')
CT_TravelBan_12_22[CT_TravelBan_12_22.NewCasesPer100k > 10]

<font size="5"><b>CT Travel Ban List for 12/29</b></font>

In [None]:
### Change dates needed: every Tuesday
pd.set_option('precision', 2)
print(str(len(CT_TravelBan_12_29)) + ' states:')
CT_TravelBan_12_29[CT_TravelBan_12_29.NewCasesPer100k > 10]

<font size="5"><b>US New Cases (all days)</b></font>

In [None]:
newCases = np.diff(usCases)
allDates = np.array(usData.date)[1:]
#allDates = np.arange(allDates.shape[0])
dates_len = allDates.shape[0]

# Set up low pass filter for newCases
a = 0.3
Ts = 1
num = [1-a]
den = [1,-a]
fLP = TransferFunction(num,den,dt=Ts)

#newCasesLPF = dlsim(fLP, newCases.tolist(), np.arange(0,13))[1]

# Set up n-day average for newCases
newCases7DayAvg = nDayAverage(newCases, 7)


allNewCaseData = go.Bar(x=allDates, y=newCases, name="Actual Data")
allNewCaseData7DayAvg = go.Scatter(x=allDates, y=newCases7DayAvg, name="7 Day Average")

allNewCasesLayout = go.Layout(title='New US Cases over Time', xaxis_title='Date', yaxis_title='# of New Cases')
allNewCasesPlot = go.Figure(data=[allNewCaseData,allNewCaseData7DayAvg], layout=allNewCasesLayout)

allNewCasesPlot.show()

<font size="5"><b>US New Cases over the last 14 days</b></font>

In [None]:
days_in_past = 14

days_in_future = 10
future_forcast = np.array([i for i in range(len(dates)+days_in_future)])
adjusted_dates = future_forcast[:-10]

dates_len = dates.shape[0]
dates_zoom = dates[dates_len - days_in_past - 1:]
adjusted_dates_zoom = adjusted_dates[dates_len - days_in_past:]

usCases_zoom = usCases[-15:]
newCases = np.diff(usCases_zoom)
dates = np.array(usData.date)[-days_in_past:]

# # Set up low pass filter for newCases
# a = 0.3
# Ts = 1
# num = [1-a]
# den = [1,-a]
# fLP = TransferFunction(num,den,dt=Ts)

# newCasesLPF = dlsim(fLP, newCases.tolist(), np.arange(0,14))[1]

# Set up n-day average for newCases
newCases7DayAvg = nDayAverage(newCases, 7)

# Create interactive plot
newCaseData = go.Bar(x=dates, y=newCases, name="Actual Data")
newCaseData7DayAvg = go.Scatter(x=dates, y=newCases7DayAvg, name="7 Day Average")

newCasesLayout = go.Layout(title='New US Cases over Time (last 14 days)', xaxis_title='Date', yaxis_title='# of New Cases')
newCasesPlot = go.Figure(data=[newCaseData,newCaseData7DayAvg], layout=newCasesLayout)

newCasesPlot.show()

<font size="5"><b>Simple linear regression on US cases (last 14 days)</b></font>

In [None]:
dates_len = np.array(usData.date).shape[0]
adjusted_dates_zoom = np.arange(dates_len - 15, dates_len - 1)
adjusted_dates_zoom_fix = adjusted_dates_zoom.reshape(-1,1)
usCases_zoom = usCases[-14:]
dates_zoom = np.array(usData.date)[-14:]

# Create linear regression object
regr = LinearRegression()

# Train linear regression model on case data
regr.fit(adjusted_dates_zoom_fix, usCases_zoom)

# Get case predictions from linear regression model
usCases_zoom_pred = regr.predict(adjusted_dates_zoom_fix)

# Get R^2 score
r_score = regr.score(adjusted_dates_zoom_fix, usCases_zoom)

# Print out equation
print('# of cases = ' + str(regr.coef_) + '*days_since_1_22_2020' + ' + ' + str(regr.intercept_))

usCasesZoomData = go.Scatter(x=dates_zoom, y=usCases_zoom, name="Actual Data")
usCasesZoomDataLinReg = go.Scatter(x=dates_zoom, y=usCases_zoom_pred, name="Linear Regression")

usCasesLinRegLayout = go.Layout(title='US Cases over Time (last 14 days) - R^2 = ' + str(np.round(r_score,3)), xaxis_title='Date', yaxis_title='# of Cases')
usCasesLinReg = go.Figure(data=[usCasesZoomData,usCasesZoomDataLinReg], layout=usCasesLinRegLayout)

usCasesLinReg.show()

<font size="5"><b>US Case Predictions for the next 7 Days</b></font>

In [None]:
### Change dates needed: to today's dates
start = adjusted_dates_zoom[-1] + 1
end = adjusted_dates_zoom[-1] + 8
next7Days = np.arange(start, end).reshape(-1,1)

predictions = regr.predict(next7Days)

# Get next7Days as string
next7Days_str = pd.date_range(start=(today + timedelta(days = 1)), end=(today + timedelta(days = 7)))
    
# Now plot
next7DaysData = go.Scatter(x=next7Days_str, y=predictions, name="Predictions")

next7DaysLayout = go.Layout(title='US Cases over Time (last 14 days) with 7 Day Prediction', xaxis_title='Date', yaxis_title='# of Cases')
next7DaysPlot = go.Figure(data=[usCasesZoomData,usCasesZoomDataLinReg,next7DaysData], layout=next7DaysLayout)

next7DaysPlot.show()


# Also print out the predictions
print('Date:' + '\t\t' + 'Cases:')
for i in range(7):
    print(str(next7Days_str[i].strftime("%Y-%m-%d")) + '\t' + str(int(predictions[i])))

<font size="5"><b>Demographics Data for US (last update on 3/07)</b></font>

In [None]:
### Change dates needed: check for latest date marked before
# Obtain latest US racial data
latestDate = '2021-03-07'
racialDataUSToday = racialDataUS.loc[racialDataUS.Date == 20210307]

# Fix stupid commas in data
racialDataUSToday.replace(',','', regex=True, inplace=True)

# Get racial data for all states (combined)
#allRaces = ['White','Black','LatinX','Asian','AIAN','NHPI','Multi','Other','Unknown','Hisp','NonHisp','EthUnknown']
allRaces = ['White','Black','Latinx','Asian','AIAN','NHPI','Hisp']

totalRaceCases = []
for race in allRaces:
    if (race != 'Hisp'):
        colName = 'Cases_' + race
    else:
        colName = 'Cases_Ethnicity_Hispanic'
    totalRaceCases.append(np.sum(racialDataUSToday[colName].astype(float)))
allRacialDataCases = pd.DataFrame([totalRaceCases], columns=allRaces)

totalRaceDeaths = []
for race in allRaces:
    if (race != 'Hisp'):
        colName = 'Deaths_' + race
    else:
        colName = 'Deaths_Ethnicity_Hispanic'
    totalRaceDeaths.append(np.sum(racialDataUSToday[colName].astype(float)))
    
allRacialDataDeaths = pd.DataFrame([totalRaceDeaths], columns=allRaces)
        
    
    
# Create pie charts
colorsRace ={'White':'lightcyan',
             'Black':'cyan',
             'Latinx':'royalblue',
             'Asian':'blueviolet',
             'AIAN':'darkblue',
             'NHPI':'crimson',
             #'Multi':'darkolivegreen',
             #'Other':'deeppink',
             #'Unknown':'goldenrod',
             'Hisp':'lightcoral',
             #'NonHisp':'darkred',
             #'EthUnknown':'violet',
            }

# Racial data for Cases
agePlotCases = px.pie(allRacialDataCases, values=totalRaceCases, names=allRaces, color=allRaces,
             color_discrete_map=colorsRace)
agePlotCases.update_traces(title='Racial Data for US Cases')
agePlotCases.show()

# Racial data for Deaths
agePlotDeaths = px.pie(allRacialDataDeaths, values=totalRaceDeaths, names=allRaces, color=allRaces,
             color_discrete_map=colorsRace)
agePlotDeaths.update_traces(title='Racial Data for US Deaths')
agePlotDeaths.show()

<font size="7"><b>Building and Training ML Models</b></font>
<a id='build_train_ML'></a>

<font size="5"><b>Define functions needed for RNN/LSTM model visualization</b></font>

In [None]:
def univariate_data(dataset, start_index, end_index, history_size, target_size):
    data = []
    labels = []

    start_index = start_index + history_size
    if end_index is None:
    end_index = len(dataset) - target_size

    for i in range(start_index, end_index):
        indices = range(i-history_size, i)
        # Reshape data from (history_size,) to (history_size, 1)
        data.append(np.reshape(dataset[indices], (history_size, 1)))
        labels.append(dataset[i+target_size])
    
    return np.array(data), np.array(labels)

In [None]:
univariate_past_history = days_in_past
univariate_future_target = 0
TRAIN_SPLIT = days_in_past

x_train_uni, y_train_uni = univariate_data(usCases_norm, 0, TRAIN_SPLIT,
                                           univariate_past_history,
                                           univariate_future_target)
x_val_uni, y_val_uni = univariate_data(usCases_norm, TRAIN_SPLIT, None,
                                       univariate_past_history,
                                       univariate_future_target)

<font size="5"><b>Building a Simple LSTM model</b></font>

In [None]:
# # Create the model
# cases_lstm = Sequential()
# cases_lstm.name = "LSTM Model for US Case Prediction"

# cases_lstm.add(LSTM(8, input_shape=(5,1)))
# cases_lstm.add(Dense(1))

# cases_lstm.compile(optimizer='adam', loss='mae')

# # Display the layers
# cases_lstm.summary()

<font size="5"><b>Train the simple LSTM model</b></font>