# Covid19 County-Level EDA Notebook
---
> ### [County-Level EDA Notebook](https://github.com/speediedan/covid19/blob/master/covid19_county_level_EDA.ipynb)
> * **This notebook can be used for manual EDA of county-level hotspot data**

> ### [The "Real-Time" County-Level Dashboard](county_covid_explorer.html):
> * **A "real-time"<sup>[1](#daily-onset-estimation)</sup> county-level dashboard w/ a focus on estimated effective reproduction number (R<sub>t</sub>)<sup>[2](#effective-reproduction-number-estimation)</sup>, 2nd order growth rates and confirmed infection density for most US counties (counties w/ > 0.03% confirmed infection density and > 1000 cases)**

> ### [The "Real-Time" Choropleth Dashboard](choropleth_covid_county_explorer.html):
> * **State and national choropleths for exploring the geographic distribution of "real-time"<sup>[1](#daily-onset-estimation)</sup> county-level R<sub>t</sub><sup>[2](#effective-reproduction-number-estimation)</sup> along with other relevant epidemiological statistics. Due to resource constraints, the national choropleth represents exclusively R<sub>t</sub> data while the state choropleths include additional county-level metrics. The national choropleth can currently be temporally evolved over a 14-day horizon.**



### Daily Onset Estimation
* It's important to be clear that these county-level R<sub>t</sub> estimates are "real-time" in the sense that the approach outlined in [(Bettencourt & Ribeiro, 2008)](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0002185) is used while convolving the latest [onset-confirmed latency distribution](https://github.com/beoutbreakprepared/nCoV2019/tree/master/latest_data) onto daily reported cases (then adjusting for right-censoring) to obtain the estimated daily onset values. The latency between case onset and confirmation/reporting means that significant changes in local conditions still require some time (days) to be fully reflected in the R<sub>t</sub> estimates, but the estimate for a given point in time should improve with each passing day to a degree roughly correlated with the aforementioned onset-delay distribution.

### Effective Reproduction Number Estimation
   * I've extended [this great notebook](https://github.com/k-sys/covid-19/blob/master/Realtime%20R0.ipynb) to a county-level. 
   * Importantly, it should be noted that (as of 2020.05.12) access to testing is continuing to increase and test positivity rates are therefore changing at a [substantial rate](https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covidview/index.html). As the testing bias continues to evolve in the near-term, one should recognize that point R<sub>t</sub> estimates will be biased to be higher than ground truth R<sub>t</sub>. There are approaches that can [mitigate this bias to a limited extent](http://freerangestats.info/blog/2020/05/09/covid-population-incidence) but fundamentally, we don't have sufficient data to eliminate the bias at this point so I've deprioritized making those model adjustments at the moment (I may make testing-related adjustments in the future though and PRs are welcome!). Fortunately, as testing access and bias stabilize at a level that increases validity of confirmed case counts, these R<sub>t</sub> estimates should become increasingly accurate. I think we can expect hotspot monitoring tools such as this to have utility for a number of months, so this initial period of testing volatility does not nullify their value.
   * The most salient change I've made in the process of the extension is that rather than using a prior of gamma-distributed generation intervals to estimate R (which seems totally reasonable), I'm experimenting with incorporating more locally-relevant information by calculating an R<sub>0</sub> using initial incidence data from each locality.
   * For execution environments that are compute-constrained, I've also provided (but left disabled) some performance enhancing functions that cut execution time by about 50% at the cost of ~5% accuracy.
   
### SEIR Model Notes
* #### At the time the SEIR model component of this notebook was written (2020.03.30) there remained significant uncertainty regarding some sars-cov-2 parameters. The data fit varied substantially by county so I used what I perceived (N.B.: w/ no personal epidemiological expertise!!) to be the consensus values, documented below:

| Parameter   | Source  | Reference Value     |
| :---        | :----:  |     ---:            |
| Latent Period   | [Lin et al., 2020](https://www.ijidonline.com/article/S1201-9712(20)30117-X/fulltext) | 3   |
| Latent Period   | [Wu et al., 2020](https://www.sciencedirect.com/science/article/pii/S0140673620302609) | 3     |
| Latent Period   | [Li et al., 2020](https://www.medrxiv.org/content/10.1101/2020.03.06.20031880v1.full.pdf) | 2 |
| Serial Interval | [Nishura et al. 2020](https://www.ijidonline.com/article/S1201-9712(20)30119-3/pdf) | 4.6 |
| Serial Interval | [Li et al., 2020](https://www.nejm.org/doi/pdf/10.1056/NEJMoa2001316?articleTools=true) | 7.5 |
| Incubation Period | [Li et al., 2020](https://www.nejm.org/doi/pdf/10.1056/NEJMoa2001316?articleTools=true) | 5.2 |
| Infectious Period | [Li et al., 2020](https://www.nejm.org/doi/pdf/10.1056/NEJMoa2001316?articleTools=true) | 2.3 |
| Infectious Period | [Zhou et al., 2020](https://www.medrxiv.org/content/10.1101/2020.02.24.20026773v1.full.pdf) | 6 |
| Infectious Period | [Bi et al., 2020](https://www.medrxiv.org/content/10.1101/2020.03.03.20028423v3) | 1.5
| Infectious Period | [Kucharski et al., 2020](https://cmmid.github.io/topics/covid19/current-patterns-transmission/wuhan-early-dynamics.html) | 2.9
| Time to Hospitalization | [Huang et al., 2020](https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(20)30183-5/fulltext) | 8
| Mean Hospitalization Period | [Wang et al., 2020](https://jamanetwork.com/journals/jama/fullarticle/2761044?guestAccessKey=f61bd430-07d8-4b86-a749-bec05bfffb65) | 12
| Hospitalization Rate | [Ferguson et al., 2020](https://spiral.imperial.ac.uk/bitstream/10044/1/77482/5/Imperial%20College%20COVID19%20NPI%20modelling%2016-03-2020.pdf) (weighted by us demo by [Covid Act Now](https://covidactnow.org/model)) | 0.073

In [6]:
import os
from IPython.core.debugger import set_trace
import datetime
from pathlib import Path
import pandas as pd
import c19_analysis.dataprep_utils as covid_utils
from c19_analysis.dataprep_flow import build_latest_case_data
import config
import c19_analysis.dataprep_utils as covid_utils
import c19_analysis.bayesian_rt_est as bayes_rt
# import c19_analysis.cust_seir_model as cust_seir
import dashboard.rt_explorer as rt_explorer
import dashboard.choropleth_explorer as choropleth_explorer
import dashboard.static_mpl_viz as static_mpl_viz

In [7]:
# Build/Update Core Case Data
# test override eda_tmp_dir for a new feature branch
def reset_stage_paths(eda_tmp: str) -> None:
    eda_tmp_dir = eda_tmp
    config.ds_meta = Path(f"{eda_tmp_dir}/ds_meta.json")
    config.repo_patient_onset_zip = Path(f"{eda_tmp_dir}/latestdata.tar.gz")
    config.repo_patient_onset_csv = Path(f"{eda_tmp_dir}/latestdata.csv")
    config.latest_case_data_zip = Path(f"{eda_tmp_dir}/latest_case_data.tar.gz")
    config.county_rt_calc_zip = Path(f"{eda_tmp_dir}/latest_county_rt_data.tar.gz")
    config.county_covid_explorer_tags = Path(f"{eda_tmp_dir}/county_covid_explorer_tags.html")
    config.choro_covid_explorer_tags = Path(f"{eda_tmp_dir}/choropleth_covid_explorer_tags.html")
    config.national_layout_png_tmp = Path(f"{eda_tmp_dir}/national_layout_tmp.png")
    config.cpath_counties_zip = Path(f"{eda_tmp_dir}/cpath_counties_df.tar.gz")
    config.exported_rtdf_json = Path(f"{eda_tmp_dir}/rtdf_export_json.tar.gz")
    config.exported_rtdf_csv = Path(f"{eda_tmp_dir}/rtdf_export_csv.tar.gz")
branch_name = "local_eda"
config.eda_tmp_dir = f"{os.environ['HOME']}/datasets/covid19/{branch_name}"
reset_stage_paths(config.eda_tmp_dir)
covid_delta_df, updated = build_latest_case_data()

Done downloading.


  0%|          | 0/3140 [00:00<?, ?it/s]

In [16]:
# explore specific counties in more detail...
target_counties = ['New York County, NY', 'King County, WA']
time_delta = 8
for c in target_counties:
    s = covid_delta_df[(covid_delta_df.index.get_level_values('name') == c) & (covid_delta_df.index.get_level_values('Date') >  (datetime.datetime.today() - datetime.timedelta(time_delta)))]
    s = s.style.apply(covid_utils.color_mask, subset=['2nd_order_growth'], thresh=0.0)
    display(s)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Estimated Onset Cases,Confirmed New Cases,Total Estimated Cases,node_start_dt,node_days,daily new cases ma,growth_rate,growth_period_n,growth_period_n-1,2nd_order_growth
id,estimated_pop,name,stateAbbr,Date,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
36061,1628701,"New York County, NY",NY,2021-12-12 00:00:00,1634.066894,923.0,29847,2021-09-21 00:00:00,82 days 00:00:00,1228.0,0.0579,0.049743,0.034743,0.4317
36061,1628701,"New York County, NY",NY,2021-12-13 00:00:00,1732.147166,810.0,31579,2021-09-21 00:00:00,83 days 00:00:00,1358.0,0.058,0.0525,0.036129,0.4531
36061,1628701,"New York County, NY",NY,2021-12-14 00:00:00,1787.116956,686.0,33366,2021-09-21 00:00:00,84 days 00:00:00,1478.0,0.0566,0.054471,0.037857,0.4389
36061,1628701,"New York County, NY",NY,2021-12-15 00:00:00,2259.831983,1507.0,35626,2021-09-21 00:00:00,85 days 00:00:00,1645.0,0.0677,0.057343,0.039871,0.4382
36061,1628701,"New York County, NY",NY,2021-12-16 00:00:00,2915.655642,2615.0,38542,2021-09-21 00:00:00,86 days 00:00:00,1885.0,0.0819,0.061757,0.041943,0.4724
36061,1628701,"New York County, NY",NY,2021-12-17 00:00:00,3549.775907,3514.0,42092,2021-09-21 00:00:00,87 days 00:00:00,2201.0,0.0921,0.067343,0.0444,0.5167
36061,1628701,"New York County, NY",NY,2021-12-18 00:00:00,3768.0,3768.0,45860,2021-09-21 00:00:00,88 days 00:00:00,2521.0,0.0895,0.071957,0.047186,0.525


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Estimated Onset Cases,Confirmed New Cases,Total Estimated Cases,node_start_dt,node_days,daily new cases ma,growth_rate,growth_period_n,growth_period_n-1,2nd_order_growth
id,estimated_pop,name,stateAbbr,Date,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
53033,2233163,"King County, WA",WA,2021-12-12 00:00:00,266.157153,0.0,29409,2021-09-21 00:00:00,82 days 00:00:00,332.0,0.0092,0.011843,0.012743,-0.0706
53033,2233163,"King County, WA",WA,2021-12-13 00:00:00,641.341495,906.0,30050,2021-09-21 00:00:00,83 days 00:00:00,360.0,0.0218,0.0126,0.012014,0.0488
53033,2233163,"King County, WA",WA,2021-12-14 00:00:00,338.926282,254.0,30389,2021-09-21 00:00:00,84 days 00:00:00,362.0,0.0113,0.012529,0.012071,0.0379
53033,2233163,"King County, WA",WA,2021-12-15 00:00:00,386.319485,371.0,30775,2021-09-21 00:00:00,85 days 00:00:00,375.0,0.0127,0.012814,0.012329,0.0394
53033,2233163,"King County, WA",WA,2021-12-16 00:00:00,507.867781,591.0,31283,2021-09-21 00:00:00,86 days 00:00:00,397.0,0.0165,0.013386,0.011671,0.1469
53033,2233163,"King County, WA",WA,2021-12-17 00:00:00,633.193531,737.0,31916,2021-09-21 00:00:00,87 days 00:00:00,433.0,0.0202,0.014386,0.011714,0.228
53033,2233163,"King County, WA",WA,2021-12-18 00:00:00,0.0,0.0,31916,2021-09-21 00:00:00,88 days 00:00:00,396.0,0.0,0.0131,0.011786,0.1115


In [9]:
# Update Effective R Estimates
# Bayesian Rt Estimation
rt_df = bayes_rt.gen_rt_df(covid_delta_df)

  0%|          | 0/3098 [00:00<?, ?it/s]

In [16]:
# The applicable scope of this particular SEIR model was only the initial, exponential stage of the outbreak and remains here for reference only.
# cust_seir.gen_seir_viz(rt_df)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Estimated Onset Cases,Confirmed New Cases,Total Estimated Cases,node_start_dt,node_days,daily new cases ma,growth_rate,growth_period_n,growth_period_n-1,2nd_order_growth
id,estimated_pop,name,stateAbbr,Date,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
12021,378488,"Collier County, FL",FL,2021-09-24,376.656885,1178.0,691,2021-09-21,3 days,173.0,1.2006,,,
12021,378488,"Collier County, FL",FL,2021-09-25,48.134527,0.0,739,2021-09-21,4 days,156.0,0.0695,0.662925,,
12021,378488,"Collier County, FL",FL,2021-09-26,55.608972,0.0,794,2021-09-21,5 days,143.0,0.0744,0.43815,,
12021,378488,"Collier County, FL",FL,2021-09-27,61.675068,0.0,856,2021-09-21,6 days,136.0,0.0781,0.35565,,
12021,378488,"Collier County, FL",FL,2021-09-28,67.023583,0.0,923,2021-09-21,7 days,58.0,0.0783,0.075075,,
12021,378488,"Collier County, FL",FL,2021-09-29,64.269832,0.0,987,2021-09-21,8 days,62.0,0.0693,0.075025,0.662925,-0.8868
12021,378488,"Collier County, FL",FL,2021-09-30,53.688804,0.0,1041,2021-09-21,9 days,62.0,0.0547,0.0701,0.43815,-0.84
12021,378488,"Collier County, FL",FL,2021-10-01,221.766342,694.0,1263,2021-09-21,10 days,102.0,0.2133,0.1039,0.35565,-0.7079
12021,378488,"Collier County, FL",FL,2021-10-02,28.216628,0.0,1291,2021-09-21,11 days,92.0,0.0222,0.089875,0.075075,0.1971
12021,378488,"Collier County, FL",FL,2021-10-03,32.60593,0.0,1324,2021-09-21,12 days,84.0,0.0256,0.07895,0.075025,0.0523


In [12]:
# Prepare dataframes necessary for downstream dashboard generation
# rt_df = core rt_explorer dataframe source
# viz_df_instances = used to generate temporal evolution of national choropleth
# status_df = rt_explorer table dataframe source
# county_date_instances: list of dates for which to generate national choropleth
rt_df, viz_df_instances, status_df, county_date_instances = covid_utils.prep_dashboard_dfs(rt_df)

In [14]:
# feel free to explore the data using the dataframes generated above
# e.g.:
# view 5 largest Rt
s = status_df.nlargest(5, 'Rt')
s = s.style.apply(covid_utils.color_mask, subset=['Rt'], thresh=1.0)
display(s)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Estimated Onset Cases,Total Estimated Cases,node_start_dt,daily new cases ma,Confirmed New Cases,growth_rate,growth_period_n,growth_period_n-1,2nd_order_growth,Rt,90_CrI_LB,90_CrI_UB,confirmed %infected
id,estimated_pop,name,stateAbbr,Date,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
48157,787858,"Fort Bend County, TX",TX,2021-12-18 00:00:00,508.0,12883,2021-09-21 00:00:00,282.0,508.0,0.0411,0.024071,0.013757,74.97,1.37,1.16,1.54,1.64
15003,980080,"Honolulu County, HI",HI,2021-12-18 00:00:00,622.0,10591,2021-09-21 00:00:00,424.0,622.0,0.0624,0.048214,0.025386,89.93,1.34,1.15,1.47,1.08
48201,4698619,"Harris County, TX",TX,2021-12-18 00:00:00,2542.0,59216,2021-09-21 00:00:00,1593.0,2542.0,0.0449,0.0303,0.016743,80.97,1.33,1.24,1.42,1.26
36061,1628701,"New York County, NY",NY,2021-12-18 00:00:00,3768.0,45860,2021-09-21 00:00:00,2521.0,3768.0,0.0895,0.071957,0.047186,52.5,1.32,1.23,1.38,2.82
34013,799767,"Essex County, NJ",NJ,2021-12-18 00:00:00,1163.0,16993,2021-09-21 00:00:00,729.0,1163.0,0.0735,0.052429,0.040414,29.73,1.31,1.16,1.42,2.12
