# Washington State Covid-19 Cases

## Purpose



## Methodology

Data are provided by the Washington State Department of Health as an XLSX spreadsheet.
This is updated weekly on Sundays, so get the specific URL from the [Data Downloads](https://www.doh.wa.gov/Emergencies/Coronavirus#34225) section of the department's [2019 Novel Coronavirus Outbreak (COVID-19)](https://www.doh.wa.gov/Emergencies/Coronavirus) page.

## Results

## Suggested next steps


# Setup

**Next two lines are for pretty output of all output in each cell, not just the last.**

In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Library import
We import all the required Python libraries

In [2]:
# Data manipulation
import pandas as pd
import numpy as np

# Options for pandas
pd.options.display.max_columns = 50
pd.options.display.max_rows = 30

# Visualizations
import plotly
import plotly.graph_objs as go
import plotly.offline as ply
import plotly.express as px
plotly.offline.init_notebook_mode(connected=True)

import cufflinks as cf
cf.go_offline(connected=True)
cf.set_config_file(theme='white')

import matplotlib as plt

# Autoreload extension
if 'autoreload' not in get_ipython().extension_manager.loaded:
    %load_ext autoreload
    
%autoreload 2

## Local library import
We import all the required local libraries libraries

In [3]:
# Include local library paths
import sys
# sys.path.append('path/to/local/lib') # uncomment and fill to import local libraries

# Import local libraries

# Parameter definition
We set all relevant parameters for our notebook. By convention, parameters are uppercase, while all the 
other variables follow Python's guidelines.


# Data import
We retrieve all the required data for the analysis.

In [59]:
latestData = ("https://www.doh.wa.gov/Portals/1/Documents/1600/coronavirus/"
              # "data-tables/PUBLIC_CDC_Event_Date_SARS.xlsx?ver=20200706121119"
              "data-tables/PUBLIC_CDC_Event_Date_SARS.xlsx?ver=20200712123123")
df = pd.read_excel(latestData)

In [60]:
df.info()
df.describe(percentiles=[0.05, 0.15, 0.25, 0.50, 0.75, 0.85, 0.95])
df.head()
df.tail()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 571 entries, 0 to 570
Data columns (total 10 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   County           571 non-null    object        
 1   WeekStartDate    571 non-null    object        
 2   NewPos_All       571 non-null    int64         
 3   Age 0-19         571 non-null    int64         
 4   Age 20-39        571 non-null    int64         
 5   Age 40-59        571 non-null    int64         
 6   Age 60-79        571 non-null    int64         
 7   Age 80+          571 non-null    int64         
 8   Positive UnkAge  571 non-null    int64         
 9   dtm_updated      571 non-null    datetime64[ns]
dtypes: datetime64[ns](1), int64(7), object(2)
memory usage: 44.7+ KB


Unnamed: 0,NewPos_All,Age 0-19,Age 20-39,Age 40-59,Age 60-79,Age 80+,Positive UnkAge
count,571.0,571.0,571.0,571.0,571.0,571.0,571.0
mean,62.859895,5.929947,23.262697,19.371278,10.269702,3.989492,0.036778
std,148.238587,15.930176,56.350697,46.961262,27.085261,12.071711,0.222536
min,1.0,0.0,0.0,0.0,0.0,0.0,0.0
5%,1.0,0.0,0.0,0.0,0.0,0.0,0.0
15%,1.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2.0,0.0,0.5,1.0,0.0,0.0,0.0
50%,9.0,1.0,3.0,3.0,2.0,0.0,0.0
75%,42.5,4.0,15.0,13.5,8.0,2.0,0.0
85%,97.0,9.0,33.0,28.0,16.0,6.0,0.0


Unnamed: 0,County,WeekStartDate,NewPos_All,Age 0-19,Age 20-39,Age 40-59,Age 60-79,Age 80+,Positive UnkAge,dtm_updated
0,Adams County,2020-03-08,4,0,1,1,2,0,0,2020-07-05 15:36:42.278
1,Adams County,2020-03-15,3,0,0,3,0,0,0,2020-07-05 15:36:42.278
2,Adams County,2020-03-22,9,0,1,4,4,0,0,2020-07-05 15:36:42.278
3,Adams County,2020-03-29,17,2,8,4,3,0,0,2020-07-05 15:36:42.278
4,Adams County,2020-04-05,8,1,2,4,1,0,0,2020-07-05 15:36:42.278


Unnamed: 0,County,WeekStartDate,NewPos_All,Age 0-19,Age 20-39,Age 40-59,Age 60-79,Age 80+,Positive UnkAge,dtm_updated
566,Unassigned,2020-05-31,3,1,1,0,0,0,1,2020-07-05 15:36:42.278
567,Unassigned,2020-06-07,5,1,1,3,0,0,0,2020-07-05 15:36:42.278
568,Unassigned,2020-06-14,9,0,7,1,0,0,1,2020-07-05 15:36:42.278
569,Unassigned,2020-06-21,27,2,12,8,5,0,0,2020-07-05 15:36:42.278
570,Unassigned,2020-06-28,81,7,40,25,7,2,0,2020-07-05 15:36:42.278


# Data processing

* Use pd.melt to make data tidy

In [61]:
counties = df['County'].unique()
print(counties)
dft = pd.melt(df.reset_index(),
              id_vars=['WeekStartDate', 'NewPos_All', 'Age 0-19', 'Age 20-39',
                       'Age 40-59', 'Age 60-79', 'Age 80+'],
              value_vars=['County'])
del dft['variable']
dft.head()
dft.tail()

['Adams County' 'Asotin County' 'Benton County' 'Chelan County'
 'Clallam County' 'Clark County' 'Columbia County' 'Cowlitz County'
 'Douglas County' 'Franklin County' 'Grant County' 'Grays Harbor County'
 'Island County' 'Jefferson County' 'King County' 'Kitsap County'
 'Kittitas County' 'Klickitat County' 'Lewis County' 'Mason County'
 'Okanogan County' 'Pacific County' 'Pend Oreille County' 'Pierce County'
 'San Juan County' 'Skagit County' 'Skamania County' 'Snohomish County'
 'Spokane County' 'Stevens County' 'Thurston County' 'Wahkiakum County'
 'Walla Walla County' 'Whatcom County' 'Whitman County' 'Yakima County'
 'Unassigned']


Unnamed: 0,WeekStartDate,NewPos_All,Age 0-19,Age 20-39,Age 40-59,Age 60-79,Age 80+,value
0,2020-03-08,4,0,1,1,2,0,Adams County
1,2020-03-15,3,0,0,3,0,0,Adams County
2,2020-03-22,9,0,1,4,4,0,Adams County
3,2020-03-29,17,2,8,4,3,0,Adams County
4,2020-04-05,8,1,2,4,1,0,Adams County


Unnamed: 0,WeekStartDate,NewPos_All,Age 0-19,Age 20-39,Age 40-59,Age 60-79,Age 80+,value
566,2020-05-31,3,1,1,0,0,0,Unassigned
567,2020-06-07,5,1,1,3,0,0,Unassigned
568,2020-06-14,9,0,7,1,0,0,Unassigned
569,2020-06-21,27,2,12,8,5,0,Unassigned
570,2020-06-28,81,7,40,25,7,2,Unassigned


In [62]:
px.scatter(df, x="WeekStartDate", y='NewPos_All', color='County')

In [63]:
px.scatter(df, x="WeekStartDate", y="Age 0-19", color='County')

In [64]:
px.scatter(df, x="WeekStartDate", y='Age 20-39', color='County')

In [65]:
px.scatter(df, x="WeekStartDate", y='Age 40-59', color='County')

In [66]:
px.scatter(df, x="WeekStartDate", y='Age 60-79', color='County')

In [67]:
px.scatter(df, x="WeekStartDate", y='Age 80+', color='County')