# Washington State Covid-19 Cases

## Purpose



## Methodology

Data are provided by the Washington State Department of Health as an XLSX spreadsheet.
This is updated weekley on Sundays, so get the specific URL from the [Data Downloads](https://www.doh.wa.gov/Emergencies/Coronavirus#34225) section of the department's [2019 Novel Coronavirus Outbreak (COVID-19)](https://www.doh.wa.gov/Emergencies/Coronavirus) page.

## Results

## Suggested next steps


# Setup

**Next two lines are for pretty output of all output in each cell, not just the last.**

In [6]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Library import
We import all the required Python libraries

In [13]:
# Data manipulation
import pandas as pd
import numpy as np

# Options for pandas
pd.options.display.max_columns = 50
pd.options.display.max_rows = 30

# Visualizations
import plotly
import plotly.graph_objs as go
import plotly.offline as ply
import plotly.express as px
plotly.offline.init_notebook_mode(connected=True)

import cufflinks as cf
cf.go_offline(connected=True)
cf.set_config_file(theme='white')

import matplotlib as plt

# Autoreload extension
if 'autoreload' not in get_ipython().extension_manager.loaded:
    %load_ext autoreload
    
%autoreload 2

## Local library import
We import all the required local libraries libraries

In [4]:
# Include local library paths
import sys
# sys.path.append('path/to/local/lib') # uncomment and fill to import local libraries

# Import local libraries

# Parameter definition
We set all relevant parameters for our notebook. By convention, parameters are uppercase, while all the 
other variables follow Python's guidelines.


# Data import
We retrieve all the required data for the analysis.

In [5]:
df = pd.read_excel("https://www.doh.wa.gov/Portals/1/Documents/1600/coronavirus/data-tables/PUBLIC_CDC_Event_Date_SARS.xlsx?ver=20200610134713")

In [11]:
df.info()
df.describe(percentiles=[0.05, 0.15, 0.25, 0.50, 0.75, 0.85, 0.95])
df.head()
df.tail()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 423 entries, 0 to 422
Data columns (total 10 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   County           423 non-null    object        
 1   WeekStartDate    423 non-null    object        
 2   NewPos_All       423 non-null    int64         
 3   Age 0-19         423 non-null    int64         
 4   Age 20-39        423 non-null    int64         
 5   Age 40-59        423 non-null    int64         
 6   Age 60-79        423 non-null    int64         
 7   Age 80+          423 non-null    int64         
 8   Positive UnkAge  423 non-null    int64         
 9   dtm_updated      423 non-null    datetime64[ns]
dtypes: datetime64[ns](1), int64(7), object(2)
memory usage: 33.2+ KB


Unnamed: 0,NewPos_All,Age 0-19,Age 20-39,Age 40-59,Age 60-79,Age 80+,Positive UnkAge
count,423.0,423.0,423.0,423.0,423.0,423.0,423.0
mean,56.06383,3.747045,18.44208,18.550827,10.910165,4.385343,0.028369
std,141.69085,9.771553,47.159363,48.012801,29.764024,13.240949,0.179913
min,1.0,0.0,0.0,0.0,0.0,0.0,0.0
5%,1.0,0.0,0.0,0.0,0.0,0.0,0.0
15%,1.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2.0,0.0,0.0,1.0,0.0,0.0,0.0
50%,7.0,0.0,2.0,2.0,1.0,0.0,0.0
75%,38.5,3.0,11.0,12.5,7.0,2.0,0.0
85%,75.7,6.0,26.0,24.0,15.0,6.7,0.0


Unnamed: 0,County,WeekStartDate,NewPos_All,Age 0-19,Age 20-39,Age 40-59,Age 60-79,Age 80+,Positive UnkAge,dtm_updated
0,Adams County,2020-03-08,4,0,1,1,2,0,0,2020-06-07 14:32:42.117
1,Adams County,2020-03-15,3,0,0,3,0,0,0,2020-06-07 14:32:42.117
2,Adams County,2020-03-22,9,0,1,4,4,0,0,2020-06-07 14:32:42.117
3,Adams County,2020-03-29,17,2,8,4,3,0,0,2020-06-07 14:32:42.117
4,Adams County,2020-04-05,8,1,2,4,1,0,0,2020-06-07 14:32:42.117


Unnamed: 0,County,WeekStartDate,NewPos_All,Age 0-19,Age 20-39,Age 40-59,Age 60-79,Age 80+,Positive UnkAge,dtm_updated
418,Unassigned,2020-04-19,1,0,0,1,0,0,0,2020-06-07 14:32:42.117
419,Unassigned,2020-05-10,2,0,0,2,0,0,0,2020-06-07 14:32:42.117
420,Unassigned,2020-05-17,7,0,3,4,0,0,0,2020-06-07 14:32:42.117
421,Unassigned,2020-05-24,5,0,3,2,0,0,0,2020-06-07 14:32:42.117
422,Unassigned,2020-05-31,17,1,8,4,2,1,1,2020-06-07 14:32:42.117


# Data processing
Put here the core of the notebook. Feel free di further split this section into subsections.

In [23]:
counties = df['County'].unique()
print(counties)
dft = pd.melt(df.reset_index(),
              id_vars=['WeekStartDate', 'NewPos_All', 'Age 0-19', 'Age 20-39', 'Age 40-59', 'Age 60-79', 'Age 80+'],
              value_vars=['County'])
dft.head()
dft.tail()

['Adams County' 'Asotin County' 'Benton County' 'Chelan County'
 'Clallam County' 'Clark County' 'Cowlitz County' 'Douglas County'
 'Franklin County' 'Grant County' 'Grays Harbor County' 'Island County'
 'Jefferson County' 'King County' 'Kitsap County' 'Kittitas County'
 'Klickitat County' 'Lewis County' 'Mason County' 'Okanogan County'
 'Pacific County' 'Pierce County' 'San Juan County' 'Skagit County'
 'Snohomish County' 'Spokane County' 'Stevens County' 'Thurston County'
 'Walla Walla County' 'Whatcom County' 'Whitman County' 'Yakima County'
 'Unassigned']


Unnamed: 0,WeekStartDate,NewPos_All,Age 0-19,Age 20-39,Age 40-59,Age 60-79,Age 80+,variable,value
0,2020-03-08,4,0,1,1,2,0,County,Adams County
1,2020-03-15,3,0,0,3,0,0,County,Adams County
2,2020-03-22,9,0,1,4,4,0,County,Adams County
3,2020-03-29,17,2,8,4,3,0,County,Adams County
4,2020-04-05,8,1,2,4,1,0,County,Adams County


Unnamed: 0,WeekStartDate,NewPos_All,Age 0-19,Age 20-39,Age 40-59,Age 60-79,Age 80+,variable,value
418,2020-04-19,1,0,0,1,0,0,County,Unassigned
419,2020-05-10,2,0,0,2,0,0,County,Unassigned
420,2020-05-17,7,0,3,4,0,0,County,Unassigned
421,2020-05-24,5,0,3,2,0,0,County,Unassigned
422,2020-05-31,17,1,8,4,2,1,County,Unassigned


In [25]:
px.scatter(df, x="WeekStartDate", y="NewPos_All", color='County')

In [26]:
px.scatter(df, x="WeekStartDate", y="Age 0-19", color='County')

In [28]:
px.scatter(df, x="WeekStartDate", y='Age 20-39', color='County')

In [29]:
px.scatter(df, x="WeekStartDate", y='Age 40-59', color='County')

In [31]:
px.scatter(df, x="WeekStartDate", y='Age 60-79', color='County')

In [33]:
px.scatter(df, x="WeekStartDate", y='Age 80+', color='County')