<a href="https://colab.research.google.com/github/nalbarr/covid19-cases-deaths/blob/master/covid19_cases_deaths.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# COVID-19 Cases, Deaths analysis
This a Python notebook used to better under COVID-19

#### Versions
- 2020.03.16. Version 1.  Explore Splunk, John Hopkins datasets.
- 2020.03.19. Version 2.  Explore Splunk, John Hopkins datasets.
- 2020.04.03. Version 3.  Use state level NY times dataset.
- 2020.04.11. Version 4.  User test with mentoring group.
- 2020.04.20. Version 5.  Update for external meetings.
- 2020.06.12. Version 6.  Strip down for public Github.

#### References:
- https://covid-19.splunkforgood.com/coronavirus__covid_19_
- https://github.com/CSSEGISandData/COVID-19
- https://github.com/nytimes/covid-19-data
- https://medium.com/@minaienick/why-you-should-be-using-geopandas-to-visualize-data-on-maps-aka-geo-visualization-fd1e3b6211b4

## Dependencies

In [0]:
# import dependencies
import numpy as np 
import pandas as pd
import datetime
import plotly.graph_objects as go

## Load Data

In [0]:
# Pull source data
!git clone https://github.com/nytimes/covid-19-data

Cloning into 'covid-19-data'...
remote: Enumerating objects: 58, done.[K
remote: Counting objects: 100% (58/58), done.[K
remote: Compressing objects: 100% (54/54), done.[K
remote: Total 1149 (delta 28), reused 8 (delta 4), pack-reused 1091[K
Receiving objects: 100% (1149/1149), 11.67 MiB | 6.51 MiB/s, done.
Resolving deltas: 100% (628/628), done.


In [0]:
# Explore data files
!ls covid-19-data

excess-deaths  live		       README.md	us.csv
LICENSE        PROBABLE-CASES-NOTE.md  us-counties.csv	us-states.csv


In [0]:
# Read data
df=pd.read_csv("covid-19-data/us-states.csv")
df

Unnamed: 0,date,state,fips,cases,deaths
0,2020-01-21,Washington,53,1,0
1,2020-01-22,Washington,53,1,0
2,2020-01-23,Washington,53,1,0
3,2020-01-24,Illinois,17,1,0
4,2020-01-24,Washington,53,1,0
...,...,...,...,...,...
4684,2020-05-26,Virginia,51,39342,1236
4685,2020-05-26,Washington,53,21278,1091
4686,2020-05-26,West Virginia,54,1854,74
4687,2020-05-26,Wisconsin,55,15923,517


## Exploratory Data Analysis

In [0]:
print(df.columns)

Index(['date', 'state', 'fips', 'cases', 'deaths'], dtype='object')


In [0]:
# explore data

# number of total rows
total_rows = df.count
print("Total rows: {}".format(total_rows))

# number of distinct dates
dates = np.unique(df['date'])
total_dates = len(dates)
print("Total (unique) dates: {}", total_dates)                  
                  
# number of distinct states
states = np.unique(df['state'])
total_states = len(states)
print("Total (unique) states: ", total_states)                          

Total rows: <bound method DataFrame.count of             date          state  fips  cases  deaths
0     2020-01-21     Washington    53      1       0
1     2020-01-22     Washington    53      1       0
2     2020-01-23     Washington    53      1       0
3     2020-01-24       Illinois    17      1       0
4     2020-01-24     Washington    53      1       0
...          ...            ...   ...    ...     ...
4684  2020-05-26       Virginia    51  39342    1236
4685  2020-05-26     Washington    53  21278    1091
4686  2020-05-26  West Virginia    54   1854      74
4687  2020-05-26      Wisconsin    55  15923     517
4688  2020-05-26        Wyoming    56    850      13

[4689 rows x 5 columns]>
Total (unique) dates: {} 127
Total (unique) states:  55


## Helper Functions

In [0]:
# NAA. 
# Strive for:
# - Self describing
# - Function composition

def get_dates(df):
  """ Return dates. """
  return df['date']

def get_cases(df):
  """ Return cases. """
  return df['cases']

def get_deaths(df):
  """ Return deaths. """
  return df['deaths']

def get_states(df):
  """ Return states. """
  return df['state']

In [0]:
# slice for just state as 'Illinois'
def get_df_by_state(df, state):
  """Returns cases, deaths by state"""
  return df[df['state'] == state]

df2 = get_df_by_state(df, 'Illinois')
print(df2.columns)
print(df2.count)

Index(['date', 'state', 'fips', 'cases', 'deaths'], dtype='object')
<bound method DataFrame.count of             date     state  fips   cases  deaths
3     2020-01-24  Illinois    17       1       0
6     2020-01-25  Illinois    17       1       0
10    2020-01-26  Illinois    17       1       0
14    2020-01-27  Illinois    17       1       0
18    2020-01-28  Illinois    17       1       0
...          ...       ...   ...     ...     ...
4428  2020-05-22  Illinois    17  105710    4740
4483  2020-05-23  Illinois    17  108100    4817
4538  2020-05-24  Illinois    17  110541    4884
4593  2020-05-25  Illinois    17  112248    4912
4648  2020-05-26  Illinois    17  113486    4960

[124 rows x 5 columns]>


In [0]:
# NAA.
# Create basic color map.

all_states = df['state']
print(all_states)

unique_states = [state for state in np.unique(all_states)]
print(unique_states)

all_colors = """aliceblue, antiquewhite, aqua, aquamarine, azure, beige, bisque, black, blanchedalmond, blue, blueviolet, brown, burlywood, cadetblue, chartreuse, chocolate, coral, cornflowerblue, cornsilk, crimson, cyan, darkblue, darkcyan, darkgoldenrod, darkgray, darkgrey, darkgreen, darkkhaki, darkmagenta, darkolivegreen, darkorange, darkorchid, darkred, darksalmon, darkseagreen, darkslateblue, darkslategray, darkslategrey, darkturquoise, darkviolet, deeppink, deepskyblue, dimgray, dimgrey, dodgerblue, firebrick, floralwhite, forestgreen, fuchsia, gainsboro, ghostwhite, gold, goldenrod, gray, grey, green, greenyellow, honeydew, hotpink, indianred, indigo, ivory, khaki, lavender, lavenderblush, lawngreen, lemonchiffon, lightblue, lightcoral, lightcyan, lightgoldenrodyellow, lightgray, lightgrey, lightgreen, lightpink, lightsalmon, lightseagreen, lightskyblue, lightslategray, lightslategrey, lightsteelblue, lightyellow, lime, limegreen,linen, magenta, maroon, mediumaquamarine, mediumblue, mediumorchid, mediumpurple, mediumseagreen, mediumslateblue, mediumspringgreen, mediumturquoise, mediumvioletred, midnightblue, mintcream, mistyrose, moccasin, navajowhite, navy, oldlace, olive, olivedrab, orange, orangered, orchid, palegoldenrod, palegreen, paleturquoise, palevioletred, papayawhip, peachpuff, peru, pink, plum, powderblue, purple, red, rosybrown, royalblue, rebeccapurple, saddlebrown, salmon, sandybrown, seagreen, seashell, sienna, silver, skyblue, slateblue, slategray, slategrey, snow, springgreen, steelblue, tan, teal, thistle, tomato, turquoise, violet, wheat, white, whitesmoke, yellow, yellowgreen"""
colors = all_colors.split(', ')

color_map = dict(zip(unique_states, colors))
print(type(color_map))

print(color_map['Illinois'])

0          Washington
1          Washington
2          Washington
3            Illinois
4          Washington
            ...      
4684         Virginia
4685       Washington
4686    West Virginia
4687        Wisconsin
4688          Wyoming
Name: state, Length: 4689, dtype: object
['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut', 'Delaware', 'District of Columbia', 'Florida', 'Georgia', 'Guam', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Northern Mariana Islands', 'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania', 'Puerto Rico', 'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virgin Islands', 'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming']
<

In [0]:
# get time-series slices by state
def get_dfs_by_state(df, state):
  df_by_state = get_df_by_state(df, state)

  dates_by_state = df_by_state['date']
  dates_by_state

  cases_by_state = df_by_state['cases']
  cases_by_state

  deaths_by_state = df_by_state['deaths']
  deaths_by_state

  return dates_by_state, cases_by_state, deaths_by_state

In [0]:
# cases by state
def plot_by_state(title, xs, ys, z):
  fig = go.Figure()
  fig.update_layout(title=title)

  fig.add_trace(go.Scatter(x=xs, y=ys,
                                mode='lines+markers',
                                name=z,
                                marker_color=color_map[z],
                                text=z))

  fig.show()

In [0]:
# Additional functions to set up all states

def get_unique_states(df):
  """ Return states. """
  unique_states = np.unique(df['state'])
  return unique_states

# Discussion using Visualization

### COVID-19 cases, deaths by for 'Illinois' separated

In [0]:
# Plot COVID-19 cases, deaths for 'Illinois' - separately

z = 'Illinois'
dates_il, cases_il, deaths_il = get_dfs_by_state(df, z)
plot_by_state('COVID-19 - Cases by State', dates_il, cases_il, z)
plot_by_state('COVID-19 - Deaths by State', dates_il, deaths_il, z)

In [0]:
# cases, deaths by state
def plot_all_by_state(title, df, state):
  fig = go.Figure()
  fig.update_layout(title=title)

  dates, cases, deaths = get_dfs_by_state(df, state)

  xs = dates
  ys = cases
  fig.add_trace(go.Scatter(x=xs, y=ys,
                                mode='lines+markers',
                                name='cases',
                                marker_color='red',
                                text=ys))

  xs = dates
  ys = deaths
  fig.add_trace(go.Scatter(x=xs, y=ys,
                                mode='lines+markers',
                                name='deaths',
                                marker_color='blue',
                                text=ys))

  fig.show()

### COVID-19 cases, deaths for multiples states overlayed:

#### Top 3 deaths states
- New York
- New Jersey
- Louisana

#### Local focus
- Illinois


In [0]:
# Plot COVID-19 cases, deaths for 'Louisana'- overlayed
z = 'New York'
plot_all_by_state('COVID-19 - Cases, Deaths by State', df, z)

In [0]:
# Plot COVID-19 cases, deaths for 'Illinois'- overlayed
z = 'Illinois'
plot_all_by_state('COVID-19 - Cases, Deaths by State', df, z)

In [0]:
# Plot COVID-19 cases, deaths for 'New Jersey'- overlayed
z = 'New Jersey'
plot_all_by_state('COVID-19 - Cases, Deaths by State', df, z)

In [0]:
# Plot COVID-19 cases, deaths for 'Louisana'- overlayed
z = 'Louisiana'
plot_all_by_state('COVID-19 - Cases, Deaths by State', df, z)


### COVID-19 cases, deaths by all states

In [0]:
# all cases ?
def plot_all(title, zs):
  fig = go.Figure()
  fig.update_layout(title=title)

  for i,z in enumerate(zs):
    dates, cases, deaths = get_dfs_by_state(df, zs[i])

    xs = dates
    ys = cases
    fig.add_trace(go.Scatter(x=xs, y=ys,
                                mode='lines',
                                name="{} - cases".format(z),
                                marker_color=color_map[z],
                                text=ys))
    xs = dates
    ys = deaths
    fig.add_trace(go.Scatter(x=xs, y=ys,
                              mode='lines+markers',
                              name="{} - deaths".format(z),
                              marker_color=color_map[z],
                              text=ys))

  fig.show()

In [0]:
# zs = ['Illinois', 'New York', 'New Jersey', 'Louisiana', 'Ohio', 'Florida']
zs = ['Illinois','Ohio', 'Florida', 'New York', 'Minnesota', 'Michigan', 'Massachusetts']
plot_all('COVID-19 - Cases by State', zs)

In [0]:
# AI4M - study group (KR, AS, NK, NA)
zs = ['Illinois','California']
plot_all('COVID-19 - Cases by State', zs)

In [0]:
plot_all('COVID-19 - Cases by State', get_unique_states(df))

# External presentation

## AI Strategy is fundamentally driven by scientific method and systems thinking.
- Scientific method: Hypothesize -> Experiment -> Synthesize/Evaluation/Feedback
- Systems Thinking:  A system with known inputs and outputs with feedback control designed towards a goal
- In U.S. Healthcare, this is driving original Learning Health System (IOM - Friedman (2014) ) and most recent "AI + Healthcare (NAP, 2019 foundational to U.S. Value-based care policy.

### Example
- Comparatively, how well are each of the states we are living in doing against COVID-19?
  - COVID-19 by state(https://www.beckershospitalreview.com/rankings-and-ratings/states-ranked-by-confirmed-covid-19-cases.html)

### How do we think about this problem?
- What data?
- What assumptions?
- How do we analyze and visualize?
- How do we know we are right?
- How do we make decisions? 
- How would we design systems around this?
- What would that look like?



In [0]:
# Apex Expert Group (Minnesota, Colorado, Illinois)
## 1.  Visualize
## 2.  Discuss
## 3.  Zoom
## 4.  Feeback and Challenge

zs = ['Minnesota', 'Colorado', 'Illinois']
plot_all('COVID-19 - Cases by State', zs)

Thursday, October 17th, 2019 

Boston has the best ...

To identify the "best hospital cities" in the U.S. and the world, Medbelle compared cities on 12 metrics that fell into one of the following three categories: quality of care, infrastructure and access. The metrics range from surgeons per capita to cost of medicine. Each factor was graded on a 100-point scale, and the cities were ranked from highest overall score to lowest. Access additional information on the methodology used for the ranking here.

"The U.S. has the highest number of top-ranking hospital cities and leads the world in medical universities, however, U.S. cities rank low among developed countries when it comes to access," Daniel Kolb, co-founder and managing director of Medbelle, said in a release.

Boston ranked No. 2 on the list of 100 best hospital cities in the world. It was the only U.S. city to crack the top 10.

Here are the top 10 hospital cities in the U.S. based on the analysis:

1. Boston (total score of 99.64 out of 100)
2. Los Angeles (89.89)
3. New York (89.33)
4. Baltimore (86.58)
5. Chicago (84.78)
6. San Francisco (84.14)
7. Ann Arbor, Mich. (82.65)
8. San Jose, Calif. (82.35)
9. Houston (81.63)
10. Seattle (81.28)

## COVID-19 cases, deaths for states with cities with topic hospitals

In [0]:
# NAA. MxD, PN demo context
# e.g., Illinois vs. 
# - https://www.beckershospitalreview.com/rankings-and-ratings/10-best-hospital-cities-in-the-us-101719.html

In [0]:
# NAA.
# NOTE:
# - California has 3 top cities
zs = ['Massachusetts', 'California', 'New York', 'Maryland', 'Illinois', 'Michigan', 'Texas', 'Washington']
plot_all('COVID-19 - Cases, Deaths by State (with top 5 hospital rank)', zs)

### Just focus on death counts for comparative analysis

In [0]:
# only deaths
def plot_deaths(title, zs):
  fig = go.Figure()
  fig.update_layout(title=title)

  for i,z in enumerate(zs):
    dates, cases, deaths = get_dfs_by_state(df, zs[i])

    xs = dates
    ys = deaths
    fig.add_trace(go.Scatter(x=xs, y=ys,
                              mode='lines+markers',
                              name="{} - deaths".format(z),
                              marker_color=color_map[z],
                              text=ys))

  fig.show()

In [0]:
# NAA. 
# - Focus only on deaths.
zs = ['Massachusetts', 'California', 'New York', 'Maryland', 'Illinois', 'Michigan', 'Texas', 'Washington']
plot_deaths('COVID-19 - Deaths by State (with top 5 hospital rank)', zs)

# Next Ideas

## Introduce basic machine learning (ML)
- https://medium.com/datadriveninvestor/covid19-time-series-forecasting-using-lstm-rnn-753a0494448

## Quiver plots - to visualize gradients

In [0]:
# map COVID-19 mitigation strategy types

# - https://www.cnn.com/2020/03/23/us/coronavirus-which-states-stay-at-home-order-trnd/index.html
# - https://en.wikipedia.org/wiki/List_of_United_States_governors
# - https://simple.wikipedia.org/wiki/List_of_U.S._states_by_population
# - https://state.1keydata.com/state-population-density.php
# - https://www.cnn.com/2020/03/31/us/states-travel-restrictions-list/index.html
# - https://www.globalair.com/airport/state.aspx

In [0]:
# Alternative visualization
# - https://plotly.com/python/quiver-plots/#basic-quiver-plot

import plotly.figure_factory as ff

import numpy as np

x,y = np.meshgrid(np.arange(0, 2, .2), np.arange(0, 2, .2))
u = np.cos(x)*y
v = np.sin(x)*y

fig = ff.create_quiver(x, y, u, v)
fig.show()