# COVID-19 Analysis: A Deeper Dive into the Stats
## ( + easy interactive figures with plot.ly) 

3/9/20

According to the "Coronavirus COVID-19 Global Cases by Johns Hopkins CSSE" dashboard, as of right now, there are a total of 113584 confirmed cases, 3996 deaths, and 62496 recovered. From the media reports, it feels like outbreaks are happening at an exponential rate. However, hearing these numbers being thrown everywhere and used to both support or refute the need to panic, I decided to dive deeper into the numbers myself. How reliable are these reports? What do they really say about the threat of the virus? I don't know...
   

Some initial notes/thoughts/findings from browsing the internet:
    - Confirmed cases include presumptive cases
    - Confirmed cases are laboratory-confirmed using PCR 
        - (sidenote: with what experience I've had with QT-PCR, results can be finicky and may vary significantly if proper mixing and sampling isn't done) 
        - According to the WHO daily situation reports, a confirmed case is "A person with laboratory confirmation of COVID-19 infection, irrespective of clinical signs and symptoms."
        - There is no single protocol.
            - Following the link to the laboratory testing page, there are several different protocols coming from several different countries including the US, China, Thailand, etc. 
            - The primers/probe combinations used for different protocols are different. The targets are different... 
        - Recovered patients who have consecutive negative test results test positive after an additional quarentine period?! ["Positive RT-PCR Test Results in Patients Recovered From COVID-19"](https://jamanetwork.com/journals/jama/fullarticle/2762452)
<a href="url" target="_blank">hyperlinked words</a>

In [1]:
import numpy as np
import pandas as pd
import scipy as sp

import plotly.graph_objects as go 
import plotly.figure_factory as ff
import plotly.express as px
# import plotly.offline as py
# py.init_notebook_mode(connected=True)


pd.set_option("display.min_rows", 15)
pd.set_option("display.max_rows", 101)
pd.set_option("display.max_columns", 101)

In [2]:
import plotly.io as pio  # offline plotting
pio.renderers
pio.renderers.default = 'notebook'
%load_ext autoreload
%autoreload 2

In [3]:
"""
# reload all changed modules before executing a new line
%load_ext autoreload
%autoreload 2

# save figures as static images
fig = go.FigureWidget(data=go.Bar(y=[2, 3, 1]))
fig.write_image('figure.png')
"""

"\n# reload all changed modules before executing a new line\n%load_ext autoreload\n%autoreload 2\n\n# save figures as static images\nfig = go.FigureWidget(data=go.Bar(y=[2, 3, 1]))\nfig.write_image('figure.png')\n"

#### Mapping out the Deaths

Out of all the stats, I would say the number of deaths can be "trusted" most (ie if someone is said to have died from the virus, it is highly probable that they had been infected).


In [4]:
df = pd.read_csv('csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Deaths.csv')
today=df.columns[-1]

In [5]:
total_deaths=df[today].sum()
print("total deaths as of {} : {} ".format(df.columns[-1],total_deaths))

total deaths as of 3/12/20 : 4720 


In [6]:
# Check that deaths are only increasing
not_monotonic=[]
dft=df.T[4:]
for col in dft.columns:
    monotonic=dft[col].is_monotonic_increasing
    if monotonic==False:
        print(states[int(col)],  ": INCONSISTANT")
        not_monotonic.append([int(col)])
print("Inconsistancies: {}".format(len(not_monotonic)))

# fill in countries
df['Country/Region']=df['Country/Region'].fillna(method="ffill")
df['Province/State']=df['Province/State'].fillna(value=df['Country/Region'])


NameError: name 'states' is not defined

In [None]:
locations=df[df.columns[:4]].reset_index()
locations.head()



In [None]:
daily_changes=df[df.columns[4:-2]].diff(axis=1)
daily_changes['sum diff']=daily_changes.sum(axis=1)
daily_changes=daily_changes.reset_index().merge(locations, left_on='index', right_on='index')
daily_changes=daily_changes[daily_changes['sum diff']>0]
print(daily_changes.head())

In [None]:
# df['bin_lat']=pd.cut(df['Lat'], bins=18)
# df['bin_long']=pd.cut(df['Long'], bins=18)
df['bin_lat'],blat=pd.cut(df['Lat'], bins=np.linspace(-180, 180, 12), precision=0,retbins=True)
df['bin_long'],blong=pd.cut(df['Long'], bins=np.linspace(-180, 180, 12),precision=0,retbins=True)

df.head()

In [None]:
#centers=[int(np.average([blat[i],blat[i-1]])) for i in np.arange(1,18)]

In [None]:
initial=df.groupby(['bin_long','bin_lat'])['3/8/20'].sum().sort_values(ascending=False).dropna().reset_index() #.fillna(0)
print(initial.head(20))

initial_total=df['3/8/20'].sum()
print('total deaths as of 3/8/20 (start of analysis): ', initial_total)

In [None]:
# today=df.columns[-3]

latest=df.groupby(['bin_lat','bin_long'])[today].sum().sort_values(ascending=False).dropna().reset_index() #.fillna(0)
print(latest.head())
print(latest.tail())

print('\n\nRemoving 0s')
latest=latest[latest[today]>0]
print(latest.head())
print(latest.tail())

In [None]:

countries=list(df.groupby('Country/Region').groups.keys())
print(countries)


In [None]:
# Group cumulative stats by lat/long coordinates since the number of deaths by province/state is sparse

latest['long']=latest['bin_long'].map({i:i.mid for i in latest['bin_long']})

latest['lat']=latest['bin_lat'].map({i:i.mid for i in latest['bin_lat']})

print(latest)
max_country=df[df[today]==df[today].max()]['Country/Region'].values[0]
max_province=df.iloc[df[df[today]==df[today].max()]['Country/Region'].index]['Province/State'].values[0]
print('\n Maximum death toll: {} in {}, {}\n'.format(df[today].max(),max_province, max_country))
total_deaths=df[today].sum()
print('Total deaths: ', total_deaths)

In [None]:
states=df['Province/State'].values

In [None]:
# Global density plot of Deaths
latest['scaled_deaths']=np.log(latest[today])

fig = px.density_mapbox(latest, lat='lat', lon='long', z='scaled_deaths', title="Map of Death Counts",  hover_name=today ,hover_data=["scaled_deaths",today], radius=25, height=500,
                        center=dict(lat=30, lon=110), zoom=1,
                        mapbox_style="carto-positron")
fig

In [None]:
# fig_mod = go.Figure(fig)

# max_deaths=max(latest['scaled_deaths'])
# fig_mod.update_layout(hovertext='today')
# fig_mod.show()

In [None]:
# #scatter
# import math
# hover_text = []
# bubble_size = []

# for index, row in df.iterrows():
#     hover_text.append(('Country/Region: {country}<br>'+
#                       'Date: {date}<br>'+
#                       'Number of Deaths: {death}<br>').format(country=df["Country/Region"],
#                                             date=today,
#                                             death=row[today]))
#     bubble_size.append(math.sqrt(row[today]))

# df['text'] = hover_text
# df['size'] = bubble_size
# sizeref = 2.*max(df['size'])/(100**2)

print("Scatter Plot of Countries/Regions where Deaths>0 as of {}".format(today))
# fig_scat = px.scatter_mapbox(df, lat="Lat", lon="Long", hover_name="Country/Region" ,hover_data=["Long","Lat", today], color_discrete_sequence=["fuchsia"], zoom=1, height=300)
fig_scat = px.scatter_mapbox(df[df[today]>0], lat="Lat", lon="Long", hover_name="Country/Region" ,hover_data=["Long","Lat", 'Province/State',today],color_discrete_sequence=["fuchsia"],zoom=1, height=600)
fig_scat.update_layout(mapbox_style="open-street-map")
fig_scat  #.show()

# fig_mod = go.Figure(fig_scat)
# fig_mod.update_layout(hovertext='today')


#### Alternative Mapbox Styles (raster tiles)
- maps that do not require an API token: 
    - `mapbox_style`=`"open-street-map"`, `"carto-positron"`, `"carto-darkmatter"`, `"stamen-terrain"`, `"stamen-toner"`, or `"stamen-watercolor" `
    - Base Tiles from the USGS: 
    ```
    fig.update_layout(
        mapbox_style="white-bg",
        mapbox_layers=[
            {
                "below": 'traces',
                "sourcetype": "raster",
                "source": [
                    "https://basemap.nationalmap.gov/arcgis/rest/services/USGSImageryOnly/MapServer/tile/{z}/{y}/{x}"
                ]
            }
          ])```
- maps that require a signup or token: "basic", "streets", "outdoors", "light", "dark", "satellite", or "satellite-streets"
    - Base Tiles from the USGS, radar overlay from Environment Canada: no token needed:
    ``` 
    fig.update_layout(
        mapbox_style="white-bg",
        mapbox_layers=[
            {
                "below": 'traces',
                "sourcetype": "raster",
                "source": [
                    "https://basemap.nationalmap.gov/arcgis/rest/services/USGSImageryOnly/MapServer/tile/{z}/{y}/{x}"
                ]
            },
            {
                "sourcetype": "raster",
                "source": ["https://geo.weather.gc.ca/geomet/?"
                           "SERVICE=WMS&VERSION=1.3.0&REQUEST=GetMap&BBOX={bbox-epsg-3857}&CRS=EPSG:3857"
                           "&WIDTH=1000&HEIGHT=1000&LAYERS=RADAR_1KM_RDBR&TILED=true&FORMAT=image/png"],
            }
          ]) ```
    - to provide token, set `layout.mapbox.access_token` (or, if using Plotly Express, via the `px.set_mapbox_access_token()` configuration function)
- 

##### Generally, if your layout.mapbox.style does not use Mapbox service data, you do not need to register for a Mapbox account.



In [None]:
fig.add_trace(fig_scat.data[0])

# fig.write_html("mapbox_scatter-density_plot_deaths.html")


fig

# uncomment to save figure
#fig_scat.write_html("mapbox_scatter_plot_deaths.html")


In [None]:
# png = go.FigureWidget(data=fig)
# png.write_image('mapbox_scatter_plot_deaths.png')

In [None]:
#compare to confirmed

dfc = pd.read_csv('csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv')
dfc['Country/Region']=dfc['Country/Region'].fillna(method="ffill")
# Check that deaths are only increasing
not_monotonic=[]
dfct=dfc.T[4:]
for col in dfct.columns:
    monotonic=dft[col].is_monotonic_increasing
    if monotonic==False:
        print(states[int(col)],  ": INCONSISTANT")
        not_monotonic.append([int(col)])
print("Inconsistancies: {}".format(len(not_monotonic)))



In [None]:
figc = px.density_mapbox(dfc, lat='Lat', lon='Long', z=today, radius=50,
                        center=dict(lat=30, lon=110), zoom=1,
                        mapbox_style="carto-positron")
figc

In [None]:
print("Scatter Plot of Confirmed Cases as of {}".format('3/8/20'))
# fig_scat = px.scatter_mapbox(df, lat="Lat", lon="Long", hover_name="Country/Region" ,hover_data=["Long","Lat", today], color_discrete_sequence=["fuchsia"], zoom=1, height=300)
ifigc_scat = px.scatter_mapbox(dfc[dfc['3/8/20']>0], lat="Lat", lon="Long", hover_name="Country/Region" ,hover_data=["Long","Lat", 'Province/State','3/8/20'],color_discrete_sequence=["fuchsia"],zoom=1, height=600)
ifigc_scat.update_layout(mapbox_style="open-street-map")
ifigc_scat

In [None]:
print("Scatter Plot of Confirmed Cases as of {}".format(today))
# fig_scat = px.scatter_mapbox(df, lat="Lat", lon="Long", hover_name="Country/Region" ,hover_data=["Long","Lat", today], color_discrete_sequence=["fuchsia"], zoom=1, height=300)
figc_scat = px.scatter_mapbox(dfc[dfc[today]>0], lat="Lat", lon="Long", hover_name="Country/Region" ,hover_data=['Province/State',today],color_discrete_sequence=["fuchsia"],zoom=1, height=600)
figc_scat.update_layout(mapbox_style="open-street-map")
figc_scat



In [None]:
figc.add_trace(ifigc_scat.data[0])
# figc.write_html("mapbox_scatter-density_plot_confirmed.html")
figc


In [None]:
#troubleshooting 
#print(vars(figc))
print(figc.layout.coloraxis)

#### Exporting as images
To export the figure/graph as an image, you must have orca installed. The official guide recommends using conda with the command:

`$ conda install -c plotly plotly-orca`

But I had better luck using the npm install: 

`$ npm install -g electron@6.1.4 orca`

In [None]:
# set default export options (otherwise my figures were saved zoomed in and cropped) 

pio.orca.config.default_format="png"    # "png", "jpeg", "webp", "svg", "pdf", or "eps"
pio.orca.config.default_scale=1
pio.orca.config.default_height=800
pio.orca.config.default_width=1200
print(pio.orca.config)

# save default size
pio.orca.config.save()

In [None]:
# Export as image
# pio.write_image(figc, file='mapbox_scatter-density_plot_confirmed.png', format='png')