# COVID-19 Analysis: A Deeper Dive into the Stats
## ( + easy interactive figures with plot.ly) 

3/9/20

According to the "Coronavirus COVID-19 Global Cases by Johns Hopkins CSSE" dashboard, as of right now, there are a total of 113584 confirmed cases, 3996 deaths, and 62496 recovered. From the media reports, it feels like outbreaks are happening at an exponential rate. However, hearing these numbers being thrown everywhere and used to both support or refute the need to panic, I decided to dive deeper into the numbers myself. How reliable are these reports? What do they really say about the threat of the virus? I don't know...
   

Some initial notes/thoughts/findings from browsing the internet:
    - Confirmed cases include presumptive cases
    - Confirmed cases are laboratory-confirmed using PCR 
        - (sidenote: with what experience I've had with QT-PCR, results can be finicky and may vary significantly if proper mixing and sampling isn't done) 
        - According to the WHO daily situation reports, a confirmed case is "A person with laboratory confirmation of COVID-19 infection, irrespective of clinical signs and symptoms."
        - There is no single protocol.
            - Following the link to the laboratory testing page, there are several different protocols coming from several different countries including the US, China, Thailand, etc. 
            - The primers/probe combinations used for different protocols are different. The targets are different... 
        - Recovered patients who have consecutive negative test results test positive after an additional quarentine period?! ["Positive RT-PCR Test Results in Patients Recovered From COVID-19"](https://jamanetwork.com/journals/jama/fullarticle/2762452)

In [150]:
import numpy as np
import pandas as pd
import scipy as sp

import plotly.graph_objects as go 
import plotly.figure_factory as ff
import plotly.express as px
# This package is optional  for figures to be uploaded to the Chart Studio cloud service.
# import chart_studio.plotly as py    

pd.set_option("display.max_rows", 101)
pd.set_option("display.max_columns", 101)

In [151]:
"""
# reload all changed modules before executing a new line
%load_ext autoreload
%autoreload 2

# save figures as static images
fig = go.FigureWidget(data=go.Bar(y=[2, 3, 1]))
fig.write_image('figure.png')
"""

"\n# reload all changed modules before executing a new line\n%load_ext autoreload\n%autoreload 2\n\n# save figures as static images\nfig = go.FigureWidget(data=go.Bar(y=[2, 3, 1]))\nfig.write_image('figure.png')\n"

#### Mapping out the Deaths

Out of all the stats, I would say the number of deaths can be "trusted" most (ie if someone is said to have died from the virus, it is highly probable that they had been infected).


In [153]:
df = pd.read_csv('csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Deaths.csv')

In [154]:
total_deaths=df[df.columns[-1]].sum()
print("total deaths as of {} : {} ".format(df.columns[-1],total_deaths))

total deaths as of 3/8/20 : 3803 


In [155]:
# Check that deaths are only increasing
not_monotonic=[]
dft=df.T[4:]
for col in dft.columns:
    monotonic=dft[col].is_monotonic_increasing
    if monotonic==False:
        print(states[int(col)],  ": INCONSISTANT")
        not_monotonic.append([int(col)])
print("Inconsistancies: {}".format(len(not_monotonic)))

# fill in countries
df['Country/Region']=df['Country/Region'].fillna(method="ffill")

Inconsistancies: 0


In [156]:
# df['bin_lat']=pd.cut(df['Lat'], bins=18)
# df['bin_long']=pd.cut(df['Long'], bins=18)
df['bin_lat'],blat=pd.cut(df['Lat'], bins=np.linspace(-180, 180, 18), precision=0,retbins=True)
df['bin_long'],blong=pd.cut(df['Long'], bins=np.linspace(-180, 180, 18),precision=0,retbins=True)

df.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,1/30/20,1/31/20,2/1/20,2/2/20,2/3/20,2/4/20,2/5/20,2/6/20,2/7/20,2/8/20,2/9/20,2/10/20,2/11/20,2/12/20,2/13/20,2/14/20,2/15/20,2/16/20,2/17/20,2/18/20,2/19/20,2/20/20,2/21/20,2/22/20,2/23/20,2/24/20,2/25/20,2/26/20,2/27/20,2/28/20,2/29/20,3/1/20,3/2/20,3/3/20,3/4/20,3/5/20,3/6/20,3/7/20,3/8/20,bin_lat,bin_long
0,Anhui,Mainland China,31.8257,117.2264,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,3,4,4,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,"(32.0, 53.0]","(116.0, 138.0]"
1,Beijing,Mainland China,40.1824,116.4142,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,5,7,8,8,8,8,8,8,8,8,8,"(32.0, 53.0]","(95.0, 116.0]"
2,Chongqing,Mainland China,30.0572,107.874,0,0,0,0,0,0,0,0,0,0,1,2,2,2,2,2,2,2,2,2,3,3,4,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,"(11.0, 32.0]","(95.0, 116.0]"
3,Fujian,Mainland China,26.0789,117.9874,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,"(11.0, 32.0]","(116.0, 138.0]"
4,Gansu,Mainland China,36.0611,103.8343,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,"(32.0, 53.0]","(95.0, 116.0]"


In [157]:
#centers=[int(np.average([blat[i],blat[i-1]])) for i in np.arange(1,18)]

In [158]:
latest=df.groupby(['bin_lat','bin_long'])['3/8/20'].sum().sort_values(ascending=False).dropna().reset_index() #.fillna(0)
latest.head()

Unnamed: 0,bin_lat,bin_long,3/8/20
0,"(11.0, 32.0]","(95.0, 116.0]",3023.0
1,"(32.0, 53.0]","(11.0, 32.0]",367.0
2,"(32.0, 53.0]","(53.0, 74.0]",194.0
3,"(32.0, 53.0]","(116.0, 138.0]",80.0
4,"(32.0, 53.0]","(-11.0, 11.0]",41.0


In [159]:
# {i:i.mid for i in latest['bin_lat']}

In [160]:

countries=list(df.groupby('Country/Region').groups.keys())
print(countries)


['Afghanistan', 'Algeria', 'Andorra', 'Argentina', 'Armenia', 'Australia', 'Austria', 'Azerbaijan', 'Bahrain', 'Bangladesh', 'Belarus', 'Belgium', 'Bhutan', 'Bosnia and Herzegovina', 'Brazil', 'Bulgaria', 'Cambodia', 'Cameroon', 'Canada', 'Chile', 'Colombia', 'Costa Rica', 'Croatia', 'Czech Republic', 'Denmark', 'Dominican Republic', 'Ecuador', 'Egypt', 'Estonia', 'Faroe Islands', 'Finland', 'France', 'French Guiana', 'Georgia', 'Germany', 'Gibraltar', 'Greece', 'Hong Kong', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iran', 'Iraq', 'Ireland', 'Israel', 'Italy', 'Japan', 'Jordan', 'Kuwait', 'Latvia', 'Lebanon', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Macau', 'Mainland China', 'Malaysia', 'Maldives', 'Malta', 'Martinique', 'Mexico', 'Moldova', 'Monaco', 'Morocco', 'Nepal', 'Netherlands', 'New Zealand', 'Nigeria', 'North Macedonia', 'Norway', 'Oman', 'Others', 'Pakistan', 'Palestine', 'Paraguay', 'Peru', 'Philippines', 'Poland', 'Portugal', 'Qatar', 'Republic of Ireland', 'Romania'

In [161]:
# Group cumulative stats by lat/long coordinates since the number of deaths by province/state is sparse

latest['long']=latest['bin_long'].map({i:i.mid for i in latest['bin_long']})

latest['lat']=latest['bin_lat'].map({i:i.mid for i in latest['bin_lat']})
latest.head()

Unnamed: 0,bin_lat,bin_long,3/8/20,long,lat
0,"(11.0, 32.0]","(95.0, 116.0]",3023.0,105.5,21.5
1,"(32.0, 53.0]","(11.0, 32.0]",367.0,21.5,42.5
2,"(32.0, 53.0]","(53.0, 74.0]",194.0,63.5,42.5
3,"(32.0, 53.0]","(116.0, 138.0]",80.0,127.0,42.5
4,"(32.0, 53.0]","(-11.0, 11.0]",41.0,0.0,42.5


In [162]:
states=df['Province/State'].values

In [164]:
# Global density plot of Deaths
fig = px.density_mapbox(latest, lat='lat', lon='long', z='3/8/20', radius=100,
                        center=dict(lat=30, lon=110), zoom=1,
                        mapbox_style="stamen-terrain")
fig.show()

In [165]:
# uncomment to save figure
# fig.write_html("mapbox_density_plot.html")

In [167]:
#compare to confirmed

dfc = pd.read_csv('csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv')
dfc['Country/Region']=dfc['Country/Region'].fillna(method="ffill")
# Check that deaths are only increasing
not_monotonic=[]
dfct=dfc.T[4:]
for col in dfct.columns:
    monotonic=dft[col].is_monotonic_increasing
    if monotonic==False:
        print(states[int(col)],  ": INCONSISTANT")
        not_monotonic.append([int(col)])
print("Inconsistancies: {}".format(len(not_monotonic)))



Anhui : INCONSISTANT
Beijing : INCONSISTANT
Chongqing : INCONSISTANT
Fujian : INCONSISTANT
Gansu : INCONSISTANT
Guangdong : INCONSISTANT
Guangxi : INCONSISTANT
Guizhou : INCONSISTANT
Hainan : INCONSISTANT
Hebei : INCONSISTANT
Heilongjiang : INCONSISTANT
Henan : INCONSISTANT
Hubei : INCONSISTANT
Hunan : INCONSISTANT
Inner Mongolia : INCONSISTANT
Jiangsu : INCONSISTANT
Jiangxi : INCONSISTANT
Jilin : INCONSISTANT
Liaoning : INCONSISTANT
Ningxia : INCONSISTANT
Qinghai : INCONSISTANT
Shaanxi : INCONSISTANT
Shandong : INCONSISTANT
Shanghai : INCONSISTANT
Shanxi : INCONSISTANT
Sichuan : INCONSISTANT
Tianjin : INCONSISTANT
Tibet : INCONSISTANT
Xinjiang : INCONSISTANT
Yunnan : INCONSISTANT
Zhejiang : INCONSISTANT
nan : INCONSISTANT
nan : INCONSISTANT
nan : INCONSISTANT
Taiwan : INCONSISTANT
King County, WA : INCONSISTANT
Cook County, IL : INCONSISTANT
Macau : INCONSISTANT
Hong Kong : INCONSISTANT
nan : INCONSISTANT
nan : INCONSISTANT
nan : INCONSISTANT
nan : INCONSISTANT
nan : INCONSISTANT
Toro

In [168]:
figc = px.density_mapbox(dfc, lat='Lat', lon='Long', z='3/8/20', radius=100,
                        center=dict(lat=30, lon=110), zoom=1,
                        mapbox_style="stamen-terrain")
figc.show()