# COVID-19 Analysis: A Deeper Dive into the Stats
## ( + easy interactive figures with plot.ly) 

3/9/20

According to the "Coronavirus COVID-19 Global Cases by Johns Hopkins CSSE" dashboard, as of right now, there are a total of 113584 confirmed cases, 3996 deaths, and 62496 recovered. From the media reports, it feels like outbreaks are happening at an exponential rate. However, hearing these numbers being thrown everywhere and used to both support or refute the need to panic, I decided to dive deeper into the numbers myself. How reliable are these reports? What do they really say about the threat of the virus? I don't know...
   

Some initial notes/thoughts/findings from browsing the internet:
    - Confirmed cases include presumptive cases
    - Confirmed cases are laboratory-confirmed using PCR 
        - (sidenote: with what experience I've had with QT-PCR, results can be finicky and may vary significantly if proper mixing and sampling isn't done) 
        - According to the WHO daily situation reports, a confirmed case is "A person with laboratory confirmation of COVID-19 infection, irrespective of clinical signs and symptoms."
        - There is no single protocol.
            - Following the link to the laboratory testing page, there are several different protocols coming from several different countries including the US, China, Thailand, etc. 
            - The primers/probe combinations used for different protocols are different. The targets are different... 
        - Recovered patients who have consecutive negative test results test positive after an additional quarentine period?! ["Positive RT-PCR Test Results in Patients Recovered From COVID-19"](https://jamanetwork.com/journals/jama/fullarticle/2762452)
<a href="https://jamanetwork.com/journals/jama/fullarticle/2762452" target="_blank">link</a>

In [1]:
import numpy as np
import pandas as pd
import scipy as sp

import plotly.graph_objects as go 
import plotly.figure_factory as ff
import plotly.express as px
# import plotly.offline as py
# py.init_notebook_mode(connected=True)


pd.set_option("display.min_rows", 15)
pd.set_option("display.max_rows", 101)
pd.set_option("display.max_columns", 101)

In [2]:
import plotly.io as pio  # offline plotting
pio.renderers
pio.renderers.default = 'notebook'
%load_ext autoreload
%autoreload 2

In [3]:
"""
# reload all changed modules before executing a new line
%load_ext autoreload
%autoreload 2

# save figures as static images
fig = go.FigureWidget(data=go.Bar(y=[2, 3, 1]))
fig.write_image('figure.png')
"""

"\n# reload all changed modules before executing a new line\n%load_ext autoreload\n%autoreload 2\n\n# save figures as static images\nfig = go.FigureWidget(data=go.Bar(y=[2, 3, 1]))\nfig.write_image('figure.png')\n"

#### Mapping out the Deaths

Out of all the stats, I would say the number of deaths can be "trusted" most (ie if someone is said to have died from the virus, it is highly probable that they had been infected).


In [4]:
df = pd.read_csv('csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Deaths.csv')
today=df.columns[-1]

# fill in missing values
df['Country/Region']=df['Country/Region'].fillna(method="ffill")
df['Province/State']=df['Province/State'].fillna(value=df['Country/Region'])
df.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,1/30/20,1/31/20,2/1/20,2/2/20,2/3/20,2/4/20,2/5/20,2/6/20,2/7/20,2/8/20,2/9/20,2/10/20,2/11/20,2/12/20,2/13/20,2/14/20,2/15/20,2/16/20,2/17/20,2/18/20,2/19/20,2/20/20,2/21/20,2/22/20,2/23/20,2/24/20,2/25/20,2/26/20,2/27/20,2/28/20,2/29/20,3/1/20,3/2/20,3/3/20,3/4/20,3/5/20,3/6/20,3/7/20,3/8/20,3/9/20,3/10/20,3/11/20,3/12/20
0,Thailand,Thailand,15.0,101.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1
1,Japan,Japan,36.0,138.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,2,4,4,5,6,6,6,6,6,6,6,6,10,10,15,16
2,Singapore,Singapore,1.2833,103.8333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Nepal,Nepal,28.1667,84.25,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Malaysia,Malaysia,2.5,112.5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [5]:
total_deaths=df[today].sum()
print("total deaths as of {} : {} ".format(df.columns[-1],total_deaths))

total deaths as of 3/12/20 : 4720 


In [6]:
locations=df[df.columns[:4]].reset_index()
print(locations.head())

states=df['Province/State'].values

   index Province/State Country/Region      Lat      Long
0      0       Thailand       Thailand  15.0000  101.0000
1      1          Japan          Japan  36.0000  138.0000
2      2      Singapore      Singapore   1.2833  103.8333
3      3          Nepal          Nepal  28.1667   84.2500
4      4       Malaysia       Malaysia   2.5000  112.5000


In [7]:
# Check that deaths are only increasing
not_monotonic=[]
dft=df.T[4:]
for col in dft.columns:
    monotonic=dft[col].is_monotonic_increasing
    if monotonic==False:
        print(states[int(col)],  ": INCONSISTANT\n")
        not_monotonic.append({int(col):[int(col),dft[col]]})
print("Inconsistancies: {}".format(len(not_monotonic)))
# print(not_monotonic)
# fill in countries
df['Country/Region']=df['Country/Region'].fillna(method="ffill")
df['Province/State']=df['Province/State'].fillna(value=df['Country/Region'])


Lee County, FL : INCONSISTANT

Grant County, WA : INCONSISTANT

Santa Rosa County, FL : INCONSISTANT

Placer County, CA : INCONSISTANT

Snohomish County, WA : INCONSISTANT

King County, WA : INCONSISTANT

Inconsistancies: 6


#### Notes:
- Several (6) US states report decreases in the number of deaths in the beginning of March, which doesn't make sense
- Between the last day in Feb to Mar 9, the deaths in King County, WA rose from 0 to 17 but fell to 0 on 3/10/20


In [8]:
# day to day differences 
daily_changes=df[df.columns[4:-2]].diff(axis=1)
daily_changes['sum_diff']=daily_changes.sum(axis=1)
daily_changes=daily_changes.reset_index().merge(locations, left_on='index', right_on='index')
daily_changes=daily_changes[daily_changes['sum_diff']>0].set_index('index')
print(daily_changes[daily_changes.columns[-6:]].head())

       3/10/20  sum_diff    Province/State Country/Region      Lat      Long
index                                                                       
0          0.0       1.0          Thailand       Thailand  15.0000  101.0000
1          0.0      10.0             Japan          Japan  36.0000  138.0000
5          0.0       1.0  British Columbia         Canada  49.2827 -123.1207
6          0.0       2.0   New South Wales      Australia -33.8688  151.2093
11         0.0       2.0           Germany        Germany  51.0000    9.0000


In [9]:
# trend (2nd order diff) - are the rates increasing or decreasing?
trend=daily_changes[daily_changes.columns[2:-5]].diff(axis=1)
trend=trend.reset_index().merge(daily_changes[daily_changes.columns[-5:]].reset_index(), left_on='index', right_on='index').set_index('index')
trend.drop(['1/24/20','sum_diff'], axis=1, inplace=True)
trend.head()

Unnamed: 0_level_0,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,1/30/20,1/31/20,2/1/20,2/2/20,2/3/20,2/4/20,2/5/20,2/6/20,2/7/20,2/8/20,2/9/20,2/10/20,2/11/20,2/12/20,2/13/20,2/14/20,2/15/20,2/16/20,2/17/20,2/18/20,2/19/20,2/20/20,2/21/20,2/22/20,2/23/20,2/24/20,2/25/20,2/26/20,2/27/20,2/28/20,2/29/20,3/1/20,3/2/20,3/3/20,3/4/20,3/5/20,3/6/20,3/7/20,3/8/20,3/9/20,3/10/20,Province/State,Country/Region,Lat,Long
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,-1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Thailand,Thailand,15.0,101.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,-1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,-2.0,1.0,0.0,-1.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,-4.0,Japan,Japan,36.0,138.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,-1.0,British Columbia,Canada,49.2827,-123.1207
6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,-1.0,0.0,0.0,1.0,-1.0,0.0,New South Wales,Australia,-33.8688,151.2093
11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,-2.0,Germany,Germany,51.0,9.0


In [10]:
# merging strings from 2 columns using '|' as a separator
place_idx=trend[trend["Province/State"]!=trend["Country/Region"]][["Province/State","Country/Region"]].agg('|'.join, axis=1)
place_idx

index
5           British Columbia|Canada
6         New South Wales|Australia
50      Western Australia|Australia
100                   Washington|US
102                   California|US
108                      Florida|US
109                   New Jersey|US
156                     Hubei|China
160                 Guangdong|China
161                     Henan|China
162                  Zhejiang|China
163                     Hunan|China
164                     Anhui|China
165                   Jiangxi|China
166                  Shandong|China
167    Diamond Princess|Cruise Ship
169                 Chongqing|China
170                   Sichuan|China
171              Heilongjiang|China
172               UK|United Kingdom
174                   Beijing|China
175                  Shanghai|China
176                     Hebei|China
177                    Fujian|China
178                   Guangxi|China
179                   Shaanxi|China
180                    Yunnan|China
181                   

In [11]:
# df['bin_lat']=pd.cut(df['Lat'], bins=18)
# df['bin_long']=pd.cut(df['Long'], bins=18)
df['bin_lat'],blat=pd.cut(df['Lat'], bins=np.linspace(-180, 180, 36), precision=0,retbins=True)
df['bin_long'],blong=pd.cut(df['Long'], bins=np.linspace(-180, 180, 36),precision=0,retbins=True)

df.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,1/30/20,1/31/20,2/1/20,2/2/20,2/3/20,2/4/20,2/5/20,2/6/20,2/7/20,2/8/20,2/9/20,2/10/20,2/11/20,2/12/20,2/13/20,2/14/20,2/15/20,2/16/20,2/17/20,2/18/20,2/19/20,2/20/20,2/21/20,2/22/20,2/23/20,2/24/20,2/25/20,2/26/20,2/27/20,2/28/20,2/29/20,3/1/20,3/2/20,3/3/20,3/4/20,3/5/20,3/6/20,3/7/20,3/8/20,3/9/20,3/10/20,3/11/20,3/12/20,bin_lat,bin_long
0,Thailand,Thailand,15.0,101.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,"(5.0, 15.0]","(98.0, 108.0]"
1,Japan,Japan,36.0,138.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,2,4,4,5,6,6,6,6,6,6,6,6,10,10,15,16,"(26.0, 36.0]","(129.0, 139.0]"
2,Singapore,Singapore,1.2833,103.8333,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"(-5.0, 5.0]","(98.0, 108.0]"
3,Nepal,Nepal,28.1667,84.25,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"(26.0, 36.0]","(77.0, 87.0]"
4,Malaysia,Malaysia,2.5,112.5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"(-5.0, 5.0]","(108.0, 118.0]"


In [12]:
bin_idx=df[['Province/State','bin_long','bin_lat']]
bin_idx['long']=bin_idx['bin_long'].map({i:i.mid for i in bin_idx['bin_long']})

bin_idx['lat']=bin_idx['bin_lat'].map({i:i.mid for i in bin_idx['bin_lat']})
print(bin_idx.head())

  Province/State        bin_long       bin_lat   long   lat
0       Thailand   (98.0, 108.0]   (5.0, 15.0]  103.0  10.0
1          Japan  (129.0, 139.0]  (26.0, 36.0]  134.0  31.0
2      Singapore   (98.0, 108.0]   (-5.0, 5.0]  103.0   0.0
3          Nepal    (77.0, 87.0]  (26.0, 36.0]   82.0  31.0
4       Malaysia  (108.0, 118.0]   (-5.0, 5.0]  113.0   0.0


In [13]:
# Stats for the first day this analysis began (just for reference)
initial=df.groupby(['bin_long','bin_lat'])['3/8/20'].sum().sort_values(ascending=False).dropna().reset_index() #.fillna(0)
print(initial.head(20))

initial_total=df['3/8/20'].sum()
print('total deaths as of 3/8/20 (start of analysis): ', initial_total)

            bin_long       bin_lat  3/8/20
0     (108.0, 118.0]  (26.0, 36.0]  3021.0
1        (5.0, 15.0]  (36.0, 46.0]   367.0
2       (46.0, 57.0]  (26.0, 36.0]   194.0
3     (118.0, 129.0]  (26.0, 36.0]    54.0
4        (-5.0, 5.0]  (36.0, 46.0]    36.0
5     (108.0, 118.0]  (36.0, 46.0]    24.0
6     (108.0, 118.0]  (15.0, 26.0]    18.0
7   (-129.0, -118.0]  (46.0, 57.0]    18.0
8     (118.0, 129.0]  (46.0, 57.0]    13.0
9      (98.0, 108.0]  (26.0, 36.0]    11.0
10    (129.0, 139.0]  (26.0, 36.0]     6.0
11      (36.0, 46.0]  (26.0, 36.0]     6.0
12    (139.0, 149.0]  (26.0, 36.0]     6.0
13       (5.0, 15.0]  (46.0, 57.0]     5.0
14      (77.0, 87.0]  (36.0, 46.0]     3.0
15       (-5.0, 5.0]  (46.0, 57.0]     3.0
16     (98.0, 108.0]  (36.0, 46.0]     2.0
17    (-87.0, -77.0]  (26.0, 36.0]     2.0
18     (98.0, 108.0]  (15.0, 26.0]     2.0
19    (118.0, 129.0]  (36.0, 46.0]     2.0
total deaths as of 3/8/20 (start of analysis):  3802


In [14]:
# today=df.columns[-3]

latest=df.groupby(['bin_lat','bin_long'])[today].sum().reset_index().sort_values(today,ascending=False).dropna() #.fillna(0)
# latest=df.groupby(['bin_lat','bin_long']).agg({today:['sum']})#.reset_index().sort_values(today,ascending=False).dropna() #.fillna(0)
# .set_index('Country/Region',append=True)
print(latest.head())
print(latest.tail())

print('\n\nRemoving 0s')
latest=latest[latest[today]>0]
print(latest.head())
print(latest.tail())

          bin_lat        bin_long  3/12/20
728  (26.0, 36.0]  (108.0, 118.0]   3092.0
753  (36.0, 46.0]     (5.0, 15.0]    830.0
722  (26.0, 36.0]    (46.0, 57.0]    429.0
752  (36.0, 46.0]     (-5.0, 5.0]    103.0
729  (26.0, 36.0]  (118.0, 129.0]     70.0
          bin_lat          bin_long  3/12/20
675  (15.0, 26.0]    (-77.0, -67.0]      0.0
674  (15.0, 26.0]    (-87.0, -77.0]      0.0
672  (15.0, 26.0]   (-108.0, -98.0]      0.0
667  (15.0, 26.0]  (-159.0, -149.0]      0.0
831  (57.0, 67.0]      (87.0, 98.0]      0.0


Removing 0s
          bin_lat        bin_long  3/12/20
728  (26.0, 36.0]  (108.0, 118.0]   3092.0
753  (36.0, 46.0]     (5.0, 15.0]    830.0
722  (26.0, 36.0]    (46.0, 57.0]    429.0
752  (36.0, 46.0]     (-5.0, 5.0]    103.0
729  (26.0, 36.0]  (118.0, 129.0]     70.0
            bin_lat        bin_long  3/12/20
518  (-36.0, -26.0]  (108.0, 118.0]      1.0
824    (57.0, 67.0]    (15.0, 26.0]      1.0
657     (5.0, 15.0]   (98.0, 108.0]      1.0
623     (-5.0, 5.0] 

In [15]:

countries=list(df.groupby('Country/Region').groups.keys())
print(countries)



['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Argentina', 'Armenia', 'Australia', 'Austria', 'Azerbaijan', 'Bahrain', 'Bangladesh', 'Belarus', 'Belgium', 'Bhutan', 'Bolivia', 'Bosnia and Herzegovina', 'Brazil', 'Brunei', 'Bulgaria', 'Burkina Faso', 'Cambodia', 'Cameroon', 'Canada', 'Chile', 'China', 'Colombia', 'Congo (Kinshasa)', 'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cruise Ship', 'Cuba', 'Cyprus', 'Czechia', 'Denmark', 'Dominican Republic', 'Ecuador', 'Egypt', 'Estonia', 'Finland', 'France', 'French Guiana', 'Georgia', 'Germany', 'Greece', 'Guyana', 'Holy See', 'Honduras', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iran', 'Iraq', 'Ireland', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jordan', 'Korea, South', 'Kuwait', 'Latvia', 'Lebanon', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Malaysia', 'Maldives', 'Malta', 'Martinique', 'Mexico', 'Moldova', 'Monaco', 'Mongolia', 'Morocco', 'Nepal', 'Netherlands', 'New Zealand', 'Nigeria', 'North Macedonia', 'Norway', 'Oman', 'Pakistan'

In [16]:
# Group cumulative stats by lat/long coordinates since the number of deaths by province/state is sparse

latest['long']=latest['bin_long'].map({i:i.mid for i in latest['bin_long']})

latest['lat']=latest['bin_lat'].map({i:i.mid for i in latest['bin_lat']})

print(latest)
max_country=df[df[today]==df[today].max()]['Country/Region'].values[0]
max_province=df.iloc[df[df[today]==df[today].max()]['Country/Region'].index]['Province/State'].values[0]
print('\n Maximum death toll: {} in {}, {}\n'.format(df[today].max(),max_province, max_country))
total_deaths=df[today].sum()
print('Total deaths: ', total_deaths)

            bin_lat          bin_long  3/12/20   long   lat
728    (26.0, 36.0]    (108.0, 118.0]   3092.0  113.0  31.0
753    (36.0, 46.0]       (5.0, 15.0]    830.0   10.0  41.0
722    (26.0, 36.0]      (46.0, 57.0]    429.0   51.5  31.0
752    (36.0, 46.0]       (-5.0, 5.0]    103.0    0.0  41.0
729    (26.0, 36.0]    (118.0, 129.0]     70.0  123.5  31.0
775    (46.0, 57.0]  (-129.0, -118.0]     32.0 -123.5  51.5
763    (36.0, 46.0]    (108.0, 118.0]     24.0  113.0  41.0
693    (15.0, 26.0]    (108.0, 118.0]     19.0  113.0  20.5
730    (26.0, 36.0]    (129.0, 139.0]     16.0  134.0  31.0
788    (46.0, 57.0]       (5.0, 15.0]     13.0   10.0  51.5
799    (46.0, 57.0]    (118.0, 129.0]     13.0  123.5  51.5
787    (46.0, 57.0]       (-5.0, 5.0]     11.0    0.0  51.5
727    (26.0, 36.0]     (98.0, 108.0]     11.0  103.0  31.0
721    (26.0, 36.0]      (36.0, 46.0]      8.0   41.0  31.0
731    (26.0, 36.0]    (139.0, 149.0]      7.0  144.0  31.0
720    (26.0, 36.0]      (26.0, 36.0]   

In [47]:
# Global density plot of Deaths
latest['log_scaled_deaths']=np.log(latest[today])

fig = px.density_mapbox(latest, lat='lat', lon='long', z='log_scaled_deaths', title="Map of Death Counts (Binned)",  hover_name=today ,hover_data=["log_scaled_deaths",today], color_continuous_scale="Temps",radius=25,
                        center=dict(lat=30, lon=110), zoom=1,
                        mapbox_style="carto-positron")
fig

In [18]:
# fig_mod = go.Figure(fig)

# max_deaths=max(latest['scaled_deaths'])
# fig_mod.update_layout(hovertext='today')
# fig_mod.show()

In [48]:
# #scatter
# import math
# hover_text = []
# bubble_size = []

# for index, row in df.iterrows():
#     hover_text.append(('Country/Region: {country}<br>'+
#                       'Date: {date}<br>'+
#                       'Number of Deaths: {death}<br>').format(country=df["Country/Region"],
#                                             date=today,
#                                             death=row[today]))
#     bubble_size.append(math.sqrt(row[today]))

# df['text'] = hover_text
# df['size'] = bubble_size
# sizeref = 2.*max(df['size'])/(100**2)

print("Scatter Plot of Countries/Regions where Deaths>0 as of {}".format(today))
df['latest_str']=df[today].astype(str)
df['hovernames']=df[["Province/State","Country/Region"]].fillna('').agg('\n'.join, axis=1)
df['hovernames']=df[["hovernames","latest_str"]].agg(' | '.join, axis=1)
# fig_scat = px.scatter_mapbox(df, lat="Lat", lon="Long", hover_name="Country/Region" ,hover_data=["Long","Lat", today], color_discrete_sequence=["fuchsia"], zoom=1, height=300)
# fig_scat = px.scatter_mapbox(df[df[today]>0], lat="Lat", lon="Long", hover_name="Country/Region" ,hover_data=["Long","Lat", 'Province/State',today],color=today,color_continuous_scale="Temps",zoom=1, height=600)
fig_scat = px.scatter_mapbox(df[df[today]>0], lat="Lat", lon="Long", hover_name="hovernames" ,hover_data=[today,"Long","Lat", 'Province/State'],color_discrete_sequence=["fuchsia"],zoom=1)

fig_scat.update_layout(mapbox_style="open-street-map")
fig_scat  #.show()

# fig_mod = go.Figure(fig_scat)
# fig_mod.update_layout(hovertext='today')


Scatter Plot of Countries/Regions where Deaths>0 as of 3/12/20


#### Alternative Mapbox Styles (raster tiles)
- maps that do not require an API token: 
    - `mapbox_style`=`"open-street-map"`, `"carto-positron"`, `"carto-darkmatter"`, `"stamen-terrain"`, `"stamen-toner"`, or `"stamen-watercolor" `
    - Base Tiles from the USGS: 
    ```
    fig.update_layout(
        mapbox_style="white-bg",
        mapbox_layers=[
            {
                "below": 'traces',
                "sourcetype": "raster",
                "source": [
                    "https://basemap.nationalmap.gov/arcgis/rest/services/USGSImageryOnly/MapServer/tile/{z}/{y}/{x}"
                ]
            }
          ])```
- maps that require a signup or token: "basic", "streets", "outdoors", "light", "dark", "satellite", or "satellite-streets"
    - Base Tiles from the USGS, radar overlay from Environment Canada: no token needed:
    ``` 
    fig.update_layout(
        mapbox_style="white-bg",
        mapbox_layers=[
            {
                "below": 'traces',
                "sourcetype": "raster",
                "source": [
                    "https://basemap.nationalmap.gov/arcgis/rest/services/USGSImageryOnly/MapServer/tile/{z}/{y}/{x}"
                ]
            },
            {
                "sourcetype": "raster",
                "source": ["https://geo.weather.gc.ca/geomet/?"
                           "SERVICE=WMS&VERSION=1.3.0&REQUEST=GetMap&BBOX={bbox-epsg-3857}&CRS=EPSG:3857"
                           "&WIDTH=1000&HEIGHT=1000&LAYERS=RADAR_1KM_RDBR&TILED=true&FORMAT=image/png"],
            }
          ]) ```
    - to provide token, set `layout.mapbox.access_token` (or, if using Plotly Express, via the `px.set_mapbox_access_token()` configuration function)
- 

##### Generally, if your layout.mapbox.style does not use Mapbox service data, you do not need to register for a Mapbox account.



In [20]:
vars(fig_scat)

{'_grid_str': 'This is the format of your plot grid:\n[ (1,1) mapbox ]\n',
 '_grid_ref': [[(SubplotRef(subplot_type='mapbox', layout_keys=('mapbox',), trace_kwargs={'subplot': 'mapbox'}),)]],
 '_data_validator': <plotly.validators.DataValidator at 0x7fd3b8c727f0>,
 '_data_objs': [Scattermapbox({
      'customdata': array([[101.0, 15.0, 'Thailand', 1],
                           [138.0, 36.0, 'Japan', 16],
                           [-123.1207, 49.2827, 'British Columbia', 1],
                           ...,
                           [113.9448, 44.0935, 'Inner Mongolia', 1],
                           [121.0, 23.7, 'Taiwan*', 1],
                           [-58.75, 5.0, 'Guyana', 1]], dtype=object),
      'hoverlabel': {'namelength': 0},
      'hovertemplate': ('<b>%{hovertext}</b><br><br>Lon' ... ']}<br>3/12/20=%{customdata[3]}'),
      'hovertext': array(['Thailand', 'Japan', 'Canada', 'Australia', 'Germany', 'Philippines',
                          'India', 'Italy', 'Sweden', 'Spain

In [49]:
fig.add_trace(fig_scat.data[0])

fig.write_html("mapbox_scatter-density_plot_deaths.html")
# uncomment to save figure
#fig_scat.write_html("mapbox_scatter_plot_deaths.html")


In [50]:
fig
# png = go.FigureWidget(data=fig)
# png.write_image('mapbox_scatter_plot_deaths.png')

In [23]:
#compare to confirmed

dfc = pd.read_csv('csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv')
dfc['Country/Region']=dfc['Country/Region'].fillna(method="ffill")
# Check that deaths are only increasing
not_monotonic=[]
dfct=dfc.T[4:]
for col in dfct.columns:
    monotonic=dft[col].is_monotonic_increasing
    if monotonic==False:
        print(states[int(col)],  ": INCONSISTANT")
        not_monotonic.append([int(col)])
print("Inconsistancies: {}".format(len(not_monotonic)))



Lee County, FL : INCONSISTANT
Grant County, WA : INCONSISTANT
Santa Rosa County, FL : INCONSISTANT
Placer County, CA : INCONSISTANT
Snohomish County, WA : INCONSISTANT
King County, WA : INCONSISTANT
Inconsistancies: 6


In [24]:
figc_title="Scatter Plot of Confirmed Cases as of {}".format(today)

figc = px.density_mapbox(dfc, lat='Lat', lon='Long', z=today, radius=50,
                        center=dict(lat=30, lon=110), zoom=1, title=figc_title,
                        mapbox_style="carto-positron",color_continuous_scale="Temps")
figc

In [25]:
figc_title="Scatter Plot of Confirmed Cases as of {}".format(today)

dfc['log_scaled_deaths']=np.log(dfc[today])
figc2 = px.density_mapbox(dfc, lat='Lat', lon='Long', z='log_scaled_deaths', radius=25,
                        center=dict(lat=30, lon=110), zoom=1, title=figc_title,
                        mapbox_style="carto-positron",color_continuous_scale="Temps")
figc2


divide by zero encountered in log



In [51]:
print("Scatter Plot of Confirmed Cases as of {}".format('3/8/20'))
# fig_scat = px.scatter_mapbox(df, lat="Lat", lon="Long", hover_name="Country/Region" ,hover_data=["Long","Lat", today], color_discrete_sequence=["fuchsia"], zoom=1, height=300)
ifigc_scat = px.scatter_mapbox(dfc[dfc['3/8/20']>0], lat="Lat", lon="Long", hover_name="Country/Region" ,hover_data=["Long","Lat", 'Province/State','3/8/20'],color_discrete_sequence=["blue"],zoom=1)
ifigc_scat.update_layout(mapbox_style="open-street-map")
ifigc_scat

Scatter Plot of Confirmed Cases as of 3/8/20


In [52]:
figc_title="Scatter Plot of Confirmed Cases as of {}".format(today)
dfc['latest_str']=dfc[today].astype(str)
dfc['hovernames']=dfc[["Province/State","Country/Region"]].fillna('').agg('\n'.join, axis=1)
dfc['hovernames']=dfc[["hovernames","latest_str"]].agg(' | '.join, axis=1)
# print(dfc.head())
# fig_scat = px.scatter_mapbox(df, lat="Lat", lon="Long", hover_name="Country/Region" ,hover_data=["Long","Lat", today], color_discrete_sequence=["fuchsia"], zoom=1, height=300)
figc_scat = px.scatter_mapbox(dfc[dfc[today]>0], lat="Lat", lon="Long", title=figc_title, hover_name="hovernames" ,hover_data=[today,],color_discrete_sequence=["blue"],zoom=1)
figc_scat.update_layout(mapbox_style="open-street-map")
figc_scat



In [54]:
figc2.add_trace(figc_scat.data[0])
# figc2_fn="mapbox_scatter-density_plot_confirmed-{}.html".format(today)
figc2.write_html("mapbox_scatter-density_plot_confirmed.html")
figc2


In [35]:
#troubleshooting 
#print(vars(figc))
print(figc.layout.coloraxis)

layout.Coloraxis({
    'colorbar': {'title': {'text': '3/12/20'}},
    'colorscale': [[0.0, 'rgb(0, 147, 146)'], [0.16666666666666666, 'rgb(57, 177,
                   133)'], [0.3333333333333333, 'rgb(156, 203, 134)'], [0.5,
                   'rgb(233, 226, 156)'], [0.6666666666666666, 'rgb(238, 180,
                   121)'], [0.8333333333333334, 'rgb(232, 132, 113)'], [1.0,
                   'rgb(207, 89, 126)']]
})


#### Exporting as images
To export the figure/graph as an image, you must have orca installed. The official guide recommends using conda with the command:

`$ conda install -c plotly plotly-orca`

But I had better luck using the npm install: 

`$ npm install -g electron@6.1.4 orca`

In [36]:
# # set default export options (otherwise my figures were saved zoomed in and cropped) 

# pio.orca.config.default_format="png"    # "png", "jpeg", "webp", "svg", "pdf", or "eps"
# pio.orca.config.default_scale=1
# pio.orca.config.default_height=800
# pio.orca.config.default_width=1200
# print(pio.orca.config)

# # save default size
# pio.orca.config.save()

orca configuration
------------------
    server_url: None
    executable: orca
    port: None
    timeout: None
    default_width: 1200
    default_height: 800
    default_scale: 1
    default_format: png
    mathjax: https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js
    topojson: None
    mapbox_access_token: None
    use_xvfb: auto

constants
---------
    plotlyjs: /Users/kaixiwang/.virtualenvs/COVID-19/lib/python3.6/site-packages/plotly/package_data/plotly.min.js
    config_file: /Users/kaixiwang/.plotly/.orca




In [37]:
# Export as image
# pio.write_image(figc, file='mapbox_scatter-density_plot_confirmed.png', format='png')