# Playing with Coronavirus Timeseries -v3

- https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset


- 07/10/20
- James M. Irving, Ph.D.

## Notes:

- This notebook is the 3rd iteration of the dashboard.
- I will only run minimal code needed to get the data to begin visualizing.

- The plan is to add additional information/plots

In [1]:
!pip install -U fsds
from fsds.imports import *

fsds v0.2.22 loaded.  Read the docs: https://fs-ds.readthedocs.io/en/latest/ 


Handle,Package,Description
dp,IPython.display,Display modules with helpful display and clearing commands.
fs,fsds,Custom data science bootcamp student package
mpl,matplotlib,Matplotlib's base OOP module with formatting artists
plt,matplotlib.pyplot,Matplotlib's matlab-like plotting module
np,numpy,scientific computing with Python
pd,pandas,High performance data structures and tools
sns,seaborn,High-level data visualization library based on matplotlib


[i] Pandas .iplot() method activated.


In [2]:
import os,glob,sys
import re
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.templates.default = "plotly_dark"

import cufflinks as cf
cf.go_offline()
cf.set_config_file(sharing='public',theme='solar',offline=True)

import functions as fn

%load_ext autoreload
%autoreload 2

# Main Kaggle Dataset - Get US States

# 📦class `CoronaData`

In [3]:
corona = fn.CoronaData(verbose=True,run_workflow=True)
corona

[i] DOWNLOADING DATA USING KAGGLE API
	https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset
	- Downloaded dataset .zip and extracted to:"New Data/"
	- Extraction Complete.


Unnamed: 0,Date,Province/State,Country/Region,Confirmed,Deaths,Recovered
0,2020-01-22,Anhui,Mainland China,1.0,0.0,0.0
1,2020-01-22,Beijing,Mainland China,14.0,0.0,0.0
2,2020-01-22,Chongqing,Mainland China,6.0,0.0,0.0
3,2020-01-22,Fujian,Mainland China,1.0,0.0,0.0
4,2020-01-22,Gansu,Mainland China,0.0,0.0,0.0


[i] There are 223 countries in the datatset
[i] Dates Covered:
	From 01-22-2020 to 07-08-2020


------------------------------------------------------------
[i] CovidTrackingProject Contents:
------------------------------------------------------------

METHODS:
	calculate_per_capita
	download_coronavirus_data
	get_and_clean_US
	get_data_fpath
	get_group_ts
	load_raw_df
	load_us_reference_info
	set_datetime_index

ATTRIBUTES
	STATES
	df
	df_us
	raw_df
	reference_data

In [4]:
df_world = corona.df

In [5]:
# pfig = fn.plot_group_ts(corona.df,group_list=['US','Italy','Canada',
#                                   'Germany',
#                                         'Mainland China'],group_col='Country/Region',
#                      new_only=True,plot_scatter=False,width=900,height=600)
# pfig

In [6]:
## Get WORLD dictionary with all countries
grouping_col = 'Country/Region'
countries = list(df_world.groupby(grouping_col).groups.keys())

WORLD = {}
for country in countries:
#     print(country)
    WORLD[country] = fn.get_group_ts(df_world,country, grouping_col)

# 🗺Adding Mapping - 07/08

https://plotly.com/python/mapbox-county-choropleth/

In [7]:
df_states = corona.df_us
df_states

Unnamed: 0_level_0,Province/State,Country/Region,Confirmed,Deaths,Recovered,state,Confirmed Per Capita,Deaths Per Capita,Recovered Per Capita
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2020-01-22,Washington,US,1.0,0.0,0.0,WA,1.313216e-07,0.000000,0.0
2020-01-23,Washington,US,1.0,0.0,0.0,WA,1.313216e-07,0.000000,0.0
2020-01-24,Washington,US,1.0,0.0,0.0,WA,1.313216e-07,0.000000,0.0
2020-01-25,Washington,US,1.0,0.0,0.0,WA,1.313216e-07,0.000000,0.0
2020-01-26,Washington,US,1.0,0.0,0.0,WA,1.313216e-07,0.000000,0.0
...,...,...,...,...,...,...,...,...,...
2020-07-04,Puerto Rico,US,7787.0,155.0,0.0,PR,2.438242e-03,0.000049,0.0
2020-07-05,Puerto Rico,US,7916.0,155.0,0.0,PR,2.478634e-03,0.000049,0.0
2020-07-06,Puerto Rico,US,8585.0,155.0,0.0,PR,2.688110e-03,0.000049,0.0
2020-07-07,Puerto Rico,US,8714.0,157.0,0.0,PR,2.728502e-03,0.000049,0.0


In [8]:
## Get maximum value for cases by state
max_corona = df_states.groupby('state').max().reset_index()
max_corona.head()

Unnamed: 0,state,Province/State,Country/Region,Confirmed,Deaths,Recovered,Confirmed Per Capita,Deaths Per Capita,Recovered Per Capita
0,AK,Alaska,US,1222.0,17.0,0.0,0.00167,2.3e-05,0.0
1,AL,Alabama,US,46962.0,1058.0,0.0,0.009578,0.000216,0.0
2,AR,Arkansas,US,25246.0,305.0,0.0,0.008366,0.000101,0.0
3,AZ,"Tempe, AZ",US,108614.0,1963.0,1.0,0.014922,0.00027,1.373868e-07
4,CA,"Yolo County, CA",US,292560.0,6718.0,6.0,0.007404,0.00017,1.518517e-07


In [9]:
df_states.reset_index(inplace=True)
df_states

Unnamed: 0,Date,Province/State,Country/Region,Confirmed,Deaths,Recovered,state,Confirmed Per Capita,Deaths Per Capita,Recovered Per Capita
0,2020-01-22,Washington,US,1.0,0.0,0.0,WA,1.313216e-07,0.000000,0.0
1,2020-01-23,Washington,US,1.0,0.0,0.0,WA,1.313216e-07,0.000000,0.0
2,2020-01-24,Washington,US,1.0,0.0,0.0,WA,1.313216e-07,0.000000,0.0
3,2020-01-25,Washington,US,1.0,0.0,0.0,WA,1.313216e-07,0.000000,0.0
4,2020-01-26,Washington,US,1.0,0.0,0.0,WA,1.313216e-07,0.000000,0.0
...,...,...,...,...,...,...,...,...,...,...
7160,2020-07-04,Puerto Rico,US,7787.0,155.0,0.0,PR,2.438242e-03,0.000049,0.0
7161,2020-07-05,Puerto Rico,US,7916.0,155.0,0.0,PR,2.478634e-03,0.000049,0.0
7162,2020-07-06,Puerto Rico,US,8585.0,155.0,0.0,PR,2.688110e-03,0.000049,0.0
7163,2020-07-07,Puerto Rico,US,8714.0,157.0,0.0,PR,2.728502e-03,0.000049,0.0


In [10]:
import plotly.express as px

color_column = 'Confirmed'
pfig = px.choropleth(max_corona,color=color_column,locations='state',
              hover_data=['Confirmed','Deaths','Recovered'], 
              hover_name='state',
              locationmode="USA-states", scope='usa',
              title=f"Total {color_column} Cases by State", #projection='natural earth',
              color_continuous_scale=px.colors.sequential.Reds)


pfig.show(config={'scrollZoom': False})

In [11]:
# date_index = df_states.index.to_series()
# date_index[-7:]

In [12]:
from datetime import datetime
date_range = pd.date_range(end=datetime.today(),
                           start = datetime.today()-pd.Timedelta('7 days'),
                          normalize=True,freq='D')
date_range


DatetimeIndex(['2020-07-03', '2020-07-04', '2020-07-05', '2020-07-06',
               '2020-07-07', '2020-07-08', '2020-07-09', '2020-07-10'],
              dtype='datetime64[ns]', freq='D')

In [13]:
def plot_map_corona(df_states,color_column = 'Confirmed',
                   hover_data=['Confirmed','Deaths','Recovered']):
    
    ## Get maximum value for cases by state
    max_corona = df_states.groupby('state').max().reset_index()

    pfig = px.choropleth(max_corona,color=color_column,locations='state',
                  hover_data=hover_data, 
                  hover_name='state',
                  locationmode="USA-states", scope='usa',
                  title=f"Total {color_column} Cases by State",
                  color_continuous_scale=px.colors.sequential.Reds)
    pfig.update_layout(autosize=True)#,zoom=False)
    pfig.show(config={'scrollZoom': False})
    return pfig
pmap = plot_map_corona(df_states)

## 📕📕SLICING OUT LAST 7 DAYS 


In [14]:
def last_N_days(N=7):
    return datetime.today()-pd.Timedelta(f'{N} days')
last_N_days()

datetime.datetime(2020, 7, 3, 22, 40, 30, 21485)

In [15]:
def iplot_map(df_states,color_column = 'Confirmed',
                   hover_data=['Confirmed','Deaths','Recovered'],n_days=3):
    
    
    df_states.loc[ df_states['Date'] >last_N_days(n_days)]
    
    ## Get maximum value for cases by state
    max_corona = df_states.groupby('state').max().reset_index()

    pfig = px.choropleth(max_corona,color=color_column,locations='state',
                  hover_data=hover_data, 
                  hover_name='state',
                  locationmode="USA-states", scope='usa',
                  title=f"Total {color_column} Cases by State",
                  color_continuous_scale=px.colors.sequential.Reds)
    pfig.update_layout(autosize=True)#,zoom=False)
    pfig.show(config={'scrollZoom': False})
#     return pfig
# pmap = plot_map_corona(df_states)

In [16]:
## SLICING OUT LAST 7 DAYS 
df_sliced = df_states.loc[ df_states['Date'] >last_N_days(7)].copy()
# df_sliced = df_sliced.set_index('Province/State').drop(columns=['Country/Region'])

## GROUP SLICED DATA BY STATE
grouper = df_sliced.set_index('Date').groupby('state')

## GET EACH STATE TOTALS FOR PERIOD
STATES = {}
for group in grouper.groups:
    
    group_df = grouper.get_group(group).select_dtypes('number')

    STATES[group] =group_df.diff().reset_index().sum()

In [17]:
df_last_week = pd.DataFrame.from_dict(STATES,orient='index')
df_last_week = df_last_week.reset_index().rename({'index':'state'},axis=1)
# df_last_week.index.to_series().rename('state')

In [18]:
plot_map_corona(df_last_week)

### Mini Dash 

In [19]:
from jupyter_dash import JupyterDash
import dash_core_components as dcc
import dash_html_components as html

app = JupyterDash()
plot_map_corona(df_states)i

SyntaxError: invalid syntax (<ipython-input-19-cb30d53efed7>, line 6)

# 07/09/20 - Updating get_methods, etc to work with plotly fig

In [None]:


def get_methods(obj,private=False):
    """
    Retrieves a list of all non-private methods (default) from inside of obj.
    - If private==False: only returns methods whose names do NOT start with a '_'
    
    Args:
        obj (object): Object to retrieve methods from.
        private (bool): Whether to retrieve private methods or public.

    Returns:
        list: the names of all of the retrieved methods.
    """
    method_list = [func for func in dir(obj) if callable(getattr(obj, func))]
    if private:
        filt_methods = list(filter(lambda x: '_' in x[0] ,method_list))
    else:
        filt_methods = list(filter(lambda x: '_' not in x[0] ,method_list))
    return  filt_methods

def get_attributes(obj,private=False):
    """
    Retrieves a list of all non-private attributes (default) from inside of obj.
    - If private==False: only returns methods whose names do NOT start with a '_'
    
    Args:
        obj (object): Object to retrieve attributes from.
        private (bool): Whether to retrieve private attributes or public.
    
    Returns:
        list: the names of all of the retrieved attributes.
    """
    method_list = [func for func in dir(obj) if not callable(getattr(obj, func))]
    if private:
        filt_methods = list(filter(lambda x: '_' in x[0] ,method_list))
    else:
        filt_methods = list(filter(lambda x: '_' not in x[0] ,method_list))
    return  filt_methods

def get_methods_attributes_df(obj,include_private=False):
    """
    Retrieves all attributes and methods (with docstrings)
    and returns them in a DataFrame. By default only retrieves
    non-private methods, unless include_privates==True
    Args:
        obj (object): object to retrieve methods/attributes from
        include_privates (bool): Whether to include private methods/attributes
    
    Returns:
        Frame: DataFrame with results.
    """
    import pandas as pd
    methods = get_methods(obj,private=False)
    method_types = ['Method' for item in methods]

    attrs = get_attributes(obj,private=False)
    att_types =['Attribute' for item in attrs]
    
    if include_private:
        private_methods = get_methods(obj,private=True)
        methods.extend(private_methods)
        method_types.extend(['Private Method' for item in private_methods])
        
        private_attrs = get_attributes(obj,private=True)
        attrs.extend(private_attrs)
        att_types.extend(['Private Attribute' for item in private_attrs])
    
    
    docs=[]
    for m in methods:
        att = getattr(obj,m)
        docs.append(att.__doc__)

    all_res = [*methods,*attrs]
    res_type = [*method_types,*att_types]#['Method' for item in methods]+['Attribute' for item in attrs]
    docstrings= docs + ['na' for i in attrs]

    df_obj = pd.DataFrame({'Object':all_res,'Type':res_type,'Doc':docstrings})
    return df_obj


In [None]:
for obj in dir(pmap):
    print(obj)
    

## Geocoding

In [None]:
df = corona.df_us
df

In [None]:
# !pip install geopandas
# !pip install geopy

In [None]:
from geopy.geocoders import Nominatim
locator = Nominatim(user_agent="myGeocoder")
res = locator.geocode('Baltimore')
res.latitude,res.longitude

## LEFTOVERS

In [None]:
# covid.df_us[['positive','negative','death','recovered',
# 'hospitalizedCurrently', 'hospitalizedCumulative',
#  'inIcuCurrently', 'inIcuCumulative', 
#  'onVentilatorCurrently','onVentilatorCumulative', 
#  'states','pending','dateChecked', 'hash',]]

In [None]:
covid.columns['good']

In [None]:
covid.df_states

In [None]:
df_us = covid.df_us.copy()
# sorted(list(df_us.columns))
df_us.columns

In [None]:
# df_us['fips']

In [None]:
good_us_cols = ['dateChecked','death', 'hash', 'hospitalizedCumulative',
 'hospitalizedCurrently','inIcuCumulative', 'inIcuCurrently',
 'negative', 'onVentilatorCumulative', 'onVentilatorCurrently',
 'pending','positive','recovered','states']

dep_us_cols = ['hospitalized', 'lastModified', 'total', 
             'totalTestResults', 'posNeg', 'deathIncrease',
            'hospitalizedIncrease', 'negativeIncrease', 'positiveIncrease', 
            'totalTestResultsIncrease']#[col for col in df_us.columns if col not in good_us_cols]
# print(dep_cols)

In [None]:
df = covid.df_us[covid.columns_us['good']].copy()
df[good_us_cols]

In [None]:
covid

In [None]:
# covid.US

#  📕Covid Tracking Project Data

https://covidtracking.com/api

`/api/v1/states/{state}/screenshots.csv`

## 📦 class `CovidTrackingProject`

In [None]:
covid = fn.CovidTrackingProject(download=True,verbose=True)
covid

In [None]:
df_us = covid.get_df(which='us')
df_us

### def `iplot_cols`

In [None]:

def iplot_cols(df_us,cols='icu'):
    pfig = df_us[[col for col in df_us.columns if cols in col.lower()]].iplot()#kind=kind)
    return pfig

In [None]:
iplot_cols(df_us,'hospital')

In [None]:
iplot_cols(df_us,'icu')

In [None]:
iplot_cols(df_us,'vent')

In [None]:
covid.columns_us['good']

In [None]:
df_states = covid.get_df()
df_states

# APPENDIX

In [None]:
## Load in Fips Data
fips = pd.read_csv('Reference Data/ZIP-COUNTY-FIPS_2018-03.csv')
fips.groupby('STATE').get_group("NY")['STCOUNTYFP'].value_counts()

In [None]:
fips.loc[fips['STCOUNTYFP']==36]

In [None]:

df = covid.STATES
df['fips']

In [None]:
# #     def __init__(self):
# tracking = CovidTrackingProject()
# states_daily = tracking.download_state_daily()
# us_daily=tracking.download_us_daily()
# state_meta = tracking.download_state_meta()
# display(states_daily.head(),us_daily.head(),state_meta.head())

In [None]:
covid = CovidTrackingProject(download=True)
state_meta = covid.data['states_metadata']
states_daily = covid.data['states']
state_list = state_meta['state'].unique()
states_daily

In [None]:
from pandas_profiling import ProfileReport

In [None]:
report  = ProfileReport(states_daily)


## NOTES: COLUMNS TO PLOT

- Basic Stats:
    - death: cumulative total people died
    - positive: total number people positive so far
    - negative
    - recovered
    

- Hospitalization:
    - hospitalizedCumulative: total number hospital so far(recovered and dead)
    - hospitalizedCurrently: 
    - hospitalizedIncrease


- ICU:
    - inIcuCumulative: total number hospital so far(recovered and dead)
    - inIcuCurrently: 
    
- Ventilator 
    - onVentilatorCumulative
    - onVentilatorCurrently


In [None]:

covid.columns

In [None]:
NY = states_daily.groupby('state').get_group('NY')[covid.columns['good']]
NY

## Folium

In [None]:
# import folium
# center = (res.latitude,res.longitude) #(resp['region']['center']['latitude'],resp['region']['center']['longitude'])

# popup = folium.Popup(f"Latitude={center[0]}, Longitude={center[1]}")
# marker = folium.Marker(center,popup)
# mymap = folium.Map(center)
# marker.add_to(mymap)
# mymap