# Playing with Coronavirus Timeseries

- https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset


## Notes

- This notebook uses 2 classes (based on a BaseDataset class) to load in data from both a kaggle dataset (novel coronavirus 2019) and the Covid Tracking Project data

## To Do:

- [x] Add data from Covid Tracking Project's API
    - https://covidtracking.com/api
    
- [ ] Move app styling to a css file in a new `assets/` folder

- Functions and classes are in functions.py

### RESOURCES FOR FUTURE
- RAFAEL STUDY GROUP FOR MAKING A MAP
    - https://www.youtube.com/watch?v=MAhK7NHXEOg&feature=emb_logo
    - https://github.com/erdosn/additional-topic-plotly

In [26]:
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.templates.default = "plotly_dark"

import cufflinks as cf
cf.go_offline()
cf.set_config_file(sharing='public',theme='solar',offline=True)

In [27]:
import os,glob,sys
import re

!pip install -U fsds
from fsds.imports import *

Requirement already up-to-date: fsds in /opt/anaconda3/envs/learn-env/lib/python3.6/site-packages (0.2.16)




In [28]:
import functions as fn

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [29]:
help(fn)

Help on module functions:

NAME
    functions

CLASSES
    builtins.object
        BaselineData
            CoronaData
            CovidTrackingProject
    
    class BaselineData(builtins.object)
     |  #Make a base class
     |  
     |  Methods defined here:
     |  
     |  __repr__(self)
     |      Return repr(self).
     |  
     |  __str__(self)
     |      Return str(self).
     |  
     |  get_group_ts(self, group_name, group_col='state', ts_col=None, df=None, freq='D', agg_func='sum')
     |      Take df_us and extracts state's data as then Freq/Aggregation provided
     |  
     |  ----------------------------------------------------------------------
     |  Data descriptors defined here:
     |  
     |  __dict__
     |      dictionary for instance variables (if defined)
     |  
     |  __weakref__
     |      list of weak references to the object (if defined)
     |  
     |  df
    
    class CoronaData(BaselineData)
     |  Dataset from the Novel Coronavirus Kaggle r

# Main Kaggle Dataset - Get US States

# 📦class `CoronaData`

In [30]:
from functions import BaselineData
from functions import CoronaData
# fs.ihelp(CoronaData,0)

In [31]:
corona = CoronaData(verbose=True,run_workflow=True)

[i] DOWNLOADING DATA USING KAGGLE API
	https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset
	- Downloaded dataset .zip and extracted to:"New Data/"
	- Extraction Complete.


Unnamed: 0,Date,Province/State,Country/Region,Confirmed,Deaths,Recovered
0,2020-01-22,Anhui,Mainland China,1.0,0.0,0.0
1,2020-01-22,Beijing,Mainland China,14.0,0.0,0.0
2,2020-01-22,Chongqing,Mainland China,6.0,0.0,0.0
3,2020-01-22,Fujian,Mainland China,1.0,0.0,0.0
4,2020-01-22,Gansu,Mainland China,0.0,0.0,0.0


[i] There are 223 countries in the datatset
[i] Dates Covered:
	From 01-22-2020 to 06-30-2020


In [32]:
df_world = corona.df.copy()
countries = list(df_world.groupby('Country/Region').groups.keys())
len(countries)

223

## 07/02 - Making these methods into standalones

In [33]:
def set_datetime_index(df_,col='Date',drop=True):#,drop_old=False):
        """Returns df with specified column as datetime index"""
        import pandas as pd
            
        ## Copy to avoid edits to orig
        df = df_.copy()
        
        ## Convert to date time
        df[col] = pd.to_datetime(df[col],infer_datetime_format=True)
        
        ## Set as index
        df.set_index(df[col],drop=False,inplace=True)
        
        if drop:
            # Drop the column if it is present
            if col in df.columns:
                df.drop(columns=col,inplace=True)
            
        return df
    
def set_freq_resample(df,date_col='Date',freq='D', agg_func='sum'):
    
    if isinstance(df.index,pd.DatetimeIndex)==False:
        df = set_datetime_index(df,col=date_col)
        
    ts  = df.resample(freq).agg(agg_func).copy()
    return ts
    
    
    
def get_group_ts(df,group_name,group_col='state',
                     ts_col=None, freq='D', agg_func='sum'):
        """Take df_us and extracts state's data as then Freq/Aggregation provided"""
        from IPython.display import display

            
        try:
            ## Get state_df group
            group_df = df.groupby(group_col).get_group(group_name).copy()#.resample(freq).agg(agg)
        except Exception:
            display(df.head())
            return None
        
        group_df = set_freq_resample(group_df.copy(),freq=freq,agg_func=agg_func)
#         ## Resample and aggregate state data
#         group_df = group_df.resample(freq).agg(agg_func)


        ## Get and Rename Sum Cols 
        orig_cols = group_df.columns

        ## Create Renamed Sum columns
        for col in orig_cols:
            group_df[f"{group_name} - {col}"] = group_df[col]

        ## Drop original cols
        group_df.drop(orig_cols,axis=1,inplace=True)

        if ts_col is not None:
            ts_cols_selected = [col for col in group_df.columns if ts_col in col]
            group_df = group_df[ts_cols_selected]

        return group_df 

In [34]:
ts_world = set_datetime_index(df_world)
ts_world

Unnamed: 0_level_0,Province/State,Country/Region,Confirmed,Deaths,Recovered
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-01-22,Anhui,Mainland China,1.0,0.0,0.0
2020-01-22,Beijing,Mainland China,14.0,0.0,0.0
2020-01-22,Chongqing,Mainland China,6.0,0.0,0.0
2020-01-22,Fujian,Mainland China,1.0,0.0,0.0
2020-01-22,Gansu,Mainland China,0.0,0.0,0.0
...,...,...,...,...,...
2020-06-30,Zacatecas,Mexico,908.0,96.0,626.0
2020-06-30,Zakarpattia Oblast,Ukraine,2889.0,91.0,943.0
2020-06-30,Zaporizhia Oblast,Ukraine,572.0,17.0,418.0
2020-06-30,Zhejiang,Mainland China,1269.0,1.0,1267.0


In [35]:
get_group_ts(df_world,'Italy','Country/Region')

Unnamed: 0_level_0,Italy - Confirmed,Italy - Deaths,Italy - Recovered
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-31,2.0,0.0,0.0
2020-02-01,2.0,0.0,0.0
2020-02-02,2.0,0.0,0.0
2020-02-03,2.0,0.0,0.0
2020-02-04,2.0,0.0,0.0
...,...,...,...
2020-06-26,239961.0,34708.0,187615.0
2020-06-27,240136.0,34716.0,188584.0
2020-06-28,240310.0,34738.0,188891.0
2020-06-29,240436.0,34744.0,189196.0


In [36]:
# isinstance(ts_world.index,pd.DatetimeIndex)

# isinstance(df_world.index, pd.Timestamp)

# isinstance(df_world.index, pd.RangeIndex)

In [37]:
set_datetime_index(df_world)

Unnamed: 0_level_0,Province/State,Country/Region,Confirmed,Deaths,Recovered
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-01-22,Anhui,Mainland China,1.0,0.0,0.0
2020-01-22,Beijing,Mainland China,14.0,0.0,0.0
2020-01-22,Chongqing,Mainland China,6.0,0.0,0.0
2020-01-22,Fujian,Mainland China,1.0,0.0,0.0
2020-01-22,Gansu,Mainland China,0.0,0.0,0.0
...,...,...,...,...,...
2020-06-30,Zacatecas,Mexico,908.0,96.0,626.0
2020-06-30,Zakarpattia Oblast,Ukraine,2889.0,91.0,943.0
2020-06-30,Zaporizhia Oblast,Ukraine,572.0,17.0,418.0
2020-06-30,Zhejiang,Mainland China,1269.0,1.0,1267.0


## Making World Version of Corona Dash

In [38]:
grouper = df_world.groupby('Country/Region')
countries = list(grouper.groups.keys())

WORLD = {}
for country in countries:
#     print(country)
    WORLD[country] = get_group_ts(df_world,country, "Country/Region")
    

In [39]:
def plot_group_ts(df, group_list,group_col, plot_cols = ['Confirmed'],
                  df_only=False,
                new_only=False,plot_scatter=True,show=False,
                 width=1000,height=700):
    """Plots the plot_cols for every state in state_list.
    Returns plotly figure
    New as of 06/21"""
    import pandas as pd 
    import numpy as np
    ## Get state dataframes
    
    concat_dfs = []  
    GROUPS = {}
    
    ## Get each state
    for group in group_list:

        # Grab each state's df and save to STATES
        dfs = get_group_ts(df,group,group_col)
        GROUPS[group] = dfs

        ## for each plot_cols, find all columns that contain that col name
        for plot_col in plot_cols:
            concat_dfs.append(dfs[[col for col in dfs.columns if col.endswith(plot_col)]])#plot_col in col]])

    ## Concatenate final dfs
    plot_df = pd.concat(concat_dfs,axis=1)#[STATES[s] for s in plot_states],axis=1).iplot()
    
    
    ## Set title and df if new_only
    if new_only:
        plot_df = plot_df.diff()
        title = "Coronavirus Cases by State - New Cases"
    else:
        title = 'Coronavirus Cases by State - Cumulative'
    
    ## Reset Indes
    plot_df.reset_index(inplace=True)
    
    
    ## Return Df or plot
    if df_only==False:

        if np.any(['per capita' in x.lower() for x in plot_cols]):
            value_name = "# of Cases - Per Capita"
        else:
            value_name='# of Cases'
        pfig_df_melt = plot_df.melt(id_vars=['Date'],var_name='Group',
                                    value_name=value_name)
        
        if plot_scatter:
            plot_func = px.scatter
        else:
            plot_func = px.line
            
            
        # Plot concatenated dfs
        pfig = plot_func(pfig_df_melt,x='Date',y=value_name,color='Group',
                      title=title,template='plotly_dark',width=width,height=height)        
#         pfig.update_xaxes(rangeslider_visible=True)

#         pfig.update_layout(legend_orientation="h")

#         pfig.update_layout(
#             xaxis=dict(
#                 rangeselector=dict(
#                     buttons=list([
#                         dict(count=7,
#                              label="1week",
#                              step="day",
#                              stepmode="backward"),
#                         dict(count=14,
#                              label="2weeks",
#                              step="day",
#                              stepmode="backward"),
#                         dict(count=1,
#                              label="1m",
#                              step="month",
#                              stepmode="backward"),
#                         dict(count=6,
#                              label="6m",
#                              step="month",
#                              stepmode="backward"),

#                         dict(step="all")
#                     ])
#                 ),
#                 rangeslider=dict(
#                     visible=True
#                 ),
#                 type="date"
#             )
#         )
        
        if show:
            pfig.show()
                
        return pfig
    
    else:
        return plot_df#.reset_index()

In [40]:
import plotly.express as px
# px.scatter()

In [41]:
pfig = plot_group_ts(df_world,group_list=['US','Italy','Canada',
                                  'Germany'],group_col='Country/Region',
                     new_only=True,plot_scatter=False,height=500)
pfig

In [42]:
# WORLD['US'].diff().plot()

In [43]:
# WORLD['Italy'].diff().plot()

In [44]:
# df = corona.df_us.copy()

# ## Report Total Cases
# total_cases = df.groupby('state').sum()[['Confirmed','Deaths']]
# total_cases.sort_values('Confirmed',0,0).head(20).style.bar(['Deaths','Confirmed'])

#  📕Covid Tracking Project Data

https://covidtracking.com/api

`/api/v1/states/{state}/screenshots.csv`

In [45]:
from fsds.imports import *
import datetime as dt
import requests
import json,urllib
pd.set_option('display.max_columns',0)

### Get US Daily


## 📦 class `CovidTrackingProject`

In [46]:
from functions import CovidTrackingProject

In [47]:
covid=CovidTrackingProject(download=True,verbose=True)
covid

[i] DOWNLOADING DATASETS FROM COVID TRACKING PROJECT
	https://covidtracking.com/data
	- File saved as: "New Data/states_metadata.csv"
ERROR
	- File saved as: "New Data/us.csv"
	- File saved as: "New Data/states.csv"
states


------------------------------------------------------------
[i] CovidTrackingProject Contents:
------------------------------------------------------------

METHODS:
	download_state_daily
	download_state_meta
	download_us_daily
	get_csv_save_load
	get_group_ts
	help

ATTRIBUTES
	base_folder
	base_url
	columns
	columns_us
	df
	df_states
	df_us
	urls

In [48]:
df_us = covid.df_us.copy()

In [49]:
df_us

Unnamed: 0_level_0,positive,negative,death,recovered,hospitalizedCurrently,hospitalizedCumulative,inIcuCurrently,inIcuCumulative,onVentilatorCurrently,onVentilatorCumulative,states,pending,dateChecked,hash
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2020-07-03,2786059,31427438,122158.0,790404.0,37750.0,247284.0,5589.0,10936.0,2049.0,1059.0,56,2237.0,2020-07-03T00:00:00Z,feabd1e5940e81315d4dd8c7f997d4dff1a777b4
2020-07-02,2728497,30763946,121523.0,781970.0,37473.0,245722.0,5624.0,10816.0,2105.0,1041.0,56,2208.0,2020-07-02T00:00:00Z,926d949599a345bb66d55718439690522560dfed
2020-07-01,2674813,30152546,120853.0,729994.0,36360.0,243846.0,5509.0,10752.0,2098.0,1027.0,56,2604.0,2020-07-01T00:00:00Z,6befb05965c58b3902f12beec20245037c802bba
2020-06-30,2621831,29584414,120152.0,720631.0,35231.0,242408.0,5421.0,10669.0,2044.0,1008.0,56,2432.0,2020-06-30T00:00:00Z,922a874533f71e362aaa52c7522136e00ef03c00
2020-06-29,2577473,28979934,119556.0,705203.0,33567.0,240826.0,5378.0,10542.0,2011.0,990.0,56,2194.0,2020-06-29T00:00:00Z,330ca1484f02baa6b75cbabb8beb1c3a27c92ba8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-01-26,2,0,,,,,,,,,1,,2020-01-26T00:00:00Z,e1cf59ab48e1cf367c4a6798a508a23d9d36bd18
2020-01-25,2,0,,,,,,,,,1,,2020-01-25T00:00:00Z,bef2a1d5f2a13491e0e0369bbd46c10cdd12973b
2020-01-24,2,0,,,,,,,,,1,,2020-01-24T00:00:00Z,bfffe76fc0b7cf11efe8aecd3cc7b22598d77d61
2020-01-23,2,0,,,,,,,,,1,,2020-01-23T00:00:00Z,cee36ebf3174bf1df0daa36e1e8088a157406fad


In [50]:
covid.columns_us['good']

['positive',
 'negative',
 'death',
 'recovered',
 'hospitalizedCurrently',
 'hospitalizedCumulative',
 'inIcuCurrently',
 'inIcuCumulative',
 'onVentilatorCurrently',
 'onVentilatorCumulative',
 'states',
 'pending',
 'dateChecked',
 'hash']

In [None]:
# covid.df_us[['positive','negative','death','recovered',
# 'hospitalizedCurrently', 'hospitalizedCumulative',
#  'inIcuCurrently', 'inIcuCumulative', 
#  'onVentilatorCurrently','onVentilatorCumulative', 
#  'states','pending','dateChecked', 'hash',]]

In [None]:
covid.columns['good']

In [None]:
covid.df_states

In [None]:
df_us = covid.df_us.copy()
# sorted(list(df_us.columns))
df_us.columns

In [None]:
# df_us['fips']

In [None]:
good_us_cols = ['dateChecked','death', 'hash', 'hospitalizedCumulative',
 'hospitalizedCurrently','inIcuCumulative', 'inIcuCurrently',
 'negative', 'onVentilatorCumulative', 'onVentilatorCurrently',
 'pending','positive','recovered','states']

dep_us_cols = ['hospitalized', 'lastModified', 'total', 
             'totalTestResults', 'posNeg', 'deathIncrease',
            'hospitalizedIncrease', 'negativeIncrease', 'positiveIncrease', 
            'totalTestResultsIncrease']#[col for col in df_us.columns if col not in good_us_cols]
# print(dep_cols)

In [None]:
df = covid.df_us[covid.columns_us['good']].copy()
df[good_us_cols]

In [None]:
covid

In [None]:
# covid.US

# APPENDIX

In [None]:
## Load in Fips Data
fips = pd.read_csv('Reference Data/ZIP-COUNTY-FIPS_2018-03.csv')
fips.groupby('STATE').get_group("NY")['STCOUNTYFP'].value_counts()

In [None]:
fips.loc[fips['STCOUNTYFP']==36]

In [None]:

df = covid.STATES
df['fips']

In [None]:
# #     def __init__(self):
# tracking = CovidTrackingProject()
# states_daily = tracking.download_state_daily()
# us_daily=tracking.download_us_daily()
# state_meta = tracking.download_state_meta()
# display(states_daily.head(),us_daily.head(),state_meta.head())

In [None]:
covid = CovidTrackingProject(download=True)
state_meta = covid.data['states_metadata']
states_daily = covid.data['states']
state_list = state_meta['state'].unique()
states_daily

In [None]:
from pandas_profiling import ProfileReport

In [None]:
report  = ProfileReport(states_daily)


## NOTES: COLUMNS TO PLOT

- Basic Stats:
    - death: cumulative total people died
    - positive: total number people positive so far
    - negative
    - recovered
    

- Hospitalization:
    - hospitalizedCumulative: total number hospital so far(recovered and dead)
    - hospitalizedCurrently: 
    - hospitalizedIncrease


- ICU:
    - inIcuCumulative: total number hospital so far(recovered and dead)
    - inIcuCurrently: 
    
- Ventilator 
    - onVentilatorCumulative
    - onVentilatorCurrently


In [None]:

covid.columns

In [None]:
NY = states_daily.groupby('state').get_group('NY')[covid.columns['good']]
NY

# 🗺Adding Mapping

## Geocoding

In [None]:
df = corona.df_us
df

In [None]:
# !pip install geopandas
# !pip install geopy

In [None]:
from geopy.geocoders import Nominatim
locator = Nominatim(user_agent="myGeocoder")
res = locator.geocode('Baltimore')
res.latitude,res.longitude

## Folium

In [None]:
# import folium
# center = (res.latitude,res.longitude) #(resp['region']['center']['latitude'],resp['region']['center']['longitude'])

# popup = folium.Popup(f"Latitude={center[0]}, Longitude={center[1]}")
# marker = folium.Marker(center,popup)
# mymap = folium.Map(center)
# marker.add_to(mymap)
# mymap