## COVID-19 Analysis, Visualization and Forecasting

Coronavirus is a family of viruses that can cause illness, which can vary from common cold and cough to sometimes more severe disease. Middle East Respiratory Syndrome (MERS-CoV) and Severe Acute Respiratory Syndrome (SARS-CoV) were such severe cases with the world already has faced.

SARS-CoV-2 (n-coronavirus) is the new virus of the coronavirus family, which first discovered in 2019, which has not been identified in humans before. It is a contiguous virus which started from Wuhan in December 2019. Which later declared as Pandemic by WHO due to high rate spreads throughout the world. Currently (on the date 29 Aug 2020), this leads to a total of 900K+ Deaths across the globe.

Pandemic is spreading all over the world; it becomes more important to understand about this spread. This is an effort to analyze the cumulative data of confirmed, deaths, and recovered cases over time. In this notebook, the main focus is to analyze the spread trend of this virus all over the world and its predictions.


In [6]:
### Downloading and Installing Prerequisite
# !pip install pycountry_convert
# !pip install folium
# !pip install calmap
# !pip install altair
# !pip install prophet==0.6
# !pip install pmdarima



In [5]:
# Importing the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import ticker 
import pycountry_convert as pc
import folium
from datetime import datetime, timedelta,date
import plotly.express as px
import json, requests
import calmap

import warnings
warnings.filterwarnings('ignore')

%matplotlib inline

## Sourcing and loading data

### 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE ([LINK](https://github.com/CSSEGISandData/COVID-19)) 
<hr>
Dataset consists of time-series data from 22 JAN 2020 to Till date (Updated on daily Basis).<br>
**Three Time-series dataset (Depricated):**
* time_series_19-covid-Confirmed.csv ([Link Raw File](https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv))
* time_series_19-covid-Deaths.csv ([Link Raw File](https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Deaths.csv))
* time_series_19-covid-Recovered.csv ([Link Raw File](https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Recovered.csv))

**New Time-series dataset:**
* time_series_covid19_confirmed_global.csv ([Link Raw File](https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv))
* time_series_covid19_deaths_global ([Link Raw File](https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv))

**New Dataset (Updated more frequently by web crawler of JHU):**
* cases_country.csv ([Link Raw File]("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_country.csv"))

In [3]:
# Retriving Dataset
df_confirmed = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
df_deaths = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')

# Depricated
df_recovered = pd.read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv")
#df_covid19 = pd.read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_country.csv")
#df_table = pd.read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_time.csv",parse_dates=['Last_Update'])

In [4]:
df_confirmed.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,8/19/20,8/20/20,8/21/20,8/22/20,8/23/20,8/24/20,8/25/20,8/26/20,8/27/20,8/28/20
0,,Afghanistan,33.93911,67.709953,0,0,0,0,0,0,...,37599,37856,37894,37953,37999,38054,38070,38113,38129,38140
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,7812,7967,8119,8275,8427,8605,8759,8927,9083,9195
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,39847,40258,40667,41068,41460,41858,42228,42619,43016,43403
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,1024,1024,1045,1045,1045,1060,1060,1098,1098,1124
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,2015,2044,2068,2134,2171,2222,2283,2332,2415,2471


## Preprocessing

In [3]:
df_confirmed = df_confirmed.rename(columns={"Province/State":"state","Country/Region": "country"})
df_deaths = df_deaths.rename(columns={"Province/State":"state","Country/Region": "country"})
df_recovered = df_recovered.rename(columns={"Province/State":"state","Country/Region": "country"})

## Changing the conuntry names as required by pycountry_convert Lib

df_confirmed.loc[df_confirmed['country'] == "US", "country"] = "USA"
df_deaths.loc[df_deaths['country'] == "US", "country"] = "USA"
df_recovered.loc[df_recovered['country'] == "US", "country"] = "USA"

df_confirmed.loc[df_confirmed['country'] == 'Korea, South', "country"] = 'South Korea'
df_deaths.loc[df_deaths['country'] == 'Korea, South', "country"] = 'South Korea'
df_recovered.loc[df_recovered['country'] == 'Korea, South', "country"] = 'South Korea'

df_confirmed.loc[df_confirmed['country'] == 'Congo (Kinshasa)', "country"] = 'Democratic Republic of the Congo'
df_deaths.loc[df_deaths['country'] == 'Congo (Kinshasa)', "country"] = 'Democratic Republic of the Congo'
df_recovered.loc[df_recovered['country'] == 'Congo (Kinshasa)', "country"] = 'Democratic Republic of the Congo'

df_confirmed.loc[df_confirmed['country'] == "Cote d'Ivoire", "country"] = "Côte d'Ivoire"
df_deaths.loc[df_deaths['country'] == "Cote d'Ivoire", "country"] = "Côte d'Ivoire"
df_recovered.loc[df_recovered['country'] == "Cote d'Ivoire", "country"] = "Côte d'Ivoire"

df_confirmed.loc[df_confirmed['country'] == "Reunion", "country"] = "Réunion"
df_deaths.loc[df_deaths['country'] == "Reunion", "country"] = "Réunion"
df_recovered.loc[df_recovered['country'] == "Reunion", "country"] = "Réunion"

df_confirmed.loc[df_confirmed['country'] == 'Congo (Brazzaville)', "country"] = 'Republic of the Congo'
df_deaths.loc[df_deaths['country'] == 'Congo (Brazzaville)', "country"] = 'Republic of the Congo'
df_recovered.loc[df_recovered['country'] == 'Congo (Brazzaville)', "country"] = 'Republic of the Congo'

df_confirmed.loc[df_confirmed['country'] == 'Bahamas, The', "country"] = 'Bahamas'
df_deaths.loc[df_deaths['country'] == 'Bahamas, The', "country"] = 'Bahamas'
df_recovered.loc[df_recovered['country'] == 'Bahamas, The', "country"] = 'Bahamas'

df_confirmed.loc[df_confirmed['country'] == 'Gambia, The', "country"] = 'Gambia'
df_deaths.loc[df_deaths['country'] == 'Gambia, The', "country"] = 'Gambia'
df_recovered.loc[df_recovered['country'] == 'Gambia, The', "country"] = 'Gambia'




## Merging Confirmed ,deaths and recovered

### melting dataframes
1.use ‘Province/State’, ‘Country/Region’, ‘Lat’, ‘Long’ as identifier variables.
2.Unpivot date columns (confirmed_df.columns[4:] ) with variable column ‘Date’ and value column ‘Confirmed’

In [4]:
dates=df_confirmed.columns[4:]
confirmed_df_melt=df_confirmed.melt(
id_vars=['state','country','Lat','Long'],
value_vars=dates,
var_name='Date',
value_name='Confirmed'
)


deaths_df_melt=df_deaths.melt(
id_vars=['state','country','Lat','Long'],
value_vars=dates,
var_name='Date',
value_name='Deaths'
)


recovered_df_melt=df_recovered.melt(
id_vars=['state','country','Lat','Long'],
value_vars=dates,
var_name='Date',
value_name='Recovered'
)

print(confirmed_df_melt.shape)
print(deaths_df_melt.shape)
print(recovered_df_melt.shape)


(52668, 6)
(52668, 6)
(50094, 6)


In [5]:
confirmed_df_melt

Unnamed: 0,state,country,Lat,Long,Date,Confirmed
0,,Afghanistan,33.939110,67.709953,1/22/20,0
1,,Albania,41.153300,20.168300,1/22/20,0
2,,Algeria,28.033900,1.659600,1/22/20,0
3,,Andorra,42.506300,1.521800,1/22/20,0
4,,Angola,-11.202700,17.873900,1/22/20,0
...,...,...,...,...,...,...
52663,,Sao Tome and Principe,0.186400,6.613100,8/6/20,878
52664,,Yemen,15.552727,48.516388,8/6/20,1768
52665,,Comoros,-11.645500,43.333300,8/6/20,396
52666,,Tajikistan,38.861000,71.276100,8/6/20,7665


In [6]:
recovered_df_melt

Unnamed: 0,state,country,Lat,Long,Date,Recovered
0,,Afghanistan,33.939110,67.709953,1/22/20,0
1,,Albania,41.153300,20.168300,1/22/20,0
2,,Algeria,28.033900,1.659600,1/22/20,0
3,,Andorra,42.506300,1.521800,1/22/20,0
4,,Angola,-11.202700,17.873900,1/22/20,0
...,...,...,...,...,...,...
50089,,Sao Tome and Principe,0.186400,6.613100,8/6/20,797
50090,,Yemen,15.552727,48.516388,8/6/20,898
50091,,Comoros,-11.645500,43.333300,8/6/20,340
50092,,Tajikistan,38.861000,71.276100,8/6/20,6443


In [7]:
## Removing data for Canada  mismatch as canada recovered data is counted by Country wise rather than Province/State-wise
recovered_df_melt=recovered_df_melt[recovered_df_melt['country']!= 'Canada']

In [8]:
## merging  3 dataframes one after another
full_table = pd.merge(left=confirmed_df_melt, right=deaths_df_melt, how='left',
                      on=['state', 'country', 'Date', 'Lat', 'Long'])

full_table = pd.merge(left=full_table, right=recovered_df_melt, how='left',
                      on=['state', 'country', 'Date', 'Lat', 'Long'])
full_table.sample(4)

Unnamed: 0,state,country,Lat,Long,Date,Confirmed,Deaths,Recovered
50157,,Liechtenstein,47.14,9.55,7/28/20,87,1,81.0
26033,Diamond Princess,Canada,0.0,0.0,4/28/20,0,1,
32823,,Fiji,-17.7134,178.065,5/24/20,18,0,15.0
36386,,Togo,8.6195,0.8248,6/6/20,487,13,240.0


In [9]:
#convert to proper date format
full_table['Date']=pd.to_datetime (full_table['Date'])
# checking for missing value
full_table.isnull().sum()

state        36630
country          0
Lat              0
Long             0
Date             0
Confirmed        0
Deaths           0
Recovered     3762
dtype: int64

In [10]:
#fill na with 0
full_table['Recovered']=full_table['Recovered'].fillna(0)
##Handling the missing values
full_table[['state']]=full_table[['state']].fillna('None')

In [11]:

#checking datatypes
full_table.dtypes

state                object
country              object
Lat                 float64
Long                float64
Date         datetime64[ns]
Confirmed             int64
Deaths                int64
Recovered           float64
dtype: object

In [12]:
#fixing dtypes
full_table['Recovered']=full_table['Recovered'].astype(int)


In [13]:
full_table.sample(6)

Unnamed: 0,state,country,Lat,Long,Date,Confirmed,Deaths,Recovered
49442,,Dominica,15.415,-61.371,2020-07-25,18,0,18
7169,,Burundi,-3.3731,29.9189,2020-02-17,0,0,0
14328,,Zimbabwe,-19.015438,29.154857,2020-03-15,0,0,0
46197,,Papua New Guinea,-6.314993,143.95555,2020-07-13,11,0,8
23145,,Andorra,42.5063,1.5218,2020-04-18,704,35,205
3342,,Lithuania,55.1694,23.8813,2020-02-03,0,0,0


In [14]:
#Grouped by day,country
datewise = full_table.groupby(['Date', 'country'])['Confirmed', 'Deaths', 'Recovered'].sum().reset_index()

In [15]:
#Calculating the Mortality Rate, Recovery Rate,active and closed cases
datewise["Mortality Rate"]=(datewise["Deaths"]/datewise["Confirmed"])*100
datewise["Recovery Rate"]=(datewise["Recovered"]/datewise["Confirmed"])*100
datewise["Active Cases"]=datewise["Confirmed"]-datewise["Recovered"]-datewise["Deaths"]
datewise["Closed Cases"]=datewise["Recovered"]+datewise["Deaths"]

print("Average Mortality Rate",datewise["Mortality Rate"].mean())
print("Median Mortality Rate",datewise["Mortality Rate"].median())
print("Average Recovery Rate",datewise["Recovery Rate"].mean())
print("Median Recovery Rate",datewise["Recovery Rate"].median())

datewise.sample(10)


Average Mortality Rate 3.2652770206775066
Median Mortality Rate 1.8991331491796961
Average Recovery Rate 42.711529603023564
Median Recovery Rate 40.12704541387


Unnamed: 0,Date,country,Confirmed,Deaths,Recovered,Mortality Rate,Recovery Rate,Active Cases,Closed Cases
28199,2020-06-19,Zimbabwe,479,4,63,0.835073,13.152401,412,67
11275,2020-03-21,West Bank and Gaza,48,0,17,0.0,35.416667,31,17
13372,2020-04-02,Brunei,133,1,56,0.75188,42.105263,76,57
21531,2020-05-15,Liechtenstein,82,1,81,1.219512,98.780488,0,82
25782,2020-06-07,Burkina Faso,889,53,765,5.961755,86.051744,71,818
33797,2020-07-19,Sao Tome and Principe,746,14,451,1.876676,60.455764,281,465
24112,2020-05-29,Djibouti,2914,20,1241,0.686342,42.587509,1653,1261
13430,2020-04-02,Ireland,3849,98,5,2.546116,0.129904,3746,103
1487,2020-01-29,Trinidad and Tobago,0,0,0,,,0,0
23485,2020-05-25,Turkey,157814,4369,120015,2.768449,76.048386,33430,124384


In [16]:
#filling missing value
colms=['Mortality Rate','Recovery Rate']
datewise[colms]=datewise[colms].fillna(0)

In [17]:
# aggregrated number of cases datewise
datewise_agg=datewise.groupby(["Date"]).agg({"Confirmed":'sum',"Recovered":'sum',"Deaths":'sum',"Mortality Rate":'sum',
                                        "Recovery Rate":'sum',"Active Cases":'sum',"Closed Cases":'sum' })

In [18]:
datewise_agg

Unnamed: 0_level_0,Confirmed,Recovered,Deaths,Mortality Rate,Recovery Rate,Active Cases,Closed Cases
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-01-22,555,28,17,3.102190,5.109489,510,45
2020-01-23,654,30,18,2.799378,4.665630,606,48
2020-01-24,941,36,26,2.826087,3.913043,879,62
2020-01-25,1434,39,42,2.987198,2.773826,1353,81
2020-01-26,2118,52,56,2.698795,52.361446,2010,108
...,...,...,...,...,...,...,...
2020-08-02,18079723,10584823,690065,583.348796,12271.460212,6804835,11274888
2020-08-03,18282208,10807151,694396,582.975753,12415.934763,6780661,11501547
2020-08-04,18540789,11028008,701347,582.449568,12509.246657,6811434,11729355
2020-08-05,18811953,11249203,708424,579.680783,12586.733134,6854326,11957627


In [19]:
## average increase of cases in per days
print("Average increase in number of Confirmed Cases every day: ",np.round(datewise_agg["Confirmed"].diff().fillna(0).mean()))
print("Average increase in number of Recovered Cases every day: ",np.round(datewise_agg["Recovered"].diff().fillna(0).mean()))
print("Average increase in number of Deaths Cases every day: ",np.round(datewise_agg["Deaths"].diff().fillna(0).mean()))

Average increase in number of Confirmed Cases every day:  96447.0
Average increase in number of Recovered Cases every day:  57768.0
Average increase in number of Deaths Cases every day:  3611.0


In [20]:
# new cases 
temp = datewise.groupby(['country', 'Date', ])['Confirmed', 'Deaths', 'Recovered']
temp = temp.sum().diff().reset_index()

mask = temp['country'] != temp['country'].shift(1)

temp.loc[mask, 'Confirmed'] = np.nan
temp.loc[mask, 'Deaths'] = np.nan
temp.loc[mask, 'Recovered'] = np.nan

# renaming columns
temp.columns = ['country', 'Date', 'New cases', 'New deaths', 'New recovered']

# merging new values

df_covid19 = pd.merge(datewise, temp, on=['country', 'Date'])

# filling na with 0

df_covid19 = df_covid19.fillna(0)

# fixing data types

cols = ['New cases', 'New deaths', 'New recovered']
df_covid19[cols] = df_covid19[cols].astype('int')
# 
df_covid19['New cases'] = df_covid19['New cases'].apply(lambda x: 0 if x<0 else x)

In [21]:
df_covid19.sample(4)

Unnamed: 0,Date,country,Confirmed,Deaths,Recovered,Mortality Rate,Recovery Rate,Active Cases,Closed Cases,New cases,New deaths,New recovered
28850,2020-06-23,Japan,17879,965,16096,5.397394,90.027406,818,17061,59,10,139
7667,2020-03-02,Senegal,1,0,0,0.0,0.0,1,0,1,0,0
5322,2020-02-19,Ethiopia,0,0,0,0.0,0.0,0,0,0,0,0
16647,2020-04-19,Madagascar,121,0,39,0.0,32.231405,82,39,1,0,4


In [22]:
#Now, will take population data from worldometer by webscrapping
#saved  file in csv
#pop_data_c =pop_data.to_csv(r'/Users/ajesh_mahto/Desktop/springboard/global_pop_data.csv',index=False)
import requests
from bs4 import BeautifulSoup
url="https://www.worldometers.info/world-population/population-by-country/"
r= requests.get(url)
soup=BeautifulSoup(r.content)
countries=soup.find_all("table")[0]
pop_data=pd.read_html(str(countries))[0]

def function(a,b,c,d,e,f,g,h,i,j,k):
    data=pd.DataFrame(
    {'a':pop_data[a],
     'b':pop_data[b],
     'c':pop_data[c],
     'd':pop_data[d],
     'e':pop_data[e],
     'f':pop_data[f],
     'g':pop_data[g],
     'h':pop_data[h],
     'i':pop_data[i],
     'j':pop_data[j],
     'k':pop_data[k]})
    return data

pop_data=function('Country (or dependency)','Population (2020)','Yearly Change','Net Change','Density (P/Km²)',
                  'Land Area (Km²)','Migrants (net)','Fert. Rate','Med. Age','Urban Pop %','World Share')
pop_data.columns=['Country (or dependency)','Population (2020)','Yearly Change','Net Change','Density (P/Km²)',
                  'Land Area (Km²)','Migrants (net)','Fert. Rate','Med. Age','Urban Pop %','World Share']



In [23]:
pop_data_c =pop_data.to_csv(r'/Users/ajesh_mahto/Desktop/springboard/capstone_1/global_pop_data.csv',index=False)

In [24]:
#loading the file of world poulation
world_population=pd.read_csv('global_pop_data.csv')

#subsetting
world_population = world_population[['Country (or dependency)', 'Population (2020)', 'Density (P/Km²)', 'Land Area (Km²)', 'Med. Age', 'Urban Pop %']]
world_population.columns = ['Country (or dependency)', 'Population (2020)', 'Density', 'Land Area', 'Med Age', 'Urban Pop']

#Replace united states by US
world_population.loc[world_population['Country (or dependency)']=='United States', 'Country (or dependency)'] = 'USA'

# Remove the % character from Urban Pop values
world_population['Urban Pop'] = world_population['Urban Pop'].str.rstrip('%')

## Replace Urban Pop and Med Age "N.A" by their respective modes, then transform to int
world_population.loc[world_population['Urban Pop']=='N.A.', 'Urban Pop'] = int(world_population.loc[world_population['Urban Pop']!='N.A.', 'Urban Pop'].mode()[0])
world_population['Urban Pop'] = world_population['Urban Pop'].astype('int16')
world_population.loc[world_population['Med Age']=='N.A.', 'Med Age'] = int(world_population.loc[world_population['Med Age']!='N.A.', 'Med Age'].mode()[0])
world_population['Med Age'] = world_population['Med Age'].astype('int16')

#now join dataset to previous data set
final_data=pd.merge(
    left=df_covid19,
    right=world_population,
    left_on='country',
    right_on='Country (or dependency)',
    how='left'
)

#dropping country(or dependency data)
final_dataset=final_data.drop('Country (or dependency)',axis=1)


In [25]:
final_dataset.sample(5)

Unnamed: 0,Date,country,Confirmed,Deaths,Recovered,Mortality Rate,Recovery Rate,Active Cases,Closed Cases,New cases,New deaths,New recovered,Population (2020),Density,Land Area,Med Age,Urban Pop
32965,2020-07-15,Germany,200890,9080,186000,4.519887,92.587983,5810,195080,434,2,900,83783942.0,240.0,348560.0,46.0,76.0
8763,2020-03-08,Montenegro,0,0,0,0.0,0.0,0,0,0,0,0,628066.0,47.0,13450.0,39.0,68.0
28778,2020-06-23,Barbados,97,7,85,7.216495,87.628866,5,92,0,0,0,287375.0,668.0,430.0,40.0,31.0
26029,2020-06-08,Jamaica,599,10,405,1.669449,67.612688,184,415,1,0,0,2961167.0,273.0,10830.0,31.0,55.0
32504,2020-07-12,Thailand,3217,58,3088,1.802922,95.990053,71,3146,1,0,0,69799978.0,137.0,510890.0,40.0,51.0


 I will be using df_covid19 dataset for Exploratory data analysis and final dataset for modelling as it has more features.

In [26]:
#getting all countries
countries = np.asarray(df_confirmed["country"])
countries1 = np.asarray(df_covid19["country"])

#Continent_code to Continent_names
continents = {
     'NA': 'North America',
    'SA': 'South America', 
    'AS': 'Asia',
    'OC': 'Australia',
    'AF': 'Africa',
    'EU' : 'Europe',
    'na' : 'Others'
}

# Defininng Function for getting continent code for country.
def country_to_continent_code(country):
    try:
        return pc.country_alpha2_to_continent_code(pc.country_name_to_country_alpha2(country))
    except :
        return 'na'

# #Collecting Continent Information
df_confirmed.insert(2,"continent", [continents[country_to_continent_code(country)] for country in countries[:]])
df_deaths.insert(2,"continent",  [continents[country_to_continent_code(country)] for country in countries[:]])
#df_recovered.insert(2,"continent",  [continents[country_to_continent_code(country)] for country in countries[:]])   
df_covid19.insert(1,"continent",  [continents[country_to_continent_code(country)] for country in countries1[:]])


In [27]:

# checking data with continent
df_covid19.sample(10)

Unnamed: 0,Date,continent,country,Confirmed,Deaths,Recovered,Mortality Rate,Recovery Rate,Active Cases,Closed Cases,New cases,New deaths,New recovered
21252,2020-05-14,Australia,Australia,7019,98,6334,1.39621,90.240775,587,6432,30,0,37
35013,2020-07-26,Africa,Democratic Republic of the Congo,8831,204,5510,2.310044,62.39384,3117,5714,30,0,205
11232,2020-03-21,Africa,Rwanda,17,0,0,0.0,0.0,17,0,0,0,0
5805,2020-02-21,Others,Taiwan*,26,1,2,3.846154,7.692308,23,3,2,0,0
31309,2020-07-06,Europe,Luxembourg,4542,110,4016,2.421841,88.419199,416,4126,20,0,0
33592,2020-07-18,Asia,Pakistan,263496,5568,204276,2.113125,77.525276,53652,209844,1579,46,5767
32109,2020-07-10,Africa,Seychelles,100,0,11,0.0,11.0,89,11,6,0,0
5654,2020-02-21,North America,Barbados,0,0,0,0.0,0.0,0,0,0,0,0
18596,2020-04-29,Africa,Tunisia,980,40,294,4.081633,30.0,646,334,5,0,15
7736,2020-03-03,Africa,Burundi,0,0,0,0.0,0.0,0,0,0,0,0


In [28]:
df_covid19.to_csv(r'/Users/ajesh_mahto/Desktop/springboard/capstone_1/df_covid19.csv',index=False)
final_dataset.to_csv(r'/Users/ajesh_mahto/Desktop/springboard/capstone_1/final_dataset.csv',index=False)
df_confirmed.to_csv(r'/Users/ajesh_mahto/Desktop/springboard/capstone_1/df_confirmed.csv',index=False)
df_deaths.to_csv(r'/Users/ajesh_mahto/Desktop/springboard/capstone_1/df_deaths.csv',index=False)