## background

The below code builds an animated choropleth map to show the spread of the COVID-19 virus throughout the world. The data is as of April 4th, 2020 and is available on kaggle here -> https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset

Code followed based on this medium article: https://towardsdatascience.com/visualizing-the-coronavirus-pandemic-with-choropleth-maps-7f30fccaecf5

## load libs

In [15]:
import numpy as np 
import pandas as pd 
import plotly as py
import plotly.express as px
import plotly.graph_objs as go
from plotly.subplots import make_subplots
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)

# allow to see all rows and cols
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

## read in and clean

In [16]:
# read in data
df = pd.read_csv("covid_19_data.csv")
df.head()

Unnamed: 0,SNo,ObservationDate,Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered
0,1,1/22/20,Anhui,Mainland China,1/22/20 17:00,1,0,0
1,2,1/22/20,Beijing,Mainland China,1/22/20 17:00,14,0,0
2,3,1/22/20,Chongqing,Mainland China,1/22/20 17:00,6,0,0
3,4,1/22/20,Fujian,Mainland China,1/22/20 17:00,1,0,0
4,5,1/22/20,Gansu,Mainland China,1/22/20 17:00,0,0,0


In [17]:
# look at data types of cols
df.dtypes

SNo                 int64
ObservationDate    object
Province/State     object
Country/Region     object
Last Update        object
Confirmed           int64
Deaths              int64
Recovered           int64
dtype: object

In [18]:
# change date to datetime
df['ObservationDate']= pd.to_datetime(df['ObservationDate'])

In [19]:
# rename cols
df = df.rename(columns={'Country/Region':'Country'})
df = df.rename(columns={'ObservationDate':'Date'})

In [20]:
# keep only instances with confirmed cases more than 0
df_country = df[df['Confirmed']>0]

# group by country and date and then sort by date from oldest to most recent
df_country = df_country.groupby(['Date','Country']).sum().reset_index().sort_values('Date', ascending=True)

In [21]:
# change date back to string for choropleth map
df_country['Date'] = df_country['Date'].astype(str)

In [22]:
# create plot
fig = px.choropleth(df_country, 
                    locations="Country", 
                    locationmode = "country names",
                    color="Confirmed", 
                    hover_name="Country", 
                    animation_frame="Date",
                    color_continuous_scale="Reds"
                   )
fig.update_layout(
    title_text = 'COVID-19 Spread as of April 4th, 2020',
    title_x = 0.5,
    geo=dict(
        showframe = False,
        showcoastlines = False,
    ))
    
fig.show()

This plot shows the spread across the entire world starting all the way back on January 22nd, 2020. China holds the most confirmed cases up until about late March when the US overtakes for most confirmed cases. It is very easy to see the spots hit the hardest by the virus due to their more red color.