# COVID-19 Visualizations

The purpose of this project was to display the analysis of the COVID-19 in a very interactive and visual way. We see statistics all the time but to put text into graphs is important as it shows the bigger picture to us. Data is from the morning of April 13th, 2020.

# Importing Libraries and Dataset

In [1]:
#import all libraries
import numpy as np
import pandas as pd
import seaborn as sb
import plotly as py
import plotly.express as px
import plotly.graph_objs as go
from plotly.subplots import make_subplots
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)

In [2]:
#import and read the data
#data source: https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide
data = pd.read_excel("COVID-19-geographic-disbtribution-worldwide-2020-04-13.xlsx", sheet_name = 'COVID-19-geographic-disbtributi')

In [3]:
#view the variables etc
data.head(5)

Unnamed: 0,dateRep,day,month,year,cases,deaths,countriesAndTerritories,geoId,countryterritoryCode,popData2018
0,2020-04-13,13,4,2020,52,0,Afghanistan,AF,AFG,37172386.0
1,2020-04-12,12,4,2020,34,3,Afghanistan,AF,AFG,37172386.0
2,2020-04-11,11,4,2020,37,0,Afghanistan,AF,AFG,37172386.0
3,2020-04-10,10,4,2020,61,1,Afghanistan,AF,AFG,37172386.0
4,2020-04-09,9,4,2020,56,3,Afghanistan,AF,AFG,37172386.0


In [4]:
#see the type of data
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10537 entries, 0 to 10536
Data columns (total 10 columns):
dateRep                    10537 non-null datetime64[ns]
day                        10537 non-null int64
month                      10537 non-null int64
year                       10537 non-null int64
cases                      10537 non-null int64
deaths                     10537 non-null int64
countriesAndTerritories    10537 non-null object
geoId                      10507 non-null object
countryterritoryCode       10328 non-null object
popData2018                10369 non-null float64
dtypes: datetime64[ns](1), float64(1), int64(5), object(3)
memory usage: 823.3+ KB


In [5]:
#rename columns
data = data.rename(columns = {'countriesAndTerritories': 'country'})
data = data.rename(columns = {'dateRep': 'date'})
#changes all the underscores to spaces
data['country'] = data['country'].str.replace('_', ' ')
data.head()

Unnamed: 0,date,day,month,year,cases,deaths,country,geoId,countryterritoryCode,popData2018
0,2020-04-13,13,4,2020,52,0,Afghanistan,AF,AFG,37172386.0
1,2020-04-12,12,4,2020,34,3,Afghanistan,AF,AFG,37172386.0
2,2020-04-11,11,4,2020,37,0,Afghanistan,AF,AFG,37172386.0
3,2020-04-10,10,4,2020,61,1,Afghanistan,AF,AFG,37172386.0
4,2020-04-09,9,4,2020,56,3,Afghanistan,AF,AFG,37172386.0


# Heat Map

In [6]:
#grouping the data for the heat map
data2 = data.groupby(['country'])['country', 'deaths', 'cases'].sum().reset_index().sort_values('country', ascending=True)
data2.head(10)

Unnamed: 0,country,deaths,cases
0,Afghanistan,18,607
1,Albania,23,446
2,Algeria,293,1914
3,Andorra,29,638
4,Angola,2,19
5,Anguilla,0,3
6,Antigua and Barbuda,2,21
7,Argentina,95,2208
8,Armenia,13,1013
9,Aruba,0,92


In [7]:
#creating the world heat map of number of cases worldwide
#creating chloropleth
fig = go.Figure(data = go.Choropleth(
    locations = data2['country'],
    locationmode = 'country names',
    z = data2['cases'],
    colorscale = 'Reds',
    marker_line_color = 'black',
    marker_line_width = 0.5,
))

fig.update_layout(
    title_text = 'Confirmed COVID-19 Cases',
    title_x = 0.5,
    geo = dict(
        showframe = False,
        showcoastlines = False,
        projection_type = 'equirectangular'
    )
)

In [8]:
#creating the world heat map of number of deaths worldwide
#creating chloropleth
fig = go.Figure(data = go.Choropleth(
    locations = data2['country'],
    locationmode = 'country names',
    z = data2['deaths'],
    colorscale = 'Reds',
    marker_line_color = 'black',
    marker_line_width = 0.5,
))

fig.update_layout(
    title_text = 'Confirmed COVID-19 Deaths',
    title_x = 0.5,
    geo = dict(
        showframe = False,
        showcoastlines = False,
        projection_type = 'equirectangular'
    )
)

# Bar Graph

In [9]:
#creating a bar graph to show the spike in cases
bar_data = data.groupby(['country', 'date'])['country', 'deaths', 'cases'].sum().reset_index().sort_values('date', ascending=True)

fig = px.bar(bar_data, x="date", y="cases", color='country', text = 'cases', orientation='v', height=600,
             title='Total Number of Cases of COVID-19')
fig.show()

In [10]:
#creating a bar graph to show the spike in deaths
bar_data = data.groupby(['country', 'date'])['country', 'deaths', 'cases'].sum().reset_index().sort_values('date', ascending=True)

fig = px.bar(bar_data, x="date", y="deaths", color='country', text = 'deaths', orientation='v', height=600,
             title='Total Number of Deaths by COVID-19')
fig.show()

# Time Series

In [11]:
#create a timeseries graph of the change in confirmed COVID cases
timeseriesdata = data.groupby(['date'])['cases'].sum().reset_index().sort_values('date', ascending=True)

fig = px.line(timeseriesdata, x = 'date', y='cases', title='COVID-19 Cases over Time')
fig.show()

In [12]:
#create a timeseries graph of the change in deaths
timeseriesdata = data.groupby(['date'])['deaths'].sum().reset_index().sort_values('date', ascending=True)

fig = px.line(timeseriesdata, x = 'date', y='deaths', title='COVID-19 Deaths over Time')
fig.show()


# Pie Chart

In [13]:
#creating the dataset for the pieChart
pieData = data.groupby(['country'])['cases'].sum().reset_index()

#creating a pie chart to display the percent of cases per country
fig = px.pie(pieData, values = 'cases',names='country', height=600)
fig.update_traces(textposition='inside', textinfo='percent+label')

fig.update_layout(
    title_x = 0.5,
    geo=dict(
        showframe = False,
        showcoastlines = False,
    ))
#showing the dataset
fig.show()

In [14]:
#creating the dataset for the pieChart
pieData2 = data.groupby(['country'])['deaths'].sum().reset_index()

#creating a pie chart to display the percent of cases per country
fig = px.pie(pieData2, values = 'deaths',names='country', height=600)
fig.update_traces(textposition='inside', textinfo='percent+label')

fig.update_layout(
    title_x = 0.5,
    geo=dict(
        showframe = False,
        showcoastlines = False,
    ))
#showing the dataset
fig.show()

# Summary

What I learned through this analysis is that countries with highly dense populations have been hit the hardest and have experienced a drastic change in the number of cases and deaths. Countries like India haven't been hit as hard even though cities are densely populated and that may be because of a lack of testing, and the fact that the country is quite warm which may be killing off the disease. So, the current pandemic in my opinion isn't an accurate depiction of what is really out there. We need to take precaution so we can have this end sooner than later.

    **Stay Indoors as much as you can! Wear masks and gloves when you go outside! Wash your hands frequently!**