*Notebook created by: Thabor Walbeek, March 2020 for learning purposes*

# Covid-19

![](covid.jpg)

Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus.

Most people infected with the COVID-19 virus will experience mild to moderate respiratory illness and recover without requiring special treatment.  Older people, and those with underlying medical problems like cardiovascular disease, diabetes, chronic respiratory disease, and cancer are more likely to develop serious illness.

The best way to prevent and slow down transmission is be well informed about the COVID-19 virus, the disease it causes and how it spreads. Protect yourself and others from infection by washing your hands or using an alcohol based rub frequently and not touching your face. 

The COVID-19 virus spreads primarily through droplets of saliva or discharge from the nose when an infected person coughs or sneezes, so it’s important that you also practice respiratory etiquette (for example, by coughing into a flexed elbow).

At this time, there are no specific vaccines or treatments for COVID-19. However, there are many ongoing clinical trials evaluating potential treatments. WHO will continue to provide updated information as soon as clinical findings become available.

https://www.who.int/health-topics/coronavirus#tab=tab_1


## 1. Exploratory Data Analysis

This notebook describes several steps for beginners to explore the 3 data sets for the COVID-19 virus. The 3 data sets are available on Kaggle (www.kaggle.com)

- time_series_covid19_confirmed_global.csv
- time_series_covid19_deaths_global.csv
- time_series_covid19_recovered_global.csv

Data set taken on 28th March 2020.

For the most accurate data please look at:

- https://www.worldometers.info/
- https://experience.arcgis.com/experience/685d0ace521648f8a5beeeee1b9125cd (WHO)

### 1.1 Load the data sets

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns                       #visualisation
import matplotlib.pyplot as plt             #visualisation

In [2]:
confirmed = pd.read_csv("time_series_covid19_confirmed_global.csv")
deaths = pd.read_csv("time_series_covid19_deaths_global.csv")
recovered = pd.read_csv("time_series_covid19_recovered_global.csv")

By loading the data sets, we have the data available to explore. In the first lines of code we **import** some packages, that will help in using pre-defined functions.

We will first have a look at how the data sets look like, to understand what information is available to us:

### 1.2 First glance on data

In [3]:
confirmed.head(10)

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,3/18/20,3/19/20,3/20/20,3/21/20,3/22/20,3/23/20,3/24/20,3/25/20,3/26/20,3/27/20
0,,Afghanistan,33.0,65.0,0,0,0,0,0,0,...,22,22,24,24,40,40,74,84,94,110
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,59,64,70,76,89,104,123,146,174,186
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,74,87,90,139,201,230,264,302,367,409
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,39,53,75,88,113,133,164,188,224,267
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,0,0,1,2,2,3,3,3,4,4
5,,Antigua and Barbuda,17.0608,-61.7964,0,0,0,0,0,0,...,1,1,1,1,1,3,3,3,7,7
6,,Argentina,-38.4161,-63.6167,0,0,0,0,0,0,...,79,97,128,158,266,301,387,387,502,589
7,,Armenia,40.0691,45.0382,0,0,0,0,0,0,...,84,115,136,160,194,235,249,265,290,329
8,Australian Capital Territory,Australia,-35.4735,149.0124,0,0,0,0,0,0,...,3,4,6,9,19,32,39,39,53,62
9,New South Wales,Australia,-33.8688,151.2093,0,0,0,0,3,4,...,267,307,353,436,669,669,818,1029,1219,1405


From this first look at the data we can see that we have several rows and several columns. To understand this, let's describe the columns and what value they contain:

### 1.2 Data Dictionary

| Column Name | Description |
| -- | -- |
| Province/State | Some countries have specific information about provinces or states, and hence have several rows for each of them |
| Country/Region | The name of the country |
| Lat | the value of the latitude of the country/region and/or the specific province/state |
| Long | the value of the longitude of the country/region and/or the specific province/state |
| 1/22/20 | The first available date with information on confirmed cases |
| ... | ... |
| 03/27/20 | The last available date with information on confirmed cases |

There are 70 columns in total with above information per column. Let's check the number of rows:

### 1.3 Shape of the data

In [5]:
confirmed.shape

(249, 70)

There are in total 249 rows in the data set. As some countries have multiple rows (specified in provinces/state), we want to check the number of unique countries. For that we can do the following:

In [8]:
confirmed['Country/Region'].nunique()

176

So from 249 rows in the data set for confirmed cases we have 176 unique countries.