<h1>
    <center>Novel Corona Virus 2020 - Exploratory Data Analysis(EDA)</center>
</h1>
<br/>
<center><h3>PROJECT DONE BY</h3></center>
<center><h5>Sunil Kumar Mano</h5></center>
<center><h5>AI and Machine Learning practitioner</h5></center>
<center><h5>Email: sunilkumarm.182@gmail.com</h5></center>

## 1. Introduction

The 2019-nCoV is a contagious coronavirus that hailed from Wuhan (Hubei province), China. This new strain of virus has striked fear in many countries as cities are quarantined and hospitals are overcrowded.

"Coronaviruses (CoV) are a large family of viruses that cause illness ranging from the common cold to more severe diseases. Some coronaviruses transmit between animals, some between animals and people, and others from people to people." (https://www.canada.ca/en/public-health/services/diseases/coronavirus.html)


#### Purpose of this notebook:

The purpose of this notebook is to provide insights into the data scrapped from a dashboard created by Johns Hopkins University.

__Note__: The virus and information available are relatively new, which means the information available now might change in the future.

## 2. Import the packages

#### 2.1 Import the packages

In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

We have imported the necessary packages in our project that are helpful for our data analysis.

#### 2.2 Import the dataset

Next, let's import the data in our project.

In [3]:
corona_data = pd.read_csv("2019_nCoV_data.csv")
corona_data.head()

Unnamed: 0,Sno,Date,Province/State,Country,Last Update,Confirmed,Deaths,Recovered
0,1,01/22/2020 12:00:00,Anhui,China,01/22/2020 12:00:00,1.0,0.0,0.0
1,2,01/22/2020 12:00:00,Beijing,China,01/22/2020 12:00:00,14.0,0.0,0.0
2,3,01/22/2020 12:00:00,Chongqing,China,01/22/2020 12:00:00,6.0,0.0,0.0
3,4,01/22/2020 12:00:00,Fujian,China,01/22/2020 12:00:00,1.0,0.0,0.0
4,5,01/22/2020 12:00:00,Gansu,China,01/22/2020 12:00:00,0.0,0.0,0.0


We have imported the data in our project for analysis. Let's start the analysis by digging deep into each and every columns.

## 3. Data Understanding

#### 3.1 Checking the columns in the dataset

Let's start by analysing the number of columns, description and it's purpose in the dataset

In [5]:
corona_data.columns

Index(['Sno', 'Date', 'Province/State', 'Country', 'Last Update', 'Confirmed',
       'Deaths', 'Recovered'],
      dtype='object')

From the analysis above, our dataset has 8 columns. The description and the purpose of each and every columns is given below.

__Column Description:__

1. __Sno__ - Serial number
2. __Date__ - Date and time of the observation in MM/DD/YYYY HH:MM:SS
3. __Province/State__ - Province or state of the observation
4. __Country__ - Country of observation
5. __Last Update__ - Time in UTC at which the data is updated for the given province or country.
6. __Confirmed__ - Number of confirmed cases
7. __Deaths__ - Number of deaths
8. __Recovered__ - Number of recovered cases

#### 3.2 Checking the Null Values in the dataset

Let's check whether our data have any Null values in it. Null values may impact our analysis results, if present.

In [7]:
corona_data.isnull().sum()

Sno                 0
Date                0
Province/State    290
Country             0
Last Update         0
Confirmed           0
Deaths              0
Recovered           0
dtype: int64

From the analysis above, our dataset has 290 null/empty values in the __Province/State__ column.

But logically, when checking the entire data we can understand that __Province/State__ column is not captured for Non-China countries.

#### 3.3 Checking the Shape of the dataset

Let's check the shape of the dataset i.e., the number of observations/rows and features/columns present in the dataset.

In [9]:
corona_data.shape

(1127, 8)

Our dataset has 1127 observations/rows and 8 features/columns present.

#### 3.4 Checking the datatype present in the dataset

Let's check the datatype of the features present in the dataset.

In [10]:
corona_data.dtypes

Sno                 int64
Date               object
Province/State     object
Country            object
Last Update        object
Confirmed         float64
Deaths            float64
Recovered         float64
dtype: object

Our dataset has __1 - Integer__ feature, __4 - Categorial__ features and __3 - float__ value columns.

#### 3.5 Concise summary of the dataset

Let's check the consise summary of the dataset.

In [11]:
corona_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1127 entries, 0 to 1126
Data columns (total 8 columns):
Sno               1127 non-null int64
Date              1127 non-null object
Province/State    837 non-null object
Country           1127 non-null object
Last Update       1127 non-null object
Confirmed         1127 non-null float64
Deaths            1127 non-null float64
Recovered         1127 non-null float64
dtypes: float64(3), int64(1), object(4)
memory usage: 70.5+ KB


#### 3.6 Statistical summary of the dataset

Let's check the statistical summary of the dataset.

In [13]:
corona_data.describe(include = "all")

Unnamed: 0,Sno,Date,Province/State,Country,Last Update,Confirmed,Deaths,Recovered
count,1127.0,1127,837,1127,1127,1127.0,1127.0,1127.0
unique,,19,59,33,293,,,
top,,02/08/2020 23:04:00,Qinghai,Mainland China,01/31/2020 19:00:00,,,
freq,,72,19,553,63,,,
mean,564.0,,,,,255.912156,5.443656,12.04614
std,325.481182,,,,,1796.841832,52.486877,83.674768
min,1.0,,,,,0.0,0.0,0.0
25%,282.5,,,,,2.0,0.0,0.0
50%,564.0,,,,,10.0,0.0,0.0
75%,845.5,,,,,80.0,0.0,2.0


## 4. Exploratory Data Analysis

#### 4.1 Total Number of Corona cases World wide

Let's analyse the total number of Corona cases confirmed, deaths and recovered around the world.

In [16]:
print("Total number of confirmed cases Worldwide : ", corona_data.Confirmed.sum())
print("Total number of Death cases Worldwide : ", corona_data.Deaths.sum())
print("Total number of Recovered cases Worldwide : ", corona_data.Recovered.sum())

Total number of confirmed cases Worldwide :  288413.0
Total number of Death cases Worldwide :  6135.0
Total number of Recovered cases Worldwide :  13576.0


#### 4.2 Number of Corona cases Confirmed - Countrywise

Let's group the Corona cases __confirmed__ based on every country.

In [23]:
corona_confirmed = corona_data.groupby("Country").Confirmed.sum().to_frame()
corona_confirmed.sort()

AttributeError: 'DataFrame' object has no attribute 'sort'