# Covid-19 Data Analysis

### Tasks
1. Show the number of Confirmed, Deaths and Recovered cases in each Region.
2. Remove all the records where the Confirmed Cases is Less Than 10.
3. In which Region, maximum number of Confirmed cases were recorded ?
4.  In which Region, minimum number of Deaths cases were recorded ?
5. How many Confirmed, Deaths & Recovered cases were reported from India till 29 April 2020 ?
6.  Sort the entire data wrt No. of Confirmed cases in ascending order.
7. Sort the entire data wrt No. of Recovered cases in descending order.


In [1]:
import pandas as pd
import os
from pathlib import Path

In [2]:
p = Path(os.getcwd())
p

WindowsPath('f:/admin/Documents/DataScience_Projects/Covid_DataAnalysis')

In [5]:
covid_dataset = str(p.parent.parent) + "\\DataScience_Projects\Covid_DataAnalysis\Covid_DataSet\\"
df_covid = pd.read_csv(covid_dataset + 'covid-dataset.csv')
df_covid.head()

Unnamed: 0,Date,State,Region,Confirmed,Deaths,Recovered
0,4/29/2020,,Afghanistan,1939,60,252
1,4/29/2020,,Albania,766,30,455
2,4/29/2020,,Algeria,3848,444,1702
3,4/29/2020,,Andorra,743,42,423
4,4/29/2020,,Angola,27,2,7


### 1. Show the number of Confirmed, Deaths and Recovered cases in each Region.

In [17]:
df_covid.count()

Date         321
State        140
Region       321
Confirmed    321
Deaths       321
Recovered    321
dtype: int64

In [26]:
df_covid.groupby('Region').sum().head()

Unnamed: 0_level_0,Confirmed,Deaths,Recovered
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Afghanistan,1939,60,252
Albania,766,30,455
Algeria,3848,444,1702
Andorra,743,42,423
Angola,27,2,7


### 2. Remove all the records where the Confirmed Cases is Less Than 10.

In [22]:
cases_confi_big_10 = df_covid[df_covid['Confirmed'] > 10]
cases_confi_big_10


Unnamed: 0,Date,State,Region,Confirmed,Deaths,Recovered
0,4/29/2020,,Afghanistan,1939,60,252
1,4/29/2020,,Albania,766,30,455
2,4/29/2020,,Algeria,3848,444,1702
3,4/29/2020,,Andorra,743,42,423
4,4/29/2020,,Angola,27,2,7
...,...,...,...,...,...,...
316,4/29/2020,Wyoming,US,545,7,0
317,4/29/2020,Xinjiang,Mainland China,76,3,73
318,4/29/2020,Yukon,Canada,11,0,0
319,4/29/2020,Yunnan,Mainland China,185,2,181


In [25]:
df_covid2 = cases_confi_big_10.copy()

### 3. In which Region, maximum number of Confirmed cases were recorded ?

In [35]:
df_covid.groupby('Region')['Confirmed'].sum().sort_values(ascending=False).head(10)

Region
US                1039909
Spain              236899
Italy              203591
France             166543
UK                 166441
Germany            161539
Turkey             117589
Russia              99399
Iran                93657
Mainland China      82862
Name: Confirmed, dtype: int64

### 4.  In which Region, minimum number of Deaths cases were recorded ?

In [39]:
df_covid.groupby('Region').Deaths.sum().sort_values(ascending=True).head(30)

Region
Laos                                0
Mongolia                            0
Mozambique                          0
Cambodia                            0
Fiji                                0
Namibia                             0
Nepal                               0
Madagascar                          0
Macau                               0
Papua New Guinea                    0
Rwanda                              0
Saint Kitts and Nevis               0
Bhutan                              0
Dominica                            0
Central African Republic            0
Saint Lucia                         0
Holy See                            0
Sao Tome and Principe               0
Yemen                               0
Western Sahara                      0
Eritrea                             0
Vietnam                             0
Saint Vincent and the Grenadines    0
Timor-Leste                         0
Uganda                              0
Grenada                             0
South

### 5. How many Confirmed, Deaths & Recovered cases were reported from India till 29 April 2020 ?

In [45]:
date = pd.to_datetime(df_covid['Date'], format='%m/%d/%Y')
date


0     2020-04-29
1     2020-04-29
2     2020-04-29
3     2020-04-29
4     2020-04-29
         ...    
316   2020-04-29
317   2020-04-29
318   2020-04-29
319   2020-04-29
320   2020-04-29
Name: Date, Length: 321, dtype: datetime64[ns]

In [47]:
df_covid3 = df_covid.copy()
df_covid3['Date'] = date
df_covid3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 321 entries, 0 to 320
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Date       321 non-null    datetime64[ns]
 1   State      140 non-null    object        
 2   Region     321 non-null    object        
 3   Confirmed  321 non-null    int64         
 4   Deaths     321 non-null    int64         
 5   Recovered  321 non-null    int64         
dtypes: datetime64[ns](1), int64(3), object(2)
memory usage: 15.2+ KB


In [49]:
df_covid3[(df_covid3['Region'] == 'India') & (df_covid3['Date'] <= '29-04-2020')]

  return self._cmp_method(other, operator.le)


Unnamed: 0,Date,State,Region,Confirmed,Deaths,Recovered
74,2020-04-29,,India,33062,1079,8437


In [50]:
df_covid3['Date'].value_counts()

2020-04-29    321
Name: Date, dtype: int64

### 6.  Sort the entire data wrt No. of Confirmed cases in ascending order.

In [55]:
df_covid.sort_values(by = ['Confirmed'], ascending= True)

Unnamed: 0,Date,State,Region,Confirmed,Deaths,Recovered
285,4/29/2020,Recovered,US,0,0,120720
284,4/29/2020,Recovered,Canada,0,0,20327
203,4/29/2020,Diamond Princess cruise ship,Canada,0,1,0
305,4/29/2020,Tibet,Mainland China,1,0,1
289,4/29/2020,Saint Pierre and Miquelon,France,1,0,0
...,...,...,...,...,...,...
57,4/29/2020,,France,165093,24087,48228
168,4/29/2020,,UK,165221,26097,0
80,4/29/2020,,Italy,203591,27682,71252
153,4/29/2020,,Spain,236899,24275,132929


### 7. Sort the entire data wrt No. of Recovered cases in descending order.

In [56]:
df_covid.sort_values(by = ['Recovered'], ascending= False)

Unnamed: 0,Date,State,Region,Confirmed,Deaths,Recovered
153,4/29/2020,,Spain,236899,24275,132929
285,4/29/2020,Recovered,US,0,0,120720
61,4/29/2020,,Germany,161539,6467,120400
76,4/29/2020,,Iran,93657,5957,73791
80,4/29/2020,,Italy,203591,27682,71252
...,...,...,...,...,...,...
248,4/29/2020,Maryland,US,20849,1078,0
246,4/29/2020,Manitoba,Canada,275,6,0
243,4/29/2020,Louisiana,US,27660,1845,0
241,4/29/2020,Kentucky,US,4537,234,0
