# Data anlaysis on Covid sample dataset

### This data analysis is using Pandas

* In this, a tiny data set related to the Covid-19 pandemic is taken and analyzed in a simple way
* Questions are given in the project and then solved with the help of Python. 

    * Q. 1) Show the number of Confirmed, Deaths and Recovered cases in each Region.
    * Q. 2) Remove all the records where the Confirmed Cases is Less Than 10.
    * Q. 3) In which Region, maximum number of Confirmed cases were recorded ?
    * Q. 4) In which Region, minimum number of Deaths cases were recorded ?
    * Q. 5) How many Confirmed, Deaths & Recovered cases were reported from India till 29 April 2020 ?
    * Q. 6-A ) Sort the entire data wrt No. of Confirmed cases in ascending order.
    * Q. 6-B ) Sort the entire data wrt No. of Recovered cases in descending order.

In [95]:
import pandas as pd

In [96]:
data = pd.read_csv(r'C:\Savithri\HyperIsland\DA23_Projects\Project6_Python\youtube_DSL_Python_Projects\4. covid_19_data.csv')

In [97]:
data.columns

Index(['Date', 'State', 'Region', 'Confirmed', 'Deaths', 'Recovered'], dtype='object')

In [102]:
data.head()

Unnamed: 0,Date,State,Region,Confirmed,Deaths,Recovered
0,4/29/2020,,Afghanistan,1939,60,252
1,4/29/2020,,Albania,766,30,455
2,4/29/2020,,Algeria,3848,444,1702
3,4/29/2020,,Andorra,743,42,423
4,4/29/2020,,Angola,27,2,7


In [104]:
data.isnull().any()

Date         False
State         True
Region       False
Confirmed    False
Deaths       False
Recovered    False
dtype: bool

In [105]:
data.shape

(321, 6)

### Q. 1) Show the number of Confirmed, Deaths and Recovered cases in each Region.

In [103]:
data.groupby('Region')['Confirmed','Deaths','Recovered'].sum()

  """Entry point for launching an IPython kernel.


Unnamed: 0_level_0,Confirmed,Deaths,Recovered
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Afghanistan,1939,60,252
Albania,766,30,455
Algeria,3848,444,1702
Andorra,743,42,423
Angola,27,2,7
...,...,...,...
West Bank and Gaza,344,2,71
Western Sahara,6,0,5
Yemen,6,0,1
Zambia,97,3,54


### Q. 2) Remove all the records where the Confirmed Cases is Less Than 10.

In [100]:
#Solution1
data[~(data.Confirmed < 10)]

Unnamed: 0,Date,State,Region,Confirmed,Deaths,Recovered
0,4/29/2020,,Afghanistan,1939,60,252
1,4/29/2020,,Albania,766,30,455
2,4/29/2020,,Algeria,3848,444,1702
3,4/29/2020,,Andorra,743,42,423
4,4/29/2020,,Angola,27,2,7
...,...,...,...,...,...,...
316,4/29/2020,Wyoming,US,545,7,0
317,4/29/2020,Xinjiang,Mainland China,76,3,73
318,4/29/2020,Yukon,Canada,11,0,0
319,4/29/2020,Yunnan,Mainland China,185,2,181


In [13]:
#Get all the indexes based on given criteria
data[data['Confirmed'] < 10].index

Int64Index([ 18,  98, 105, 126, 140, 177, 178, 184, 192, 194, 203, 272, 284,
            285, 288, 289, 305],
           dtype='int64')

In [14]:
#Solution2
data.drop(data[data['Confirmed'] < 10].index, inplace = True)

### Q. 3) In which Region, maximum number of Confirmed cases were recorded ?


In [85]:
#Solution1
data.groupby('Region')['Confirmed'].sum().sort_values(ascending = False).head(10)

Region
US                1039909
Spain              236899
Italy              203591
France             166536
UK                 166432
Germany            161539
Turkey             117589
Russia              99399
Iran                93657
Mainland China      82861
Name: Confirmed, dtype: int64

In [79]:
data.groupby('Region')[['Confirmed']].sum()

Unnamed: 0_level_0,Confirmed
Region,Unnamed: 1_level_1
Afghanistan,1939
Albania,766
Algeria,3848
Andorra,743
Angola,27
...,...
Venezuela,331
Vietnam,270
West Bank and Gaza,344
Zambia,97


### Q. 4) In which Region, minimum number of Deaths cases were recorded ?


In [47]:
data.Deaths.min()

0

In [91]:
#Solution1
data.groupby('Region').Deaths.sum().sort_values(ascending = True).head(10)

Region
Cambodia                    0
Seychelles                  0
Saint Lucia                 0
Central African Republic    0
Saint Kitts and Nevis       0
South Sudan                 0
Rwanda                      0
Grenada                     0
Macau                       0
Madagascar                  0
Name: Deaths, dtype: int64

### Q. 5) How many Confirmed, Deaths & Recovered cases were reported from India till 29 April 2020 ?


In [50]:
data.head()

Unnamed: 0,Date,State,Region,Confirmed,Deaths,Recovered
0,4/29/2020,,Afghanistan,1939,60,252
1,4/29/2020,,Albania,766,30,455
2,4/29/2020,,Algeria,3848,444,1702
3,4/29/2020,,Andorra,743,42,423
4,4/29/2020,,Angola,27,2,7


In [56]:
data.dtypes

Date         object
State        object
Region       object
Confirmed     int64
Deaths        int64
Recovered     int64
dtype: object

In [60]:
data['Date'] = pd.to_datetime(data['Date'])

In [62]:
data.head(2)

Unnamed: 0,Date,State,Region,Confirmed,Deaths,Recovered
0,2020-04-29,,Afghanistan,1939,60,252
1,2020-04-29,,Albania,766,30,455


In [92]:
data[(data.Region == 'India')]

Unnamed: 0,Date,State,Region,Confirmed,Deaths,Recovered
74,2020-04-29,,India,33062,1079,8437


### Q. 6-A ) Sort the entire data wrt No. of Confirmed cases in ascending order.


In [64]:
data.sort_values(by=['Confirmed'], inplace=True)

In [94]:
data.head()

Unnamed: 0,Date,State,Region,Confirmed,Deaths,Recovered
153,2020-04-29,,Spain,236899,24275,132929
61,2020-04-29,,Germany,161539,6467,120400
76,2020-04-29,,Iran,93657,5957,73791
80,2020-04-29,,Italy,203591,27682,71252
229,2020-04-29,Hubei,Mainland China,68128,4512,63616


### Q. 6-B ) Sort the entire data wrt No. of Recovered cases in descending order.

In [68]:
data.sort_values(by=['Recovered'], inplace=True, ascending=False)

In [93]:
data.head()

Unnamed: 0,Date,State,Region,Confirmed,Deaths,Recovered
153,2020-04-29,,Spain,236899,24275,132929
61,2020-04-29,,Germany,161539,6467,120400
76,2020-04-29,,Iran,93657,5957,73791
80,2020-04-29,,Italy,203591,27682,71252
229,2020-04-29,Hubei,Mainland China,68128,4512,63616
