# Assignment - Data Wrangling using the Covid-19 Dataset (Updated)


The Coronavirus disease 2019 (COVID-19), formerly known as 2019-nCoV acute respiratory disease, is an infectious disease caused by SARS-CoV-2, a virus closely related to the SARS virus. The disease is the cause of the 2019–20 coronavirus outbreak. It is primarily spread between people via respiratory droplets from infected individuals when they cough or sneeze. Time from exposure to onset of symptoms is generally between 2 and 14 days. Spread can be limited by handwashing and other hygiene measures.*

In this assignment, we will use a Novel Corona Virus 2019 Dataset that is extracted from Kaggle, to perform some data wrangling exercises. 

This dataset consists of three files:
* time_series_2019_ncov_confirmed.csv
* time_series_2019_ncov_recovered.csv
* time_series_2019_ncov_deaths.csv

Each of these files contains the number of confirmed cases, as well as the number of recoveries and deaths that resulted from the COVID-19 disease. The areas affected are classified according to provinces/states, as well as countries/regions. 

* Source - https://en.wikipedia.org/wiki/Coronavirus_disease_2019

Take a minute to explore the various columns and rows in the dataset

# Import the Packages

In [1]:
# ***ADD YOUR ANSWER HERE***
import pandas as pd
import numpy as np
import datetime

# Setting the Pandas Print Option

In [2]:
# Ensures that pandas print out all rows
# ***ADD YOUR ANSWER HERE***
pd.options.display.max_rows = 80
pd.options.display.max_columns = 40

# Load the CSV Files

In [3]:
# ***ADD YOUR ANSWER HERE***
df_conf = pd.read_csv('time_series_2019_ncov_confirmed.csv')
df_death = pd.read_csv('time_series_2019_ncov_deaths.csv')
df_recovered = pd.read_csv('time_series_2019_ncov_recovered.csv')

# View the DataFrames

Observe the various rows and columns of the loaded dataframes

In [56]:
# ***ADD YOUR ANSWER HERE***
#time_series_2019_ncov_confirmed
df_conf

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/21/20 22:00,1/22/20 12:00,1/23/20 12:00,1/24/20 0:00,1/24/20 12:00,1/25/20 0:00,1/25/20 12:00,1/25/20 22:00,1/26/20 11:00,1/26/20 23:00,1/27/20 9:00,1/27/20 19:00,1/27/20 20:30,1/28/20 13:00,1/28/20 18:00,1/28/20 23:00,1/29/20 13:30,1/29/20 14:30,1/29/20 21:00,1/30/20 11:00,1/31/20 14:00,02/01/20 10:00,02/02/20 21:00,02/03/20 21:00,02/04/20 9:40,02/04/20 22:00,02/05/20 9:00,02/05/20 23:00,02/06/20 9:00,02/06/20 14:20,02/07/20 20:13,02/07/20 22:50,02/08/20 22:04,02/08/20 23:04,02/09/20 10:30,02/09/20 23:20
0,Anhui,Mainland China,31.82571,117.2264,,1.0,9.0,15.0,15.0,39.0,39.0,60.0,60.0,70.0,70.0,70.0,106.0,106.0,106.0,152.0,152.0,152.0,200.0,200.0,237.0,297.0,408.0,480.0,480.0,530.0,530.0,591.0,591.0,591.0,665,733,733,779,779,830
1,Beijing,Mainland China,40.18238,116.4142,10.0,14.0,22.0,26.0,36.0,36.0,41.0,51.0,68.0,68.0,72.0,80.0,80.0,91.0,91.0,91.0,111.0,111.0,111.0,114.0,139.0,168.0,191.0,212.0,212.0,228.0,253.0,274.0,274.0,274.0,297,315,315,326,326,337
2,Chongqing,Mainland China,30.05718,107.874,5.0,6.0,9.0,27.0,27.0,57.0,57.0,75.0,75.0,110.0,110.0,110.0,132.0,132.0,132.0,147.0,147.0,147.0,165.0,182.0,211.0,247.0,300.0,337.0,337.0,366.0,376.0,389.0,400.0,400.0,415,426,428,446,450,468
3,Fujian,Mainland China,26.07783,117.9895,,1.0,5.0,5.0,10.0,10.0,18.0,18.0,35.0,35.0,56.0,59.0,59.0,80.0,80.0,82.0,84.0,84.0,101.0,101.0,120.0,144.0,159.0,179.0,179.0,194.0,205.0,215.0,215.0,215.0,224,239,239,250,250,261
4,Gansu,Mainland China,36.0611,103.8343,,,2.0,2.0,2.0,4.0,4.0,7.0,7.0,14.0,14.0,14.0,19.0,19.0,19.0,24.0,24.0,24.0,26.0,26.0,29.0,35.0,51.0,55.0,55.0,57.0,57.0,62.0,62.0,62.0,67,71,79,79,79,83
5,Guangdong,Mainland China,23.33841,113.422,17.0,26.0,32.0,53.0,53.0,78.0,78.0,98.0,111.0,146.0,151.0,151.0,151.0,207.0,207.0,241.0,277.0,277.0,311.0,354.0,436.0,535.0,683.0,725.0,797.0,870.0,895.0,944.0,970.0,970.0,1034,1075,1095,1120,1131,1151
6,Guangxi,Mainland China,23.82908,108.7881,,2.0,5.0,13.0,23.0,23.0,23.0,33.0,36.0,46.0,46.0,46.0,51.0,51.0,51.0,58.0,58.0,58.0,78.0,78.0,87.0,100.0,127.0,139.0,139.0,150.0,150.0,168.0,168.0,168.0,172,183,183,195,195,210
7,Guizhou,Mainland China,26.81536,106.8748,,1.0,3.0,3.0,3.0,4.0,4.0,5.0,5.0,7.0,7.0,7.0,9.0,9.0,9.0,9.0,9.0,9.0,12.0,12.0,29.0,29.0,46.0,56.0,56.0,64.0,64.0,69.0,71.0,71.0,81,89,89,96,99,109
8,Hainan,Mainland China,19.19673,109.7455,,4.0,5.0,8.0,8.0,17.0,19.0,19.0,22.0,22.0,33.0,33.0,33.0,40.0,40.0,43.0,43.0,43.0,43.0,46.0,52.0,62.0,71.0,79.0,79.0,91.0,91.0,100.0,106.0,106.0,117,124,124,128,131,136
9,Hebei,Mainland China,38.0428,114.5149,,1.0,1.0,2.0,2.0,8.0,8.0,13.0,13.0,18.0,18.0,18.0,33.0,33.0,33.0,48.0,48.0,48.0,65.0,65.0,82.0,96.0,113.0,126.0,126.0,135.0,135.0,157.0,157.0,157.0,172,195,195,206,206,218


In [57]:
# ***ADD YOUR ANSWER HERE***
# time_series_2019_ncov_deaths
df_death

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/21/20 22:00,1/22/20 12:00,1/23/20 12:00,1/24/20 0:00,1/24/20 12:00,1/25/20 0:00,1/25/20 12:00,1/25/20 22:00,1/26/20 11:00,1/26/20 23:00,1/27/20 9:00,1/27/20 19:00,1/27/20 20:30,1/28/20 13:00,1/28/20 18:00,1/28/20 23:00,1/29/20 13:30,1/29/20 14:30,1/29/20 21:00,1/30/20 11:00,1/31/20 14:00,02/01/20 10:00,02/02/20 21:00,02/03/20 21:00,02/04/20 9:40,02/04/20 22:00,02/05/20 9:00,02/05/20 23:00,02/06/20 9:00,02/06/20 14:20,02/07/20 20:13,02/07/20 22:50,02/08/20 10:24,02/08/20 23:04,02/09/20 10:30,02/09/20 23:20
0,Anhui,Mainland China,31.82571,117.2264,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,1,1,3
1,Beijing,Mainland China,40.18238,116.4142,,,,,,,,,,,,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,2,2,2,2
2,Chongqing,Mainland China,30.05718,107.874,,,,,,,,,,,,,,,,,,,,,,1.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2,2,2,2
3,Fujian,Mainland China,26.07783,117.9895,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,0
4,Gansu,Mainland China,36.0611,103.8343,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,1,1,1,2
5,Guangdong,Mainland China,23.33841,113.422,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,1.0,1,1,1,1
6,Guangxi,Mainland China,23.82908,108.7881,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,1,1,1
7,Guizhou,Mainland China,26.81536,106.8748,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,1.0,1.0,1.0,1.0,1.0,1,1,1,1
8,Hainan,Mainland China,19.19673,109.7455,,,,,,,,,,,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,2.0,2,2,3,3
9,Hebei,Mainland China,38.0428,114.5149,,,,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1,2,2,2


In [58]:
# ***ADD YOUR ANSWER HERE***
#time_series_2019_ncov_recovered
df_recovered

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/21/20 22:00,1/22/20 12:00,1/23/20 12:00,1/24/20 0:00,1/24/20 12:00,1/25/20 0:00,1/25/20 12:00,1/25/20 22:00,1/26/20 11:00,1/26/20 23:00,1/27/20 9:00,1/27/20 19:00,1/27/20 20:30,1/28/20 13:00,1/28/20 18:00,1/28/20 23:00,1/29/20 13:30,1/29/20 14:30,1/29/20 21:00,1/30/20 11:00,1/31/20 14:00,02/01/20 10:00,02/02/20 21:00,02/03/20 21:00,02/04/20 9:40,02/04/20 22:00,02/05/20 9:00,02/05/20 23:00,02/06/20 9:00,02/06/20 14:20,02/07/20 20:13,02/07/20 22:50,02/08/20 10:24,02/08/20 23:04,02/09/20 10:30,02/09/20 23:20
0,Anhui,Mainland China,31.82571,117.2264,,,,,,,,,,,,,,,,,2.0,2.0,2.0,2.0,3.0,5.0,7.0,14.0,14.0,20.0,23.0,23.0,34.0,34.0,47.0,47.0,59,59,72,73
1,Beijing,Mainland China,40.18238,116.4142,,,,,1.0,1.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,5.0,9.0,9.0,12.0,12.0,23.0,24.0,31.0,31.0,31.0,33.0,34.0,34,37,37,44
2,Chongqing,Mainland China,30.05718,107.874,,,,,,,,,,,,,,,,1.0,,1.0,1.0,1.0,1.0,3.0,7.0,9.0,9.0,14.0,15.0,15.0,24.0,24.0,31.0,31.0,39,39,50,51
3,Fujian,Mainland China,26.07783,117.9895,,,,,,,,,,,,,,,,,,,,,,,,1.0,1.0,3.0,11.0,11.0,14.0,14.0,20.0,20.0,24,26,35,35
4,Gansu,Mainland China,36.0611,103.8343,,,,,,,,,,,,,,,,,,,,,,,3.0,3.0,3.0,4.0,4.0,6.0,6.0,6.0,9.0,9.0,12,12,15,16
5,Guangdong,Mainland China,23.33841,113.422,,,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,3.0,4.0,4.0,4.0,4.0,5.0,5.0,5.0,6.0,10.0,11.0,14.0,15.0,21.0,21.0,32.0,49.0,49.0,69.0,69.0,88.0,97.0,112,125,141,147
6,Guangxi,Mainland China,23.82908,108.7881,,,,,,,,,,,,,,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,7.0,7.0,10.0,13.0,14.0,14.0,14.0,17.0,17.0,17,18,18,18
7,Guizhou,Mainland China,26.81536,106.8748,,,,,,,,,,,,,,,,,,1.0,1.0,1.0,2.0,2.0,2.0,2.0,2.0,5.0,9.0,6.0,6.0,6.0,6.0,6.0,7,7,7,7
8,Hainan,Mainland China,19.19673,109.7455,,,,,,,,,,,,,,,,,1.0,,,1.0,1.0,1.0,4.0,4.0,4.0,4.0,5.0,5.0,8.0,8.0,10.0,11.0,14,15,19,19
9,Hebei,Mainland China,38.0428,114.5149,,,,,,,,,,,,,,,,,,,,,,,3.0,3.0,3.0,4.0,6.0,7.0,13.0,13.0,22.0,25.0,30,30,34,35


# Extract the names of columns containing dates and times

For each dataframe, extract the names of all the columns that contains date and time. Eg., 1/21/20 22:00, 1/22/20 12:00, etc.

![7.JPG](attachment:7.JPG)

In [55]:
# ***ADD YOUR ANSWER HERE***
dates_columns_conf = df_conf.columns[4:]
dates_columns_death = df_death.columns[4:]
dates_columns_recovered = df_recovered.columns[4:]
print(dates_columns_conf)
print(dates_columns_death)
print(dates_columns_recovered)

Index(['1/21/20 22:00', '1/22/20 12:00', '1/23/20 12:00', '1/24/20 0:00',
       '1/24/20 12:00', '1/25/20 0:00', '1/25/20 12:00', '1/25/20 22:00',
       '1/26/20 11:00', '1/26/20 23:00', '1/27/20 9:00', '1/27/20 19:00',
       '1/27/20 20:30', '1/28/20 13:00', '1/28/20 18:00', '1/28/20 23:00',
       '1/29/20 13:30', '1/29/20 14:30', '1/29/20 21:00', '1/30/20 11:00',
       '1/31/20 14:00', '02/01/20 10:00', '02/02/20 21:00', '02/03/20 21:00',
       '02/04/20 9:40', '02/04/20 22:00', '02/05/20 9:00', '02/05/20 23:00',
       '02/06/20 9:00', '02/06/20 14:20', '02/07/20 20:13', '02/07/20 22:50',
       '02/08/20 22:04', '02/08/20 23:04', '02/09/20 10:30', '02/09/20 23:20'],
      dtype='object')
Index(['1/21/20 22:00', '1/22/20 12:00', '1/23/20 12:00', '1/24/20 0:00',
       '1/24/20 12:00', '1/25/20 0:00', '1/25/20 12:00', '1/25/20 22:00',
       '1/26/20 11:00', '1/26/20 23:00', '1/27/20 9:00', '1/27/20 19:00',
       '1/27/20 20:30', '1/28/20 13:00', '1/28/20 18:00', '1/28/20 23:0

# Unpivot a DataFrame from wide format to long format

Unpivot the dataframes so that the dates are no longer represented as columns. Rather, the dates should be stored as values under a column, say Date. The number of cases (Confirmed, Deaths, and Recovered) should be saved as a corresponding column, say Confirmed, Recovered, and Deaths

In [59]:
df_conf_melted = df_conf.melt(id_vars=['Province/State','Country/Region','Lat','Long'], # define the unchanged columns
                   value_vars = dates_columns_conf, #what you want to unpivot
                   var_name= "Date", #name you want togive the columns
                   value_name='Confirmed') #name you want to give the values))

df_death_melted = df_death.melt(id_vars=['Province/State','Country/Region','Lat','Long'], # define the unchanged columns
                   value_vars = dates_columns_death, #what you want to unpivot
                   var_name= "Date", #name you want togive the columns
                   value_name='Deaths') #name you want to give the values))

df_recovered_melted = df_recovered.melt(id_vars=['Province/State','Country/Region','Lat','Long'], # define the unchanged columns
                   value_vars = dates_columns_recovered, #what you want to unpivot
                   var_name= "Date", #name you want togive the columns
                   value_name='Recovered') #name you want to give the values))


# View the unpivoted DataFrames

Display all the unpivoted dataframes

In [60]:
df_conf_melted


Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Confirmed
0,Anhui,Mainland China,31.82571,117.2264,1/21/20 22:00,
1,Beijing,Mainland China,40.18238,116.4142,1/21/20 22:00,10.0
2,Chongqing,Mainland China,30.05718,107.8740,1/21/20 22:00,5.0
3,Fujian,Mainland China,26.07783,117.9895,1/21/20 22:00,
4,Gansu,Mainland China,36.06110,103.8343,1/21/20 22:00,
...,...,...,...,...,...,...
2587,"Boston, MA",US,42.36010,-71.0589,02/09/20 23:20,1.0
2588,"San Benito, CA",US,36.57610,-120.9876,02/09/20 23:20,2.0
2589,,Belgium,50.50390,4.4699,02/09/20 23:20,1.0
2590,"Madison, WI",US,43.07310,-89.4012,02/09/20 23:20,1.0


In [7]:
df_death_melted


Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Deaths
0,Anhui,Mainland China,31.82571,117.2264,1/21/20 22:00,
1,Beijing,Mainland China,40.18238,116.4142,1/21/20 22:00,
2,Chongqing,Mainland China,30.05718,107.8740,1/21/20 22:00,
3,Fujian,Mainland China,26.07783,117.9895,1/21/20 22:00,
4,Gansu,Mainland China,36.06110,103.8343,1/21/20 22:00,
...,...,...,...,...,...,...
2587,"Boston, MA",US,42.36010,-71.0589,02/09/20 23:20,0.0
2588,"San Benito, CA",US,36.57610,-120.9876,02/09/20 23:20,0.0
2589,,Belgium,50.50390,4.4699,02/09/20 23:20,0.0
2590,"Madison, WI",US,43.07310,-89.4012,02/09/20 23:20,0.0


In [8]:
df_recovered_melted


Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Recovered
0,Anhui,Mainland China,31.82571,117.2264,1/21/20 22:00,
1,Beijing,Mainland China,40.18238,116.4142,1/21/20 22:00,
2,Chongqing,Mainland China,30.05718,107.8740,1/21/20 22:00,
3,Fujian,Mainland China,26.07783,117.9895,1/21/20 22:00,
4,Gansu,Mainland China,36.06110,103.8343,1/21/20 22:00,
...,...,...,...,...,...,...
2587,"Boston, MA",US,42.36010,-71.0589,02/09/20 23:20,0.0
2588,"San Benito, CA",US,36.57610,-120.9876,02/09/20 23:20,0.0
2589,,Belgium,50.50390,4.4699,02/09/20 23:20,0.0
2590,"Madison, WI",US,43.07310,-89.4012,02/09/20 23:20,0.0


# Combine all the unpivoted dataframes into one single dataframe

Combine all the various figures for Confirmed, Recovered, and Deaths into a single dataframe

![12.JPG](attachment:12.JPG)

In [63]:
# ***ADD YOUR ANSWER HERE***
df_merged = pd.merge(df_recovered_melted,df_death_melted, how="inner")
df_overall = pd.merge(df_merged,df_conf_melted, how="inner" )
df_overall


Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Recovered,Deaths,Confirmed
0,Anhui,Mainland China,31.82571,117.2264,1/21/20 22:00,,,
1,Beijing,Mainland China,40.18238,116.4142,1/21/20 22:00,,,10.0
2,Chongqing,Mainland China,30.05718,107.8740,1/21/20 22:00,,,5.0
3,Fujian,Mainland China,26.07783,117.9895,1/21/20 22:00,,,
4,Gansu,Mainland China,36.06110,103.8343,1/21/20 22:00,,,
...,...,...,...,...,...,...,...,...
2515,"Boston, MA",US,42.36010,-71.0589,02/09/20 23:20,0.0,0.0,1.0
2516,"San Benito, CA",US,36.57610,-120.9876,02/09/20 23:20,0.0,0.0,2.0
2517,,Belgium,50.50390,4.4699,02/09/20 23:20,0.0,0.0,1.0
2518,"Madison, WI",US,43.07310,-89.4012,02/09/20 23:20,0.0,0.0,1.0


# Replace all NAs with 0s

For all the empty cells in the dataframe, replace with 0

![13.JPG](attachment:13.JPG)

In [64]:
# ***ADD YOUR ANSWER HERE***
df_overall = df_overall.fillna(0)
df_overall

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Recovered,Deaths,Confirmed
0,Anhui,Mainland China,31.82571,117.2264,1/21/20 22:00,0.0,0.0,0.0
1,Beijing,Mainland China,40.18238,116.4142,1/21/20 22:00,0.0,0.0,10.0
2,Chongqing,Mainland China,30.05718,107.8740,1/21/20 22:00,0.0,0.0,5.0
3,Fujian,Mainland China,26.07783,117.9895,1/21/20 22:00,0.0,0.0,0.0
4,Gansu,Mainland China,36.06110,103.8343,1/21/20 22:00,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...
2515,"Boston, MA",US,42.36010,-71.0589,02/09/20 23:20,0.0,0.0,1.0
2516,"San Benito, CA",US,36.57610,-120.9876,02/09/20 23:20,0.0,0.0,2.0
2517,0,Belgium,50.50390,4.4699,02/09/20 23:20,0.0,0.0,1.0
2518,"Madison, WI",US,43.07310,-89.4012,02/09/20 23:20,0.0,0.0,1.0


# Change the Date column to Datetime format

Observe that the Date column contains both date and time. Eg. 1/25/20 12:00. Some cases are reported a few times a day. For this, it is useful to:
* Convert the date and time string into the datetime format
* Remove the time so that later on we can combine data that are reported a few times a day into a single day
* If there are more than 1 readings a day, only keep the last reading.

![a1.JPG](attachment:a1.JPG)

In [66]:
# ***ADD YOUR ANSWER HERE***
df_overall["Date"] = pd.to_datetime(df_overall["Date"]).dt.date
df_new=df_overall.sort_values(["Province/State", "Country/Region"]).drop_duplicates(subset = (["Province/State", "Country/Region", "Date"]),keep= 'last')
df_new


Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Recovered,Deaths,Confirmed
69,0,Belgium,50.50390,4.4699,2020-01-21,0.0,0.0,0.0
141,0,Belgium,50.50390,4.4699,2020-01-22,0.0,0.0,0.0
213,0,Belgium,50.50390,4.4699,2020-01-23,0.0,0.0,0.0
357,0,Belgium,50.50390,4.4699,2020-01-24,0.0,0.0,0.0
573,0,Belgium,50.50390,4.4699,2020-01-25,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...
1974,Zhejiang,Mainland China,29.18251,120.0985,2020-02-05,81.0,0.0,954.0
2118,Zhejiang,Mainland China,29.18251,120.0985,2020-02-06,94.0,0.0,954.0
2262,Zhejiang,Mainland China,29.18251,120.0985,2020-02-07,127.0,0.0,1048.0
2334,Zhejiang,Mainland China,29.18251,120.0985,2020-02-08,185.0,0.0,1075.0


# Display the daily number of confirmed, recovered, and deaths 

Display the daily numbers of Confirmed, Recovered, and Deaths cases.

![15.JPG](attachment:15.JPG)

In [70]:
# ***ADD YOUR ANSWER HERE***

df_new_ordered = df_new.sort_index() #used to order the indices
cases_grouped_date = df_new_ordered[['Country/Region','Confirmed','Recovered','Deaths']].groupby([df_new_ordered['Date']])


for date,group in cases_grouped_date:
    print("="*len(str(date)))
    print(date)
    print("="*len(str(date)))
    print(group)
    print("")
     
                                                                                

2020-01-21
          Country/Region  Confirmed  Recovered  Deaths
0         Mainland China        0.0        0.0     0.0
1         Mainland China       10.0        0.0     0.0
2         Mainland China        5.0        0.0     0.0
3         Mainland China        0.0        0.0     0.0
4         Mainland China        0.0        0.0     0.0
5         Mainland China       17.0        0.0     0.0
6         Mainland China        0.0        0.0     0.0
7         Mainland China        0.0        0.0     0.0
8         Mainland China        0.0        0.0     0.0
9         Mainland China        0.0        0.0     0.0
10        Mainland China        0.0        0.0     0.0
11        Mainland China        0.0        0.0     0.0
12        Mainland China      270.0       25.0     0.0
13        Mainland China        1.0        0.0     0.0
14        Mainland China        0.0        0.0     0.0
15        Mainland China        0.0        0.0     0.0
16        Mainland China        2.0        0.0     0.0

# Display the total daily number of confirmed, recovered, and deaths for each country

Display the daily numbers of Confirmed, Recovered, and Deaths cases. For this, we are only interested in the <b>total</b> numbers for each <b>country</b>. 

![16.JPG](attachment:16.JPG)

In [72]:
# ***ADD YOUR ANSWER HERE*** groupby by date and country

cases_grouped_date_country = df_new_ordered[['Confirmed','Recovered','Deaths']].groupby([df_new_ordered['Date'],df_new_ordered['Country/Region']]).agg({'Confirmed':'sum',"Recovered": "sum", "Deaths": "sum"})

cases_grouped_date_country

Unnamed: 0_level_0,Unnamed: 1_level_0,Confirmed,Recovered,Deaths
Date,Country/Region,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-01-21,Australia,0.0,0.0,0.0
2020-01-21,Belgium,0.0,0.0,0.0
2020-01-21,Cambodia,0.0,0.0,0.0
2020-01-21,Canada,0.0,0.0,0.0
2020-01-21,Finland,0.0,0.0,0.0
...,...,...,...,...
2020-02-09,Thailand,32.0,10.0,0.0
2020-02-09,UK,3.0,0.0,0.0
2020-02-09,US,12.0,3.0,0.0
2020-02-09,United Arab Emirates,7.0,0.0,0.0


# Display the data for the most recent day

Show the data for each country for the most recent day.
![Screenshot%202020-04-09%20at%201.53.57%20PM.png](attachment:Screenshot%202020-04-09%20at%201.53.57%20PM.png)


In [74]:
# get all the different dates
# ***ADD YOUR ANSWER HERE***
dates = df_new_ordered["Date"]

# latest date
# ***ADD YOUR ANSWER HERE***
most_recent_date = dates.max()
# get all the data on a partciular date
# ***ADD YOUR ANSWER HERE***
df_recent_date = cases_grouped_date_country.loc[most_recent_date,:]
df_recent_date

Unnamed: 0_level_0,Confirmed,Recovered,Deaths
Country/Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Australia,15.0,2.0,0.0
Belgium,1.0,0.0,0.0
Cambodia,1.0,0.0,0.0
Canada,7.0,0.0,0.0
Finland,1.0,0.0,0.0
France,11.0,0.0,0.0
Germany,14.0,0.0,0.0
Hong Kong,36.0,0.0,1.0
India,3.0,0.0,0.0
Italy,3.0,0.0,0.0


# Top 10 countries with confirmed cases

Display the top 10 countries with confirmed cases.
![Screenshot%202020-04-09%20at%201.54.40%20PM.png](attachment:Screenshot%202020-04-09%20at%201.54.40%20PM.png)


In [77]:
# ***ADD YOUR ANSWER HERE*** find max
df_confirmed_top = df_recent_date.sort_values("Confirmed", ascending = False)
df_confirmed_top.head(10)

Unnamed: 0_level_0,Confirmed,Recovered,Deaths
Country/Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Mainland China,40160.0,3286.0,908.0
Others,64.0,0.0,0.0
Singapore,43.0,2.0,0.0
Hong Kong,36.0,0.0,1.0
Thailand,32.0,10.0,0.0
South Korea,27.0,3.0,0.0
Japan,26.0,1.0,0.0
Taiwan,18.0,1.0,0.0
Malaysia,18.0,1.0,0.0
Australia,15.0,2.0,0.0


***The End***