To check that your coronavirus exercise was right, try to use the last '.csv' file from the next url. 

https://covid.ourworldindata.org/data/ecdc/full_data.csv

When you execute all your code, it must run without problems updating the graphs and outputs with the last available information. 

### Tasks: 

- Create a new dataframe with all the locations except the median. Call it "no_median_df". All must be different locations. 

#### Try to create a new function that return the answer of the next tasks and questions. One function per task: 

- Which countries have less cases than the percentile country 75%?
- Imagine that the median location is the 175th row. Create a dataframe with all the locations from the 165 to 185. Call it "median_df". 
- With "median_df", create with bars, with lines, with points and with pie charts the top 10 in reverse. Both in seaborn and matplotlib. 
- Create a new dataframe with the concatenation of the "median_df" and "no_median_df". After that, create a function that receives a dataframe as param and return the number of non NaN values per each column and per each file. 
- Show, in a graph, which location appears more times in the dataset. 
- Which is the country with less frequency?
- There is any unusual value? Why?
- Delete all NaN value and repeat the same tasks using the functions created.
- Do you find differences? Why?
- What conclusion can we draw?

In [14]:
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
df = pd.read_csv("https://covid.ourworldindata.org/data/ecdc/full_data.csv")

In [3]:
df

Unnamed: 0,date,location,new_cases,new_deaths,total_cases,total_deaths
0,2019-12-31,Afghanistan,0,0,0,0
1,2020-01-01,Afghanistan,0,0,0,0
2,2020-01-02,Afghanistan,0,0,0,0
3,2020-01-03,Afghanistan,0,0,0,0
4,2020-01-04,Afghanistan,0,0,0,0
...,...,...,...,...,...,...
6447,2020-03-22,Zambia,0,0,2,0
6448,2020-03-23,Zambia,1,0,3,0
6449,2020-03-21,Zimbabwe,1,0,1,0
6450,2020-03-22,Zimbabwe,1,0,2,0


### Q.1. Create a new dataframe with all the locations except the median. Call it "no_median_df". All must be different locations.

- First, we will check the median values of the original dataframe. Having a look at the dataframe it seems we might want to get rid of the 'location' 'World', as it does not represent a specific location, but rather the sum of all values of the countries listed. To do so, we will create a dataframe 'df_no_world' with the grouped locations, dropping the one with value 'World'.
- Once we have a clean dataframe without the value 'World', we compute the median, and filter the observation with that value. In this cases, we chose to do so in the column 'total_cases'. In seems the median of the dataframe is the location 'Faeroe Islands'
- Finally, we create the required dataframe 'no_median_df', by dropping the row/observation 'Faeroe Islands'

In [33]:
df.median()

new_cases       0.0
new_deaths      0.0
total_cases     1.0
total_deaths    0.0
dtype: float64

In [34]:
df_no_world = df.groupby('location').sum().drop("World", axis=0)
df_no_world

Unnamed: 0_level_0,new_cases,new_deaths,total_cases,total_deaths
location,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Afghanistan,34,0,213,0
Albania,89,2,635,17
Algeria,102,15,782,66
Andorra,113,0,440,0
Angola,2,0,4,0
...,...,...,...,...
Vatican,1,0,9,0
Venezuela,36,0,265,0
Vietnam,118,0,1413,0
Zambia,3,0,11,0


In [35]:
df_no_world.median()

new_cases        64.0
new_deaths        0.0
total_cases     359.0
total_deaths      0.0
dtype: float64

In [36]:
is_median = df_no_world['total_cases'] == df_no_world['total_cases'].median()
df_no_world_median = df_no_world[is_median]
df_no_world_median

Unnamed: 0_level_0,new_cases,new_deaths,total_cases,total_deaths
location,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Faeroe Islands,115,0,359,0


In [37]:
no_median_df = df_no_world.groupby('location').sum().drop("Faeroe Islands", axis=0)
no_median_df

Unnamed: 0_level_0,new_cases,new_deaths,total_cases,total_deaths
location,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Afghanistan,34,0,213,0
Albania,89,2,635,17
Algeria,102,15,782,66
Andorra,113,0,440,0
Angola,2,0,4,0
...,...,...,...,...
Vatican,1,0,9,0
Venezuela,36,0,265,0
Vietnam,118,0,1413,0
Zambia,3,0,11,0


### Q.2. Try to create a new function that return the answer of the next tasks and questions. One function per task: 

- Which countries have less cases than the percentile country 75%?
- Imagine that the median location is the 175th row. Create a dataframe with all the locations from the 165 to 185. Call it "median_df". 
- With "median_df", create with bars, with lines, with points and with pie charts the top 10 in reverse. Both in seaborn and matplotlib. 
- Create a new dataframe with the concatenation of the "median_df" and "no_median_df". After that, create a function that receives a dataframe as param and return the number of non NaN values per each column and per each file. 
- Show, in a graph, which location appears more times in the dataset. 
- Which is the country with less frequency?
- There is any unusual value? Why?
- Delete all NaN value and repeat the same tasks using the functions created.
- Do you find differences? Why?
- What conclusion can we draw?

### Which countries have less cases than the percentile country 75%?

First, we cumpute the 75% quantile and then filter the dataframe using a mask with that value:

In [60]:
df_no_world.quantile(q=0.75)

new_cases        348.5
new_deaths         3.0
total_cases     2238.0
total_deaths      14.0
Name: 0.75, dtype: float64

In [47]:
less_3q = df_no_world['total_cases'] < 2238
df_no_world[less_3q]

Unnamed: 0_level_0,new_cases,new_deaths,total_cases,total_deaths
location,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Afghanistan,34,0,213,0
Albania,89,2,635,17
Algeria,102,15,782,66
Andorra,113,0,440,0
Angola,2,0,4,0
...,...,...,...,...
Vatican,1,0,9,0
Venezuela,36,0,265,0
Vietnam,118,0,1413,0
Zambia,3,0,11,0


### Imagine that the median location is the 175th row. Create a dataframe with all the locations from the 165 to 185. Call it "median_df".