# WHO Coronavirus Disease (COVID-19) Dashboard

https://covid19.who.int

- Goto the website
- Download the csv file into a new **local** folder
- Start Jupyter Notebook
- Navigate to the folder
- Open a new Notebook there

## Python text strings

In [None]:
'https://' 'covid19.who.int'

In [None]:
who_url = 'https://covid19.who.int'

In [None]:
who_url

In [None]:
print?

In [None]:
print(who_url)

#### Introducing the indexing operator '[ ]'

In [None]:
who_url[1]

In [None]:
who_url[-1]

In [None]:
slice?

In [None]:
x = slice(5)
x

In [None]:
who_url[x]

In [None]:
who_url[:5]

In [None]:
who_url[-3:-1]

In [None]:
x = slice(-3, None)
x

In [None]:
who_url[x]

In [None]:
who_url[::-1]

In [None]:
who_url.upper()

#### Introducing the Python list '[ ]'

In [None]:
l = who_url.split('.')
l

In [None]:
l = l + ['csv']

In [None]:
l += [print]
l

In [None]:
l[-1](l[0])

In [None]:
csv_file = 'WHO-COVID-19-global-data.csv'

In [None]:
csv_url = who_url + '/' + csv_file

## Import a 3rd party package:

In [None]:
import pandas as pd

In [None]:
dir(pd)

In [None]:
pd.__version__

In [None]:
type(dir(pd))

In [None]:
len(dir(pd))

### What read methods do we have?
_Readability counts._

In [None]:
range(10)

In [None]:
list(range(10))

In [None]:
# C-style loop
dir_pd = dir(pd)
for i in range(len(dir_pd)):
    if dir_pd[i].startswith('read'):
        print(dir_pd[i])

In [None]:
# Python-style loop
for item in dir(pd):
    if item.startswith('read'):
        print(item)

In [None]:
x = 1 if 7 < 3 else 2
x

In [None]:
[x for x in dir(pd) if x.startswith('read')]

In [None]:
import this

In [None]:
pd.read_csv?

#### Firewall issues

In [None]:
# The following code does't work behind a firewall:
# df = pd.read_csv(url)

# Instead use the 'proxifyer' function
import proxify
df = pd.read_csv(proxify.proxify(csv_url))

# Alternatively download the csv-file via a Webbrowser
# and import it directly:
# df = pd.read_csv(csv_file)

## Understanding the DataFrame

In [None]:
print(df)

In [None]:
df.info()

In [None]:
df.index

In [None]:
df.columns

In [None]:
# take care of the empty spaces in the column names
df.columns = df.columns.str.strip()
df.columns

In [None]:
df.values

In [None]:
# surprisingly there are some negative deaths 
# (aka "resurrections") in the data?
df.describe()

### Selection via column attribute

In [None]:
df.Country

In [None]:
df.Country.unique()

## Selection via indexing operator '[ ]'  
- select columns by label

In [None]:
df['Country'].to_frame()

In [None]:
df[['Country', 'WHO_region']]

- select rows by slice

In [None]:
df[-3:]

- select rows by bool

In [None]:
ix = df.Country == 'Germany'
ix

In [None]:
df[ix]

In [None]:
df[(df.Country == 'Germany') & (df.New_cases > 5000)]

In [None]:
df.query('Country == "Germany" and New_cases > 5000')

### With a DataFrame you can't select a row by index <br>using the indexing operator only!
With a Series you can.

In [None]:
try:
    df[0]
except KeyError as err:
    print('KeyError:', err)

- Select rows and columns by chained indexing

In [None]:
df[ix].New_cases

In [None]:
df.New_cases[ix]

### Warning: Don't use chained indexing for assignments!

In [None]:
df.New_cases[:1] = 7

## Plotting data

In [None]:
%matplotlib inline

In [None]:
df[ix].plot()

In [None]:
df.New_cases[ix].plot(title='New cases Germany', grid=True);

In [None]:
df.Date_reported = pd.to_datetime(df.Date_reported)

In [None]:
df.dtypes

In [None]:
df.Date_reported[0].value

# inplace=True
- inplace, contrary to what the name implies, often does not prevent copies from being created, and (almost) never offers any performance benefits
- inplace does not work with method chaining
- inplace is a common pitfall for beginners, so removing this option will simplify the API

**I don't advise setting this parameter as it serves little purpose.**

```Python
df.set_index(['Date_reported'], inplace=True)
df.sort_index(inplace=True)
```

In [None]:
df = df.set_index(['Date_reported']).sort_index()
df

In [None]:
ax = (df
      .New_cases[df.Country=='Germany']
      .plot(title='New cases Germany', grid=True))
ax.set_ylim(0, 8000)

In [None]:
df[ix].plot(
    x='Cumulative_cases', 
    y='Cumulative_deaths', 
    title='Deaths vs. Infections', grid=True);

### Select rows and columns via loc attribute

In [None]:
df.loc['2020-01-11', 'Cumulative_cases']

In [None]:
df.loc['2020-01', 'Cumulative_cases']

### Select rows and columns via iloc attribute

In [None]:
df.iloc[[1, 3, 7], [1, 2]]

## Grouping rows

In [None]:
df.groupby('Country').sum()

In [None]:
df.groupby('Country')[['New_cases', 'Cumulative_cases']].sum()

In [None]:
sum, max

### Introducing the Python dictionary (dict) '{ }'

In [None]:
rule = {'New_cases': sum, 'Cumulative_cases': max}
rule

In [None]:
cases = (df
 .groupby(['Country'])[['New_cases', 'Cumulative_cases']]
 .agg(rule))
cases

In [None]:
all(cases.New_cases == cases.Cumulative_cases)

In [None]:
cases[cases.New_cases != cases.Cumulative_cases]

In [None]:
df[df.New_cases < 0]

### Group by 2 columns

https://en.wikipedia.org/wiki/WHO_regions

In [None]:
df.WHO_region.unique()

In [None]:
s = df.groupby(['Date_reported', 'WHO_region'])['New_cases'].sum()
s

### Multiindex Series

In [None]:
type(s)

In [None]:
s[:, 'EURO']

In [None]:
piv = s.unstack('WHO_region')
piv

In [None]:
piv.loc[:, 'EURO'].dropna().astype(int)

In [None]:
piv.plot(title='New_cases')

In [None]:
ax = piv.plot(
    title='New_cases', kind='bar', stacked=True, figsize=(12, 4))
ax.set_xticks([])

In [None]:
pd.pivot_table(
    df, 
    values='New_cases', 
    index='Date_reported', 
    columns='WHO_region', 
    aggfunc=sum).plot()

In [None]:
pd.pivot_table(
    df, 
    values='New_deaths', 
    index='Date_reported', 
    columns='WHO_region', 
    aggfunc=sum).plot()

## Add new columns

In [None]:
df['Mortality'] = df.Cumulative_deaths / df.Cumulative_cases
df

In [None]:
df.Mortality.max()

In [None]:
df.Mortality.idxmax()

In [None]:
df.loc[df.Mortality.idxmax()]

In [None]:
df.index.is_unique

### Multiindex DataFrame

In [None]:
df2 = df.set_index('Country', append=True).sort_index()
df2

In [None]:
df2.index.is_unique

In [None]:
df2.Mortality.idxmax()

In [None]:
df2.loc[df2.Mortality.idxmax()]

In [None]:
df2.loc[(slice(None), 'Germany'), :]  # you have to specify both axes here!

In [None]:
df2.xs('Germany', level=1)['2020-04']

## Joining 2 Series

Create a plot showing New_cases Germany vs. EURO

In [None]:
de = df.loc[df.Country == 'Germany', 'New_cases']
de.name = 'New cases Germany'
de

In [None]:
# 1st option:
eu = (df
      .loc[df.WHO_region == 'EURO', 'New_cases']
      .groupby('Date_reported')
      .sum())
eu.name = 'New cases Europe'      
eu

In [None]:
# 2nd option
eu = (df
      .groupby(['Date_reported', 'WHO_region'])
      .sum()
      .loc[(slice(None), 'EURO'), 'New_cases']
      .droplevel('WHO_region'))
eu.name = 'New cases Europe'     
eu

In [None]:
pd.concat([de, eu], axis=1).plot()

In [None]:
pd.concat([de, eu], axis=1).rolling(window=7).mean().plot()

## Exercises

1. How many Corona cases have been registered in 2020-Q1 world-wide?

In [None]:
# your code here ...

In [None]:
%load solutions/covid_01.py

2. What was the average number of new cases in Germany over the past 30 days?

In [None]:
# your code here ...

In [None]:
%load solutions/covid_02.py

3. Print: "The number of Corona deaths in the first half of this year for Sweden was xxxx."  
_Hint: Check thr print() arguments and the df.values attribute._

In [None]:
# your code here ...

In [None]:
%load solutions/covid_03.py

4. What was the average number of new cases in Europe over the past 30 days?

In [None]:
# your code here ...

In [None]:
%load solutions/covid_04.py

5. Make a 'pie' chart from the average number of new cases per WHO region over the past 30 days.  
_Hint: df.plot(kind='pie')_

In [None]:
# your code here ...

In [None]:
%load solutions/covid_05.py