# "European Covid data exploration"
> "Exploring which countries have had the highest and lowest covid numbers in Europe"

- toc: true
- branch: master
- badges: true
- comments: false
- author: Ifan Johnston
- categories: [covid]

# Importing and preparing the data

We will be looking at data from the following countries:

- Italy
- Austria
- Germany
- Belgium
- France
- United Kingdom

We begin by importing the data, and adding calculating some new features so that we can compare the data from different countries. For example we calculate 'confirmed cases per 100k population', 'deaths per 100k' and 'new cases' since these are not initially in the dataset.

In [1]:
#collapse
from covid19dh import covid19
import altair as alt
import datetime

countries = ["Italy", 
             "Austria",
             "Germany",
             "Belgium",
             "France",
             "United Kingdom"
            ]

yesterday = datetime.date.today() - datetime.timedelta(days=1)

x, src = covid19(countries, raw=False, verbose=False, end=yesterday)

x_small = x.loc[:, ['administrative_area_level_1', 'date', 'vaccines', 'confirmed','tests', 'recovered', 'deaths', 'population']]
x_small.rename(columns={'administrative_area_level_1': 'id'}, inplace=True)

x_small['confirmed_per'] = 100000 * x_small['confirmed'] / x_small['population']
x_small['deaths_per'] = 100000 * x_small['deaths'] / x_small['population']
x_small['ratio'] = 100 * (x_small['deaths']) / (x_small['confirmed'])
x_small['tests_per'] = 100000 * (x_small['tests']) / (x_small['population'])

x_small['new_cases']=x_small.groupby('id').confirmed.diff().fillna(0)
x_small['new_cases_per']=x_small.groupby('id').confirmed_per.diff().fillna(0)

Here is a random sample of 5 rows from the dataset.

In [2]:
x_small.sample(5)

Unnamed: 0,id,date,vaccines,confirmed,tests,recovered,deaths,population,confirmed_per,deaths_per,ratio,tests_per,new_cases,new_cases_per
6118,Belgium,2020-02-04,0.0,0,0,0,0,11433256,0.0,0.0,,0.0,0.0,0.0
5119,Austria,2021-03-15,1123152.0,494803,18485146,447096,8775,8840521,5596.989137,99.258856,1.773433,209095.663027,2797.0,31.638407
41554,Italy,2021-01-05,272665.0,2181619,27139378,1536129,76329,60421760,3610.651196,126.327005,3.498732,44916.563172,15375.0,25.446131
22416,Germany,2021-02-24,5636560.0,2417575,43963626,2343393,74132,82905782,2916.051139,89.417165,3.066378,53028.419658,11324.0,13.658878
6092,Belgium,2020-01-09,0.0,0,0,0,0,11433256,0.0,0.0,,0.0,0.0,0.0


# Plotting the data

We will first look at the total numbers of cases and deaths in each country, before moving on to cases and deaths per 100k population.

{% include info.html text="In each of the charts below, you can click on the legend to filter the lines shown" %}

## Total cases per 100,000

In [3]:
#collapse

leg_selection = alt.selection_multi(fields=['id'], bind='legend')

alt.Chart(x_small).mark_line().encode(
    x=alt.X("yearmonthdate(date):T", axis=alt.Axis(title='Date')),
    y=alt.Y("confirmed_per:Q", axis=alt.Axis(title='Confirmed per 100k')),
    tooltip='id',
    color=alt.Color('id', legend=alt.Legend(title="Countries")),
    opacity=alt.condition(leg_selection, alt.value(1), alt.value(0.2))
).add_selection(leg_selection).properties(title='Total number of cases per 100,000 population for selected European Countries', width=600).interactive()


## Total deaths per 100,000

In [4]:
#collapse
alt.Chart(x_small).mark_line().encode(
    x=alt.X("yearmonthdate(date):T", axis=alt.Axis(title='Date')),
    y=alt.Y("deaths_per:Q", axis=alt.Axis(title='Deaths per 100k')),
    tooltip='id',
    color=alt.Color('id', legend=alt.Legend(title="Countries")),
    opacity=alt.condition(leg_selection, alt.value(1), alt.value(0.2))
).add_selection(leg_selection).properties(title='Number of deaths per 100,000 population for selected European Countries', width=600).interactive()

## New cases per 100,000

In [5]:
#collapse
brush = alt.selection(type='interval', encodings=['x'])

base = alt.Chart(x_small).mark_line().transform_window(
    rolling_mean='sum(new_cases_per)',
    frame=[-7, 0],
    groupby=['id']
).encode(
    x=alt.X("yearmonthdate(date):T",
            axis=alt.Axis(title='Date')
           ),
    y=alt.Y("rolling_mean:Q",
            axis=alt.Axis(title='Incidence rate')
           ),
    tooltip='id',
    color=alt.Color('id', legend=alt.Legend(title="Countries")),
    opacity=alt.condition(leg_selection, alt.value(1), alt.value(0.2))
).add_selection(leg_selection).properties(
    width=600,
    height=400,
    title='Number of new cases per 100,000 for selected countries'
)

upper = base.encode(
    alt.X('yearmonthdate(date):T',axis=alt.Axis(title='Date'),
          scale=alt.Scale(domain=brush))
)

lower = base.properties(
    height=60
).add_selection(brush)

upper & lower

The ratio of confirmed cases and deaths gives an indication of what the case fatality rate is - it seems to be between 2 and 3%, assuming that the countries listed here are catching all positive cases (which they probably aren't, so it's likely lower than this).

## Case fatality rate

In [6]:
#collapse
base = alt.Chart(x_small).mark_line().encode(
    x=alt.X("yearmonthdate(date):T", axis=alt.Axis(title='Date')),
    y=alt.Y("ratio:Q", axis=alt.Axis(title='Ratio of deaths per case')),
    tooltip='id',
    color=alt.Color('id', legend=alt.Legend(title="Countries")),
opacity=alt.condition(leg_selection, alt.value(1), alt.value(0.2))
).add_selection(leg_selection).properties(title='The ratio of deaths to confirmed cases (case fatality rate)', width=600)

upper = base.encode(
    alt.X('yearmonthdate(date):T',axis=alt.Axis(title='Date'),
          scale=alt.Scale(domain=brush))
)

lower = base.properties(
    height=60
).add_selection(brush)

upper & lower