Covid-19 Data analysis for Austria

What we will do today:
* Pie plot
* Bar plot
* plotting on a map
* Line plot

The data is taken from here https://github.com/statistikat/coronaDAT.git

The map data for Austria is taken from here: https://www.data.gv.at/katalog/dataset/stat_gliederung-osterreichs-in-gemeinden14f53

The additional info about the districts is takenfrom wikipedia: https://de.wikipedia.org/wiki/Liste_der_Bezirke_und_Statutarst%C3%A4dte_in_%C3%96sterreich


The official data was finally published and can be found here: https://www.data.gv.at/covid-19/

In [None]:
from IPython.display import clear_output

Install the libraries geopandas and gitpython

In [None]:
!pip install geopandas
!pip install gitpython
clear_output()

Import all needed libraries

In [None]:
import geopandas
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

import urllib.request
import pathlib

from zipfile import ZipFile
import git
import os

% matplotlib inline

Clone the Git Repository which contains the Covid Data

The data was originally scraped from https://info.gesundheitsministerium.at/ before the layout of the dashboard was completely changed. Meanwhile the data is officially available here: https://www.data.gv.at/covid-19/

In [None]:
os.mkdir("/content/covid-data")

repo = git.Repo.clone_from("https://github.com/statistikat/coronaDAT.git", "/content/covid-data", no_checkout=True)
repo.git.checkout("8a439f28bbe817090b9583148ce8a64e06800814") # 18.04.2021

In [None]:
# Create a ZipFile Object and load our data in it
with ZipFile('/content/covid-data/archive/20210417/data/20210417_140201_orig_csv_ages.zip', 'r') as zipObj:
   # Extract all the contents of zip file in different directory
   zipObj.extractall('/content/covid-data/latest')

Have a look at the downloaded data. How is the structure? What kind of data does it contain?

Start with [PiePlot](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.axes.Axes.pie.html) of gender data


In [None]:
agegroup = pd.read_csv("/content/covid-data/latest/CovidFaelle_Altersgruppe.csv", sep=";")

In [None]:
agegroup.head()

In [None]:
sum_m = sum(agegroup[agegroup['Bundesland'] == "Österreich"][agegroup['Geschlecht'] == "M"]['Anzahl'])
sum_f = sum(agegroup[agegroup['Bundesland'] == "Österreich"][agegroup['Geschlecht'] == "W"]['Anzahl'])

gender = pd.DataFrame([["m", sum_m], ["f", sum_f]], columns=['gender', 'freq'])

In [None]:
# use the .plot.pie() function
plot = gender.plot.pie(y='freq', figsize=(5, 5), autopct='%1.0f%%', pctdistance=0.3, labeldistance=1.2, labels=gender['gender'])

[PiePlot](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.axes.Axes.pie.html) or [BarPlot](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.axes.Axes.bar.html#matplotlib.axes.Axes.bar) for age data?

In [None]:
agegroup = agegroup[agegroup['Bundesland'] == "Österreich"]

In [None]:
age = pd.DataFrame()
age['agegroup'] = list(dict.fromkeys(list(agegroup['Altersgruppe'])))
age['male'] = list(agegroup[agegroup['Geschlecht'] == "M"]['Anzahl'])
age['male_inhabitants'] = list(agegroup[agegroup['Geschlecht'] == "M"]['AnzEinwohner'])
age['female'] = list(agegroup[agegroup['Geschlecht'] == "W"]['Anzahl'])
age['female_inhabitants'] = list(agegroup[agegroup['Geschlecht'] == "W"]['AnzEinwohner'])

age['sum'] = age['male'] + age['female']
age['sum_inhabitants'] = age['male_inhabitants'] + age['female_inhabitants']
age['sum_norm'] = (age['male'] + age['female']) / age['sum_inhabitants']

In [None]:
age.head()

In [None]:
# use the .plot.pie() function
plot = age.plot.pie(y='sum', figsize=(5, 5), autopct='%1.0f%%', pctdistance=0.5, labeldistance=1.2, labels=age['agegroup'])

In [None]:
# use the .plot.bar() function
plot = age.plot.bar(x='agegroup', y='sum', rot=40)

In [None]:
plot = age.plot.bar(x='agegroup', y='sum_norm', rot=40)

In [None]:
# use the .plot.bar() function
age.plot.bar(x='agegroup', y=['male', 'female'], rot=40)

Show the data for the single districts on a map

In [None]:
cov_district = pd.read_csv("/content/covid-data/latest/CovidFaelle_GKZ.csv", sep=";")

In [None]:
cov_district.head()

Download the Shapefile for Austria. It's a file that describes the borders of states and villages in Austria.

In [None]:
#https://www.data.gv.at/katalog/dataset/stat_gliederung-osterreichs-in-gemeinden14f53
url = "http://data.statistik.gv.at/data/OGDEXT_GEM_1_STATISTIK_AUSTRIA_20200101.zip"
urllib.request.urlretrieve(url, '/content/Shapefile.zip')

In [None]:
# Create a ZipFile Object and load sample.zip in it
with ZipFile('/content/Shapefile.zip', 'r') as zipObj:
   # Extract all the contents of zip file in different directory
   zipObj.extractall('Shapefile')

In [None]:
# load the .shp file into the variable "austria"
austria = geopandas.read_file('/content/Shapefile/STATISTIK_AUSTRIA_GEM_20200101Polygon.shp', encoding='utf-8')

Look at the shapefile dataframe, plot it

In [None]:
austria.head()

In [None]:
austria.plot()

In [None]:
austria[austria.name == "Linz"].plot()

In [None]:
#combine with cov data

In [None]:
ids = []
for id in austria['id']:
  id = id[:-2]
  if id[0] == '9':
    ids.append(900)
  else:
    ids.append(int(id))

In [None]:
austria['GKZ'] = ids # district_id

In [None]:
austria = austria.merge(cov_district, on='GKZ', how="left").fillna(0)
austria.head()

In [None]:
#plot the austria map with cases as values
austria.plot(column='Anzahl', legend=True)

Show the relative cases 

In [None]:
# calculate the relative cases, store them in the column 'relative_cases'
austria['relative_cases'] = austria['Anzahl'] / austria['AnzEinwohner']

In [None]:
# plot the relative cases on the austria map
plt = austria.plot(column='relative_cases', legend=True)

[Lineplot](https://matplotlib.org/3.2.1/api/_as_gen/matplotlib.pyplot.plot.html) of general data

In [None]:
general = pd.read_csv("/content/covid-data/latest/CovidFaelle_Timeline.csv", sep=";")
casenumbers = pd.read_csv("/content/covid-data/latest/CovidFallzahlen.csv", sep=";")

In [None]:
general = general[general['BundeslandID'] == 10]
casenumbers = casenumbers[casenumbers['BundeslandID'] == 10]

In [None]:
casenumbers = casenumbers.rename(columns={'MeldeDatum': 'Time'})

In [None]:
merged = pd.merge(general, casenumbers, on=['Time'], how='left').fillna(0)
merged.head()

In [None]:
# call the .plot.line() function (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.line.html)
lines = merged.plot.line(x="Time", y=['AnzahlFaelle', 'FZHosp', 'FZICU', 'AnzahlTotTaeglich'], rot=45)

In [None]:
lines = merged.plot.line(subplots=True, x="Time", y=['AnzahlFaelle', 'FZHosp', 'FZICU', 'AnzahlTotTaeglich'])

Lineplot including healthy people

In [None]:
merged['AnzahlAktuell'] = merged['AnzahlFaelleSum'] - merged['AnzahlGeheiltSum'] - merged['AnzahlTotSum']

In [None]:
lines = merged.plot.line(x="Time", y=['AnzahlFaelleSum', 'AnzahlGeheiltSum', 'AnzahlAktuell', 'FZHosp', 'FZICU', 'AnzahlTotSum'])