# Compare and Analyze Data on COVID-19 Infections Provided by the [Robert Koch Institute (RKI)](https://www.rki.de/EN/Home/homepage_node.html)

The [Robert Koch Institute (RKI)](https://www.rki.de/EN/Home/homepage_node.html) is the federal government agency responsible for disease control and prevention in Germany. Is publishes data about COVID-19 for all of Germany and uses various channels for that (see [this](https://www.rki.de/EN/Content/infections/epidemiology/outbreaks/COVID-19/COVID19.html) page for an overview).

This notebook has been created for analyzing and comparing data from two different sources that are updated by the RKI daily, but have a different level of detail. The main objective is to understand how the very fine grained numbers provided via github can be aggregated such that they match what is shown in the 
[RKI's COVID-19 dashboard](https://corona.rki.de/).

## Preliminaries

In [1]:
import json

import datetime as dt
import numpy as np
import pandas as pd

import local_constants as LC
from urllib.request import urlopen

## Load and Analyze Data From the [NPGEO Corona Hub 2020](https://npgeo-corona-npgeo-de.hub.arcgis.com/)

For all German districts, up-to-date COVID-19 data is available via [this](https://opendata.arcgis.com/datasets/917fc37a709542548cc3be077a786c17_0) page. This data appears to be the basis for the [COVID-19 dashboard](https://corona.rki.de/).

### Read Data

In [2]:
# load main data, but restrict the created dataframe to the most relevant columns 

RKI_ARCGIS_COVID_BY_DISTRICT = \
    pd.read_csv(LC.RKI_ARCGIS_URL, usecols=list(LC.RKI_ARCGIS_COLUMN_NAME_MAPPER.keys()),
                converters=LC.RKI_ARCGIS_VALUE_CONVERTERS)\
        .rename(columns=LC.RKI_ARCGIS_COLUMN_NAME_MAPPER)\
        .sort_values(by="district ID")\
        .set_index("district ID")

# add column for the number of deaths within the last seven days adjusted to a population size of 100.000 people

RKI_ARCGIS_COVID_BY_DISTRICT["deaths last 7 days per 100k"] = \
    10**5 * RKI_ARCGIS_COVID_BY_DISTRICT["deaths last 7 days"] \
    / RKI_ARCGIS_COVID_BY_DISTRICT["population"]

In [3]:
# print date of most recent entry in the loaded data

print(RKI_ARCGIS_COVID_BY_DISTRICT["update time"].max().strftime("last update is from %Y-%m-%d"))

last update is from 2021-09-29


## Load and Analyze RKI's Data on COVID-19 From [GitHub](https://github.com/robert-koch-institut/SARS-CoV-2_Infektionen_in_Deutschland)

Repository ["SARS-CoV-2 Infektionen in Deutschland"](https://github.com/robert-koch-institut/SARS-CoV-2_Infektionen_in_Deutschland) (SARS-CoV-2 Infections in Germany) contains
up-to-date numbers of COVID-19 cases in Germany. The data appears to be what is reported by the districts to the RKI as it lists new cases based on the reporting date, beginning of the disease (reference date), age group, sex and district.

In [Readme.md](https://github.com/robert-koch-institut/SARS-CoV-2_Infektionen_in_Deutschland/blob/master/Readme.md), an explanation for the data is provided (in German). 

### Read Metadata

The RKI publishes metadata based on [zenodo's](https://about.zenodo.org/) JSON format. Here, it is used to detect the publication date of the data. Typically, this is shortly after 3 AM of the current day (local time in Germany). 

In [4]:
RKI_GITHUB_METADATA = json.loads(urlopen(LC.RKI_GITHUB_RAW_DATA_BASE_URL + LC.RKI_GITHUB_ZENODO_REL_URL).read())
RKI_GITHUB_PUBLICATION_DATE = pd.to_datetime(RKI_GITHUB_METADATA["publication_date"])
print(f"publication date is {RKI_GITHUB_PUBLICATION_DATE:%Y-%m-%d %H:%M:%S%z}")

publication date is 2021-09-29 03:30:37+0200


### Read Data

Load data describing COVID-19 infections and deaths etc.

In [5]:
RKI_GITHUB_COVID_INFECTIONS = pd.read_csv(LC.RKI_GITHUB_RAW_DATA_BASE_URL + LC.RKI_GITHUB_COVID_INFECTIONS_REL_URL, 
                                    converters=LC.RKI_GITHUB_VALUE_CONVERTERS)\
                                .rename(columns=LC.RKI_GITHUB_COLUMN_NAME_MAPPER)\
                                .astype(LC.RKI_GITHUB_COLUMN_TYPES_MAPPER)

### Trying to Understand [Readme.md](https://github.com/robert-koch-institut/SARS-CoV-2_Infektionen_in_Deutschland/blob/master/Readme.md)

According to the text in file [Readme.md](https://github.com/robert-koch-institut/SARS-CoV-2_Infektionen_in_Deutschland/blob/master/Readme.md), are the values for the number of 
infected (column `cases` - `AnzahlFall` in the original data), deceased (column `deaths` - column `AnzahlTodesfall` before renaming) and recovered 
(column `recovered` - `AnzahlGenesen` before renaming) people, **[natural numbers](https://en.wikipedia.org/wiki/Natural_number)** (i.e. elements of {1,2,3 ...}). 

However, negative values seem to occur in the data. 

In [6]:
# show that there are rows, for which the value in one of the columns "cases", "deaths" or "recovered" is negative

((RKI_GITHUB_COVID_INFECTIONS["cases"] < 0) |  \
    (RKI_GITHUB_COVID_INFECTIONS["deaths"] < 0) |  \
        (RKI_GITHUB_COVID_INFECTIONS["recovered"] < 0))\
            .value_counts()[True] > 0

True

Hence, despite what's stated in `Readme.md`, values of the respective columns are **integers** (i.e. elements of {... -3, -2, -1, 0,1,2,3 ...}) and **not** natural numbers. 

If one wants to determine to total number of COVID-19 cases reported in Germany, `Readme.md` seems to imply that only such rows should be considered for which the value 
column `is new case` (`NeuerFall` in the original data) is not 0, because 0 indicates that the respective case was already reported previously. 

Additionally, one would think that rows for which the values in column `is new case` is -1 should be subtracted, because this value indicates that cases have been falsely reported in the past.

However, it appears that 
$$

 n = -1  \Leftrightarrow c < 0,
$$ 
where $n$ stands for the value in column `is new case` and $c$ for the value in column `cases` in a row. This means that values for such rows should **not be subtracted but added** (since they are negative).

In [7]:
# verify that in all rows of dataframe RKI_GITHUB_COVID_INFECTIONS, the value in column "is new case" is -1 
# iff the value in column "cases" is less than 0

equiv = lambda a,b: ((~a) | b) & ((~b) | a) 
equiv(RKI_GITHUB_COVID_INFECTIONS["is new case"] == -1, RKI_GITHUB_COVID_INFECTIONS["cases"] < 0).all()

True

Hence, it should be correct to simply add all values of column `cases`, where `is new case` is not 0 to get the total number of cases in Germany ...

In [25]:
# print the sum for column "cases", filtered by "is new case" == 1 (i.e. is new case) 

print(f'{RKI_GITHUB_COVID_INFECTIONS.loc[RKI_GITHUB_COVID_INFECTIONS["is new case"] == 1, "cases"].sum():,d}')
RKI_GITHUB_COVID_INFECTIONS.loc[RKI_GITHUB_COVID_INFECTIONS["is new case"] == 1].sort_values(by="reporting date")

11,981


Unnamed: 0,district ID,age group,sex,reporting date,reference date,is start of desease,is new case,is new death,is new recovered,cases,deaths,recovered
1320917,8317,A60-A79,M,2020-05-04,2020-05-04,0,1,-9,1,1,0,1
839802,6431,A05-A14,F,2020-10-23,2020-10-23,0,1,-9,1,1,0,1
1320499,8316,A35-A59,F,2020-11-16,2021-09-27,1,1,-9,-9,1,0,0
1663266,9376,A15-A34,M,2020-11-25,2020-11-22,1,1,-9,1,1,0,1
1815991,9761,A15-A34,F,2020-12-15,2020-12-15,0,1,-9,1,1,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...
1052845,7315,A60-A79,F,2021-09-28,2021-09-28,0,1,-9,-9,1,0,0
1052844,7315,A05-A14,F,2021-09-28,2021-09-28,0,1,-9,-9,3,0,0
1052843,7315,A35-A59,F,2021-09-28,2021-09-28,0,1,-9,-9,6,0,0
1056109,7317,A15-A34,F,2021-09-28,2021-09-28,0,1,-9,-9,2,0,0


... but this number is **way too small**. 

The number that is shown on the dashboard can be retrieved from the data by adding _all positive values_ in column `cases`. 

In [9]:
print("total cases for Germany:")
# print the sum for all positive values in column "cases" 
print(f' GitHub: {RKI_GITHUB_COVID_INFECTIONS.loc[RKI_GITHUB_COVID_INFECTIONS["cases"] > 0, "cases"].sum():,d}')
# sum of all cases in the data from ARCGIS
print(f' ARCGIS: {RKI_ARCGIS_COVID_BY_DISTRICT["cases"].sum():,d}')

total cases for Germany:
 GitHub: 4,215,351
 ARCGIS: 4,215,351


### Compute Totals

Based on te interpretation of the data described above, the sum of columns `cases`, `deaths` and `recovered` is calculated for each district.

For the sake of convenience, a new dataframe is defined, in which negative values for the number of COVID-19 cases, deaths and recovered patients are set to zero. This is used later for the aggregation of data.

In [10]:
# define dataframe that without the negative values in RKI_GITHUB_COVID_INFECTIONS

RKI_GITHUB_COVID_INFECTIONS_WITHOUT_NEGATIVES = \
    pd.concat([RKI_GITHUB_COVID_INFECTIONS[["district ID", "age group", "sex", "reporting date", "reference date"]], 
              RKI_GITHUB_COVID_INFECTIONS[["cases", "deaths", "recovered"]].apply(lambda a: np.maximum(a,0))], 
              axis="columns")

In [11]:
#  sum up "cases", "deaths", "recovered" for each district
RKI_GITHUB_COVID_BY_DISTRICT_TOTALS = \
    RKI_GITHUB_COVID_INFECTIONS_WITHOUT_NEGATIVES[["district ID", "cases", "deaths", "recovered"]].groupby(by="district ID").sum()

# copy population size for each district from RKI_ARCGIS_COVID_BY_DISTRICT
RKI_GITHUB_COVID_BY_DISTRICT_TOTALS["population"] = RKI_ARCGIS_COVID_BY_DISTRICT["population"]

### Compute Numbers per 100K People 

The sums in columns `cases`, `deaths` and `recovered` are normalized by the population size of the district. 
For the district's population size, the data from the [NPGEO Corona Hub 2020](https://npgeo-corona-npgeo-de.hub.arcgis.com/) is used.

In [12]:
# define a new dataframe by dividing the totals by the population size of each district and multiplying that with 100,000

RKI_GITHUB_COVID_BY_DISTRICT_PER_100K = \
    pd.DataFrame(data=10**5 * RKI_GITHUB_COVID_BY_DISTRICT_TOTALS[["cases", "deaths", "recovered"]].values \
                 / np.array(3 * [RKI_GITHUB_COVID_BY_DISTRICT_TOTALS["population"].values]).T,
                 index=RKI_GITHUB_COVID_BY_DISTRICT_TOTALS.index,
                 columns=["cases per 100k", "deaths per 100k", "recovered per 100k"])

### Compute Totals for the Last Seven Days

The values in columns `cases`, `deaths` and `recovered` of the last seven days are summed up.

In [13]:
# define the date of the earlist data that should be considered
number_of_days = 7 
cut_off_date = np.datetime64(RKI_GITHUB_PUBLICATION_DATE.date() - dt.timedelta(days=number_of_days))

# compute sums for data that has been reported on or after the cut_off_date
RKI_GITHUB_COVID_BY_DISTRICT_LAST_7_DAYS_TOTALS = \
        RKI_GITHUB_COVID_INFECTIONS_WITHOUT_NEGATIVES\
                .loc[RKI_GITHUB_COVID_INFECTIONS_WITHOUT_NEGATIVES["reporting date"] >= cut_off_date]\
                .groupby(by="district ID").sum()\
                .rename(columns={"cases": "cases last 7 days", "deaths": "deaths last 7 days", "recovered": "recovered last 7 days"})

# ensure that there is data for each district by filling missing data with zeros
# define a dataframe containing the most recent data of each district
df = RKI_GITHUB_COVID_INFECTIONS_WITHOUT_NEGATIVES[["district ID", "reporting date"]].sort_values(by=["district ID", "reporting date"])\
        .groupby(["district ID"]).last()

missing_rows = df.loc[df["reporting date"] < cut_off_date].index.shape[0]
if  missing_rows > 0:
    # there are districts that have not reported data within the last 7 days
    # create dataframe containing the zeroes filling the missing data 
    ddf = pd.DataFrame(data=np.zeros((missing_rows, RKI_GITHUB_COVID_BY_DISTRICT_LAST_7_DAYS_TOTALS.shape[1]), dtype=np.int64), 
                       index=df.loc[df["reporting date"] < cut_off_date].index,
                       columns=RKI_GITHUB_COVID_BY_DISTRICT_LAST_7_DAYS_TOTALS.columns)
    # append zeros
    RKI_GITHUB_COVID_BY_DISTRICT_LAST_7_DAYS_TOTALS = RKI_GITHUB_COVID_BY_DISTRICT_LAST_7_DAYS_TOTALS.append(ddf)

### Compute Numbers for the Last Seven Days per 100K People 

Normalize the sums for the last seven days by the districts' population size 

In [14]:
# define a new dataframe computing values by dividing the totals for the last seven days by the population size 
# of each district and multiplying it with 100,000
 
RKI_GITHUB_COVID_BY_DISTRICT_LAST_7_DAYS_PER_100K = \
    pd.DataFrame(data=10**5 * RKI_GITHUB_COVID_BY_DISTRICT_LAST_7_DAYS_TOTALS.values / np.array(3 * \
        [RKI_GITHUB_COVID_BY_DISTRICT_TOTALS.loc[RKI_GITHUB_COVID_BY_DISTRICT_LAST_7_DAYS_TOTALS.index, "population"]]).T,
        index=RKI_GITHUB_COVID_BY_DISTRICT_LAST_7_DAYS_TOTALS.index,
        columns=["cases last 7 days per 100k", "deaths last 7 days per 100k", "recovered last 7 days per 100k"])

### Create a DataFrame Containing all Values Derived From the COVID-19 Data from GitHub

In [15]:
# combine all of the dataframes with data that has been derived from RKI_GITHUB_COVID_INFECTIONS into one dataframe, 
# only columns "district name" and "state name" are taken from RKI_ARCGIS_COVID_BY_DISTRICT

RKI_GITHUB_COVID_BY_DISTRICT = \
    pd.concat([RKI_ARCGIS_COVID_BY_DISTRICT[["district name", "state name"]],
               RKI_GITHUB_COVID_BY_DISTRICT_TOTALS, 
               RKI_GITHUB_COVID_BY_DISTRICT_PER_100K, 
               RKI_GITHUB_COVID_BY_DISTRICT_LAST_7_DAYS_TOTALS,
               RKI_GITHUB_COVID_BY_DISTRICT_LAST_7_DAYS_PER_100K], axis="columns")

## Verify that Derived Data From the [COVID-19 Data from GitHub](https://github.com/robert-koch-institut/SARS-CoV-2_Infektionen_in_Deutschland) and the Data From [NPGEO Corona Hub 2020](https://npgeo-corona-npgeo-de.hub.arcgis.com/) are (More or Less) Identical

In [16]:
EPSILON = 10**-11 # threshold for treating float values as zero
common_columns = list(set(RKI_ARCGIS_COVID_BY_DISTRICT.columns) & set(RKI_GITHUB_COVID_BY_DISTRICT.columns)) 
common_numerical_columns = [c for c in common_columns if RKI_GITHUB_COVID_BY_DISTRICT[c].dtypes != object]

# return True, if all the differences between the absolute values in the common numerical columns is smaller than the threshold

np.amax(np.abs(RKI_ARCGIS_COVID_BY_DISTRICT[common_numerical_columns].values \
    - RKI_GITHUB_COVID_BY_DISTRICT[common_numerical_columns].values)) < EPSILON

True

## Show Some Data

Based on the result of the verification above, it seems that the interpretation of the data that is provided via GitHub is correct ... or at least identical with what is shown
in the COVID-19 Dashboard. Hence, the following list the most affected districts in Germany can be assumed to be correct.

In [17]:
n = 30
RKI_GITHUB_COVID_BY_DISTRICT.sort_values(by="cases last 7 days per 100k", ascending=False)\
    .head(n).style.hide_index().format(LC.FORMAT_MAPPER)

district name,state name,cases,deaths,recovered,population,cases per 100k,deaths per 100k,recovered per 100k,cases last 7 days,deaths last 7 days,recovered last 7 days,cases last 7 days per 100k,deaths last 7 days per 100k,recovered last 7 days per 100k
SK Bremerhaven,Bremen,6006,110,5022,113557,5289.0,96.867652,4422.4,302,0,1,265.9,0.0,0.9
LK Traunstein,Bayern,12847,220,11886,177485,7238.4,123.954137,6696.9,432,0,9,243.4,0.0,5.1
LK Rosenheim,Bayern,16456,472,14868,261721,6287.6,180.344718,5680.9,511,0,6,195.2,0.0,2.3
LK Berchtesgadener Land,Bayern,7277,102,6902,106327,6844.0,95.930479,6491.3,183,0,3,172.1,0.0,2.8
SK Pforzheim,Baden-Württemberg,8960,210,8162,126016,7110.2,166.645505,6477.0,209,1,5,165.9,0.794,4.0
SK Schweinfurt,Bayern,3655,95,3419,53319,6855.0,178.172884,6412.3,86,0,0,161.3,0.0,0.0
LK Miesbach,Bayern,5173,85,4721,100183,5163.6,84.844734,4712.4,160,0,0,159.7,0.0,0.0
SK Offenbach,Hessen,10934,185,10121,130892,8353.5,141.337897,7732.3,188,0,4,143.6,0.0,3.1
SK Stuttgart,Baden-Württemberg,34623,499,32692,630305,5493.1,79.168022,5186.7,869,0,0,137.9,0.0,0.0
SK Hagen,Nordrhein-Westfalen,14434,334,13247,188687,7649.7,177.012725,7020.6,256,0,1,135.7,0.0,0.5


Show sums by state

In [18]:
base_columns = ["cases", "deaths", "recovered"]
last_7_days_columns = [c + " last 7 days" for c in base_columns]
index_column = "state name"
columns = [index_column, "population"] + base_columns + last_7_days_columns
RKI_GITHUB_COVID_BY_STATE = RKI_GITHUB_COVID_BY_DISTRICT[columns].groupby(by=index_column).sum()

per_100k_columns = [c + " per 100k" for c in base_columns]
last_7_days_per_100k_columns = [c + " per 100k" for c in last_7_days_columns]
RKI_GITHUB_COVID_BY_STATE[per_100k_columns + last_7_days_per_100k_columns] = \
    10**5 * RKI_GITHUB_COVID_BY_STATE[base_columns + last_7_days_columns].values\
        / np.array(6 * [RKI_GITHUB_COVID_BY_STATE["population"]]).T
RKI_GITHUB_COVID_BY_STATE.style.format(LC.FORMAT_MAPPER)

Unnamed: 0_level_0,population,cases,deaths,recovered,cases last 7 days,deaths last 7 days,recovered last 7 days,cases per 100k,deaths per 100k,recovered per 100k,cases last 7 days per 100k,deaths last 7 days per 100k,recovered last 7 days per 100k
state name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Baden-Württemberg,11103043,569854,10670,538933,8911,5,229,5132.4,96.099781,4853.9,80.3,0.045,2.1
Bayern,13140183,726236,15630,686653,11021,2,134,5526.8,118.948115,5225.6,83.9,0.015,1.0
Berlin,3657463,205244,3625,195035,2420,0,66,5611.6,99.112418,5332.5,66.2,0.0,1.8
Brandenburg,2531071,115913,3849,109697,874,4,19,4579.6,152.070013,4334.0,34.5,0.158,0.8
Bremen,680130,32654,509,30399,788,0,3,4801.1,74.838634,4469.6,115.9,0.0,0.4
Hamburg,1852478,91195,1724,85413,1004,0,9,4922.9,93.064533,4610.7,54.2,0.0,0.5
Hessen,6293154,332014,7736,311212,4030,3,75,5275.8,122.927232,4945.2,64.0,0.048,1.2
Mecklenburg-Vorpommern,1610774,48636,1199,46492,628,1,14,3019.4,74.436265,2886.3,39.0,0.062,0.9
Niedersachsen,8003421,299121,5953,285082,3591,3,84,3737.4,74.380693,3562.0,44.9,0.037,1.0
Nordrhein-Westfalen,17925570,955827,17802,906484,9667,6,256,5332.2,99.31065,5056.9,53.9,0.033,1.4


Equally, the following totals for all of Germany appear correct

In [19]:
columns = ["population", "cases", "cases last 7 days", "deaths", "deaths last 7 days", "recovered", "recovered last 7 days"]
RKI_GITHUB_COVID_BY_DISTRICT[columns].sum().to_frame().T.style.hide_index().format(LC.FORMAT_MAPPER)

population,cases,cases last 7 days,deaths,deaths last 7 days,recovered,recovered last 7 days
83148406,4215351,50711,93571,26,3989672,1059


Likewise, the sums for the last 7 days per 100,000 people can be computed

In [20]:
columns = ["cases last 7 days", "deaths last 7 days", "recovered last 7 days"]
(10**5 * RKI_GITHUB_COVID_BY_DISTRICT[columns].sum() / RKI_GITHUB_COVID_BY_DISTRICT["population"].sum())\
    .to_frame().T.rename(columns={c:c+" per 100k" for c in columns}).style.hide_index().format(LC.FORMAT_MAPPER)

cases last 7 days per 100k,deaths last 7 days per 100k,recovered last 7 days per 100k
61.0,0.031,1.3


For the increase in the numbers of cases, deaths and recovered people within the last day, the data for the previous day is loaded and subtracted from the totals of the current day.

In [21]:
PREVIOUS_DAY = pd.Timestamp(RKI_GITHUB_PUBLICATION_DATE.date() - dt.timedelta(days=1))
RKI_GITHUB_COVID_INFECTIONS_PREVIOUS_DAY \
    = pd.read_csv(LC.RKI_GITHUB_RAW_DATA_BASE_URL + "/Archiv" + \
                  PREVIOUS_DAY.strftime("/%Y-%m-%d_Deutschland_SarsCov2_Infektionen.csv"), 
                  converters=LC.RKI_GITHUB_VALUE_CONVERTERS)\
        .rename(columns=LC.RKI_GITHUB_COLUMN_NAME_MAPPER)\
        .astype(LC.RKI_GITHUB_COLUMN_TYPES_MAPPER)

RKI_GITHUB_COVID_INFECTIONS_PREVIOUS_DAY_WITHOUT_NEGATIVES = \
    pd.concat([RKI_GITHUB_COVID_INFECTIONS_PREVIOUS_DAY[["district ID", "age group", "sex", "reporting date", "reference date"]], 
               RKI_GITHUB_COVID_INFECTIONS_PREVIOUS_DAY[["cases", "deaths", "recovered"]].apply(lambda a: np.maximum(a,0))], 
               axis="columns")
               
(RKI_GITHUB_COVID_INFECTIONS_WITHOUT_NEGATIVES[["cases", "deaths", "recovered"]].sum() 
    - RKI_GITHUB_COVID_INFECTIONS_PREVIOUS_DAY_WITHOUT_NEGATIVES[["cases", "deaths", "recovered"]].sum())\
        .to_frame().T.style.hide_index().format(LC.FORMAT_MAPPER)

cases,deaths,recovered
11780,67,11499


### Compute Totals by Age Group and Sex

In order to close this notebook with something that is not mysterious, the total number of cases, deaths and recovered people is computed by age group and sex. 
This is again in sync with the data of the COVID-19 dashboard.

In [22]:
RKI_GITHUB_COVID_BY_AGE_GROUP_AND_SEX_TOTALS = \
    RKI_GITHUB_COVID_INFECTIONS_WITHOUT_NEGATIVES[["age group", "sex", "cases", "deaths", "recovered"]]\
        .groupby(by=["age group", "sex"]).sum()
RKI_GITHUB_COVID_BY_AGE_GROUP_AND_SEX_TOTALS.style.format(LC.FORMAT_MAPPER)

Unnamed: 0_level_0,Unnamed: 1_level_0,cases,deaths,recovered
age group,sex,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A00-A04,F,55892,9,53159
A00-A04,M,60137,3,57294
A00-A04,unknown,1447,0,1305
A05-A14,F,172825,4,159509
A05-A14,M,188746,3,174381
A05-A14,unknown,4083,0,3406
A15-A34,F,632359,77,610954
A15-A34,M,647247,136,624944
A15-A34,unknown,7718,0,7230
A35-A59,F,803068,1263,781252
