# Do RKI numbers change retrospectively?

There was anecdotal evidence that the RKI numbers change retrospectively, in particular that for the last reported day they may be a little low (presumably because not all reports have arrived yet).

As a counter measure, we always ignore the last day of reported data - thus the RKI data seems to be one day behind the Johns Hopkins data in our plots.

Here, we'll try to get some quantitative evidence on that.

# Data gathering

We need to store data from a few subsequent days.

In [None]:
%config InlineBackend.figure_formats = ['svg']

In [None]:
import oscovida as cv

In [None]:
cv.clear_cache()

In [None]:
germany = cv.fetch_data_germany(include_last_day=True)

In [None]:
import datetime


In [None]:
date_time = datetime.datetime.now().strftime("%Y-%m-%d")
date_time

In [None]:
germany.to_csv(f'rki-downloaded-{date_time}.csv.gz')

# Compare data sets

In [None]:
import os
import glob
import pandas as pd

files = glob.glob("rki-downloaded*csv.gz")
data_sets = {}
totals = pd.DataFrame()
for file in reversed(sorted(files)):
    date = file.split("rki-downloaded-")[1].split('.csv.gz')[0]
    df = pd.read_csv(file)
    data_sets[date] = df
    totals[date] = df.groupby('date').sum()['AnzahlFall']

In [None]:
totals

In [None]:
data_sets['2020-05-09']['Meldedatum'].max()

In [None]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(12, 10))
# ax.plot(totals.index, tmp['AnzahlFall'])
totals.iloc[-10:].T.plot(ax=ax)
ax.set_ylim([0, 1400])

In [None]:

tails = totals.tail(n=totals.shape[1]+15)
tails

In [None]:
totals.corr()

In [None]:
fix, axes = plt.subplots(figsize=(10,10))

In [None]:
#pd.plotting.scatter_matrix(totals)
pd.plotting.scatter_matrix(tails.diff(), figsize=(12,12))