-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Rework how publication date stuff is handled
Retractions are tricky. The previous approach did not consider that they may easily come from the distant past. And we don't know exactly from when: 1. Either a case has been introduced before the "historic data available" cutoff (i.e. from before when we have daily full case files): Then it was recorded at the Meldedatum in the record. 2. Or a case has been introdued after the "historic data available" cutoff. In that case, it has been recorded in the dataset at the exact date at which the dataset was published. Unfortunately, to resolve the second case, we lack sufficient data: We do not know the publication date of any recorded record. We have to guess and start working our way forward starting from the reported date until we find a timeslot where at least as many cases have been added as are being retracted. This is obviously not without potential flaws. For instance, if a case group is reported with 4 new cases on day X and 3 cases on day X+1 and later on, a retraction aimed at the case group on day X+1 comes in and retracts all three cases. Then we'll remove the cases on day X, because it is the first bin with enough matching cases available. If another retraction comes in and attempts to remove the case group from day X, it will not find a matching bin: the one at day X only has 1 case left, and the one on day X+1 only had 3 cases to begin with. In such cases, we'll now log a warning; originally, I wanted to make this panicking, but it appears that at least one dataset has the issue of retracting a case *which had never been reported* [1]! Hence, we cannot be strict about this and need to hope that we'll not run into such a situation too often. (We can still detect it at a later point, because we'll see too many cases in {cases,deaths,recovered}_pub_cum compared to the respective ref series.) [1]: robert-koch-institut/SARS-CoV-2-Infektionen_in_Deutschland_Archiv#11
- Loading branch information
Showing
5 changed files
with
222 additions
and
279 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.