Wrong padding multiplier/wrong number of users for 2020-08-18? #2

PalminX · 2020-08-19T08:27:27Z

It seems that the data for 2020-08-18 is not quite right:

https://ctt.pfstr.de/users/2020-08-18.txt shows a detected padding number of 1, resulting in 379 users
The graphs for number of users and number of keys show 98 users and 952 keys, which doesn't match the numbers from approx. users file
The dashboard from @micb25 https://micb25.github.io/dka/ shows 75 users, 749 keys and a padding of 5 for 2020-08-18, which seems to be about right.

Is there a problem in the downloaded source data? If so, why does https://micb25.github.io/dka/ show more reasonable values?

PalminX · 2020-08-19T08:33:21Z

Hm, there are 3749 keys in the 2020-08-18 file, which obviously is not a multiple of 5. So it is maybe more a question to @micb25 how he handles these discrepancies

PalminX · 2020-08-19T08:40:01Z

OK, I saw that @micb25 sometimes manually corrected the multiplier in the past.
So I think here you should also have some way of handling or flagging these inconsistent values, because currently the number of users from 2020-08-18 is probably too high

micb25 · 2020-08-19T09:22:50Z

OK, I saw that @micb25 sometimes manually corrected the multiplier in the past.

Yes, I had to correct this manually for one of yesterday's hourly packages as well as for one package in the past (2020-08-04). I wonder what situation causes these issues. Fortunately, it seems to happen very rarely. However, the impact on the statistics can be quite significant as you spotted out.

Edit: As a consequence, I do manually check the statistics every day before uploading the new data. And I think this is still necessary for the future, at least as long as fake diagnosis keys are being generated.

mh- · 2020-08-19T10:01:26Z

For one specific case, there was an explanation here: corona-warn-app/cwa-server#693
A user submitted twice, the original keys were accepted only once (to avoid duplicate keys), but 2x4 random padding keys were added.
If someone uses my diagnosis-keys tools, I’d suggest to not always use the auto detect feature, but fix the factor to 5 at the moment, and change it when required.

janpf · 2020-08-19T10:46:28Z

https://ctt.pfstr.de/users/2020-08-18.txt shows a detected padding number of 1, resulting in 379 users

The graphs for number of users and number of keys show 98 users and 952 keys, which doesn't match the numbers from approx. users file

the https://ctt.pfstr.de/X/Y.txt files are generated based on the published daily package, while the graphs are based on the hourly packages. So there will be a discrepancy.
This is done since there is no use for the enduser to click through 24 hourly files per day, but the analysis for the hourly files is of course more precise.

I've now changed it so that the https://ctt.pfstr.de/X/Y.txt files are always analysed with a fixed multiplier of 5, so if the multiplier is wrongly detected, or actually changes it will now be visible by comparing those files to the graphs (1 or 2 users difference will nearly always be present).

Is there a problem in the downloaded source data? If so, why does https://micb25.github.io/dka/ show more reasonable values?

Kind of, yes. If the padding is detected strictly automatically the value is jumping all over the place for the hourly packages, as you correctly noted:

There are 3749 keys in the 2020-08-18 file, which obviously is not a multiple of 5.

I wanted to keep the process of updating the page and analyzing new data as automated and "hands-off" as possible, so these cases were handled incorrectly on my end.
I did this to generate the data as transparently as possible, without any manual interventions.
Everybody can replicate my numbers by running the commands defined in the workflow file in that order.

So it is maybe more a question to @micb25 how he handles these discrepancies.

It seems the only way to handle this is to set some reasonable hard-coded values like @micb25 did.

So I think here you should also have some way of handling or flagging these inconsistent values, because currently the number of users from 2020-08-18 is probably too high

If someone uses my diagnosis-keys tools, I’d suggest to not always use the auto detect feature, but fix the factor to 5 at the moment, and change it when required.

I've placed some safeguards, which should fix it for the moment.
I will use -n -a -m 5 (so with the automatic detection activated, but capped at 5) on new packages every day and when an issue appears I will manually flag the file to be reanalyzed with a fixed multiplier of 5 by adding them to this list.

Thank you for notifying me about the issue!

janpf added the bug Something isn't working label Aug 19, 2020

janpf closed this as completed Aug 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong padding multiplier/wrong number of users for 2020-08-18? #2

Wrong padding multiplier/wrong number of users for 2020-08-18? #2

PalminX commented Aug 19, 2020

PalminX commented Aug 19, 2020

PalminX commented Aug 19, 2020

micb25 commented Aug 19, 2020 •

edited

Loading

mh- commented Aug 19, 2020

janpf commented Aug 19, 2020 •

edited

Loading

Wrong padding multiplier/wrong number of users for 2020-08-18? #2

Wrong padding multiplier/wrong number of users for 2020-08-18? #2

Comments

PalminX commented Aug 19, 2020

PalminX commented Aug 19, 2020

PalminX commented Aug 19, 2020

micb25 commented Aug 19, 2020 • edited Loading

mh- commented Aug 19, 2020

janpf commented Aug 19, 2020 • edited Loading

micb25 commented Aug 19, 2020 •

edited

Loading

janpf commented Aug 19, 2020 •

edited

Loading