Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences in vaccination figures #148

Closed
lucasrodes opened this issue Jan 10, 2022 · 10 comments
Closed

Differences in vaccination figures #148

lucasrodes opened this issue Jan 10, 2022 · 10 comments

Comments

@lucasrodes
Copy link

lucasrodes commented Jan 10, 2022

Hi,
First of all, thanks for sharing your work. We are currently using it in our project https://github.com/owid/covid-19-data.

I have a question regarding some slight differences in the reported figures between these two files:

  1. Vaccination/OpenData_Slovakia_Vaccination_Regions.csv
  2. OpenData_Slovakia_Vaccination_AgeGroup_District.csv

In particular, there appear to be some small differences in the total number of reported first, second and third doses between these files:

  • File 1) reports 2,734,109, 2,453,823 and 1,120,296, respectively.
  • File 2) reports 2,759,500, 2,471,554 and 1,116,910, respectively.
Find here python code to recreate these results

# Read data
>>> url_region = "https://github.com/Institut-Zdravotnych-Analyz/covid19-data/raw/main/Vaccination/OpenData_Slovakia_Vaccination_Regions.csv"
>>> url_agedis = "https://github.com/Institut-Zdravotnych-Analyz/covid19-data/raw/main/Vaccination/OpenData_Slovakia_Vaccination_AgeGroup_District.csv"
>>> df_region = pd.read_csv(url_1, sep=";")
>>> df_agedis = pd.read_csv(url_2, sep=";")

# Display values
>>> df_region[['first_dose', 'second_dose', 'third_dose']].sum()
first_dose     2734109
second_dose    2453823
third_dose     1120296
dtype: int64

>>> df_agedis.groupby('dose').doses_administered.sum()

dose
1        2759500
2        2471554
3        1116910
fully    2603672
Name: doses_administered, dtype: int64

I did some more checks now with Janssen vaccine

>>> msk = df_region.Vaccine_name.isin([
    "COVID-19 Vaccine Janssen injekčná suspenzia sus inj 10x2,5 ml (liek.inj.skl.)",
    "COVID-19 Vaccine Janssen injekčná suspenzia sus inj 20x2,5 ml (liek.inj.skl.)",
])
>>> df_region[msk][['first_dose', 'second_dose', 'third_dose']].sum()
first_dose     142292
second_dose         0
third_dose       1512
dtype: int64
>>> df_agedis[df_agedis.vaccine == "JANSSEN"].groupby('dose').doses_administered.sum()
dose
1        167232
3          2087
fully    164151
Name: doses_administered, dtype: int64

The relative differences here for first dose seem quite relevant. 142292 vs 167232 (15% variation)


We are currently using file 1, but recently a user noted the differences with file 2 in owid/covid-19-data#2237. Any help or suggestion on this matter would be highly appreciated.

Thanks!

@MartinHBA
Copy link

@KristianSufliarsky vies sa na to pozriet pls? je to dolezite lebo reportujeme do celeho sveta nizsie ako je realita

@KristianSufliarsky
Copy link
Contributor

Hi Lucas,

Thank you for opening this issue. As far as I know, these two datasets are from two separate data sources.

Kristian

@mhudec
Copy link

mhudec commented Jan 11, 2022

@KristianSufliarsky - any reason in particular why general practitioners are missing in OpenData_Slovakia_Vaccination_Regions.csv? NCZI obviously has this data, they are publishing it as well... what source are you using if not NCZI? If this OpenData_Slovakia_Vaccination_Regions.csv is incorrect, why are you publishing it at all?

OpenData_Slovakia_Vaccination_AgeGroup_District.csv seems to be contain weekly (and not daily) aggregate data, without any differentiation to weeks from 2021 and 2022, unless week 54 is actually the current week 2. One other thing, in dose column you have values 1, 2, 3 and fully, while doses_administered shows cumulative number for these. What is the difference between dose 2 and dose fully in case of Comirnaty (or Moderna)? And what is age group NA? Like you have to have age known at time of vaccination (it is used in registration, it is also seen in ID card at vaccination reception etc.).

Thank you for your time.

@mhudec
Copy link

mhudec commented Jan 11, 2022

An example of dubious data from OpenData_Slovakia_Vaccination_AgeGroup_District.csv:

...
"week";"vaccine";"gender";"AgeGroup";"region";"district";"district_code";"dose";"doses_administered"
...
43;"COMIRNATY";"M";"5-9";"Bratislavský";"Bratislava";"SK0101";"fully";1
45;"COMIRNATY";"M";"5-9";"Žilinský";"Žilina";"SK031B";"fully";1
46;"COMIRNATY";"F";"5-9";"Košický";"Košice";"SK0422";"fully";1
46;"COMIRNATY";"F";"5-9";"Prešovský";"Prešov";"SK0417";"fully";1
46;"COMIRNATY";"F";"5-9";"Žilinský";"Žilina";"SK031B";"fully";1
46;"COMIRNATY";"M";"5-9";"Prešovský";"Prešov";"SK0417";"fully";1
...

Weeks 43-46 were weeks from ~25th October up to ~21st November. EMA had recommended approval for Comirnaty vaccination for age group 5 to 11 years old ~25th November.

So what is this, please?

@KristianSufliarsky
Copy link
Contributor

@mhudec reasoning behind this issue is that they is no regulation that will make GPs report this data into this data source (ISZI), and they aren't so keen on doing that.
To answer your second question, NCZI are publishing data which are processed by me or rather script I did and as I mentioned previously OpenData_Slovakia_Vaccination_AgeGroup_District.csv is the one that is published officially daily (even GPs).
Week represent number of weeks from the beginning of vaccination.
Value fully means that a person is considered fully vaccinated which means that it takes 2 shots and 14 days from second dose and in case of JnJ 1 dose and 21 days.

And if you have any questions about this dubious data, feel free to contact NCZI since they are owner of this data, and they are responsible for their quality and even if I would know that something is not correct in this db I don't have write rights. So the only thing I can do is tell them that there is an issue. But to answer his particular problem, vaccination in special cases was allowed under the age o 11 before 25th of November.

I tried to answer all of your question, so I consider this issue closed.

@mhudec
Copy link

mhudec commented Jan 12, 2022

Thank you, @KristianSufliarsky.

So basically records for e.g. Comirnaty dose "2" in week X move to dose "fully" in week X+2, right?

Regarding NCZI, who can be contacted regarding this data, please? Generic NCZI mails are no go (I've reported several issues with PowerBI data using those channels, unfortunately there was no response).

@lucasrodes
Copy link
Author

lucasrodes commented Jan 12, 2022

Thanks for the clarifications, we will switch the source for Slovakia and will be using OpenData_Slovakia_Vaccination_AgeGroup_District.csv.

Week represent number of weeks from the beginning of vaccination.

@KristianSufliarsky Could you provide a pointer to which is the reference day for this computation? I.e. the date that the vaccination campaign started.

Thanks!

@MartinHBA
Copy link

@lucasrodes Slovakia not Slovenia, just to avoid more misunderstandings :)

@lucasrodes
Copy link
Author

@MartinHBA My bad! 🙏

@MartinHBA
Copy link

Week represent number of weeks from the beginning of vaccination. means first calendar week of 2021 as far as I know , but let us @KristianSufliarsky to confirm that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants