Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data: Start using WHO as main source #2792

Merged
merged 4 commits into from
Mar 8, 2023
Merged

Data: Start using WHO as main source #2792

merged 4 commits into from
Mar 8, 2023

Conversation

lucasrodes
Copy link
Member

Main and internal datasets will use WHO's data instead of JHU's.

@lucasrodes lucasrodes linked an issue Feb 27, 2023 that may be closed by this pull request
9 tasks
@@ -13,14 +13,21 @@
VAX_URL = "https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv"
TESTING_URL = "https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/testing/covid-testing-all-observations.csv"
HOSP_URL = "https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/hospitalizations/covid-hospitalizations.csv"
FULL_URL = "https://covid.ourworldindata.org/data/owid-covid-data.csv"
FULL_URL_CSV = "https://covid.ourworldindata.org/data/owid-covid-data.csv"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to bother. I am not sure if this specific file represents the finalized new dataset mirroring the WHO dataset, but I thought I should try to help and point out that there is a large discrepancy in the total number of cases in China, if I am reading this correctly. Here is what I have graphed out for 2020 through Feb 27th, 2023:

image

This of course ends in a magnitude of 2,023,904 total cases from 1-22-2020 to 2-27-2023.

However looking on the WHO dashboard here: https://covid19.who.int/region/wpro/country/cn

It appears that their accounting shows that there has been 99,030,129 confirmed cases between 3 January 2020 to 6:06pm CET, 28 February 2023

image

This represents a discrepancy of 97,006,225 or an about 48 times difference in total cases.

I am not sure if this may also effect the following file, which appears to be an R tracking filter from what I can tell in the ReadMe:

scripts/src/cowidev/megafile/steps/core.py

file_url="https://github.com/crondonm/TrackingR/raw/main/Estimates-Database/database_7.csv"

Full disclosure, again, sorry to bother, hopefully I am being helpful here, but there is a prediction contest being put on by the University of Texas at Austin that I am taking part of. You have a large number of participants currently watching this repo. Here is the discussion in case you are interested. https://salemcenter.manifold.markets/SalemCenter/china-reaches-100000-covid-cases-by

Copy link
Collaborator

@edomt edomt Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pwdel

Thanks for getting in touch. I just scrolled through the latest comments on Manifold and can see how complicated this situation is. We won't give any particular opinion on how to handle the resolution of the market, but if it helps, here's more information below.

The data published by Johns Hopkins University (JHU) currently shows 2,023,904 confirmed cases in China.
The data published by the WHO currently shows a total of 99,030,129 confirmed cases in China.
The reasons for the discrepancy between the two sources are explained by the JHU team here.
Note that we (Our World in Data) do not collect data on confirmed cases ourselves; we've always relied on third-party sources for this data.

The file you mentioned (https://covid.ourworldindata.org/data/owid-covid-data.csv) is our primary COVID dataset aggregating data from multiple sources. This file still relies on the JHU data for confirmed cases and will do so until 8 March.

On 8 March, we'll merge our pull request to start relying on WHO data instead, and the entire time series (all the way back to early 2020) will be updated in this file.

I hope this helps!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edomt thank you so much for the fantastic work you have done. Obviously you are under no obligation to solve any other institution's problems at all and in truth I feel bad about even approaching you about this. I think your answer completely clears up the question I had though, thank you.

@lucasrodes lucasrodes merged commit 2810f8f into master Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cases and deaths: New source, technical details
3 participants