-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Differences with NYState Calculation #11
Comments
NYC and NYS has a different hour for reporting cases. One is few hours before the other - if I remember correctly, it gets corrected after 7 pm daily.
… On Apr 2, 2020, at 6:17 PM, Dhruv Madeka ***@***.***> wrote:
If I look at the daily figures from nyc.gov <https://www1.nyc.gov/assets/doh/downloads/pdf/imm/covid-19-daily-data-summary.pdf>
It reports 48462 total cases vs NYState's official NYC Count <https://coronavirus.health.ny.gov/county-county-breakdown-positive-cases> is 51809
Whats the source of the discrepancy? Ive been noticing it for a few days
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#11>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA5PZJEUHGTWI7UWOYEHHCDRKUFHXANCNFSM4L27MZIQ>.
|
They dont seem to match for days though. This is NYState's historical estimates vs NYC historical estimates. |
they will not match until after 7 pm and only for a moment, then they will need to be corrected next day again |
I don't think this is a timing issue. The data shown on the NYS site usually matches what is presented during a Governor briefing. For the past 3 days, the number shown for NYC during the noonish briefing has been ~2000 higher than the evening NYC number. |
Im with @psylum - the afternoon numbers seem higher, and the implied growth rates are very different. Here's a bar chart from the NYC data (I took the last three points and added them to the 31st). Where as on wiki, NYState has an implied growth rate of 13-10% over the last few days. There seem to be big differences |
Hello all please review the Issue string started when the NYC Health Dept started to use GitHub as data storage for their WEB page ("Counts vary differently from Yesterday"). At the same time of switching to GitHub the Health Dept changed the reporting methodology. Using "Diagnosis Date" instead of "Reporting Date". I am sure that the State Health Department is stuck with just getting the "Reporting Date" because they are collecting from too many different sources. The City is now attempting to show the NEW cases as of the date-of-diagnosis. The original Diagnosis occurs when the doctor suspects the patient has the virus and orders the TEST. The Lab provides data on the Reporting-Date, the Lab results may take 3 to 14 days (OUCH). I have looked at the LAG time between Diagnosis Date and Reporting Date see here |
This is causing a whole lot of confusion. It’s the responsibility of NYC to make these data differences crystal clear, and to provide both sets of data. |
Also @DTPOTO ... why would there be more total positive tests in the reporting date methodology? Is that just due to the time of day the results are generated? |
I agree NYC health dept should supply both data sets. The time of day has a minor impact, more so when you are using the Report-Date methodology. The reason why the Reporting date methodology has higher numbers is because you are focused on the current date (today). The data files are being restated by BACK-DATING. It's a little like the government revising last months unemployment number. The TOTAL number of cases are identical, it just when are they being reported. @joansobo demonstrated that the total cases were the same, and able to calculate a new REPORTED Cases by looking at the Case-Hosp-Deaths.csv over two different days. The issue is Daily Restatement. Getting the lasted version of Case-Hosp-Deaths.csv may be you best bet in terms of predictive modeling. I don't like either but we may get that clarity or better information in a timely way. |
Data from NYC and NYS will always be different for a number of reasons, including the time of day the dataset is cut, de-duplication procedures that differ between the agencies, and data cleaning and QA procedures. |
If I look at the daily figures from nyc.gov
It reports 48462 total cases vs NYState's official NYC Count is 51809
Whats the source of the discrepancy? Ive been noticing it for a few days
The text was updated successfully, but these errors were encountered: