Prevalence runs failing #1654

apiology · 2022-11-26T14:12:26Z

Our daily prevalence updates have failed the last four days (11/23-11/26). This is an example log, which is unfortunately hidden to folks who aren't part of the org:

13s
Run yarn prevalence -b -c -k *** -v /home/runner/.virtualenvs/.venv
yarn run v[1](https://github.com/microCOVID/microCOVID/actions/runs/3532758468/jobs/5927512982#step:10:1).22.19
$ ./scripts/prevalence_helper.sh -b -c -k *** -v /home/runner/.virtualenvs/.venv
Using a manual virtualenv directory: /home/runner/.virtualenvs/.venv
Branch will be based on the currently checked out branch
Switched to a new branch 'auto-update-prevalence-2022-11-23--14-0[7](https://github.com/microCOVID/microCOVID/actions/runs/3532758468/jobs/5927512982#step:10:8)-30'
remote: 
remote: Create a pull request for 'auto-update-prevalence-2022-11-23--14-07-30' on GitHub by visiting:        
remote:      https://github.com/microCOVID/microCOVID/pull/new/auto-update-prevalence-2022-11-23--14-07-30        
remote: 
To https://github.com/microCOVID/microCOVID
 * [new branch]          auto-update-prevalence-2022-11-23--14-07-30 -> auto-update-prevalence-2022-11-23--14-07-30
branch 'auto-update-prevalence-2022-11-23--14-07-30' set up to track 'origin/auto-update-prevalence-2022-11-23--14-07-30'.
Created branch auto-update-prevalence-2022-11-23--14-07-30
Activating the virtualenv
Activating virtulenv: /home/runner/.virtualenvs/.venv/bin/activate
Running prevalence script
Fetching https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/UID_ISO_FIPS_LookUp_Table.csv...
read 4322 objects
Fetching https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/11-22-2022.csv...
read 4017 objects
Fetching https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/11-21-2022.csv...
read 4017 objects
Fetching https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/11-20-2022.csv...
read 4017 objects
Fetching https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/11-19-2022.csv...
read 4017 objects
Fetching https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/11-1[8](https://github.com/microCOVID/microCOVID/actions/runs/3532758468/jobs/5927512982#step:10:9)-2022.csv...
read 4017 objects
Fetching https://raw.githubusercontent.com/CSSEGISandData/COVID-1[9](https://github.com/microCOVID/microCOVID/actions/runs/3532758468/jobs/5927512982#step:10:10)/master/csse_covid_19_data/csse_covid_19_daily_reports/11-17-2022.csv...
read 4017 objects
Fetching https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/11-16-2022.csv...
read 4017 objects
Fetching https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/11-15-2022.csv...
read 4017 objects
Fetching https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/11-14-2022.csv...
read 4017 objects
Fetching https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/11-13-2022.csv...
read 4017 objects
Fetching https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/11-12-2022.csv...
read 4017 objects
Fetching https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/11-11-2022.csv...
read 4017 objects
Fetching https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/11-[10](https://github.com/microCOVID/microCOVID/actions/runs/3532758468/jobs/5927512982#step:10:11)-2022.csv...
read 4017 objects
Fetching https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/[11](https://github.com/microCOVID/microCOVID/actions/runs/3532758468/jobs/5927512982#step:10:12)-09-2022.csv...
read 4017 objects
Fetching https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/11-08-2022.csv...
read 4017 objects
Fetching https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/11-07-2022.csv...
read 4017 objects
Fetching https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/11-06-2022.csv...
read 4017 objects
Fetching https://raw.githubusercontent.com/govex/COVID-19/master/data_tables/vaccine_data/global_data/time_series_covid19_vaccine_global.csv...
read [12](https://github.com/microCOVID/microCOVID/actions/runs/3532758468/jobs/5927512982#step:10:13)3105 objects
Traceback (most recent call last):
  File "update_prevalence.py", line 1863, in <module>
    main()
  File "update_prevalence.py", line 1746, in main
    parse_jhu_vaccines_global(cache, data)
  File "update_prevalence.py", line [14](https://github.com/microCOVID/microCOVID/actions/runs/3532758468/jobs/5927512982#step:10:15)57, in parse_jhu_vaccines_global
    raise ValueError(f"Not able to gain data from {JHUVaccinesTimeseriesGlobal.SOURCE}")
ValueError: Not able to gain data from https://raw.githubusercontent.com/govex/COVID-[19](https://github.com/microCOVID/microCOVID/actions/runs/3532758468/jobs/5927512982#step:10:20)/master/data_tables/vaccine_data/global_data/time_series_covid19_vaccine_global.csv
Sentry is attempting to send 2 pending error messages
Waiting up to 2 seconds
Press Ctrl-C to quit
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
Error: Process completed with exit code 1.

The text was updated successfully, but these errors were encountered:

apiology · 2022-11-26T14:16:15Z

If anyone thinks they could take a crack on figuring out some Python code and would like to pair on fixing this, please reach out, or just book some time with me.

There's some information on setting up local development in our README to get started.

shawnbiesan2 · 2022-11-27T03:14:11Z

Eyeballing it, it looks like the world vaccine data (https://raw.githubusercontent.com/govex/COVID-19/master/data_tables/vaccine_data/global_data/time_series_covid19_vaccine_global.csv) has not been updated since the 21st.

For issues like this is it typical to followup with the source to figure out why it hasn't been updated? (changes in release cadence, no longer maintained, etc). Or moreso just make the script handle the lack of data and move on?

apiology · 2022-11-27T04:13:21Z

Yeah, I've made a practice of following up with the upstream source when things like this happen, which has been pretty effective in general.

Note that population vaccination numbers currently don't affect the risk values in the model much, because it's easy to catch and spread Omicron even when you're vaccinated.

Given that, I wouldn't have a problem making changes to the safety check, especially if it's done in a way that balances the risk of things failing silently as a result. We've encountered data feeds being retired, data formats radically changing, upstream providers having issues they don't fix until we talk to them, etc...having something to tell us about those is useful.

Ideally we'd have a low-noise way to publish warnings about things like this without failing prevalence entirely. We don't today - the Sentry references in the code aren't configured to go to an account I have access to. I've thought about adding a Sentry Slack integration or even just a direct Slack integration from the Python script, so we can at least get those piped into places that active contributors can see.

shawnbiesan2 · 2022-11-27T19:41:56Z

Left a github issue comment for the upstream source but given the state of the other issues I'm not expecting a near term response 🤞

Gotcha, makes sense. Sentry does allow open source projects to apply for a free account via https://sentry.io/for/open-source if a new account for current contributors is needed. I'm assuming the slack you refer to is an instance used for contributors?

apiology · 2022-11-28T14:59:07Z

Left a github issue comment for the upstream source but given the state of the other issues I'm not expecting a near term response 🤞

Thanks for filing that!

They don't provide source code or logs for their data ingestion pipeline. That said, I notice in their README they list three sources for the upstream data:

US Centers for Disease Control and Prevention (CDC): https://covid.cdc.gov/covid-data-tracker/#vaccinations
Our World in Data (OWiD): https://ourworldindata.org/covid-vaccinations
World Health Organization (WHO): https://covid19.who.int/who-data/vaccination-data.csv

Wonder if there's an obvious point where things are stuck upstream.

Sentry does allow open source projects to apply for a free account via https://sentry.io/for/open-source if a new account for current contributors is needed. I'm assuming the slack you refer to is an instance used for contributors?

Right on. Be aware that their free plan does have a 50k monthly error limit, which I suspect we'd blow through with the current configuration on what gets logged. Maybe the open source plan has a higher limit...

Yeah, we have a Slack instance we can use - I can get you access if you like. It's a ghost town in terms of actual discussion, but may be useful for integrations like this to post into.

coachnate · 2022-11-29T03:22:26Z

I'm happy to take a look. I'm not a python expert, but I know my way around well enough. I only took a cursory look, but I didn't see any try/catch action going on. @apiology I'm going to grab some time on calendly with you for later this week to get better aquatinted with the code. I also know actions FWIW.

apiology · 2022-11-29T14:35:41Z

Fixed upstream--thanks to @shawnbiesan2 for alerting folks!

apiology mentioned this issue Nov 29, 2022

Cleanly drop state data upon failures and continue processing #1571

Draft

apiology closed this as completed Nov 29, 2022

shawnbiesan2 mentioned this issue Nov 29, 2022

Log to sentry and continue if JHU vaccine data is unavailable #1655

Closed

apiology mentioned this issue Dec 1, 2022

Reduce WARNING-level prevalence logging #1659

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevalence runs failing #1654

Prevalence runs failing #1654

apiology commented Nov 26, 2022

apiology commented Nov 26, 2022

shawnbiesan2 commented Nov 27, 2022 •

edited

Loading

apiology commented Nov 27, 2022

shawnbiesan2 commented Nov 27, 2022

apiology commented Nov 28, 2022

coachnate commented Nov 29, 2022

apiology commented Nov 29, 2022

Prevalence runs failing #1654

Prevalence runs failing #1654

Comments

apiology commented Nov 26, 2022

apiology commented Nov 26, 2022

shawnbiesan2 commented Nov 27, 2022 • edited Loading

apiology commented Nov 27, 2022

shawnbiesan2 commented Nov 27, 2022

apiology commented Nov 28, 2022

coachnate commented Nov 29, 2022

apiology commented Nov 29, 2022

shawnbiesan2 commented Nov 27, 2022 •

edited

Loading