-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automate ingest and phylogenetic workflows #38
Conversation
Currently just runs the ingest workflow and uploads the results to AWS S3. Subsequent commits will add automation for the phylogenetic workflow. Copied commit from Zika PR #52 nextstrain/zika@d44f2ae
The phylogenetic workflow will run after the ingest workflow has completed successfully to use the latest available data. Subsequent commits will check if the ingest results included new data to only run the phylogenetic workflow when there's new data. Copied commit from Zika PR #52 nextstrain/zika@2c415e7
Uses GitHub Actions cache to store a file that contains the `Metadata.sh256sum` of the ingest files on S3 and use the `hashFiles` function to create a unique cache key. Then the existence of the cache key is an indicator that the ingest file contents have not been updated since a previous run on GH Actions. This does come with a big caveat that GH will remove any cache entries that have not been accessed in over 7 days.¹ If the workflow is not being automatically run within 7 days, then it will always run the phylogenetic job. If this works well, then we may want to consider moving this within the `pathogen-repo-build` reusable workflow to have the same functionality across pathogen automation workflows. ¹ https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy Copied commit from Zika PR #52 nextstrain/zika@eb5e76d
Add individuals inputs per workflow to override the default Docker image used by `nextstrain build`. Having this input has been extremely helpful to continue running pathogen workflows when we run into new bugs that are not present in older nextstrain-base images. There are separate image inputs for the two workflows because they use different tools and may require different versions of images. Copied commit from Zika PR #52 nextstrain/zika@65a8acc
Copied daily schedule of mpox ingest https://github.com/nextstrain/mpox/blob/e439235ff1c1d66e7285b774e9536e2896d9cd2f/.github/workflows/fetch-and-ingest.yaml#L4-L21 Daily runs seem fine since the ingest workflow currently takes less than 2 minutes to complete and it will not trigger the phylogenetic workflow if there's no new data. We can bring this down to once a week if it seems like overkill. Copied commit from Zika PR #52 nextstrain/zika@77ca1d4
642a310
to
795546d
Compare
Yay, the test run completed with plenty of time to spare for the phylogenetic workflow 🎉
I know this will be fixed in #18, but I would just update the phylo outputs with the hardcoded |
FYI, after the initial trigger with
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me 👍
A couple things to follow up on before merging:
- drop the
pull_request
triggers added in 5daf8b7 - include hash fixes added in Fix automation hash zika#54
Copied fix from zika, guided by @joverlee521 nextstrain/zika@cdd071e
Uses the variable `AWS_DEFAULT_REGION` that @joverlee521 added to the Nextstrain GitHub organization variables.¹ ¹ https://github.com/organizations/nextstrain/settings/variables/actions Copied fix from zika, guided by @joverlee521 nextstrain/zika@cdd071e
9839842
to
02504d6
Compare
The dengue workflow is now showing up on the pathogen workflow status page |
I missed this in review of #38. The phylogenetic workflow was still pulling from old S3 URLs and not the ingest workflow output data. This commit corrects the S3 URL to the ingest output files and updates the `strain_id_field` config param to use the appropriate ID column from the ingest output.
Description of proposed changes
Coordinated with @joverlee521 to copy commits from zika PR: nextstrain/zika#52
Adds a single GH Action workflow to automate the ingest and phylogenetic workflows, set to run daily at the same time as the automated mpox ingest.
Uses GH Action caches to store hash of ingest results'
Metadata.sha256sum
values added to the S3 metadata within upload-to-s3. If the cache contains a match from previous runs of the GH Action workflow, then the workflow will skip the phylogenetic job.See commits for details.
Related issue(s)
Based on discussion in nextstrain/pathogen-repo-guide#25
Checklist
The manual run completed successfully although does not push to the live site since output files do not have "_genome" postfixs in the filenames: