-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GISAID workflow hitting max_session_duration #446
Comments
It may be time to revisit #240 |
My comment from related thread on Slack:
|
Ah, this is also not considering the full run that gets triggered when there's a new Nextclade dataset released. The last full run on Apri 16, 2024 ran for ~15h. |
I've been only bumping the memory but not the CPUs for the fetch-and-ingest workflows. Might as well use all the compute that we are paying for. GenBank should be using c5.9xlarge and GISAID should be using c5.12xlarge, so bumping CPUs to match the instances.¹ Maybe this will magically help #446? ¹ <https://aws.amazon.com/ec2/instance-types/c5/>
Bumping the CPUs in #447 decreased the build time by about ~1h, which came from parallelizing the download of data at the beginning of the workflow. We will still run over the 12h limit for full Nextclade runs, so I'm going to work on nextstrain/ingest#41 |
Thanks @joverlee521 for the summary! How hard is it to parallelize As this operates on ndjson lines, it might be parallelizable, or at least some part of it. The obvious way to do so would be to have a split rule to divide input files into N chunks, run transform, and merge back. |
@corneliusroemer I honestly have no idea...I made #448 to track this separately. |
The speeding up upload-to-s3 is not as straight-forward as initially thought... For now, sidestepping the issue by creating a nextstrain-ncov-ingest IAM user and added credentials to repo secrets. So the workflow be able to run without any time limits. Added a post clean up list above to remove those credentials and delete the user once we've resolved this issue. |
Adding as part of #240 to help collect more data for tackling #446. One unexpected behavior that I ran into when testing the `--stats` option is that Snakemake doesn't generate the stats file if the workflow exits with an error at any step. Note that the Snakemake `--stats` option is not available starting with Snakemake v8, so this will need to be removed when we eventually upgrade Snakemake in our runtimes.
Latest GISAID full run that included complete re-runs of both Nextclade datasets was >21h. |
Context
Our automated workflows use short lived AWS credentials for sessions which are limited by the max_session_duration of 12 hours, which is the maximum allowed by AWS.
The GISAID workflow this max yesterday and ran into errors:
TODOs
upload-to-s3
ingest#41transform-gisaid
#448Post clean-up
The text was updated successfully, but these errors were encountered: