Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit concurrent deploys to ~2 #1045

Merged
merged 1 commit into from
Feb 2, 2023
Merged

Conversation

tsibley
Copy link
Member

@tsibley tsibley commented Feb 1, 2023

This should help avoid the CloudFront limit, which we recently ran into.¹ Full rationale in the new comment.

The availability of Snakemake's workflow.global_resources is something I uncovered by reading thru the source code after a documentation search for how to possibly set defaults for --resources.² Though it's not documented, I feel comfortable enough using it as 1) Snakemake's version control history shows this property has been available for a long time unchanged, and 2) searching GitHub for others' usage of workflow.global_resources found many examples.

¹ https://bedfordlab.slack.com/archives/C0159227X7Y/p1675268246299799?thread_ts=1675215501.875519&cid=C0159227X7Y

² Note that this is a different thing than --default-resources, which
sets defaults for what resources are consumed by rules, not what
resources are available to the workflow.

Testing

Tested this technique works with Snakemake.

Have not run a full trial build, but that also seems unnecessary.

This should help avoid the CloudFront limit, which we recently ran
into.¹  Full rationale in the new comment.

The availability of Snakemake's `workflow.global_resources` is something
I uncovered by reading thru the source code after a documentation search
for how to possibly set defaults for --resources.²  Though it's not
documented, I feel comfortable enough using it as 1) Snakemake's version
control history shows this property has been available for a long time
unchanged, and 2) searching GitHub for others' usage of
`workflow.global_resources` found many examples.

¹ <https://bedfordlab.slack.com/archives/C0159227X7Y/p1675268246299799?thread_ts=1675215501.875519&cid=C0159227X7Y>

² Note that this is a different thing than --default-resources, which
  sets defaults for what resources are consumed by rules, not what
  resources are available to the workflow.
@tsibley tsibley requested review from corneliusroemer and a team February 1, 2023 23:21
Comment on lines +420 to +421
resources:
concurrent_deploys = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking through the Resources section of Snakemake docs, I understand that standard resources such as mem_mb are used by Snakemake itself. However, I still don't see where custom resources such as concurrent_deploys are used. Can you help me understand?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom resources are taken into account by Snakemake's scheduler, the same way that it takes into account CPUs and memory allocated to the workflow.

The relevant bits from the doc section you linked to are:

[T]he scheduler will ensure that the given resources are not exceeded by running jobs.

In general, resources are just names to the Snakemake scheduler, i.e., Snakemake does not check on the resource consumption of jobs in real time. Instead, resources are used to determine which jobs can be executed at the same time without exceeding the limits specified at the command line.

Resources can have any arbitrary name, and must be assigned int or str values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the value of a custom resource is int, then the sum of those values, from all jobs where it's set, is capped by a globally defined value with the same name. Am I understanding this right? How are limits applied to str values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're understanding it right. Numeric custom resources are used in the scheduling solutions, the ~same way the builtin CPU and memory resources are used.

A job's consumed resources are specified via the resources: block, --default-resources option, --set-resources option, or profile config.

A workflow's allocated resources are specified via --resources or workflow.global_resources.

The Snakemake docs aren't super clear on str-typed resources (though they may be technically accurate with a close reading). The code is clearer. While a job's consumed resources can be an int, string, or a callable, the resources allocated to the workflow can really only be ints because the scheduler needs to deal in numbers at the end of the day. The only builtin resource to take a string is tmpdir, and that's a specially non-scheduled resource. It's not clear to me why, or even if, it'd be useful or possible to have a job consume a custom resource that was a string, but it wouldn't be involved in the scheduler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that explains it. I was confused at first because I thought there was as single concurrent_deploys taking the value of 1 or 2 as a default, without being used anywhere else. Now I understand that the global value is a cap on when it is set on a job level.

@tsibley tsibley merged commit bb8874c into master Feb 2, 2023
@tsibley tsibley deleted the trs/limit-concurrent-deploys branch February 2, 2023 23:01
j23414 added a commit to nextstrain/dengue that referenced this pull request Mar 29, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 11, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 12, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 14, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 17, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 17, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 17, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 17, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 17, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 17, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 17, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 17, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 17, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 17, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 17, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 18, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 18, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 18, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 18, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 18, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 18, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 18, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 18, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 18, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 18, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 18, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 18, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 18, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Apr 18, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request May 5, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request May 5, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request May 5, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request May 5, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request May 5, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request May 5, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request May 5, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request May 5, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request May 5, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Jun 6, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Jun 21, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Aug 19, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
j23414 added a commit to nextstrain/dengue that referenced this pull request Sep 11, 2023
Since fetch_from_genbank can query NCBI up to 5 times for each of the serotypes, try to limit concurrent queries to under 3. Using 2 to be cautious.

Following the format shown at:
nextstrain/ncov#1045
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants