Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CPU millis for cpus directive #2516

Closed
wants to merge 30 commits into from

Conversation

pditommaso
Copy link
Member

This commit adds the ability to CPU millis as unit for the cpus directive.

Target execution platforms should be able to take advantage of this feature, otherwise, it falls back to integer units representing the number of cores.

More in details:

  • when providing an integer value for the cpus directive nothing change
  • when providing a decimal number eg 1.5 platform supporting CPUs millis
    takes the 1500m value; platforms not supporting it, the millis value is
    rounded to the nearest upper value ie 2
  • when providing an integer number ending with m the value is intepreted
    as CPUs millis, therefore platform supporting it takes the value 1500m;
    if the legacy platform takes the nearest upper value ie 2.

This commit adds the ability to CPU millis as unit for the
`cpus` directive. Target execution platform should be able
to take advantage of this feature otherwise fallback to integer
units representing the number of cores.

More in details:
- when providing an integer value for the `cpus` directive nothing change
- when providing a decimal number eg `1.5` platform supporting CPUs millis
  takes the 1500m value; platforms not supporting it, the millis value is
  rounded to the nearest upper value ie `2`
- when providing an integer number ending with `m` the value is intepreted
  as CPUs millis, therefore platform supporting it takes the value `1500m`;
  if the legacy platform takes the nearest upper value ie `2`.
@pditommaso pditommaso linked an issue Dec 22, 2021 that may be closed by this pull request
Copy link
Member

@bentsherman bentsherman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want me to run the tests myself as part of the review? I assume they work on your end.

Some future items for the record:

  • demonstrate this feature in the Nextflow docs
  • add a millis unit so that you can do cpus = 1000.m

@pditommaso
Copy link
Member Author

Since we are moving to a world with fractional CPUs request, I've modelled this adding a CpuUnit class as it's done for the memory

@pditommaso
Copy link
Member Author

Do you want me to run the tests myself as part of the review? I assume they work on your end.

Yep, do need to test it

@bentsherman
Copy link
Member

Tests pass for me. Ready to merge.

I will hold off on documentation until we add request / limit for cpus. Side note for docs: you must use '1000m' instead of 1000.m because that already refers to time minutes.

@bentsherman
Copy link
Member

Going over the related issues #773, @pditommaso is the situation you described here still covered?

Problem 2) this could be managed in the cpus directive but then how the actual command/tool would use that value? I mean the cpus is reflected in the task.cpus property that is generally used in the command script to set the number of threads/cpus that a multithread program should use. See for example here. Would not make sense to provide there a decimal value.

In other words, what will task.cpus be if cpus was provided in millis? If the executor doesn't support millis then I assume task.cpus will already be rounded up. But if it does support millis, I'm not sure what it should be.

@bentsherman
Copy link
Member

bentsherman commented Dec 23, 2021

From the same thread, this is what I recommend:

Another thing is how many threads/processes a program should use to perform a computation. Usually it can be set to the fraction of CPU time described above -rounded down when it is larger than 1, rounded up if it is less than 1- num_processes = max(1, floor(cpu_fraction)).

We can explain this in the docs. Basically the cpus directive specifies the number of cores to allocate, and task.cpus should give the number of threads/processes that are available to use. If you're actually using multi-threading then you probably don't want to specify a fractional CPU anyway.

Edit: To clarify, I don't think you would ever need to have the exact milli-cpus available in the task object.

@pditommaso
Copy link
Member Author

Um no, in this PR task.cpus always see the (rounder) integer value. Think it's the most safety thing to do. Tools generally need to parallelise by the number of cpus/cores. Don't think it will be useful to give in form of a percentage. If needed, we can expose a task.cpusMillis

@pditommaso
Copy link
Member Author

Not sure to understand the rationale of the last comment. Why if it was specified cpus 1.9, the task should see 1?

@bentsherman
Copy link
Member

Honestly... I don't think it matters whether task.cpus equals 1 or 2 in that situation. I don't think the user should specify a fractional CPU if the process is doing multi-threading anyway. Perhaps best to leave it as is.

@pditommaso
Copy link
Member Author

Disagree, I've seen a lot of use cases in which a task declaring, let's say 4, in reality, it only has a peak using 4 cpus and on average less. I think in these cases declaring a fraction can help to better tune the real usage.

@bentsherman
Copy link
Member

So in that case we still want to round up the fractional cpu for task.cpus right? Fine by me.

Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@bentsherman
Copy link
Member

@pditommaso This PR is ready to be merged. We agreed that task.cpus should be rounded up, which is what it was already doing. The docs are updated now. I think we are good to go. This feature will be useful for testing the resource optimization.

This commit adds the ability to CPU millis as unit for the
`cpus` directive. Target execution platform should be able
to take advantage of this feature otherwise fallback to integer
units representing the number of cores.

More in details:
- when providing an integer value for the `cpus` directive nothing change
- when providing a decimal number eg `1.5` platform supporting CPUs millis
  takes the 1500m value; platforms not supporting it, the millis value is
  rounded to the nearest upper value ie `2`
- when providing an integer number ending with `m` the value is intepreted
  as CPUs millis, therefore platform supporting it takes the value `1500m`;
  if the legacy platform takes the nearest upper value ie `2`.
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@bentsherman
Copy link
Member

@pditommaso This PR is updated with the latest changes. I also added fractional CPU support to AWS Batch. Ready for your to review.

@bentsherman
Copy link
Member

Just confirmed that decimal values don't work for AWS Batch:

Value 1.1 for type VCPU in resourceRequirement is not valid. Please provide a numeric value. (Service: AWSBatch; Status Code: 400; Error Code: ClientException; Request ID: 84dfe037-5b3e-4e8e-ac58-a504de75a906; Proxy: null)

So it looks like decimal values must be less than 1, which is only supported by Fargate. I will revert these changes.

@pditommaso pditommaso force-pushed the master branch 2 times, most recently from 0d59b4c to b93634e Compare March 11, 2023 11:20
@bentsherman bentsherman self-assigned this Apr 26, 2023
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@bentsherman
Copy link
Member

@pditommaso I updated and cleaned up this PR. To review, it adds fractional CPU support to the Local, K8s, and Google Batch executors. Otherwise fractional requests are rounded to the next integer. The task.cpus and the cpus trace field are always rounded up for backwards compatibility.

I think it is good to go. Do you have any remaining concerns?

Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@bentsherman
Copy link
Member

@pditommaso I think we can close this PR. I think the original use cases can be covered in other ways:

  • setting pod cpus to vary from 0.9 - 1.1 CPUs was more about not setting the CPU limit to allow tasks to burst CPU usage when needed, which Nextflow does now

  • submitting many small tasks with e.g. cpus = 0.1, I think this is better handled by task batching Task batching #3909

// submit 100 tasks with 0.01 cpus, lots of overhead
cpus = '10m'

// submit one task batch with 1 cpu, much more efficient
batch = 100
cpus = 1

@pditommaso
Copy link
Member Author

Agree

@pditommaso pditommaso closed this Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scheduling using K8S
3 participants