[node-core-library] Add allowOversubscription option #5355

ethanburrelldd · 2025-09-12T14:21:12Z

Summary

Adds allowOversubscription property to control whether weighted operations can exceed concurrency limits.

Details

Previously, weighted operations could be started in a case where the sum of all running operations would exceed the concurrency limit.

This PR adds an allowOversubscription property that allows for more control over currency behavior while maintaining backward compatible with the current behavior. Backward compatibility is maintained since the current implementation lets the CPU sit idle for less cycles.

allowOversubscription = true, keeps existing behavior, sum of operations can exceed limit (default)
allowOversubscription = false, enforces the concurrency limit preventing the option from running until weight concurrency is available

In my company's repo, we're running into some resource issues where many small projects get queued with a large project that should be running alone. Now, for this expensive project we will use allowOversubscription: false, weight: 8, where 8 is the parallelism set when running rush build -p 8, this should allow the expensive project to run with a higher level of isolation. In these cases the expensive operation will eat up CPU resources causing the smaller operations to timeout.

Example

With maxConcurrency = 8 and concurrentUnitsInProgress = 4 after the last task exited

Previously: An operation of weight 5 task could start
Now (allowOversubscription=true): An operation of weight 5 task could start
Before (allowOversubscription=false): An operation of weight 5 will wait to start until concurrentUnitsInProgress <= 3

Changes:

Added allowOversubscription option to command-line.schema.json (defaults to true)
updated CommandLineJson.ts class to support this option
Propagate allowOversubscription through the operation lifecycle
updated _forEachWeightedAsync to handle the cases when this is set
Added test coverage for this new option

How it was tested

Can I please get a pre-release so that I can test this version on our repo?

integration testing by patching the library in production repo
testing in rush-redis-cobuild-plugin-integration-test
unit tests

Impacted documentation

update required for this page: https://rushjs.io/pages/configs/command-line_json/

ethanburrelldd · 2025-09-12T17:31:56Z

@microsoft-github-policy-service agree company="DoorDash"

D4N14L

Overall looks good, though would like @dmichon-msft to take a look here.

common/changes/@rushstack/node-core-library/eb-concurrency-bug-fix_2025-09-11-15-24.json

libraries/node-core-library/src/Async.ts

dmichon-msft · 2025-09-12T23:04:43Z

For clarity, this isn't a bugfix, it's a behavior change. Exceeding the concurrency was the original intended design for how to handle large tasks. Waiting for sufficient capacity to take the entire job results in the CPUs spending more time idling and in general is expected to slow down overall completion.

libraries/node-core-library/src/test/Async.test.ts

dmichon-msft · 2025-09-12T23:14:52Z

I think the safest way to address the competing priorities would be to add an extra option maxOversubscription or similar (with a default of 0), which affects how far a large operation is allowed to push the max concurrency over the limit temporarily.

ethanburrelldd · 2025-09-15T15:52:45Z

@dmichon-msft I'd like to get clarity on the intended behavior for the concurrency parameter to check that I'm understanding this library's expected behavior correctly.

The JSDoc for concurrency states it should "limit the maximum number of concurrent promises to the specified number." My change enforces this as a strict limit, but I wanted to understand if the previous behavior that allowed tasks to exceed this limit was intentional (and the docs need updating) or if it was a bug.

Here's the different options we have:

Allow oversubscription: Let large tasks exceed the limit (previous behavior)
Strict limit: Wait until sufficient capacity is available (current approach)
Task-level control: Let individual tasks opt into exceeding limits

If we want to go ahead with #3, I'd suggest using a boolean allowExceedingConcurrency (default = ~~false~~ true) to configure this behavior at the task level. I think the numeric approach creates unpredictable and hard to configure behavior. Example: if running a cobuild on 2 agents with concurrency=4 and tasks [1,2,3,4], depending on scheduling you might get:

1, 3 and 2, 4 (second agent exceeding limit by 2)
- if maxOversubscription = 1 then 4 would wait for 2 to finish before executing
1, 4 and 2, 3 (both agents exceeding limit by 1)
- if maxOversubscription = 1 then 4 would execute despite being over concurrency of 4

I think a boolean keeps the behavior deterministic and simpler for developers to reason about, I think setting the overage amount is difficult to reason about. I agree that changing this behavior could affect build times of existing repos, let me know the best way to land this in a safe way while exposing this isolation logic to project maintainers.

Please let me know which approach we'd like to go forward with and I can update this PR.

dmichon-msft · 2025-09-15T19:56:31Z

I can work with a boolean control, seems simple enough.

Oversubscription was supported in the original design to deal with the scenario of "what happens if you specify a max concurrency of less than the largest operation weight", but arguably that gets handled by clipping the weights to the max concurrency. The other consideration is that if you have 16 cores, are running a long operation that takes 1, and have a queued operation that can use 16 cores (but in practice uses up to that, whatever it can get), then having to wait for that long running operation is wasteful.

Arguably the best way to handle heavy jobs is probably to tune your configured operation weights to take better advantage of the hardware (or better yet, to shard that expensive operation so that it can be scheduled more easily).

dmichon-msft · 2025-09-16T23:48:46Z

I apologize for the miscommunication; I think allowOversubscription should be a flag in the options to Async.forEachAsync, not something we try to specify for individual tasks. If you try to do it on individual tasks the algorithm gets really confusing, because theoretically you should only engage in oversubscription if all currently executing tasks allow it.

aramissennyeydd · 2025-09-17T15:29:28Z

@dmichon-msft Chiming in here a little late (I've been sick the past few days), hopefully adding a little more context on how we ended up here. We've been investigating a whole slew of unit test flakes recently that are pretty easy to track down to "this test phase ran with our other expensive unit test phase" or "this test phase ran with our expensive NextJS app build phase". In those cases the test phase is weight 1 and the expensive phases are already weight 8. (the unit tests we're running have already been sharded and the NextJS app build can't be :( )

To address the flakes, we have a few options:

Try to drop rush parallelism to 1 so that all phases are run in isolation and don't impact other phases. We've tested this and it's caused a very significant slowdown in CI execution (upwards of 3-4x slower).
Update all test phase weights to 8. Basically treating all tests as the problem and ensuring only 1 runs at a time. Also causes a significant slowdown.
This PR to try and isolate the known offending phases so that they don't impact the rest of the executing operations.

ethanburrelldd · 2025-09-17T18:14:34Z

@dmichon-msft

Thanks for the feedback, I've added the allowOversubscription option into command-line.json that gives users more granular config into how their parallelism works.

I'd appreciate another review on this, whenever this is in a state close to approval it would be great to have a preview release so I can test on my teams repo.

common/changes/@microsoft/rush/eb-concurrency-bug-fix_2025-09-16-19-06.json

common/changes/@rushstack/node-core-library/eb-concurrency-bug-fix_2025-09-11-15-24.json

libraries/node-core-library/src/Async.ts

ethanburrelldd · 2025-09-19T16:34:21Z

@iclanton is this PR good to merge? I can test via preview in our repo if you'd like more testing before this goes in.

ethanburrelldd · 2025-09-24T17:51:52Z

@dmichon-msft @iclanton @D4N14L

Hey Team, this PR seems to be a solid fix for the test flakes and performance issues we're experiencing when several expensive projects run on the same build agent. Could you provide a timeline for a merge or a dev preview so we can test it out? Thanks for your patience with my pings, we're just really excited to get this resolved. 😃

octogonz · 2025-09-25T18:32:58Z

libraries/node-core-library/src/Async.ts

+   * If true (default), will start operations even when they would exceed the limit.
+   * If false, waits until sufficient capacity is available.
+   */
+  allowOversubscription?: boolean;


/** * Controls whether operations can start even if doing so would exceed the total concurrency limit. * If true (default), will start operations even when they would exceed the limit. * If false, waits until sufficient capacity is available. */ allowOversubscription?: boolean;

Rush Stack's convention is that optional booleans always default to false.

The allowOversubscription=false behavior seems like a more natural/intuitive operation, so maybe we should make false the default?

Although that's technically a "breaking" change, a bit less parallelism in an edge case is unlikely to break anyone's existing code. In fact, it's arguably a bugfix.

@dmichon-msft

I've updated the comments and changed the default to false for Async but not for Rush.

@ethanburrelldd Do you think we should change the default for Rush as well?

Changing the default for Async to false makes sense, but I'm worried about changing the default for Rush. It might slow things down for repo maintainers who update their version. Since the previous behavior allowed for oversubscription, keeping true as the default for Rush seems like the right move to avoid breaking things for other maintainers.

…lse; improve docs

octogonz · 2025-09-25T20:38:35Z

🚀 @microsoft/rush version 5.158.1-pr5355.0 has been published.

@ethanburrelldd Let us know how if it solves your problem.

iclanton · 2025-10-06T18:24:45Z

@ethanburrelldd - Did that release work for you?

ethanburrelldd · 2025-10-07T16:02:05Z

Thanks for generating the pre-release! We're seeing decreased parallelism alongside tasks that have weight = concurrency limit (EG: @org/main-web-app (cotest) - shard 1/4). This allows a small subset of expensive tests from projects with weight < concurrency limit to have better performance due to no longer running at the same time as the long running expensive test cases.

Here's a snapshot comparing build plans before / after the preview release.

rushVersion = 5.157.0:
   @org/lib-ui-components (build-storybook) ----------------####----------------------------------------------------------------  59.4s
                @org/service-web-app (test) -----------------#######------------------------------------------------------------  78.6s
                      @org/chat-app (build) -----------------###----------------------------------------------------------------  28.0s
     @org/main-web-app (cotest) - shard 1/4 -----------------#####################################------------------------------ 632.7s

rushVersion = 5.158.1-pr5355.0:
                @org/small-utility (build) --#---------------------------------------------------------------------------------    0.1s
    @org/main-web-app (cotest) - shard 1/4 ---#####################################################---------------------------- 643.7s

octogonz · 2025-10-07T16:31:31Z

Great, thanks for following up!

ethanburrelldd requested review from D4N14L, apostolisms, dmichon-msft, iclanton and octogonz as code owners September 12, 2025 14:21

github-project-automation bot added this to Bug Triage Sep 12, 2025

github-project-automation bot moved this to Needs triage in Bug Triage Sep 12, 2025

D4N14L reviewed Sep 12, 2025

View reviewed changes

common/changes/@rushstack/node-core-library/eb-concurrency-bug-fix_2025-09-11-15-24.json Outdated Show resolved Hide resolved

libraries/node-core-library/src/Async.ts Outdated Show resolved Hide resolved

libraries/node-core-library/src/Async.ts Outdated Show resolved Hide resolved

dmichon-msft reviewed Sep 12, 2025

View reviewed changes

libraries/node-core-library/src/test/Async.test.ts Show resolved Hide resolved

ethanburrelldd changed the title ~~[node-core-library] Fix weighted oversubscription~~ [node-core-library] Add allowOversubscription option Sep 16, 2025

ethanburrelldd added 9 commits September 16, 2025 18:35

Fix concurrency bug for max weighted operation scheduling

3e4fc31

create Peekable iterator

534076e

changelog

ba1c694

go back to singleton next iterator

c50ea4f

reviews

5a84065

reviews

7688a32

update changelogs

1fce468

api reviews

381b718

test cleanup before review

18960cd

ethanburrelldd force-pushed the eb/concurrency-bug-fix branch from 36c1370 to 18960cd Compare September 16, 2025 22:39

ethanburrelldd requested a review from dmichon-msft September 16, 2025 23:02

move allowOversubscription to command-line.json

69c60b5

remove un-needed changes

5b20bcf

dmichon-msft approved these changes Sep 18, 2025

View reviewed changes

iclanton reviewed Sep 18, 2025

View reviewed changes

iclanton approved these changes Sep 18, 2025

View reviewed changes

libraries/node-core-library/src/Async.ts Outdated Show resolved Hide resolved

ethanburrelldd added 3 commits September 18, 2025 15:00

reviews

d1bd1f2

pull bump type from version-policies

9bdaab2

change to none

44f794f

Merge branch 'main' into eb/concurrency-bug-fix

39ed33b

octogonz reviewed Sep 25, 2025

View reviewed changes

Change IAsyncParallelismOptions.allowOversubscription default to fa…

2530f74

…lse; improve docs

iclanton moved this from Needs triage to In Progress in Bug Triage Oct 6, 2025

octogonz merged commit 3a5cc0e into microsoft:main Oct 7, 2025
5 checks passed

github-project-automation bot moved this from In Progress to Closed in Bug Triage Oct 7, 2025

[node-core-library] Add allowOversubscription option #5355

[node-core-library] Add allowOversubscription option #5355

Uh oh!

Conversation

ethanburrelldd commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Example

Changes:

How it was tested

Impacted documentation

Uh oh!

ethanburrelldd commented Sep 12, 2025

Uh oh!

D4N14L left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dmichon-msft commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

dmichon-msft commented Sep 12, 2025

Uh oh!

ethanburrelldd commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dmichon-msft commented Sep 15, 2025

Uh oh!

dmichon-msft commented Sep 16, 2025

Uh oh!

aramissennyeydd commented Sep 17, 2025

Uh oh!

ethanburrelldd commented Sep 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ethanburrelldd commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ethanburrelldd commented Sep 24, 2025

Uh oh!

octogonz Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

octogonz Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

octogonz Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

ethanburrelldd Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

octogonz commented Sep 25, 2025

Uh oh!

iclanton commented Oct 6, 2025

Uh oh!

ethanburrelldd commented Oct 7, 2025

Uh oh!

octogonz commented Oct 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ethanburrelldd commented Sep 12, 2025 •

edited

Loading

dmichon-msft commented Sep 12, 2025 •

edited

Loading

ethanburrelldd commented Sep 15, 2025 •

edited

Loading

ethanburrelldd commented Sep 19, 2025 •

edited

Loading