[Feature]: Option to distribute projects evenly among shards

### 🚀 Feature Request

I would like the option to having sharding where each project is evenly distributed among the shards.

I'm picturing this as distributing tests from each project in a round-robin way across the shards, but a randomized seed would be a suitable alternative.

### Example

Currently, if I have 2 projects with 3 tests each and split it into 3 shards, they'll be divided like so:

- Project 1 / Test 1 -> Shard 1
- Project 1 / Test 2 -> Shard 1
- Project 1 / Test 3 -> Shard 2
- Project 1 / Test 4 > Shard 2
- Project 2 / Test 1 -> Shard 3
- Project 2 / Test 2 -> Shard 3
- Project 2 / Test 3 -> Shard 4
- Project 2 / Test 4 -> Shard 4

The burden of Project 1 is entirely borne by Shards 1 and 2, while Project 2 goes entirely to Shards 3 and 4. Projects and Shards are always correlated.

Instead, I would like to have the tests of one project allocated evenly to the 4 shards like so:

- Project 1 / Test 1 -> Shard 1
- Project 1 / Test 2 -> Shard 2
- Project 1 / Test 3 -> Shard 3
- Project 1 / Test 4 > Shard 4
- Project 2 / Test 1 -> Shard 1
- Project 2 / Test 2 -> Shard 2
- Project 2 / Test 3 -> Shard 3
- Project 2 / Test 4 -> Shard 4

Instead of "filling up" one shard before moving to the next, we spread the projects evenly. 

### Motivation

The existing algorithm is fine when projects are all approximately equal, e.g. in the default case where 3 projects each represent a different browser but all run the same set of tests.

However, I'm working on a product that has one set of slow tests that I've isolated into their own project since I only want to run it in one browser. My project list looks like:

- Project 1: All tests in Chrome expect the tests reserved for Project 4
- Project 2: All tests in Firefox expect the tests reserved for Project 4
- Project 3: All tests in Webkit expect the tests reserved for Project 4
- Project 4: Slow tests only run in a single browser

When parallelizing this, all the slow tests being grouped together means the shards will _never_ have equal runtimes. Even if I add more shards to spread the load, the last shard will _always_ take _much_ longer to run. (Typically by a factor of 10 in my case).

I am aware there are other proposals to alter sharding behaviour, e.g. letting the user modify it directly (#33386) and multiple version of using timing data (#20116, #17969). A seed to randomize the distribution was added in #30817 but then reverted in #31260 in favour of #30962 which would actually implement the algorithm I've described, but that PR was closed without being merged. Then #33032 attempted it again but was closed without being merged.  I think the latest idea is the one to allow programmatically specifying any logic you want, but to me "round-robin" and "randomized" are two pretty obvious alternatives. Really feels to me like we keep blocking incremental improvement here while we wait for a perfect solution.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Option to distribute projects evenly among shards #36253

🚀 Feature Request

Example

Motivation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: Option to distribute projects evenly among shards #36253

Description

🚀 Feature Request

Example

Motivation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions