Skip to content

[Feature]: Option to distribute projects evenly among shards #36253

@gpaciga

Description

@gpaciga

🚀 Feature Request

I would like the option to having sharding where each project is evenly distributed among the shards.

I'm picturing this as distributing tests from each project in a round-robin way across the shards, but a randomized seed would be a suitable alternative.

Example

Currently, if I have 2 projects with 3 tests each and split it into 3 shards, they'll be divided like so:

  • Project 1 / Test 1 -> Shard 1
  • Project 1 / Test 2 -> Shard 1
  • Project 1 / Test 3 -> Shard 2
  • Project 1 / Test 4 > Shard 2
  • Project 2 / Test 1 -> Shard 3
  • Project 2 / Test 2 -> Shard 3
  • Project 2 / Test 3 -> Shard 4
  • Project 2 / Test 4 -> Shard 4

The burden of Project 1 is entirely borne by Shards 1 and 2, while Project 2 goes entirely to Shards 3 and 4. Projects and Shards are always correlated.

Instead, I would like to have the tests of one project allocated evenly to the 4 shards like so:

  • Project 1 / Test 1 -> Shard 1
  • Project 1 / Test 2 -> Shard 2
  • Project 1 / Test 3 -> Shard 3
  • Project 1 / Test 4 > Shard 4
  • Project 2 / Test 1 -> Shard 1
  • Project 2 / Test 2 -> Shard 2
  • Project 2 / Test 3 -> Shard 3
  • Project 2 / Test 4 -> Shard 4

Instead of "filling up" one shard before moving to the next, we spread the projects evenly.

Motivation

The existing algorithm is fine when projects are all approximately equal, e.g. in the default case where 3 projects each represent a different browser but all run the same set of tests.

However, I'm working on a product that has one set of slow tests that I've isolated into their own project since I only want to run it in one browser. My project list looks like:

  • Project 1: All tests in Chrome expect the tests reserved for Project 4
  • Project 2: All tests in Firefox expect the tests reserved for Project 4
  • Project 3: All tests in Webkit expect the tests reserved for Project 4
  • Project 4: Slow tests only run in a single browser

When parallelizing this, all the slow tests being grouped together means the shards will never have equal runtimes. Even if I add more shards to spread the load, the last shard will always take much longer to run. (Typically by a factor of 10 in my case).

I am aware there are other proposals to alter sharding behaviour, e.g. letting the user modify it directly (#33386) and multiple version of using timing data (#20116, #17969). A seed to randomize the distribution was added in #30817 but then reverted in #31260 in favour of #30962 which would actually implement the algorithm I've described, but that PR was closed without being merged. Then #33032 attempted it again but was closed without being merged. I think the latest idea is the one to allow programmatically specifying any logic you want, but to me "round-robin" and "randomized" are two pretty obvious alternatives. Really feels to me like we keep blocking incremental improvement here while we wait for a perfect solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions