-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(test runner): improve sharding algorithm to better spread similar tests among shards #30962
base: main
Are you sure you want to change the base?
Conversation
Maybe it's better to make this an option to allow restoring the old behaviour. ¯_(ツ)_/¯
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Do you think you can achieve the same better behavior with your sharding seed? Or are you looking for additional bias against subsequent tests being put into the same group? |
Not sure yet. But I will test this new sharding logic in our test setup to gather some results.
The seeded shuffle is basically just a quick and easy way to influence the test group to shard assignment… it's random and so it's results may vary. However, this change is aimed to improve the sharding logic to generally yield better results, which yet needs to be proved. 😅 Currently this sharding algorithm uses the number of tests per test group as a cost metric. It would be great if we could use the test duration of a previous run (when available) to even better distribute the tests among the shards. But the algorithm would be quite similar. |
I think your seed change allows users to experiment with the seeds and arrive at some better state than they are today. Any other changes w/o the timing feedback are going to yield similar results, not need to experiment with biases.
This requires a feedback loop with the test time stats which we don't have today. We recently started storing the last run stats in |
Yes, I would like to work on that. I was not yet aware of the .last-run.json. Is that something that is also written by the merge reports command? Because we need the stats combined from all shard runs. I was thinkings about adding a separate reporter for that purpose, but if those last run stats are already there…, then there might not be the need to create a separate reporter. |
Shaping this code as reporter sounds good, but Playwright core would need to consume the output of that reporter, so it needs to be baked in. Merging those should not be a hard problem, reporter or not. Unfortunately merging mangles test ids today, so we'd need to figure that out. Maybe not using the ids altogether and falling back to the file names and test titles. Also has some tricky edge cases as tests that are fast on Chromium and are slow on Firefox... |
This comment has been minimized.
This comment has been minimized.
I've added a Surprisingly when merging the reports the test ids just had a 1 character suffix that I was able to strip off… but it doesn't feel like the right way to do this. What's the reason to modify test ids when merging blobs? Couldn't this be done in a way that only modifies a test id when there is a collision? |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
const testDurations = testRun.rootSuite?.allTests().reduce((map, t) => { | ||
if (t.results.length) | ||
map[t.id] = t.results.reduce((a, b) => a + b.duration, 0); | ||
return map; | ||
}, {} as { [testId: string]: number }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually not sure it is the right way to sum all the durations… maybe it makes more sense to calc the average? Or only include durations from successful test runs… 🤔
@pavelfeldman |
Agreed. Note that for the PGO-alike behavior, we would probably explicitly point to the file and commit it to the repo. Would be the same format as last-run, but user would copy it over into some playwright/ folder and point to it explicitly. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Do you have a recommendation how you would name cli parameter / configuration option? Something like |
When merging blob reports test ids are patched to make sure there is no collision when merging reports that might have overlapping test ids. However, even if you were merging reports that had no overlapping ids, all test ids will be modified, which is an undesirable side effect. This PR only modify test ids when the same test id has already been used in a previous blob report. ---- This change is also part of #30962
Test results for "tests 1"27501 passed, 608 skipped Merge workflow run. |
…icrosoft#30817)" This reverts commit 825e0e4. API review notes: sounds like this change did not solve the problem for the contributor, there is a new approach under development in microsoft#30962
Adds alternative algorithms to assign test groups to shards to better distribute tests.
Problem
Currently the way sharding works is something like this…
Tests are ordered in the way they are discovered, which is mostly alphabetically. This has the effect that test cases are sorted nearby similar tests… for example your have first 6 tests which are testing logged-in state and then 6 tests which test logged-out state. The first 6 tests require more setup time as they are testing logged-in behaviour… With the current sharding algorithm shard 1 & 2 get those slow logged-in tests and shard 3 & 4 get the more quicker tests…
Solution
This PR adds a new
shardingMode
configuration which allows to specify the sharding algorithm to be used…shardingMode: 'partition'
That's the current behaviour, which is the default. Let me know if you have a better name to describe the current algorithm...
shardingMode: 'round-robin'
Distribute the test groups more evenly. It…
Here is a simple example where every test group represents a single test (e.g.
--fully-parallel
) ...…or a more complex scenario where test groups have different number of tests…
shardingMode: 'duration-round-robin'
It's very similar to
round-robin
, but it uses the duration of a tests previous run as cost factor. The duration will be read from.last-run.json
when available. When a test can not be found in.last-run.json
it will use the average duration of available tests. When no last run info is available, the behaviour would be identical toround-robin
.Other changes
testDurations?: { [testId: string]: number }
to.last-run.json
lastrun
reporter, which allowsmerge-reports
to generate a.last-run.json
to be generatedAppendix
Below are some runtime stats from a project I've been working on, which shows the potential benefit of this change.
The tests runs had to complete 161 tests. Single test duration ranges from a few seconds to over 2 minutes.
The partition run gives the baseline performance and illustrates the problem quite good. We have a single shard that takes almost 16 min while another one completes in under 5 min.
The round-robin algorithm gives a bit better performance, but it still has a shard that requires twice the time of another shard.
The duration-round-robin run was using the duration info from a previous run and achieves the best result by far. All shards complete in 10-11 minutes. 🏆 🎉