Fix #4255: Add support for sharding app module Gradle tests in CI #4256
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Explanation
Fixes #4255
This fixes the recent Gradle flakes we've seen by sharding app module tests into 4 separate runners.
Introduction of Gradle commands to facilitate sharding
Gradle doesn't have sharding built in, so I needed to take a hacky approach by excluding files from other shards when executing a particular shard (leaving only the tests corresponding to that shard). Since Gradle doesn't expect its source sets to change between builds, a clean build must separate running the shards. Running a shard is as simple as:
The current shards and their respective tests can be listed using:
(Note that some of the tests that are excluded for localization are also included in the list but won't actually be executed).
Another requirement is ensuring that test shard bucketing is deterministic and consistent between runs and environments. To this end, it made sense to use the hash of test files' paths relative to the app module project directory. However, simply using the hash to bucket the tests (such as might happen in a hash table) resulted in a less even distribution of tests (probably due to Java's string
hashCode
function being fairly weak for uniqueness), so I decided to instead use the index as a seed of a PRNG. This resulted in a better distribution of tests among shards, and determinism.CI changes & verifying that it works
The shards are fixed rather than computed like Bazel, so 4 new workflows were added to separately execute each shard. The results are then combined together in the same way as Bazel, and the result of that workflow is what blocks PR submission. I reused the existing app module workflow for the latter part, so we don't need to change the CI checks requirements (which then means no one needs to update their branches for submission).
See this workflow for an example failure, and this workflow to see that it the required check fails due to the shard failing.
Each shard uploads its test results separately (as demonstrated from this run).
The sharding also significantly speeds up running the app module tests (to the point where they're now probably faster than their Bazel counterparts), so this should be a nice qualify-of-life improvement beyond just the flake going away.
Essential Checklist
For UI-specific PRs only
N/A -- This is affecting only build infrastructure.