Why run automated bisections?
KernelCI periodically monitors a series of kernel trees (mainline, stable, next...), and builds them when it detects some changes in them. It then runs some tests (boot at least) with the resulting kernel binaries on a variety of platforms. When a test fails, it compares the results with previous kernel revisions from that same branch on the same platform. If it was working previously, then KernelCI has detected a new failure and stores it as a regression.
As there may have been a topic branch merge with many commits between the last working revision and the now failing one, a bisection is needed in order to isolate the individual commit that introduced the failure. At least this is the idea, it can get more complicated if several different failures were introduced in the meantime or if the branch got rebased.
How does it work?
The KernelCI automated bisection is implemented as a Jenkins Pipeline job, with some functionality in Python.
The current status of automated bisection is as follows:
- triggered for each regression found
- only run with plain ramdisk boot tests for now
- run on mainline, stable, next and several maintainer trees
- several checks are in place to avoid false positives due to board issues:
- check the initial good and bad revisions coming from the found regression
- when the bisection finds a commit, check that it does fail 3 times
- revert the found commit in-place and check that it does boot 3 times
- when started manually, it's also possible to test each kernel iteration several times
- send an email report to a set of recipients determined from the breaking commit found
Where are the results?
The bisection results are only shared by email, at least on LKML. They could also be added to the kernelci.org web front-end next to test results.
What's left to do?
Potential improvements to the automated bisection are:
- extend to stable-rc (branches get rebased when pushed to the tree)
- extend to test plans other than boot (with fixed version of the test to run while iterating kernel revisions)
- possibility to manually start semi-automatic bisections for special cases
In order to maximize the build throughput of KernelCI, many the kernel builds are run with a low number of parallel processes. This helps in particular during the linking stages, which have to be run on a single core. So ideally, a server with 8 CPU cores would have a maximum throughput when building 8 kernels in parallel with
make -j1. In practice, we're running with
make -j4 to keep long builds like
allmodconfig short enough and run ncores / 4 builds in parallel.
For bisections, the use-case is rather different as we need to wait for each build to be tested before building the next one. So the current approach is to have a build lock, to run many bisections in parallel on a single builder but with only one build running on all the CPU cores.
The inconvenient with this situation is that builders have to be dedicated to either regular builds or bisections. Sharing the builders for both use-cases would require some dynamic way of spawning regular builds with parallel processes, for example by starting a bisection kernel build at
-j4 and increasing it to
-j8 when a regular build completes while reserving the slot normally allocated to a regular build... Jenkins isn't really designed to be able to do this so we just have to keep monitoring the builders' usage to assign them to either regular or bisection builds when needed.