-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Topic/kbuild output pr #67
Conversation
Now that we accept KBUILD_OUTPUT we no longer need the `vmlinux_btf` input argument to some actions -- we can just assume `KBUILD_OUTPUT/vmlinux`. Hence, deprecate said argument until all clients have been adjusted to no longer pass them in (at which point they can be removed). Signed-off-by: Daniel Müller <deso@posteo.net>
89a8170
to
0b1e2f3
Compare
With upcoming changes we would like to separate kernel build artifacts from kernel sources. Unfortunately, it turns out the two concerns are not really separated by prepare-rootfs/run.sh. Specifically, the script assumes `make -C "${BUILDDIR}" <xxx>` to work. However, the build directory does not necessarily contain any makefiles at all. What the script really means to do is `make -C <kernel-source> <xxx>`. Unfortunately, currently the script does not receive the location of the kernel source tree to begin with. To fix this state of affairs, this change introduces the `--source` option, which can be used to provide the location of the kernel source tree. Signed-off-by: Daniel Müller <deso@posteo.net>
If we ever want to get incremental kernel build support rolled out to CI, we need a way to separate build artifacts from source code. That is necessary because source code is checked out by GitHub Actions as part of the CI run and it would be counter-productive to keep it around and transferred between builds. But even if we decide against incremental builds long-term, it is not a bad practice to separate those two concerns -- it makes for a logical separation and can act as a forcing factor to be more explicit about the paths to artifacts and binaries. With that in mind, this change adjusts our logic to honor KBUILD_OUTPUT [0], which controls where the kernel build system stores its build artifacts. Several of our actions now accept it as input parameter and the underlying scripts make sure that the build system sees this variable. [0]: https://www.kernel.org/doc/html/latest/kbuild/kbuild.html#kbuild-output Signed-off-by: Daniel Müller <deso@posteo.net>
8402c13
to
72915cc
Compare
@@ -8,6 +8,7 @@ source "${THISDIR}"/../helpers.sh | |||
|
|||
ARCH="$1" | |||
TOOLCHAIN="$2" | |||
export KBUILD_OUTPUT="$3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no strong opinion here, but instead of being a required parameter, could it have been an optional env var.
export KBUILD_OUTPUT=${KBUILD_OUTPUT:-"."}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really want to do away with global variables being accessed at random locations and rather make them explicit arguments instead, to be honest.
Edit: And in this case it's guaranteed to be set by the action itself. And I think that's how we should keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this will need a matching kernel-patches/vmtest diff to stop using those arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is KERNELSRC
being used? I see it being declared, but never used.
that was in diff 1. |
Yep. |
As of libbpf/ci#67 a bunch of actions honor KBUILD_OUTPUT. Doing so will make it possible to separate source code from build artifacts, which in turn may allow us to support incremental kernel compilation in CI down the line. Irrespective of these future changes, actions pertaining the kernel build now ask for an additional input defining where to store or expect build artifacts. Provide it. Signed-off-by: Daniel Müller <deso@posteo.net>
As of libbpf/ci#67 a bunch of actions honor KBUILD_OUTPUT. Doing so will make it possible to separate source code from build artifacts, which in turn may allow us to support incremental kernel compilation in CI down the line. Irrespective of these future changes, actions pertaining the kernel build now ask for an additional input defining where to store or expect build artifacts. Provide it. Signed-off-by: Daniel Müller <deso@posteo.net>
As of libbpf/ci#67 a bunch of actions honor KBUILD_OUTPUT. Doing so will make it possible to separate source code from build artifacts, which in turn may allow us to support incremental kernel compilation in CI down the line. Irrespective of these future changes, actions pertaining the kernel build now ask for an additional input defining where to store or expect build artifacts. Provide it. Signed-off-by: Daniel Müller <deso@posteo.net>
This change introduces the means for performing incremental kernel builds in CI in order to decrease overall CI turn-around times. We piggy back (and rely) on the kernel's own make-based build infrastructure for that purpose. Specifically, with libbpf/ci#67 we have set the stage for separating build artifacts from source code. With the change at hand we use this capability in conjunction with the `actions/cache` [0] GitHub Actions action to share intermediate build artifacts between CI runs, enabling the kernel build to only rebuild what has changed, as opposed to everything unconditionally. There are several wrinkles to doing so. The first one is that GitHub Actions checks out the source code anew every time a workflow is run. Because git does not store file meta data like time stamps, but the kernel's build system relies solely on last-modified time stamps, we need a way to approximate the source code time stamps from the previous build. The way we do that is by first remember both the time stamp and the SHA-1 at which we built and carry that over to subsequent builds. These subsequent builds then make sure to check out said SHA-1, adjust the last-modified time stamps of the entire tree, and then check out the code as it is meant to be tested by the very workflow run in question. Second, `actions/cache` imposes several restrictions on cached artifacts: there is a 10 GiB limit per repository, only artifacts produced on certain branches can actually be used (to achieve cache isolation), and [0] https://github.com/actions/cache Signed-off-by: Daniel Müller <deso@posteo.net>
This change introduces the means for performing incremental kernel builds in CI in order to decrease overall CI turn-around times. We piggy back (and rely) on the kernel's own make-based build infrastructure for that purpose. Specifically, with libbpf/ci#67 we have set the stage for separating build artifacts from source code. With the change at hand we use this capability in conjunction with the `actions/cache` [0] GitHub Actions action to share intermediate build artifacts between CI runs, enabling the kernel build to only rebuild what has changed, as opposed to everything unconditionally. There are several wrinkles to doing so. The first one is that GitHub Actions checks out the source code anew every time a workflow is run. Because git does not store file meta data like time stamps, but the kernel's build system relies solely on last-modified time stamps, we need a way to approximate the source code time stamps from the previous build. The way we do that is by first remembering both the time stamp and the SHA-1 at which we built and carry that over to subsequent builds. These subsequent builds then make sure to check out said SHA-1, adjust the last-modified time stamps of the entire tree, and then check out the code as it is meant to be tested by the very workflow run in question. Second, `actions/cache` imposes several restrictions on cached artifacts: only artifacts produced on certain branches can actually be used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact limit per repository [2], and the API is questionable. To circumvent the isolation-imposed requirements, we use the fact that we are maintaining a base branch against which pull requests are created to our advantage. Specifically, once this branch is updated (i.e., pushed to) we trigger a CI run which builds the kernel (already potentially incrementally) and then stores intermediate build artifacts. Pull requests created against this branch (i.e., every one created by Kernel Patches Daemon) will be able to use those artifacts. Doing so indirectly also addresses the space constraints: the packed build artifacts are <1GiB in size. Because we only need to keep one set of artifacts for each arch and toolchain combo (currently: three) we don't need to worry about trashing (as a side note, these limitations don't seem to be enforced, certainly not rigorously or in a timely manner; I've had >34GiB of build artifacts cached for several hours without an eviction happening). Because of the questionable API, we have to jump through hoops in order to prevent cache creation for regular pull requests: we remove the KBUILD_OUTPUT contents in a separate step, which causes no cache to be created. [0] https://github.com/actions/cache [1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache [2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy Signed-off-by: Daniel Müller <deso@posteo.net>
This change introduces the means for performing incremental kernel builds in CI in order to decrease overall CI turn-around times. We piggy back (and rely) on the kernel's own make-based build infrastructure for that purpose. Specifically, with libbpf/ci#67 we have set the stage for separating build artifacts from source code. With the change at hand we use this capability in conjunction with the `actions/cache` [0] GitHub Actions action to share intermediate build artifacts between CI runs, enabling the kernel build to only rebuild what has changed, as opposed to everything unconditionally. There are several wrinkles to doing so. The first one is that GitHub Actions checks out the source code anew every time a workflow is run. Because git does not store file meta data like time stamps, but the kernel's build system relies solely on last-modified time stamps, we need a way to approximate the source code time stamps from the previous build. The way we do that is by first remembering both the time stamp and the SHA-1 at which we built and carry that over to subsequent builds. These subsequent builds then make sure to check out said SHA-1, adjust the last-modified time stamps of the entire tree, and then check out the code as it is meant to be tested by the very workflow run in question. Second, `actions/cache` imposes several restrictions on cached artifacts: only artifacts produced on certain branches can actually be used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact limit per repository [2], and the API is questionable. To circumvent the isolation-imposed requirements, we use the fact that we are maintaining a base branch against which pull requests are created to our advantage. Specifically, once this branch is updated (i.e., pushed to) we trigger a CI run which builds the kernel (already potentially incrementally) and then stores intermediate build artifacts. Pull requests created against this branch (i.e., every one created by Kernel Patches Daemon) will be able to use those artifacts. Doing so indirectly also addresses the space constraints: the packed build artifacts are <1GiB in size. Because we only need to keep one set of artifacts for each arch and toolchain combo (currently: three) we don't need to worry about trashing (as a side note, these limitations don't seem to be enforced, certainly not rigorously or in a timely manner; I've had >34GiB of build artifacts cached for several hours without an eviction happening). Because of the questionable API, we have to jump through hoops in order to prevent cache creation for regular pull requests: we remove the KBUILD_OUTPUT contents in a separate step, which causes no cache to be created. As a result of this infrastructure, we see a significant decrease of the "Build Kernel Image" step, which is part of every CI run (happening three times): on GitHub-hosted runners we get down from somewhere in the vicinity of 20minutes to <2minutes. Note that these results depend on what changes the patches in a series make: if a patch modifies a header included by each .c file then the result will be close to a full rebuild irrespective of this infrastructure. Conversely, if a patch only updates documentation the incremental rebuild can be close to instant. In practice, based on time-boxed observations, we seem to be ending up closer to "doing almost nothing" than to doing a full rebuild. [0] https://github.com/actions/cache [1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache [2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy Signed-off-by: Daniel Müller <deso@posteo.net>
This change introduces the means for performing incremental kernel builds in CI in order to decrease overall CI turn-around times. We piggy back (and rely) on the kernel's own make-based build infrastructure for that purpose. Specifically, with libbpf/ci#67 we have set the stage for separating build artifacts from source code. With the change at hand we use this capability in conjunction with the `actions/cache` [0] GitHub Actions action to share intermediate build artifacts between CI runs, enabling the kernel build to only rebuild what has changed, as opposed to everything unconditionally. There are several wrinkles to doing so. The first one is that GitHub Actions checks out the source code anew every time a workflow is run. Because git does not store file meta data like time stamps, but the kernel's build system relies solely on last-modified time stamps, we need a way to approximate the source code time stamps from the previous build. The way we do that is by first remembering both the time stamp and the SHA-1 at which we built and carry that over to subsequent builds. These subsequent builds then make sure to check out said SHA-1, adjust the last-modified time stamps of the entire tree, and then check out the code as it is meant to be tested by the very workflow run in question. Second, `actions/cache` imposes several restrictions on cached artifacts: only artifacts produced on certain branches can actually be used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact limit per repository [2], and the API is questionable. To circumvent the isolation-imposed requirements, we use the fact that we are maintaining a base branch against which pull requests are created to our advantage. Specifically, once this branch is updated (i.e., pushed to) we trigger a CI run which builds the kernel (already potentially incrementally) and then stores intermediate build artifacts. Pull requests created against this branch (i.e., every one created by Kernel Patches Daemon) will be able to use those artifacts. Doing so indirectly also addresses the space constraints: the packed build artifacts are <1GiB in size. Because we only need to keep one set of artifacts for each arch and toolchain combo (currently: three) we don't need to worry about trashing (as a side note, these limitations don't seem to be enforced, certainly not rigorously or in a timely manner; I've had >34GiB of build artifacts cached for several hours without an eviction happening). Because of the questionable API surface, we have to jump through hoops in order to prevent cache creation for regular pull requests: we remove the KBUILD_OUTPUT contents in a separate step, which causes no cache to be created. As a result of this introduced infrastructure, we see a significant decrease of the "Build Kernel Image" step, which is part of every CI run (happening three times): on GitHub-hosted runners we get down from somewhere in the vicinity of 20minutes to <2minutes. Note that these results depend on what changes the patches in a series make: if a patch modifies a header included by each .c file then the result will be close to a full rebuild irrespective of this infrastructure. Conversely, if a patch only updates documentation the incremental rebuild can be close to instant. In practice, based on a weekend's worth of observations, we seem to be ending up closer to "doing almost nothing" than to doing a full rebuild. Note that incremental builds are not bullet proof and to a somewhat large degree we are at the mercy of them working correctly. However, when we encounter a build failure we first attempt a 'make clean' followed by a full rebuild. This is expected to solve build related problems in all conceivable cases. There is, however, the chance of miscompilation or similar issues introduced as part of buggy incremental recompilation logic. A brief survey within Meta's kernel group suggests that generally developers rely on incremental builds and anecdotal problems were always in the "build fails" realm. #dogfood Note furthermore that selftest and sample builds are not covered by the incremental build machinery. The main reason for that is that neither of them honors KBUILD_OUTPUT -- a deficiency known but not yet addressed. We could work around that conceptually, but payoff is less for these steps and an upstream fix could get us there without added complexity in the CI itself. [0] https://github.com/actions/cache [1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache [2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy Signed-off-by: Daniel Müller <deso@posteo.net>
This change introduces the means for performing incremental kernel builds in CI in order to decrease overall CI turn-around times. We piggy back (and rely) on the kernel's own make-based build infrastructure for that purpose. Specifically, with libbpf/ci#67 we have set the stage for separating build artifacts from source code. With the change at hand we use this capability in conjunction with the `actions/cache` [0] GitHub Actions action to share intermediate build artifacts between CI runs, enabling the kernel build to only rebuild what has changed, as opposed to everything unconditionally. There are several wrinkles to doing so. The first one is that GitHub Actions checks out the source code anew every time a workflow is run. Because git does not store file meta data like time stamps, but the kernel's build system relies solely on last-modified time stamps, we need a way to approximate the source code time stamps from the previous build. The way we do that is by first remembering both the time stamp and the SHA-1 at which we built and carry that over to subsequent builds. These subsequent builds then make sure to check out said SHA-1, adjust the last-modified time stamps of the entire tree, and then check out the code as it is meant to be tested by the very workflow run in question. Second, `actions/cache` imposes several restrictions on cached artifacts: only artifacts produced on certain branches can actually be used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact limit per repository [2], and the API is questionable. To circumvent the isolation-imposed requirements, we use the fact that we are maintaining a base branch against which pull requests are created to our advantage. Specifically, once this branch is updated (i.e., pushed to) we trigger a CI run which builds the kernel (already potentially incrementally) and then stores intermediate build artifacts. Pull requests created against this branch (i.e., every one created by Kernel Patches Daemon) will be able to use those artifacts. Doing so indirectly also addresses the space constraints: the packed build artifacts are <1GiB in size. Because we only need to keep one set of artifacts for each arch and toolchain combo (currently: three) we don't need to worry about trashing (as a side note, these limitations don't seem to be enforced, certainly not rigorously or in a timely manner; I've had >34GiB of build artifacts cached for several hours without an eviction happening). Because of the questionable API surface, we have to jump through hoops in order to prevent cache creation for regular pull requests: we remove the KBUILD_OUTPUT contents in a separate step, which causes no cache to be created. As a result of this introduced infrastructure, we see a significant decrease of the "Build Kernel Image" step, which is part of every CI run (happening three times): on GitHub-hosted runners we get down from somewhere in the vicinity of 20minutes to <2minutes. Note that these results depend on what changes the patches in a series make: if a patch modifies a header included by each .c file then the result will be close to a full rebuild irrespective of this infrastructure. Conversely, if a patch only updates documentation the incremental rebuild can be close to instant. In practice, based on a weekend's worth of observations, we seem to be ending up closer to "doing almost nothing" than to doing a full rebuild. Note that incremental builds are not bullet proof and to a somewhat large degree we are at the mercy of them working correctly. However, when we encounter a build failure we first attempt a 'make clean' followed by a full rebuild. This is expected to solve build related problems in all conceivable cases. There is, however, the chance of miscompilation or similar issues introduced as part of buggy incremental recompilation logic. A brief survey within Meta's kernel group suggests that generally developers rely on incremental builds and anecdotal problems were always in the "build fails" realm. #dogfood Note furthermore that selftest and sample builds are not covered by the incremental build machinery. The main reason for that is that neither of them honors KBUILD_OUTPUT -- a deficiency known but not yet addressed. We could work around that conceptually, but payoff is less for these steps and an upstream fix could get us there without added complexity in the CI itself. Lastly, should CI, despite all efforts to the contrary, get stuck because a cache somehow contains corrupted data causing us to fail builds, the GitHub UI allows for the deletion of cache artifacts. If no caches are present, this feature is basically a no-op from an initial build perspective (i.e., a full build will be performed). [0] https://github.com/actions/cache [1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache [2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy [3] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#deleting-cache-entries Signed-off-by: Daniel Müller <deso@posteo.net>
This change introduces the means for performing incremental kernel builds in CI in order to decrease overall CI turn-around times. We piggy back (and rely) on the kernel's own make-based build infrastructure for that purpose. Specifically, with libbpf/ci#67 we have set the stage for separating build artifacts from source code. With the change at hand we use this capability in conjunction with the `actions/cache` [0] GitHub Actions action to share intermediate build artifacts between CI runs, enabling the kernel build to only rebuild what has changed, as opposed to everything unconditionally. There are several wrinkles to doing so. The first one is that GitHub Actions checks out the source code anew every time a workflow is run. Because git does not store file meta data like time stamps, but the kernel's build system relies solely on last-modified time stamps, we need a way to approximate the source code time stamps from the previous build. The way we do that is by first remembering both the time stamp and the SHA-1 at which we built and carry that over to subsequent builds. These subsequent builds then make sure to check out said SHA-1, adjust the last-modified time stamps of the entire tree, and then check out the code as it is meant to be tested by the very workflow run in question. Second, `actions/cache` imposes several restrictions on cached artifacts: only artifacts produced on certain branches can actually be used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact limit per repository [2], and the API is questionable. To circumvent the isolation-imposed requirements, we use the fact that we are maintaining a base branch against which pull requests are created to our advantage. Specifically, once this branch is updated (i.e., pushed to) we trigger a CI run which builds the kernel (already potentially incrementally) and then stores intermediate build artifacts. Pull requests created against this branch (i.e., every one created by Kernel Patches Daemon) will be able to use those artifacts. Doing so indirectly also addresses the space constraints: the packed build artifacts are <1GiB in size. Because we only need to keep one set of artifacts for each arch and toolchain combo (currently: three) we don't need to worry about trashing (as a side note, these limitations don't seem to be enforced, certainly not rigorously or in a timely manner; I've had >34GiB of build artifacts cached for several hours without an eviction happening). Because of the questionable API surface, we have to jump through hoops in order to prevent cache creation for regular pull requests: we remove the KBUILD_OUTPUT contents in a separate step, which causes no cache to be created. As a result of this introduced infrastructure, we see a significant decrease of the "Build Kernel Image" step, which is part of every CI run (happening three times): on GitHub-hosted runners we get down from somewhere in the vicinity of 20minutes to <2minutes. Note that these results depend on what changes the patches in a series make: if a patch modifies a header included by each .c file then the result will be close to a full rebuild irrespective of this infrastructure. Conversely, if a patch only updates documentation the incremental rebuild can be close to instant. In practice, based on a weekend's worth of observations, we seem to be ending up close to "doing almost nothing" almost all the time than to doing a full rebuild. Note that incremental builds are not bullet proof and to a somewhat large degree we are at the mercy of them working correctly. However, when we encounter a build failure we first attempt a 'make clean' followed by a full rebuild [4]. This is expected to solve build related problems in all conceivable cases. There is, however, the chance of miscompilation or similar issues introduced as part of buggy incremental recompilation logic. A brief survey within Meta's kernel group suggests that generally developers rely on incremental builds and anecdotal problems were always in the "build fails" realm. #dogfood Note furthermore that selftest and sample builds are not covered by the incremental build machinery. The main reason for that is that neither of them honors KBUILD_OUTPUT -- a deficiency known to upstream Linux folks but not yet addressed. We could work around that conceptually, but payoff is less for these steps and an upstream fix could get us there without added complexity in the CI itself. Lastly, it is important to note that, should CI, despite all efforts to the contrary, get stuck because a cache somehow contains corrupted data causing us to fail builds, the GitHub UI allows for the deletion of cache artifacts. If no caches are present, this feature is basically a no-op from an initial build perspective (i.e., a full build will be performed). [0] https://github.com/actions/cache [1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache [2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy [3] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#deleting-cache-entries [4] libbpf/ci#73 Signed-off-by: Daniel Müller <deso@posteo.net>
This change introduces the means for performing incremental kernel builds in CI in order to decrease overall CI turn-around times. We piggy back (and rely) on the kernel's own make-based build infrastructure for that purpose. Specifically, with libbpf/ci#67 we have set the stage for separating build artifacts from source code. With the change at hand we use this capability in conjunction with the `actions/cache` [0] GitHub Actions action to share intermediate build artifacts between CI runs, enabling the kernel build to only rebuild what has changed, as opposed to everything unconditionally. There are several wrinkles to doing so. The first one is that GitHub Actions checks out the source code anew every time a workflow is run. Because git does not store file meta data like time stamps, but the kernel's build system relies solely on last-modified time stamps, we need a way to approximate the source code time stamps from the previous build. The way we do that is by first remembering both the time stamp and the SHA-1 at which we built and carry that over to subsequent builds. These subsequent builds then make sure to check out said SHA-1, adjust the last-modified time stamps of the entire tree, and then check out the code as it is meant to be tested by the very workflow run in question. Second, `actions/cache` imposes several restrictions on cached artifacts: only artifacts produced on certain branches can actually be used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact limit per repository [2], and the API is questionable. To circumvent the isolation-imposed requirements, we use the fact that we are maintaining a base branch against which pull requests are created to our advantage. Specifically, once this branch is updated (i.e., pushed to) we trigger a CI run which builds the kernel (already potentially incrementally) and then stores intermediate build artifacts. Pull requests created against this branch (i.e., every one created by Kernel Patches Daemon) will be able to use those artifacts. Doing so indirectly also addresses the space constraints: the packed build artifacts are <1GiB in size. Because we only need to keep one set of artifacts for each arch and toolchain combo (currently: three) we don't need to worry about trashing (as a side note, these limitations don't seem to be enforced, certainly not rigorously or in a timely manner; I've had >34GiB of build artifacts cached for several hours without an eviction happening). Because of the questionable API surface, we have to jump through hoops in order to prevent cache creation for regular pull requests: we remove the KBUILD_OUTPUT contents in a separate step, which causes no cache to be created. As a result of this introduced infrastructure, we see a significant decrease of the "Build Kernel Image" step, which is part of every CI run (happening three times): on GitHub-hosted runners we get down from somewhere in the vicinity of 20minutes to <2minutes. Note that these results depend on what changes the patches in a series make: if a patch modifies a header included by each .c file then the result will be close to a full rebuild irrespective of this infrastructure. Conversely, if a patch only updates documentation the incremental rebuild can be close to instant. In practice, based on a weekend's worth of observations, we seem to be ending up close to "doing almost nothing" almost all the time than to doing a full rebuild. Note that incremental builds are not bullet proof and to a somewhat large degree we are at the mercy of them working correctly. However, when we encounter a build failure we first attempt a 'make clean' followed by a full rebuild [4]. This is expected to solve build related problems in all conceivable cases. There is, however, the chance of miscompilation or similar issues introduced as part of buggy incremental recompilation logic. A brief survey within Meta's kernel group suggests that generally developers rely on incremental builds and anecdotal problems were always in the "build fails" realm. #dogfood Note furthermore that selftest and sample builds are not covered by the incremental build machinery. The main reason for that is that neither of them honors KBUILD_OUTPUT -- a deficiency known to upstream Linux folks but not yet addressed. We could work around that conceptually, but payoff is less for these steps and an upstream fix could get us there without added complexity in the CI itself. Lastly, it is important to note that, should CI, despite all efforts to the contrary, get stuck because a cache somehow contains corrupted data causing us to fail builds, the GitHub UI allows for the deletion of cache artifacts. If no caches are present, this feature is basically a no-op from an initial build perspective (i.e., a full build will be performed). [0] https://github.com/actions/cache [1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache [2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy [3] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#deleting-cache-entries [4] libbpf/ci#73 Signed-off-by: Daniel Müller <deso@posteo.net>
This change introduces the means for performing incremental kernel builds in CI in order to decrease overall CI turn-around times. We piggy back (and rely) on the kernel's own make-based build infrastructure for that purpose. Specifically, with libbpf/ci#67 we have set the stage for separating build artifacts from source code. With the change at hand we use this capability in conjunction with the `actions/cache` [0] GitHub Actions action to share intermediate build artifacts between CI runs, enabling the kernel build to only rebuild what has changed, as opposed to everything unconditionally. There are several wrinkles to doing so. The first one is that GitHub Actions checks out the source code anew every time a workflow is run. Because git does not store file meta data like time stamps, but the kernel's build system relies solely on last-modified time stamps, we need a way to approximate the source code time stamps from the previous build. The way we do that is by first remembering both the time stamp and the SHA-1 at which we built and carry that over to subsequent builds. These subsequent builds then make sure to check out said SHA-1, adjust the last-modified time stamps of the entire tree, and then check out the code as it is meant to be tested by the very workflow run in question. Second, `actions/cache` imposes several restrictions on cached artifacts: only artifacts produced on certain branches can actually be used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact limit per repository [2], and the API is questionable. To circumvent the isolation-imposed requirements, we use the fact that we are maintaining a base branch against which pull requests are created to our advantage. Specifically, once this branch is updated (i.e., pushed to) we trigger a CI run which builds the kernel (already potentially incrementally) and then stores intermediate build artifacts. Pull requests created against this branch (i.e., every one created by Kernel Patches Daemon) will be able to use those artifacts. Doing so indirectly also addresses the space constraints: the packed build artifacts are <1GiB in size. Because we only need to keep one set of artifacts for each arch and toolchain combo (currently: three) we don't need to worry about trashing (as a side note, these limitations don't seem to be enforced, certainly not rigorously or in a timely manner; I've had >34GiB of build artifacts cached for several hours without an eviction happening). Because of the questionable API surface, we have to jump through hoops in order to prevent cache creation for regular pull requests: we remove the KBUILD_OUTPUT contents in a separate step, which causes no cache to be created. As a result of this introduced infrastructure, we see a significant decrease of the "Build Kernel Image" step, which is part of every CI run (happening three times): on GitHub-hosted runners we get down from somewhere in the vicinity of 20minutes to <2minutes. Note that these results depend on what changes the patches in a series make: if a patch modifies a header included by each .c file then the result will be close to a full rebuild irrespective of this infrastructure. Conversely, if a patch only updates documentation the incremental rebuild can be close to instant. In practice, based on a weekend's worth of observations, we seem to be ending up close to "doing almost nothing" almost all the time than to doing a full rebuild. Note that incremental builds are not bullet proof and to a somewhat large degree we are at the mercy of them working correctly. However, when we encounter a build failure we first attempt a 'make clean' followed by a full rebuild [4]. This is expected to solve build related problems in all conceivable cases. There is, however, the chance of miscompilation or similar issues introduced as part of buggy incremental recompilation logic. A brief survey within Meta's kernel group suggests that generally developers rely on incremental builds and anecdotal problems were always in the "build fails" realm. #dogfood Note furthermore that selftest and sample builds are not covered by the incremental build machinery. The main reason for that is that neither of them honors KBUILD_OUTPUT -- a deficiency known to upstream Linux folks but not yet addressed. We could work around that conceptually, but payoff is less for these steps and an upstream fix could get us there without added complexity in the CI itself. Lastly, it is important to note that, should CI, despite all efforts to the contrary, get stuck because a cache somehow contains corrupted data causing us to fail builds, the GitHub UI allows for the deletion of cache artifacts. If no caches are present, this feature is basically a no-op from an initial build perspective (i.e., a full build will be performed). [0] https://github.com/actions/cache [1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache [2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy [3] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#deleting-cache-entries [4] libbpf/ci#73 Signed-off-by: Daniel Müller <deso@posteo.net>
This change introduces the means for performing incremental kernel builds in CI in order to decrease overall CI turn-around times. We piggy back (and rely) on the kernel's own make-based build infrastructure for that purpose. Specifically, with libbpf/ci#67 we have set the stage for separating build artifacts from source code. With the change at hand we use this capability in conjunction with the `actions/cache` [0] GitHub Actions action to share intermediate build artifacts between CI runs, enabling the kernel build to only rebuild what has changed, as opposed to everything unconditionally. There are several wrinkles to doing so. The first one is that GitHub Actions checks out the source code anew every time a workflow is run. Because git does not store file meta data like time stamps, but the kernel's build system relies solely on last-modified time stamps, we need a way to approximate the source code time stamps from the previous build. The way we do that is by first remembering both the time stamp and the SHA-1 at which we built and carry that over to subsequent builds. These subsequent builds then make sure to check out said SHA-1, adjust the last-modified time stamps of the entire tree, and then check out the code as it is meant to be tested by the very workflow run in question. Second, `actions/cache` imposes several restrictions on cached artifacts: only artifacts produced on certain branches can actually be used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact limit per repository [2], and the API is questionable. To circumvent the isolation-imposed requirements, we use the fact that we are maintaining a base branch against which pull requests are created to our advantage. Specifically, once this branch is updated (i.e., pushed to) we trigger a CI run which builds the kernel (already potentially incrementally) and then stores intermediate build artifacts. Pull requests created against this branch (i.e., every one created by Kernel Patches Daemon) will be able to use those artifacts. Doing so indirectly also addresses the space constraints: the packed build artifacts are <1GiB in size. Because we only need to keep one set of artifacts for each arch and toolchain combo (currently: three) we don't need to worry about trashing (as a side note, these limitations don't seem to be enforced, certainly not rigorously or in a timely manner; I've had >34GiB of build artifacts cached for several hours without an eviction happening). Because of the questionable API surface, we have to jump through hoops in order to prevent cache creation for regular pull requests: we remove the KBUILD_OUTPUT contents in a separate step, which causes no cache to be created. As a result of this introduced infrastructure, we see a significant decrease of the "Build Kernel Image" step, which is part of every CI run (happening three times): on GitHub-hosted runners we get down from somewhere in the vicinity of 20minutes to <2minutes. Note that these results depend on what changes the patches in a series make: if a patch modifies a header included by each .c file then the result will be close to a full rebuild irrespective of this infrastructure. Conversely, if a patch only updates documentation the incremental rebuild can be close to instant. In practice, based on a weekend's worth of observations, we seem to be ending up close to "doing almost nothing" almost all the time than to doing a full rebuild. Note that incremental builds are not bullet proof and to a somewhat large degree we are at the mercy of them working correctly. However, when we encounter a build failure we first attempt a 'make clean' followed by a full rebuild [4]. This is expected to solve build related problems in all conceivable cases. There is, however, the chance of miscompilation or similar issues introduced as part of buggy incremental recompilation logic. A brief survey within Meta's kernel group suggests that generally developers rely on incremental builds and anecdotal problems were always in the "build fails" realm. #dogfood Note furthermore that selftest and sample builds are not covered by the incremental build machinery. The main reason for that is that neither of them honors KBUILD_OUTPUT -- a deficiency known to upstream Linux folks but not yet addressed. We could work around that conceptually, but payoff is less for these steps and an upstream fix could get us there without added complexity in the CI itself. Lastly, it is important to note that, should CI, despite all efforts to the contrary, get stuck because a cache somehow contains corrupted data causing us to fail builds, the GitHub UI allows for the deletion of cache artifacts. If no caches are present, this feature is basically a no-op from an initial build perspective (i.e., a full build will be performed). [0] https://github.com/actions/cache [1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache [2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy [3] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#deleting-cache-entries [4] libbpf/ci#73 Signed-off-by: Daniel Müller <deso@posteo.net>
This change introduces the means for performing incremental kernel builds in CI in order to decrease overall CI turn-around times. We piggy back (and rely) on the kernel's own make-based build infrastructure for that purpose. Specifically, with libbpf/ci#67 we have set the stage for separating build artifacts from source code. With the change at hand we use this capability in conjunction with the `actions/cache` [0] GitHub Actions action to share intermediate build artifacts between CI runs, enabling the kernel build to only rebuild what has changed, as opposed to everything unconditionally. There are several wrinkles to doing so. The first one is that GitHub Actions checks out the source code anew every time a workflow is run. Because git does not store file meta data like time stamps, but the kernel's build system relies solely on last-modified time stamps, we need a way to approximate the source code time stamps from the previous build. The way we do that is by first remembering both the time stamp and the SHA-1 at which we built and carry that over to subsequent builds. These subsequent builds then make sure to check out said SHA-1, adjust the last-modified time stamps of the entire tree, and then check out the code as it is meant to be tested by the very workflow run in question. Second, `actions/cache` imposes several restrictions on cached artifacts: only artifacts produced on certain branches can actually be used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact limit per repository [2], and the API is questionable. To circumvent the isolation-imposed requirements, we use the fact that we are maintaining a base branch against which pull requests are created to our advantage. Specifically, once this branch is updated (i.e., pushed to) we trigger a CI run which builds the kernel (already potentially incrementally) and then stores intermediate build artifacts. Pull requests created against this branch (i.e., every one created by Kernel Patches Daemon) will be able to use those artifacts. Doing so indirectly also addresses the space constraints: the packed build artifacts are <1GiB in size. Because we only need to keep one set of artifacts for each arch and toolchain combo (currently: three) we don't need to worry about trashing (as a side note, these limitations don't seem to be enforced, certainly not rigorously or in a timely manner; I've had >34GiB of build artifacts cached for several hours without an eviction happening). Because of the questionable API surface, we have to jump through hoops in order to prevent cache creation for regular pull requests: we remove the KBUILD_OUTPUT contents in a separate step, which causes no cache to be created. As a result of this introduced infrastructure, we see a significant decrease of the "Build Kernel Image" step, which is part of every CI run (happening three times): on GitHub-hosted runners we get down from somewhere in the vicinity of 20minutes to <2minutes. Note that these results depend on what changes the patches in a series make: if a patch modifies a header included by each .c file then the result will be close to a full rebuild irrespective of this infrastructure. Conversely, if a patch only updates documentation the incremental rebuild can be close to instant. In practice, based on a weekend's worth of observations, we seem to be ending up close to "doing almost nothing" almost all the time than to doing a full rebuild. Note that incremental builds are not bullet proof and to a somewhat large degree we are at the mercy of them working correctly. However, when we encounter a build failure we first attempt a 'make clean' followed by a full rebuild [4]. This is expected to solve build related problems in all conceivable cases. There is, however, the chance of miscompilation or similar issues introduced as part of buggy incremental recompilation logic. A brief survey within Meta's kernel group suggests that generally developers rely on incremental builds and anecdotal problems were always in the "build fails" realm. #dogfood Note furthermore that selftest and sample builds are not covered by the incremental build machinery. The main reason for that is that neither of them honors KBUILD_OUTPUT -- a deficiency known to upstream Linux folks but not yet addressed. We could work around that conceptually, but payoff is less for these steps and an upstream fix could get us there without added complexity in the CI itself. Lastly, it is important to note that, should CI, despite all efforts to the contrary, get stuck because a cache somehow contains corrupted data causing us to fail builds, the GitHub UI allows for the deletion of cache artifacts. If no caches are present, this feature is basically a no-op from an initial build perspective (i.e., a full build will be performed). [0] https://github.com/actions/cache [1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache [2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy [3] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#deleting-cache-entries [4] libbpf/ci#73 Signed-off-by: Daniel Müller <deso@posteo.net>
This change introduces the means for performing incremental kernel builds in CI in order to decrease overall CI turn-around times. We piggy back (and rely) on the kernel's own make-based build infrastructure for that purpose. Specifically, with libbpf/ci#67 we have set the stage for separating build artifacts from source code. With the change at hand we use this capability in conjunction with the `actions/cache` [0] GitHub Actions action to share intermediate build artifacts between CI runs, enabling the kernel build to only rebuild what has changed, as opposed to everything unconditionally. There are several wrinkles to doing so. The first one is that GitHub Actions checks out the source code anew every time a workflow is run. Because git does not store file meta data like time stamps, but the kernel's build system relies solely on last-modified time stamps, we need a way to approximate the source code time stamps from the previous build. The way we do that is by first remembering both the time stamp and the SHA-1 at which we built and carry that over to subsequent builds. These subsequent builds then make sure to check out said SHA-1, adjust the last-modified time stamps of the entire tree, and then check out the code as it is meant to be tested by the very workflow run in question. Second, `actions/cache` imposes several restrictions on cached artifacts: only artifacts produced on certain branches can actually be used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact limit per repository [2], and the API is questionable. To circumvent the isolation-imposed requirements, we use the fact that we are maintaining a base branch against which pull requests are created to our advantage. Specifically, once this branch is updated (i.e., pushed to) we trigger a CI run which builds the kernel (already potentially incrementally) and then stores intermediate build artifacts. Pull requests created against this branch (i.e., every one created by Kernel Patches Daemon) will be able to use those artifacts. Doing so indirectly also addresses the space constraints: the packed build artifacts are <1GiB in size. Because we only need to keep one set of artifacts for each arch and toolchain combo (currently: three) we don't need to worry about trashing (as a side note, these limitations don't seem to be enforced, certainly not rigorously or in a timely manner; I've had >34GiB of build artifacts cached for several hours without an eviction happening). Because of the questionable API surface, we have to jump through hoops in order to prevent cache creation for regular pull requests: we remove the KBUILD_OUTPUT contents in a separate step, which causes no cache to be created. As a result of this introduced infrastructure, we see a significant decrease of the "Build Kernel Image" step, which is part of every CI run (happening three times): on GitHub-hosted runners we get down from somewhere in the vicinity of 20minutes to <2minutes. Note that these results depend on what changes the patches in a series make: if a patch modifies a header included by each .c file then the result will be close to a full rebuild irrespective of this infrastructure. Conversely, if a patch only updates documentation the incremental rebuild can be close to instant. In practice, based on a weekend's worth of observations, we seem to be ending up close to "doing almost nothing" almost all the time than to doing a full rebuild. Note that incremental builds are not bullet proof and to a somewhat large degree we are at the mercy of them working correctly. However, when we encounter a build failure we first attempt a 'make clean' followed by a full rebuild [4]. This is expected to solve build related problems in all conceivable cases. There is, however, the chance of miscompilation or similar issues introduced as part of buggy incremental recompilation logic. A brief survey within Meta's kernel group suggests that generally developers rely on incremental builds and anecdotal problems were always in the "build fails" realm. #dogfood Note furthermore that selftest and sample builds are not covered by the incremental build machinery. The main reason for that is that neither of them honors KBUILD_OUTPUT -- a deficiency known to upstream Linux folks but not yet addressed. We could work around that conceptually, but payoff is less for these steps and an upstream fix could get us there without added complexity in the CI itself. Lastly, it is important to note that, should CI, despite all efforts to the contrary, get stuck because a cache somehow contains corrupted data causing us to fail builds, the GitHub UI allows for the deletion of cache artifacts. If no caches are present, this feature is basically a no-op from an initial build perspective (i.e., a full build will be performed). [0] https://github.com/actions/cache [1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache [2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy [3] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#deleting-cache-entries [4] libbpf/ci#73 Signed-off-by: Daniel Müller <deso@posteo.net>
This change introduces the means for performing incremental kernel builds in CI in order to decrease overall CI turn-around times. We piggy back (and rely) on the kernel's own make-based build infrastructure for that purpose. Specifically, with libbpf/ci#67 we have set the stage for separating build artifacts from source code. With the change at hand we use this capability in conjunction with the `actions/cache` [0] GitHub Actions action to share intermediate build artifacts between CI runs, enabling the kernel build to only rebuild what has changed, as opposed to everything unconditionally. There are several wrinkles to doing so. The first one is that GitHub Actions checks out the source code anew every time a workflow is run. Because git does not store file meta data like time stamps, but the kernel's build system relies solely on last-modified time stamps, we need a way to approximate the source code time stamps from the previous build. The way we do that is by first remembering both the time stamp and the SHA-1 at which we built and carry that over to subsequent builds. These subsequent builds then make sure to check out said SHA-1, adjust the last-modified time stamps of the entire tree, and then check out the code as it is meant to be tested by the very workflow run in question. Second, `actions/cache` imposes several restrictions on cached artifacts: only artifacts produced on certain branches can actually be used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact limit per repository [2], and the API is questionable. To circumvent the isolation-imposed requirements, we use the fact that we are maintaining a base branch against which pull requests are created to our advantage. Specifically, once this branch is updated (i.e., pushed to) we trigger a CI run which builds the kernel (already potentially incrementally) and then stores intermediate build artifacts. Pull requests created against this branch (i.e., every one created by Kernel Patches Daemon) will be able to use those artifacts. Doing so indirectly also addresses the space constraints: the packed build artifacts are <1GiB in size. Because we only need to keep one set of artifacts for each arch and toolchain combo (currently: three) we don't need to worry about trashing (as a side note, these limitations don't seem to be enforced, certainly not rigorously or in a timely manner; I've had >34GiB of build artifacts cached for several hours without an eviction happening). Because of the questionable API surface, we have to jump through hoops in order to prevent cache creation for regular pull requests: we remove the KBUILD_OUTPUT contents in a separate step, which causes no cache to be created. As a result of this introduced infrastructure, we see a significant decrease of the "Build Kernel Image" step, which is part of every CI run (happening three times): on GitHub-hosted runners we get down from somewhere in the vicinity of 20minutes to <2minutes. Note that these results depend on what changes the patches in a series make: if a patch modifies a header included by each .c file then the result will be close to a full rebuild irrespective of this infrastructure. Conversely, if a patch only updates documentation the incremental rebuild can be close to instant. In practice, based on a weekend's worth of observations, we seem to be ending up close to "doing almost nothing" almost all the time than to doing a full rebuild. Note that incremental builds are not bullet proof and to a somewhat large degree we are at the mercy of them working correctly. However, when we encounter a build failure we first attempt a 'make clean' followed by a full rebuild [4]. This is expected to solve build related problems in all conceivable cases. There is, however, the chance of miscompilation or similar issues introduced as part of buggy incremental recompilation logic. A brief survey within Meta's kernel group suggests that generally developers rely on incremental builds and anecdotal problems were always in the "build fails" realm. #dogfood Note furthermore that selftest and sample builds are not covered by the incremental build machinery. The main reason for that is that neither of them honors KBUILD_OUTPUT -- a deficiency known to upstream Linux folks but not yet addressed. We could work around that conceptually, but payoff is less for these steps and an upstream fix could get us there without added complexity in the CI itself. Lastly, it is important to note that, should CI, despite all efforts to the contrary, get stuck because a cache somehow contains corrupted data causing us to fail builds, the GitHub UI allows for the deletion of cache artifacts. If no caches are present, this feature is basically a no-op from an initial build perspective (i.e., a full build will be performed). [0] https://github.com/actions/cache [1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache [2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy [3] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#deleting-cache-entries [4] libbpf/ci#73 Signed-off-by: Daniel Müller <deso@posteo.net>
As of libbpf/ci#67 a bunch of actions honor KBUILD_OUTPUT. Doing so will make it possible to separate source code from build artifacts, which in turn may allow us to support incremental kernel compilation in CI down the line. Irrespective of these future changes, actions pertaining the kernel build now ask for an additional input defining where to store or expect build artifacts. Provide it. Signed-off-by: Daniel Müller <deso@posteo.net>
This change introduces the means for performing incremental kernel builds in CI in order to decrease overall CI turn-around times. We piggy back (and rely) on the kernel's own make-based build infrastructure for that purpose. Specifically, with libbpf/ci#67 we have set the stage for separating build artifacts from source code. With the change at hand we use this capability in conjunction with the `actions/cache` [0] GitHub Actions action to share intermediate build artifacts between CI runs, enabling the kernel build to only rebuild what has changed, as opposed to everything unconditionally. There are several wrinkles to doing so. The first one is that GitHub Actions checks out the source code anew every time a workflow is run. Because git does not store file meta data like time stamps, but the kernel's build system relies solely on last-modified time stamps, we need a way to approximate the source code time stamps from the previous build. The way we do that is by first remembering both the time stamp and the SHA-1 at which we built and carry that over to subsequent builds. These subsequent builds then make sure to check out said SHA-1, adjust the last-modified time stamps of the entire tree, and then check out the code as it is meant to be tested by the very workflow run in question. Second, `actions/cache` imposes several restrictions on cached artifacts: only artifacts produced on certain branches can actually be used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact limit per repository [2], and the API is questionable. To circumvent the isolation-imposed requirements, we use the fact that we are maintaining a base branch against which pull requests are created to our advantage. Specifically, once this branch is updated (i.e., pushed to) we trigger a CI run which builds the kernel (already potentially incrementally) and then stores intermediate build artifacts. Pull requests created against this branch (i.e., every one created by Kernel Patches Daemon) will be able to use those artifacts. Doing so indirectly also addresses the space constraints: the packed build artifacts are <1GiB in size. Because we only need to keep one set of artifacts for each arch and toolchain combo (currently: three) we don't need to worry about trashing (as a side note, these limitations don't seem to be enforced, certainly not rigorously or in a timely manner; I've had >34GiB of build artifacts cached for several hours without an eviction happening). Because of the questionable API surface, we have to jump through hoops in order to prevent cache creation for regular pull requests: we remove the KBUILD_OUTPUT contents in a separate step, which causes no cache to be created. As a result of this introduced infrastructure, we see a significant decrease of the "Build Kernel Image" step, which is part of every CI run (happening three times): on GitHub-hosted runners we get down from somewhere in the vicinity of 20minutes to <2minutes. Note that these results depend on what changes the patches in a series make: if a patch modifies a header included by each .c file then the result will be close to a full rebuild irrespective of this infrastructure. Conversely, if a patch only updates documentation the incremental rebuild can be close to instant. In practice, based on a weekend's worth of observations, we seem to be ending up close to "doing almost nothing" almost all the time than to doing a full rebuild. Note that incremental builds are not bullet proof and to a somewhat large degree we are at the mercy of them working correctly. However, when we encounter a build failure we first attempt a 'make clean' followed by a full rebuild [4]. This is expected to solve build related problems in all conceivable cases. There is, however, the chance of miscompilation or similar issues introduced as part of buggy incremental recompilation logic. A brief survey within Meta's kernel group suggests that generally developers rely on incremental builds and anecdotal problems were always in the "build fails" realm. #dogfood Note furthermore that selftest and sample builds are not covered by the incremental build machinery. The main reason for that is that neither of them honors KBUILD_OUTPUT -- a deficiency known to upstream Linux folks but not yet addressed. We could work around that conceptually, but payoff is less for these steps and an upstream fix could get us there without added complexity in the CI itself. Lastly, it is important to note that, should CI, despite all efforts to the contrary, get stuck because a cache somehow contains corrupted data causing us to fail builds, the GitHub UI allows for the deletion of cache artifacts. If no caches are present, this feature is basically a no-op from an initial build perspective (i.e., a full build will be performed). [0] https://github.com/actions/cache [1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache [2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy [3] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#deleting-cache-entries [4] libbpf/ci#73 Signed-off-by: Daniel Müller <deso@posteo.net>
This change introduces the means for performing incremental kernel builds in CI in order to decrease overall CI turn-around times. We piggy back (and rely) on the kernel's own make-based build infrastructure for that purpose. Specifically, with libbpf/ci#67 we have set the stage for separating build artifacts from source code. With the change at hand we use this capability in conjunction with the `actions/cache` [0] GitHub Actions action to share intermediate build artifacts between CI runs, enabling the kernel build to only rebuild what has changed, as opposed to everything unconditionally. There are several wrinkles to doing so. The first one is that GitHub Actions checks out the source code anew every time a workflow is run. Because git does not store file meta data like time stamps, but the kernel's build system relies solely on last-modified time stamps, we need a way to approximate the source code time stamps from the previous build. The way we do that is by first remembering both the time stamp and the SHA-1 at which we built and carry that over to subsequent builds. These subsequent builds then make sure to check out said SHA-1, adjust the last-modified time stamps of the entire tree, and then check out the code as it is meant to be tested by the very workflow run in question. Second, `actions/cache` imposes several restrictions on cached artifacts: only artifacts produced on certain branches can actually be used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact limit per repository [2], and the API is questionable. To circumvent the isolation-imposed requirements, we use the fact that we are maintaining a base branch against which pull requests are created to our advantage. Specifically, once this branch is updated (i.e., pushed to) we trigger a CI run which builds the kernel (already potentially incrementally) and then stores intermediate build artifacts. Pull requests created against this branch (i.e., every one created by Kernel Patches Daemon) will be able to use those artifacts. Doing so indirectly also addresses the space constraints: the packed build artifacts are <1GiB in size. Because we only need to keep one set of artifacts for each arch and toolchain combo (currently: three) we don't need to worry about trashing (as a side note, these limitations don't seem to be enforced, certainly not rigorously or in a timely manner; I've had >34GiB of build artifacts cached for several hours without an eviction happening). Because of the questionable API surface, we have to jump through hoops in order to prevent cache creation for regular pull requests: we remove the KBUILD_OUTPUT contents in a separate step, which causes no cache to be created. As a result of this introduced infrastructure, we see a significant decrease of the "Build Kernel Image" step, which is part of every CI run (happening three times): on GitHub-hosted runners we get down from somewhere in the vicinity of 20minutes to <2minutes. Note that these results depend on what changes the patches in a series make: if a patch modifies a header included by each .c file then the result will be close to a full rebuild irrespective of this infrastructure. Conversely, if a patch only updates documentation the incremental rebuild can be close to instant. In practice, based on a weekend's worth of observations, we seem to be ending up close to "doing almost nothing" almost all the time than to doing a full rebuild. Note that incremental builds are not bullet proof and to a somewhat large degree we are at the mercy of them working correctly. However, when we encounter a build failure we first attempt a 'make clean' followed by a full rebuild [4]. This is expected to solve build related problems in all conceivable cases. There is, however, the chance of miscompilation or similar issues introduced as part of buggy incremental recompilation logic. A brief survey within Meta's kernel group suggests that generally developers rely on incremental builds and anecdotal problems were always in the "build fails" realm. #dogfood Note furthermore that selftest and sample builds are not covered by the incremental build machinery. The main reason for that is that neither of them honors KBUILD_OUTPUT -- a deficiency known to upstream Linux folks but not yet addressed. We could work around that conceptually, but payoff is less for these steps and an upstream fix could get us there without added complexity in the CI itself. Lastly, it is important to note that, should CI, despite all efforts to the contrary, get stuck because a cache somehow contains corrupted data causing us to fail builds, the GitHub UI allows for the deletion of cache artifacts. If no caches are present, this feature is basically a no-op from an initial build perspective (i.e., a full build will be performed). [0] https://github.com/actions/cache [1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache [2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy [3] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#deleting-cache-entries [4] libbpf/ci#73 Signed-off-by: Daniel Müller <deso@posteo.net>
Please see individual commits for description.