Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topic/kbuild output pr #67

Merged
merged 3 commits into from
Nov 4, 2022

Conversation

danielocfb
Copy link
Collaborator

Please see individual commits for description.

Now that we accept KBUILD_OUTPUT we no longer need the `vmlinux_btf`
input argument to some actions -- we can just assume
`KBUILD_OUTPUT/vmlinux`.
Hence, deprecate said argument until all clients have been adjusted to
no longer pass them in (at which point they can be removed).

Signed-off-by: Daniel Müller <deso@posteo.net>
@danielocfb
Copy link
Collaborator Author

danielocfb commented Nov 2, 2022

@danielocfb danielocfb force-pushed the topic/kbuild-output-pr branch 2 times, most recently from 89a8170 to 0b1e2f3 Compare November 2, 2022 22:23
With upcoming changes we would like to separate kernel build artifacts
from kernel sources. Unfortunately, it turns out the two concerns are
not really separated by prepare-rootfs/run.sh. Specifically, the script
assumes `make -C "${BUILDDIR}" <xxx>` to work. However, the build
directory does not necessarily contain any makefiles at all. What the
script really means to do is `make -C <kernel-source> <xxx>`.
Unfortunately, currently the script does not receive the location of the
kernel source tree to begin with.
To fix this state of affairs, this change introduces the `--source`
option, which can be used to provide the location of the kernel source
tree.

Signed-off-by: Daniel Müller <deso@posteo.net>
If we ever want to get incremental kernel build support rolled out to
CI, we need a way to separate build artifacts from source code. That is
necessary because source code is checked out by GitHub Actions as part
of the CI run and it would be counter-productive to keep it around and
transferred between builds.
But even if we decide against incremental builds long-term, it is not a
bad practice to separate those two concerns -- it makes for a logical
separation and can act as a forcing factor to be more explicit about the
paths to artifacts and binaries.
With that in mind, this change adjusts our logic to honor
KBUILD_OUTPUT [0], which controls where the kernel build system stores its
build artifacts. Several of our actions now accept it as input parameter
and the underlying scripts make sure that the build system sees this
variable.

[0]: https://www.kernel.org/doc/html/latest/kbuild/kbuild.html#kbuild-output

Signed-off-by: Daniel Müller <deso@posteo.net>
@danielocfb danielocfb force-pushed the topic/kbuild-output-pr branch 2 times, most recently from 8402c13 to 72915cc Compare November 2, 2022 23:27
@danielocfb danielocfb marked this pull request as ready for review November 3, 2022 16:01
@@ -8,6 +8,7 @@ source "${THISDIR}"/../helpers.sh

ARCH="$1"
TOOLCHAIN="$2"
export KBUILD_OUTPUT="$3"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no strong opinion here, but instead of being a required parameter, could it have been an optional env var.

export KBUILD_OUTPUT=${KBUILD_OUTPUT:-"."}

Copy link
Collaborator Author

@danielocfb danielocfb Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really want to do away with global variables being accessed at random locations and rather make them explicit arguments instead, to be honest.

Edit: And in this case it's guaranteed to be set by the action itself. And I think that's how we should keep it.

Copy link
Collaborator

@chantra chantra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this will need a matching kernel-patches/vmtest diff to stop using those arguments.

Copy link
Collaborator

@chantra chantra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is KERNELSRC being used? I see it being declared, but never used.

@chantra
Copy link
Collaborator

chantra commented Nov 3, 2022

Where is KERNELSRC being used? I see it being declared, but never used.

that was in diff 1.

@danielocfb
Copy link
Collaborator Author

Where is KERNELSRC being used? I see it being declared, but never used.

that was in diff 1.

Yeah, I don't know why GitHub orders commits in pull requests according to some date and not by their actual order. Anyway, we use it here and here.

@danielocfb
Copy link
Collaborator Author

I suppose this will need a matching kernel-patches/vmtest diff to stop using those arguments.

Yep.

@danielocfb danielocfb merged commit 3f93ac5 into libbpf:master Nov 4, 2022
danielocfb pushed a commit to danielocfb/vmtest that referenced this pull request Nov 4, 2022
As of libbpf/ci#67 a bunch of actions honor
KBUILD_OUTPUT. Doing so will make it possible to separate source code
from build artifacts, which in turn may allow us to support incremental
kernel compilation in CI down the line.
Irrespective of these future changes, actions pertaining the kernel
build now ask for an additional input defining where to store or expect
build artifacts. Provide it.

Signed-off-by: Daniel Müller <deso@posteo.net>
@danielocfb danielocfb deleted the topic/kbuild-output-pr branch November 4, 2022 17:13
anakryiko pushed a commit to libbpf/libbpf that referenced this pull request Nov 7, 2022
As of libbpf/ci#67 a bunch of actions honor
KBUILD_OUTPUT. Doing so will make it possible to separate source code
from build artifacts, which in turn may allow us to support incremental
kernel compilation in CI down the line.
Irrespective of these future changes, actions pertaining the kernel
build now ask for an additional input defining where to store or expect
build artifacts. Provide it.

Signed-off-by: Daniel Müller <deso@posteo.net>
anakryiko pushed a commit to kernel-patches/vmtest that referenced this pull request Nov 7, 2022
As of libbpf/ci#67 a bunch of actions honor
KBUILD_OUTPUT. Doing so will make it possible to separate source code
from build artifacts, which in turn may allow us to support incremental
kernel compilation in CI down the line.
Irrespective of these future changes, actions pertaining the kernel
build now ask for an additional input defining where to store or expect
build artifacts. Provide it.

Signed-off-by: Daniel Müller <deso@posteo.net>
danielocfb pushed a commit to danielocfb/vmtest that referenced this pull request Nov 7, 2022
This change introduces the means for performing incremental kernel
builds in CI in order to decrease overall CI turn-around times. We piggy
back (and rely) on the kernel's own make-based build infrastructure for
that purpose. Specifically, with libbpf/ci#67 we
have set the stage for separating build artifacts from source code. With
the change at hand we use this capability in conjunction with the
`actions/cache` [0] GitHub Actions action to share intermediate build
artifacts between CI runs, enabling the kernel build to only rebuild
what has changed, as opposed to everything unconditionally.

There are several wrinkles to doing so. The first one is that GitHub
Actions checks out the source code anew every time a workflow is run.
Because git does not store file meta data like time stamps, but the
kernel's build system relies solely on last-modified time stamps, we
need a way to approximate the source code time stamps from the previous
build. The way we do that is by first remember both the time stamp and
the SHA-1 at which we built and carry that over to subsequent builds.
These subsequent builds then make sure to check out said SHA-1, adjust
the last-modified time stamps of the entire tree, and then check out the
code as it is meant to be tested by the very workflow run in question.

Second, `actions/cache` imposes several restrictions on cached
artifacts: there is a 10 GiB limit per repository, only artifacts
produced on certain branches can actually be used (to achieve cache
isolation), and

[0] https://github.com/actions/cache

Signed-off-by: Daniel Müller <deso@posteo.net>
danielocfb pushed a commit to danielocfb/vmtest that referenced this pull request Nov 7, 2022
This change introduces the means for performing incremental kernel
builds in CI in order to decrease overall CI turn-around times. We piggy
back (and rely) on the kernel's own make-based build infrastructure for
that purpose. Specifically, with libbpf/ci#67 we
have set the stage for separating build artifacts from source code. With
the change at hand we use this capability in conjunction with the
`actions/cache` [0] GitHub Actions action to share intermediate build
artifacts between CI runs, enabling the kernel build to only rebuild
what has changed, as opposed to everything unconditionally.

There are several wrinkles to doing so. The first one is that GitHub
Actions checks out the source code anew every time a workflow is run.
Because git does not store file meta data like time stamps, but the
kernel's build system relies solely on last-modified time stamps, we
need a way to approximate the source code time stamps from the previous
build. The way we do that is by first remembering both the time stamp
and the SHA-1 at which we built and carry that over to subsequent
builds. These subsequent builds then make sure to check out said SHA-1,
adjust the last-modified time stamps of the entire tree, and then check
out the code as it is meant to be tested by the very workflow run in
question.

Second, `actions/cache` imposes several restrictions on cached
artifacts: only artifacts produced on certain branches can actually be
used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact
limit per repository [2], and the API is questionable.
To circumvent the isolation-imposed requirements, we use the fact that
we are maintaining a base branch against which pull requests are created
to our advantage. Specifically, once this branch is updated (i.e.,
pushed to) we trigger a CI run which builds the kernel (already
potentially incrementally) and then stores intermediate build artifacts.
Pull requests created against this branch (i.e., every one created by
Kernel Patches Daemon) will be able to use those artifacts.

Doing so indirectly also addresses the space constraints: the packed
build artifacts are <1GiB in size. Because we only need to keep one set
of artifacts for each arch and toolchain combo (currently: three) we
don't need to worry about trashing (as a side note, these limitations
don't seem to be enforced, certainly not rigorously or in a timely
manner; I've had >34GiB of build artifacts cached for several hours
without an eviction happening).

Because of the questionable API, we have to jump through hoops in order
to prevent cache creation for regular pull requests: we remove the
KBUILD_OUTPUT contents in a separate step, which causes no cache to be
created.

[0] https://github.com/actions/cache
[1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache
[2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy

Signed-off-by: Daniel Müller <deso@posteo.net>
danielocfb pushed a commit to danielocfb/vmtest that referenced this pull request Nov 7, 2022
This change introduces the means for performing incremental kernel
builds in CI in order to decrease overall CI turn-around times. We piggy
back (and rely) on the kernel's own make-based build infrastructure for
that purpose. Specifically, with libbpf/ci#67 we
have set the stage for separating build artifacts from source code. With
the change at hand we use this capability in conjunction with the
`actions/cache` [0] GitHub Actions action to share intermediate build
artifacts between CI runs, enabling the kernel build to only rebuild
what has changed, as opposed to everything unconditionally.

There are several wrinkles to doing so. The first one is that GitHub
Actions checks out the source code anew every time a workflow is run.
Because git does not store file meta data like time stamps, but the
kernel's build system relies solely on last-modified time stamps, we
need a way to approximate the source code time stamps from the previous
build. The way we do that is by first remembering both the time stamp
and the SHA-1 at which we built and carry that over to subsequent
builds. These subsequent builds then make sure to check out said SHA-1,
adjust the last-modified time stamps of the entire tree, and then check
out the code as it is meant to be tested by the very workflow run in
question.

Second, `actions/cache` imposes several restrictions on cached
artifacts: only artifacts produced on certain branches can actually be
used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact
limit per repository [2], and the API is questionable.
To circumvent the isolation-imposed requirements, we use the fact that
we are maintaining a base branch against which pull requests are created
to our advantage. Specifically, once this branch is updated (i.e.,
pushed to) we trigger a CI run which builds the kernel (already
potentially incrementally) and then stores intermediate build artifacts.
Pull requests created against this branch (i.e., every one created by
Kernel Patches Daemon) will be able to use those artifacts.

Doing so indirectly also addresses the space constraints: the packed
build artifacts are <1GiB in size. Because we only need to keep one set
of artifacts for each arch and toolchain combo (currently: three) we
don't need to worry about trashing (as a side note, these limitations
don't seem to be enforced, certainly not rigorously or in a timely
manner; I've had >34GiB of build artifacts cached for several hours
without an eviction happening).

Because of the questionable API, we have to jump through hoops in order
to prevent cache creation for regular pull requests: we remove the
KBUILD_OUTPUT contents in a separate step, which causes no cache to be
created.

As a result of this infrastructure, we see a significant decrease of the
"Build Kernel Image" step, which is part of every CI run (happening
three times): on GitHub-hosted runners we get down from somewhere in the
vicinity of 20minutes to <2minutes. Note that these results depend on
what changes the patches in a series make: if a patch modifies a header
included by each .c file then the result will be close to a full rebuild
irrespective of this infrastructure. Conversely, if a patch only updates
documentation the incremental rebuild can be close to instant. In
practice, based on time-boxed observations, we seem to be ending up
closer to "doing almost nothing" than to doing a full rebuild.

[0] https://github.com/actions/cache
[1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache
[2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy

Signed-off-by: Daniel Müller <deso@posteo.net>
danielocfb pushed a commit to danielocfb/vmtest that referenced this pull request Nov 8, 2022
This change introduces the means for performing incremental kernel
builds in CI in order to decrease overall CI turn-around times. We piggy
back (and rely) on the kernel's own make-based build infrastructure for
that purpose. Specifically, with libbpf/ci#67 we
have set the stage for separating build artifacts from source code. With
the change at hand we use this capability in conjunction with the
`actions/cache` [0] GitHub Actions action to share intermediate build
artifacts between CI runs, enabling the kernel build to only rebuild
what has changed, as opposed to everything unconditionally.

There are several wrinkles to doing so. The first one is that GitHub
Actions checks out the source code anew every time a workflow is run.
Because git does not store file meta data like time stamps, but the
kernel's build system relies solely on last-modified time stamps, we
need a way to approximate the source code time stamps from the previous
build. The way we do that is by first remembering both the time stamp
and the SHA-1 at which we built and carry that over to subsequent
builds. These subsequent builds then make sure to check out said SHA-1,
adjust the last-modified time stamps of the entire tree, and then check
out the code as it is meant to be tested by the very workflow run in
question.

Second, `actions/cache` imposes several restrictions on cached
artifacts: only artifacts produced on certain branches can actually be
used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact
limit per repository [2], and the API is questionable.
To circumvent the isolation-imposed requirements, we use the fact that
we are maintaining a base branch against which pull requests are created
to our advantage. Specifically, once this branch is updated (i.e.,
pushed to) we trigger a CI run which builds the kernel (already
potentially incrementally) and then stores intermediate build artifacts.
Pull requests created against this branch (i.e., every one created by
Kernel Patches Daemon) will be able to use those artifacts.

Doing so indirectly also addresses the space constraints: the packed
build artifacts are <1GiB in size. Because we only need to keep one set
of artifacts for each arch and toolchain combo (currently: three) we
don't need to worry about trashing (as a side note, these limitations
don't seem to be enforced, certainly not rigorously or in a timely
manner; I've had >34GiB of build artifacts cached for several hours
without an eviction happening).

Because of the questionable API surface, we have to jump through hoops
in order to prevent cache creation for regular pull requests: we remove
the KBUILD_OUTPUT contents in a separate step, which causes no cache to
be created.

As a result of this introduced infrastructure, we see a significant
decrease of the "Build Kernel Image" step, which is part of every CI run
(happening three times): on GitHub-hosted runners we get down from
somewhere in the vicinity of 20minutes to <2minutes. Note that these
results depend on what changes the patches in a series make: if a patch
modifies a header included by each .c file then the result will be close
to a full rebuild irrespective of this infrastructure. Conversely, if a
patch only updates documentation the incremental rebuild can be close to
instant. In practice, based on a weekend's worth of observations, we
seem to be ending up closer to "doing almost nothing" than to doing a
full rebuild.

Note that incremental builds are not bullet proof and to a somewhat
large degree we are at the mercy of them working correctly. However,
when we encounter a build failure we first attempt a 'make clean'
followed by a full rebuild. This is expected to solve build related
problems in all conceivable cases. There is, however, the chance of
miscompilation or similar issues introduced as part of buggy incremental
recompilation logic. A brief survey within Meta's kernel group suggests
that generally developers rely on incremental builds and anecdotal
problems were always in the "build fails" realm. #dogfood

Note furthermore that selftest and sample builds are not covered by the
incremental build machinery. The main reason for that is that neither of
them honors KBUILD_OUTPUT -- a deficiency known but not yet addressed.
We could work around that conceptually, but payoff is less for these
steps and an upstream fix could get us there without added complexity in
the CI itself.

[0] https://github.com/actions/cache
[1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache
[2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy

Signed-off-by: Daniel Müller <deso@posteo.net>
danielocfb pushed a commit to danielocfb/vmtest that referenced this pull request Nov 8, 2022
This change introduces the means for performing incremental kernel
builds in CI in order to decrease overall CI turn-around times. We piggy
back (and rely) on the kernel's own make-based build infrastructure for
that purpose. Specifically, with libbpf/ci#67 we
have set the stage for separating build artifacts from source code. With
the change at hand we use this capability in conjunction with the
`actions/cache` [0] GitHub Actions action to share intermediate build
artifacts between CI runs, enabling the kernel build to only rebuild
what has changed, as opposed to everything unconditionally.

There are several wrinkles to doing so. The first one is that GitHub
Actions checks out the source code anew every time a workflow is run.
Because git does not store file meta data like time stamps, but the
kernel's build system relies solely on last-modified time stamps, we
need a way to approximate the source code time stamps from the previous
build. The way we do that is by first remembering both the time stamp
and the SHA-1 at which we built and carry that over to subsequent
builds. These subsequent builds then make sure to check out said SHA-1,
adjust the last-modified time stamps of the entire tree, and then check
out the code as it is meant to be tested by the very workflow run in
question.

Second, `actions/cache` imposes several restrictions on cached
artifacts: only artifacts produced on certain branches can actually be
used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact
limit per repository [2], and the API is questionable.
To circumvent the isolation-imposed requirements, we use the fact that
we are maintaining a base branch against which pull requests are created
to our advantage. Specifically, once this branch is updated (i.e.,
pushed to) we trigger a CI run which builds the kernel (already
potentially incrementally) and then stores intermediate build artifacts.
Pull requests created against this branch (i.e., every one created by
Kernel Patches Daemon) will be able to use those artifacts.

Doing so indirectly also addresses the space constraints: the packed
build artifacts are <1GiB in size. Because we only need to keep one set
of artifacts for each arch and toolchain combo (currently: three) we
don't need to worry about trashing (as a side note, these limitations
don't seem to be enforced, certainly not rigorously or in a timely
manner; I've had >34GiB of build artifacts cached for several hours
without an eviction happening).

Because of the questionable API surface, we have to jump through hoops
in order to prevent cache creation for regular pull requests: we remove
the KBUILD_OUTPUT contents in a separate step, which causes no cache to
be created.

As a result of this introduced infrastructure, we see a significant
decrease of the "Build Kernel Image" step, which is part of every CI run
(happening three times): on GitHub-hosted runners we get down from
somewhere in the vicinity of 20minutes to <2minutes. Note that these
results depend on what changes the patches in a series make: if a patch
modifies a header included by each .c file then the result will be close
to a full rebuild irrespective of this infrastructure. Conversely, if a
patch only updates documentation the incremental rebuild can be close to
instant. In practice, based on a weekend's worth of observations, we
seem to be ending up closer to "doing almost nothing" than to doing a
full rebuild.

Note that incremental builds are not bullet proof and to a somewhat
large degree we are at the mercy of them working correctly. However,
when we encounter a build failure we first attempt a 'make clean'
followed by a full rebuild. This is expected to solve build related
problems in all conceivable cases. There is, however, the chance of
miscompilation or similar issues introduced as part of buggy incremental
recompilation logic. A brief survey within Meta's kernel group suggests
that generally developers rely on incremental builds and anecdotal
problems were always in the "build fails" realm. #dogfood

Note furthermore that selftest and sample builds are not covered by the
incremental build machinery. The main reason for that is that neither of
them honors KBUILD_OUTPUT -- a deficiency known but not yet addressed.
We could work around that conceptually, but payoff is less for these
steps and an upstream fix could get us there without added complexity in
the CI itself.

Lastly, should CI, despite all efforts to the contrary, get stuck
because a cache somehow contains corrupted data causing us to fail
builds, the GitHub UI allows for the deletion of cache artifacts. If no
caches are present, this feature is basically a no-op from an initial
build perspective (i.e., a full build will be performed).

[0] https://github.com/actions/cache
[1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache
[2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy
[3] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#deleting-cache-entries

Signed-off-by: Daniel Müller <deso@posteo.net>
danielocfb pushed a commit to danielocfb/vmtest that referenced this pull request Nov 8, 2022
This change introduces the means for performing incremental kernel
builds in CI in order to decrease overall CI turn-around times. We piggy
back (and rely) on the kernel's own make-based build infrastructure for
that purpose. Specifically, with libbpf/ci#67 we
have set the stage for separating build artifacts from source code. With
the change at hand we use this capability in conjunction with the
`actions/cache` [0] GitHub Actions action to share intermediate build
artifacts between CI runs, enabling the kernel build to only rebuild
what has changed, as opposed to everything unconditionally.

There are several wrinkles to doing so. The first one is that GitHub
Actions checks out the source code anew every time a workflow is run.
Because git does not store file meta data like time stamps, but the
kernel's build system relies solely on last-modified time stamps, we
need a way to approximate the source code time stamps from the previous
build. The way we do that is by first remembering both the time stamp
and the SHA-1 at which we built and carry that over to subsequent
builds. These subsequent builds then make sure to check out said SHA-1,
adjust the last-modified time stamps of the entire tree, and then check
out the code as it is meant to be tested by the very workflow run in
question.

Second, `actions/cache` imposes several restrictions on cached
artifacts: only artifacts produced on certain branches can actually be
used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact
limit per repository [2], and the API is questionable.
To circumvent the isolation-imposed requirements, we use the fact that
we are maintaining a base branch against which pull requests are created
to our advantage. Specifically, once this branch is updated (i.e.,
pushed to) we trigger a CI run which builds the kernel (already
potentially incrementally) and then stores intermediate build artifacts.
Pull requests created against this branch (i.e., every one created by
Kernel Patches Daemon) will be able to use those artifacts.

Doing so indirectly also addresses the space constraints: the packed
build artifacts are <1GiB in size. Because we only need to keep one set
of artifacts for each arch and toolchain combo (currently: three) we
don't need to worry about trashing (as a side note, these limitations
don't seem to be enforced, certainly not rigorously or in a timely
manner; I've had >34GiB of build artifacts cached for several hours
without an eviction happening).

Because of the questionable API surface, we have to jump through hoops
in order to prevent cache creation for regular pull requests: we remove
the KBUILD_OUTPUT contents in a separate step, which causes no cache to
be created.

As a result of this introduced infrastructure, we see a significant
decrease of the "Build Kernel Image" step, which is part of every CI run
(happening three times): on GitHub-hosted runners we get down from
somewhere in the vicinity of 20minutes to <2minutes. Note that these
results depend on what changes the patches in a series make: if a patch
modifies a header included by each .c file then the result will be close
to a full rebuild irrespective of this infrastructure. Conversely, if a
patch only updates documentation the incremental rebuild can be close to
instant. In practice, based on a weekend's worth of observations, we
seem to be ending up close to "doing almost nothing" almost all the time
than to doing a full rebuild.

Note that incremental builds are not bullet proof and to a somewhat
large degree we are at the mercy of them working correctly. However,
when we encounter a build failure we first attempt a 'make clean'
followed by a full rebuild [4]. This is expected to solve build related
problems in all conceivable cases. There is, however, the chance of
miscompilation or similar issues introduced as part of buggy incremental
recompilation logic. A brief survey within Meta's kernel group suggests
that generally developers rely on incremental builds and anecdotal
problems were always in the "build fails" realm. #dogfood

Note furthermore that selftest and sample builds are not covered by the
incremental build machinery. The main reason for that is that neither of
them honors KBUILD_OUTPUT -- a deficiency known to upstream Linux folks
but not yet addressed. We could work around that conceptually, but
payoff is less for these steps and an upstream fix could get us there
without added complexity in the CI itself.

Lastly, it is important to note that, should CI, despite all efforts to
the contrary, get stuck because a cache somehow contains corrupted data
causing us to fail builds, the GitHub UI allows for the deletion of
cache artifacts. If no caches are present, this feature is basically a
no-op from an initial build perspective (i.e., a full build will be
performed).

[0] https://github.com/actions/cache
[1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache
[2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy
[3] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#deleting-cache-entries
[4] libbpf/ci#73

Signed-off-by: Daniel Müller <deso@posteo.net>
danielocfb pushed a commit to danielocfb/vmtest that referenced this pull request Nov 9, 2022
This change introduces the means for performing incremental kernel
builds in CI in order to decrease overall CI turn-around times. We piggy
back (and rely) on the kernel's own make-based build infrastructure for
that purpose. Specifically, with libbpf/ci#67 we
have set the stage for separating build artifacts from source code. With
the change at hand we use this capability in conjunction with the
`actions/cache` [0] GitHub Actions action to share intermediate build
artifacts between CI runs, enabling the kernel build to only rebuild
what has changed, as opposed to everything unconditionally.

There are several wrinkles to doing so. The first one is that GitHub
Actions checks out the source code anew every time a workflow is run.
Because git does not store file meta data like time stamps, but the
kernel's build system relies solely on last-modified time stamps, we
need a way to approximate the source code time stamps from the previous
build. The way we do that is by first remembering both the time stamp
and the SHA-1 at which we built and carry that over to subsequent
builds. These subsequent builds then make sure to check out said SHA-1,
adjust the last-modified time stamps of the entire tree, and then check
out the code as it is meant to be tested by the very workflow run in
question.

Second, `actions/cache` imposes several restrictions on cached
artifacts: only artifacts produced on certain branches can actually be
used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact
limit per repository [2], and the API is questionable.
To circumvent the isolation-imposed requirements, we use the fact that
we are maintaining a base branch against which pull requests are created
to our advantage. Specifically, once this branch is updated (i.e.,
pushed to) we trigger a CI run which builds the kernel (already
potentially incrementally) and then stores intermediate build artifacts.
Pull requests created against this branch (i.e., every one created by
Kernel Patches Daemon) will be able to use those artifacts.

Doing so indirectly also addresses the space constraints: the packed
build artifacts are <1GiB in size. Because we only need to keep one set
of artifacts for each arch and toolchain combo (currently: three) we
don't need to worry about trashing (as a side note, these limitations
don't seem to be enforced, certainly not rigorously or in a timely
manner; I've had >34GiB of build artifacts cached for several hours
without an eviction happening).

Because of the questionable API surface, we have to jump through hoops
in order to prevent cache creation for regular pull requests: we remove
the KBUILD_OUTPUT contents in a separate step, which causes no cache to
be created.

As a result of this introduced infrastructure, we see a significant
decrease of the "Build Kernel Image" step, which is part of every CI run
(happening three times): on GitHub-hosted runners we get down from
somewhere in the vicinity of 20minutes to <2minutes. Note that these
results depend on what changes the patches in a series make: if a patch
modifies a header included by each .c file then the result will be close
to a full rebuild irrespective of this infrastructure. Conversely, if a
patch only updates documentation the incremental rebuild can be close to
instant. In practice, based on a weekend's worth of observations, we
seem to be ending up close to "doing almost nothing" almost all the time
than to doing a full rebuild.

Note that incremental builds are not bullet proof and to a somewhat
large degree we are at the mercy of them working correctly. However,
when we encounter a build failure we first attempt a 'make clean'
followed by a full rebuild [4]. This is expected to solve build related
problems in all conceivable cases. There is, however, the chance of
miscompilation or similar issues introduced as part of buggy incremental
recompilation logic. A brief survey within Meta's kernel group suggests
that generally developers rely on incremental builds and anecdotal
problems were always in the "build fails" realm. #dogfood

Note furthermore that selftest and sample builds are not covered by the
incremental build machinery. The main reason for that is that neither of
them honors KBUILD_OUTPUT -- a deficiency known to upstream Linux folks
but not yet addressed. We could work around that conceptually, but
payoff is less for these steps and an upstream fix could get us there
without added complexity in the CI itself.

Lastly, it is important to note that, should CI, despite all efforts to
the contrary, get stuck because a cache somehow contains corrupted data
causing us to fail builds, the GitHub UI allows for the deletion of
cache artifacts. If no caches are present, this feature is basically a
no-op from an initial build perspective (i.e., a full build will be
performed).

[0] https://github.com/actions/cache
[1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache
[2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy
[3] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#deleting-cache-entries
[4] libbpf/ci#73

Signed-off-by: Daniel Müller <deso@posteo.net>
danielocfb pushed a commit to danielocfb/vmtest that referenced this pull request Nov 9, 2022
This change introduces the means for performing incremental kernel
builds in CI in order to decrease overall CI turn-around times. We piggy
back (and rely) on the kernel's own make-based build infrastructure for
that purpose. Specifically, with libbpf/ci#67 we
have set the stage for separating build artifacts from source code. With
the change at hand we use this capability in conjunction with the
`actions/cache` [0] GitHub Actions action to share intermediate build
artifacts between CI runs, enabling the kernel build to only rebuild
what has changed, as opposed to everything unconditionally.

There are several wrinkles to doing so. The first one is that GitHub
Actions checks out the source code anew every time a workflow is run.
Because git does not store file meta data like time stamps, but the
kernel's build system relies solely on last-modified time stamps, we
need a way to approximate the source code time stamps from the previous
build. The way we do that is by first remembering both the time stamp
and the SHA-1 at which we built and carry that over to subsequent
builds. These subsequent builds then make sure to check out said SHA-1,
adjust the last-modified time stamps of the entire tree, and then check
out the code as it is meant to be tested by the very workflow run in
question.

Second, `actions/cache` imposes several restrictions on cached
artifacts: only artifacts produced on certain branches can actually be
used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact
limit per repository [2], and the API is questionable.
To circumvent the isolation-imposed requirements, we use the fact that
we are maintaining a base branch against which pull requests are created
to our advantage. Specifically, once this branch is updated (i.e.,
pushed to) we trigger a CI run which builds the kernel (already
potentially incrementally) and then stores intermediate build artifacts.
Pull requests created against this branch (i.e., every one created by
Kernel Patches Daemon) will be able to use those artifacts.

Doing so indirectly also addresses the space constraints: the packed
build artifacts are <1GiB in size. Because we only need to keep one set
of artifacts for each arch and toolchain combo (currently: three) we
don't need to worry about trashing (as a side note, these limitations
don't seem to be enforced, certainly not rigorously or in a timely
manner; I've had >34GiB of build artifacts cached for several hours
without an eviction happening).

Because of the questionable API surface, we have to jump through hoops
in order to prevent cache creation for regular pull requests: we remove
the KBUILD_OUTPUT contents in a separate step, which causes no cache to
be created.

As a result of this introduced infrastructure, we see a significant
decrease of the "Build Kernel Image" step, which is part of every CI run
(happening three times): on GitHub-hosted runners we get down from
somewhere in the vicinity of 20minutes to <2minutes. Note that these
results depend on what changes the patches in a series make: if a patch
modifies a header included by each .c file then the result will be close
to a full rebuild irrespective of this infrastructure. Conversely, if a
patch only updates documentation the incremental rebuild can be close to
instant. In practice, based on a weekend's worth of observations, we
seem to be ending up close to "doing almost nothing" almost all the time
than to doing a full rebuild.

Note that incremental builds are not bullet proof and to a somewhat
large degree we are at the mercy of them working correctly. However,
when we encounter a build failure we first attempt a 'make clean'
followed by a full rebuild [4]. This is expected to solve build related
problems in all conceivable cases. There is, however, the chance of
miscompilation or similar issues introduced as part of buggy incremental
recompilation logic. A brief survey within Meta's kernel group suggests
that generally developers rely on incremental builds and anecdotal
problems were always in the "build fails" realm. #dogfood

Note furthermore that selftest and sample builds are not covered by the
incremental build machinery. The main reason for that is that neither of
them honors KBUILD_OUTPUT -- a deficiency known to upstream Linux folks
but not yet addressed. We could work around that conceptually, but
payoff is less for these steps and an upstream fix could get us there
without added complexity in the CI itself.

Lastly, it is important to note that, should CI, despite all efforts to
the contrary, get stuck because a cache somehow contains corrupted data
causing us to fail builds, the GitHub UI allows for the deletion of
cache artifacts. If no caches are present, this feature is basically a
no-op from an initial build perspective (i.e., a full build will be
performed).

[0] https://github.com/actions/cache
[1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache
[2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy
[3] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#deleting-cache-entries
[4] libbpf/ci#73

Signed-off-by: Daniel Müller <deso@posteo.net>
danielocfb pushed a commit to danielocfb/vmtest that referenced this pull request Nov 9, 2022
This change introduces the means for performing incremental kernel
builds in CI in order to decrease overall CI turn-around times. We piggy
back (and rely) on the kernel's own make-based build infrastructure for
that purpose. Specifically, with libbpf/ci#67 we
have set the stage for separating build artifacts from source code. With
the change at hand we use this capability in conjunction with the
`actions/cache` [0] GitHub Actions action to share intermediate build
artifacts between CI runs, enabling the kernel build to only rebuild
what has changed, as opposed to everything unconditionally.

There are several wrinkles to doing so. The first one is that GitHub
Actions checks out the source code anew every time a workflow is run.
Because git does not store file meta data like time stamps, but the
kernel's build system relies solely on last-modified time stamps, we
need a way to approximate the source code time stamps from the previous
build. The way we do that is by first remembering both the time stamp
and the SHA-1 at which we built and carry that over to subsequent
builds. These subsequent builds then make sure to check out said SHA-1,
adjust the last-modified time stamps of the entire tree, and then check
out the code as it is meant to be tested by the very workflow run in
question.

Second, `actions/cache` imposes several restrictions on cached
artifacts: only artifacts produced on certain branches can actually be
used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact
limit per repository [2], and the API is questionable.
To circumvent the isolation-imposed requirements, we use the fact that
we are maintaining a base branch against which pull requests are created
to our advantage. Specifically, once this branch is updated (i.e.,
pushed to) we trigger a CI run which builds the kernel (already
potentially incrementally) and then stores intermediate build artifacts.
Pull requests created against this branch (i.e., every one created by
Kernel Patches Daemon) will be able to use those artifacts.

Doing so indirectly also addresses the space constraints: the packed
build artifacts are <1GiB in size. Because we only need to keep one set
of artifacts for each arch and toolchain combo (currently: three) we
don't need to worry about trashing (as a side note, these limitations
don't seem to be enforced, certainly not rigorously or in a timely
manner; I've had >34GiB of build artifacts cached for several hours
without an eviction happening).

Because of the questionable API surface, we have to jump through hoops
in order to prevent cache creation for regular pull requests: we remove
the KBUILD_OUTPUT contents in a separate step, which causes no cache to
be created.

As a result of this introduced infrastructure, we see a significant
decrease of the "Build Kernel Image" step, which is part of every CI run
(happening three times): on GitHub-hosted runners we get down from
somewhere in the vicinity of 20minutes to <2minutes. Note that these
results depend on what changes the patches in a series make: if a patch
modifies a header included by each .c file then the result will be close
to a full rebuild irrespective of this infrastructure. Conversely, if a
patch only updates documentation the incremental rebuild can be close to
instant. In practice, based on a weekend's worth of observations, we
seem to be ending up close to "doing almost nothing" almost all the time
than to doing a full rebuild.

Note that incremental builds are not bullet proof and to a somewhat
large degree we are at the mercy of them working correctly. However,
when we encounter a build failure we first attempt a 'make clean'
followed by a full rebuild [4]. This is expected to solve build related
problems in all conceivable cases. There is, however, the chance of
miscompilation or similar issues introduced as part of buggy incremental
recompilation logic. A brief survey within Meta's kernel group suggests
that generally developers rely on incremental builds and anecdotal
problems were always in the "build fails" realm. #dogfood

Note furthermore that selftest and sample builds are not covered by the
incremental build machinery. The main reason for that is that neither of
them honors KBUILD_OUTPUT -- a deficiency known to upstream Linux folks
but not yet addressed. We could work around that conceptually, but
payoff is less for these steps and an upstream fix could get us there
without added complexity in the CI itself.

Lastly, it is important to note that, should CI, despite all efforts to
the contrary, get stuck because a cache somehow contains corrupted data
causing us to fail builds, the GitHub UI allows for the deletion of
cache artifacts. If no caches are present, this feature is basically a
no-op from an initial build perspective (i.e., a full build will be
performed).

[0] https://github.com/actions/cache
[1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache
[2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy
[3] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#deleting-cache-entries
[4] libbpf/ci#73

Signed-off-by: Daniel Müller <deso@posteo.net>
danielocfb pushed a commit to danielocfb/vmtest that referenced this pull request Nov 9, 2022
This change introduces the means for performing incremental kernel
builds in CI in order to decrease overall CI turn-around times. We piggy
back (and rely) on the kernel's own make-based build infrastructure for
that purpose. Specifically, with libbpf/ci#67 we
have set the stage for separating build artifacts from source code. With
the change at hand we use this capability in conjunction with the
`actions/cache` [0] GitHub Actions action to share intermediate build
artifacts between CI runs, enabling the kernel build to only rebuild
what has changed, as opposed to everything unconditionally.

There are several wrinkles to doing so. The first one is that GitHub
Actions checks out the source code anew every time a workflow is run.
Because git does not store file meta data like time stamps, but the
kernel's build system relies solely on last-modified time stamps, we
need a way to approximate the source code time stamps from the previous
build. The way we do that is by first remembering both the time stamp
and the SHA-1 at which we built and carry that over to subsequent
builds. These subsequent builds then make sure to check out said SHA-1,
adjust the last-modified time stamps of the entire tree, and then check
out the code as it is meant to be tested by the very workflow run in
question.

Second, `actions/cache` imposes several restrictions on cached
artifacts: only artifacts produced on certain branches can actually be
used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact
limit per repository [2], and the API is questionable.
To circumvent the isolation-imposed requirements, we use the fact that
we are maintaining a base branch against which pull requests are created
to our advantage. Specifically, once this branch is updated (i.e.,
pushed to) we trigger a CI run which builds the kernel (already
potentially incrementally) and then stores intermediate build artifacts.
Pull requests created against this branch (i.e., every one created by
Kernel Patches Daemon) will be able to use those artifacts.

Doing so indirectly also addresses the space constraints: the packed
build artifacts are <1GiB in size. Because we only need to keep one set
of artifacts for each arch and toolchain combo (currently: three) we
don't need to worry about trashing (as a side note, these limitations
don't seem to be enforced, certainly not rigorously or in a timely
manner; I've had >34GiB of build artifacts cached for several hours
without an eviction happening).

Because of the questionable API surface, we have to jump through hoops
in order to prevent cache creation for regular pull requests: we remove
the KBUILD_OUTPUT contents in a separate step, which causes no cache to
be created.

As a result of this introduced infrastructure, we see a significant
decrease of the "Build Kernel Image" step, which is part of every CI run
(happening three times): on GitHub-hosted runners we get down from
somewhere in the vicinity of 20minutes to <2minutes. Note that these
results depend on what changes the patches in a series make: if a patch
modifies a header included by each .c file then the result will be close
to a full rebuild irrespective of this infrastructure. Conversely, if a
patch only updates documentation the incremental rebuild can be close to
instant. In practice, based on a weekend's worth of observations, we
seem to be ending up close to "doing almost nothing" almost all the time
than to doing a full rebuild.

Note that incremental builds are not bullet proof and to a somewhat
large degree we are at the mercy of them working correctly. However,
when we encounter a build failure we first attempt a 'make clean'
followed by a full rebuild [4]. This is expected to solve build related
problems in all conceivable cases. There is, however, the chance of
miscompilation or similar issues introduced as part of buggy incremental
recompilation logic. A brief survey within Meta's kernel group suggests
that generally developers rely on incremental builds and anecdotal
problems were always in the "build fails" realm. #dogfood

Note furthermore that selftest and sample builds are not covered by the
incremental build machinery. The main reason for that is that neither of
them honors KBUILD_OUTPUT -- a deficiency known to upstream Linux folks
but not yet addressed. We could work around that conceptually, but
payoff is less for these steps and an upstream fix could get us there
without added complexity in the CI itself.

Lastly, it is important to note that, should CI, despite all efforts to
the contrary, get stuck because a cache somehow contains corrupted data
causing us to fail builds, the GitHub UI allows for the deletion of
cache artifacts. If no caches are present, this feature is basically a
no-op from an initial build perspective (i.e., a full build will be
performed).

[0] https://github.com/actions/cache
[1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache
[2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy
[3] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#deleting-cache-entries
[4] libbpf/ci#73

Signed-off-by: Daniel Müller <deso@posteo.net>
danielocfb pushed a commit to danielocfb/vmtest that referenced this pull request Nov 9, 2022
This change introduces the means for performing incremental kernel
builds in CI in order to decrease overall CI turn-around times. We piggy
back (and rely) on the kernel's own make-based build infrastructure for
that purpose. Specifically, with libbpf/ci#67 we
have set the stage for separating build artifacts from source code. With
the change at hand we use this capability in conjunction with the
`actions/cache` [0] GitHub Actions action to share intermediate build
artifacts between CI runs, enabling the kernel build to only rebuild
what has changed, as opposed to everything unconditionally.

There are several wrinkles to doing so. The first one is that GitHub
Actions checks out the source code anew every time a workflow is run.
Because git does not store file meta data like time stamps, but the
kernel's build system relies solely on last-modified time stamps, we
need a way to approximate the source code time stamps from the previous
build. The way we do that is by first remembering both the time stamp
and the SHA-1 at which we built and carry that over to subsequent
builds. These subsequent builds then make sure to check out said SHA-1,
adjust the last-modified time stamps of the entire tree, and then check
out the code as it is meant to be tested by the very workflow run in
question.

Second, `actions/cache` imposes several restrictions on cached
artifacts: only artifacts produced on certain branches can actually be
used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact
limit per repository [2], and the API is questionable.
To circumvent the isolation-imposed requirements, we use the fact that
we are maintaining a base branch against which pull requests are created
to our advantage. Specifically, once this branch is updated (i.e.,
pushed to) we trigger a CI run which builds the kernel (already
potentially incrementally) and then stores intermediate build artifacts.
Pull requests created against this branch (i.e., every one created by
Kernel Patches Daemon) will be able to use those artifacts.

Doing so indirectly also addresses the space constraints: the packed
build artifacts are <1GiB in size. Because we only need to keep one set
of artifacts for each arch and toolchain combo (currently: three) we
don't need to worry about trashing (as a side note, these limitations
don't seem to be enforced, certainly not rigorously or in a timely
manner; I've had >34GiB of build artifacts cached for several hours
without an eviction happening).

Because of the questionable API surface, we have to jump through hoops
in order to prevent cache creation for regular pull requests: we remove
the KBUILD_OUTPUT contents in a separate step, which causes no cache to
be created.

As a result of this introduced infrastructure, we see a significant
decrease of the "Build Kernel Image" step, which is part of every CI run
(happening three times): on GitHub-hosted runners we get down from
somewhere in the vicinity of 20minutes to <2minutes. Note that these
results depend on what changes the patches in a series make: if a patch
modifies a header included by each .c file then the result will be close
to a full rebuild irrespective of this infrastructure. Conversely, if a
patch only updates documentation the incremental rebuild can be close to
instant. In practice, based on a weekend's worth of observations, we
seem to be ending up close to "doing almost nothing" almost all the time
than to doing a full rebuild.

Note that incremental builds are not bullet proof and to a somewhat
large degree we are at the mercy of them working correctly. However,
when we encounter a build failure we first attempt a 'make clean'
followed by a full rebuild [4]. This is expected to solve build related
problems in all conceivable cases. There is, however, the chance of
miscompilation or similar issues introduced as part of buggy incremental
recompilation logic. A brief survey within Meta's kernel group suggests
that generally developers rely on incremental builds and anecdotal
problems were always in the "build fails" realm. #dogfood

Note furthermore that selftest and sample builds are not covered by the
incremental build machinery. The main reason for that is that neither of
them honors KBUILD_OUTPUT -- a deficiency known to upstream Linux folks
but not yet addressed. We could work around that conceptually, but
payoff is less for these steps and an upstream fix could get us there
without added complexity in the CI itself.

Lastly, it is important to note that, should CI, despite all efforts to
the contrary, get stuck because a cache somehow contains corrupted data
causing us to fail builds, the GitHub UI allows for the deletion of
cache artifacts. If no caches are present, this feature is basically a
no-op from an initial build perspective (i.e., a full build will be
performed).

[0] https://github.com/actions/cache
[1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache
[2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy
[3] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#deleting-cache-entries
[4] libbpf/ci#73

Signed-off-by: Daniel Müller <deso@posteo.net>
danielocfb pushed a commit to danielocfb/vmtest that referenced this pull request Nov 9, 2022
This change introduces the means for performing incremental kernel
builds in CI in order to decrease overall CI turn-around times. We piggy
back (and rely) on the kernel's own make-based build infrastructure for
that purpose. Specifically, with libbpf/ci#67 we
have set the stage for separating build artifacts from source code. With
the change at hand we use this capability in conjunction with the
`actions/cache` [0] GitHub Actions action to share intermediate build
artifacts between CI runs, enabling the kernel build to only rebuild
what has changed, as opposed to everything unconditionally.

There are several wrinkles to doing so. The first one is that GitHub
Actions checks out the source code anew every time a workflow is run.
Because git does not store file meta data like time stamps, but the
kernel's build system relies solely on last-modified time stamps, we
need a way to approximate the source code time stamps from the previous
build. The way we do that is by first remembering both the time stamp
and the SHA-1 at which we built and carry that over to subsequent
builds. These subsequent builds then make sure to check out said SHA-1,
adjust the last-modified time stamps of the entire tree, and then check
out the code as it is meant to be tested by the very workflow run in
question.

Second, `actions/cache` imposes several restrictions on cached
artifacts: only artifacts produced on certain branches can actually be
used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact
limit per repository [2], and the API is questionable.
To circumvent the isolation-imposed requirements, we use the fact that
we are maintaining a base branch against which pull requests are created
to our advantage. Specifically, once this branch is updated (i.e.,
pushed to) we trigger a CI run which builds the kernel (already
potentially incrementally) and then stores intermediate build artifacts.
Pull requests created against this branch (i.e., every one created by
Kernel Patches Daemon) will be able to use those artifacts.

Doing so indirectly also addresses the space constraints: the packed
build artifacts are <1GiB in size. Because we only need to keep one set
of artifacts for each arch and toolchain combo (currently: three) we
don't need to worry about trashing (as a side note, these limitations
don't seem to be enforced, certainly not rigorously or in a timely
manner; I've had >34GiB of build artifacts cached for several hours
without an eviction happening).

Because of the questionable API surface, we have to jump through hoops
in order to prevent cache creation for regular pull requests: we remove
the KBUILD_OUTPUT contents in a separate step, which causes no cache to
be created.

As a result of this introduced infrastructure, we see a significant
decrease of the "Build Kernel Image" step, which is part of every CI run
(happening three times): on GitHub-hosted runners we get down from
somewhere in the vicinity of 20minutes to <2minutes. Note that these
results depend on what changes the patches in a series make: if a patch
modifies a header included by each .c file then the result will be close
to a full rebuild irrespective of this infrastructure. Conversely, if a
patch only updates documentation the incremental rebuild can be close to
instant. In practice, based on a weekend's worth of observations, we
seem to be ending up close to "doing almost nothing" almost all the time
than to doing a full rebuild.

Note that incremental builds are not bullet proof and to a somewhat
large degree we are at the mercy of them working correctly. However,
when we encounter a build failure we first attempt a 'make clean'
followed by a full rebuild [4]. This is expected to solve build related
problems in all conceivable cases. There is, however, the chance of
miscompilation or similar issues introduced as part of buggy incremental
recompilation logic. A brief survey within Meta's kernel group suggests
that generally developers rely on incremental builds and anecdotal
problems were always in the "build fails" realm. #dogfood

Note furthermore that selftest and sample builds are not covered by the
incremental build machinery. The main reason for that is that neither of
them honors KBUILD_OUTPUT -- a deficiency known to upstream Linux folks
but not yet addressed. We could work around that conceptually, but
payoff is less for these steps and an upstream fix could get us there
without added complexity in the CI itself.

Lastly, it is important to note that, should CI, despite all efforts to
the contrary, get stuck because a cache somehow contains corrupted data
causing us to fail builds, the GitHub UI allows for the deletion of
cache artifacts. If no caches are present, this feature is basically a
no-op from an initial build perspective (i.e., a full build will be
performed).

[0] https://github.com/actions/cache
[1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache
[2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy
[3] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#deleting-cache-entries
[4] libbpf/ci#73

Signed-off-by: Daniel Müller <deso@posteo.net>
chantra pushed a commit to chantra/kernel-patches-vmtest that referenced this pull request Nov 9, 2022
As of libbpf/ci#67 a bunch of actions honor
KBUILD_OUTPUT. Doing so will make it possible to separate source code
from build artifacts, which in turn may allow us to support incremental
kernel compilation in CI down the line.
Irrespective of these future changes, actions pertaining the kernel
build now ask for an additional input defining where to store or expect
build artifacts. Provide it.

Signed-off-by: Daniel Müller <deso@posteo.net>
danielocfb pushed a commit to danielocfb/vmtest that referenced this pull request Nov 9, 2022
This change introduces the means for performing incremental kernel
builds in CI in order to decrease overall CI turn-around times. We piggy
back (and rely) on the kernel's own make-based build infrastructure for
that purpose. Specifically, with libbpf/ci#67 we
have set the stage for separating build artifacts from source code. With
the change at hand we use this capability in conjunction with the
`actions/cache` [0] GitHub Actions action to share intermediate build
artifacts between CI runs, enabling the kernel build to only rebuild
what has changed, as opposed to everything unconditionally.

There are several wrinkles to doing so. The first one is that GitHub
Actions checks out the source code anew every time a workflow is run.
Because git does not store file meta data like time stamps, but the
kernel's build system relies solely on last-modified time stamps, we
need a way to approximate the source code time stamps from the previous
build. The way we do that is by first remembering both the time stamp
and the SHA-1 at which we built and carry that over to subsequent
builds. These subsequent builds then make sure to check out said SHA-1,
adjust the last-modified time stamps of the entire tree, and then check
out the code as it is meant to be tested by the very workflow run in
question.

Second, `actions/cache` imposes several restrictions on cached
artifacts: only artifacts produced on certain branches can actually be
used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact
limit per repository [2], and the API is questionable.
To circumvent the isolation-imposed requirements, we use the fact that
we are maintaining a base branch against which pull requests are created
to our advantage. Specifically, once this branch is updated (i.e.,
pushed to) we trigger a CI run which builds the kernel (already
potentially incrementally) and then stores intermediate build artifacts.
Pull requests created against this branch (i.e., every one created by
Kernel Patches Daemon) will be able to use those artifacts.

Doing so indirectly also addresses the space constraints: the packed
build artifacts are <1GiB in size. Because we only need to keep one set
of artifacts for each arch and toolchain combo (currently: three) we
don't need to worry about trashing (as a side note, these limitations
don't seem to be enforced, certainly not rigorously or in a timely
manner; I've had >34GiB of build artifacts cached for several hours
without an eviction happening).

Because of the questionable API surface, we have to jump through hoops
in order to prevent cache creation for regular pull requests: we remove
the KBUILD_OUTPUT contents in a separate step, which causes no cache to
be created.

As a result of this introduced infrastructure, we see a significant
decrease of the "Build Kernel Image" step, which is part of every CI run
(happening three times): on GitHub-hosted runners we get down from
somewhere in the vicinity of 20minutes to <2minutes. Note that these
results depend on what changes the patches in a series make: if a patch
modifies a header included by each .c file then the result will be close
to a full rebuild irrespective of this infrastructure. Conversely, if a
patch only updates documentation the incremental rebuild can be close to
instant. In practice, based on a weekend's worth of observations, we
seem to be ending up close to "doing almost nothing" almost all the time
than to doing a full rebuild.

Note that incremental builds are not bullet proof and to a somewhat
large degree we are at the mercy of them working correctly. However,
when we encounter a build failure we first attempt a 'make clean'
followed by a full rebuild [4]. This is expected to solve build related
problems in all conceivable cases. There is, however, the chance of
miscompilation or similar issues introduced as part of buggy incremental
recompilation logic. A brief survey within Meta's kernel group suggests
that generally developers rely on incremental builds and anecdotal
problems were always in the "build fails" realm. #dogfood

Note furthermore that selftest and sample builds are not covered by the
incremental build machinery. The main reason for that is that neither of
them honors KBUILD_OUTPUT -- a deficiency known to upstream Linux folks
but not yet addressed. We could work around that conceptually, but
payoff is less for these steps and an upstream fix could get us there
without added complexity in the CI itself.

Lastly, it is important to note that, should CI, despite all efforts to
the contrary, get stuck because a cache somehow contains corrupted data
causing us to fail builds, the GitHub UI allows for the deletion of
cache artifacts. If no caches are present, this feature is basically a
no-op from an initial build perspective (i.e., a full build will be
performed).

[0] https://github.com/actions/cache
[1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache
[2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy
[3] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#deleting-cache-entries
[4] libbpf/ci#73

Signed-off-by: Daniel Müller <deso@posteo.net>
danielocfb pushed a commit to kernel-patches/vmtest that referenced this pull request Nov 10, 2022
This change introduces the means for performing incremental kernel
builds in CI in order to decrease overall CI turn-around times. We piggy
back (and rely) on the kernel's own make-based build infrastructure for
that purpose. Specifically, with libbpf/ci#67 we
have set the stage for separating build artifacts from source code. With
the change at hand we use this capability in conjunction with the
`actions/cache` [0] GitHub Actions action to share intermediate build
artifacts between CI runs, enabling the kernel build to only rebuild
what has changed, as opposed to everything unconditionally.

There are several wrinkles to doing so. The first one is that GitHub
Actions checks out the source code anew every time a workflow is run.
Because git does not store file meta data like time stamps, but the
kernel's build system relies solely on last-modified time stamps, we
need a way to approximate the source code time stamps from the previous
build. The way we do that is by first remembering both the time stamp
and the SHA-1 at which we built and carry that over to subsequent
builds. These subsequent builds then make sure to check out said SHA-1,
adjust the last-modified time stamps of the entire tree, and then check
out the code as it is meant to be tested by the very workflow run in
question.

Second, `actions/cache` imposes several restrictions on cached
artifacts: only artifacts produced on certain branches can actually be
used (to achieve cache isolation) [1], there is a 10 GiB cache-artifact
limit per repository [2], and the API is questionable.
To circumvent the isolation-imposed requirements, we use the fact that
we are maintaining a base branch against which pull requests are created
to our advantage. Specifically, once this branch is updated (i.e.,
pushed to) we trigger a CI run which builds the kernel (already
potentially incrementally) and then stores intermediate build artifacts.
Pull requests created against this branch (i.e., every one created by
Kernel Patches Daemon) will be able to use those artifacts.

Doing so indirectly also addresses the space constraints: the packed
build artifacts are <1GiB in size. Because we only need to keep one set
of artifacts for each arch and toolchain combo (currently: three) we
don't need to worry about trashing (as a side note, these limitations
don't seem to be enforced, certainly not rigorously or in a timely
manner; I've had >34GiB of build artifacts cached for several hours
without an eviction happening).

Because of the questionable API surface, we have to jump through hoops
in order to prevent cache creation for regular pull requests: we remove
the KBUILD_OUTPUT contents in a separate step, which causes no cache to
be created.

As a result of this introduced infrastructure, we see a significant
decrease of the "Build Kernel Image" step, which is part of every CI run
(happening three times): on GitHub-hosted runners we get down from
somewhere in the vicinity of 20minutes to <2minutes. Note that these
results depend on what changes the patches in a series make: if a patch
modifies a header included by each .c file then the result will be close
to a full rebuild irrespective of this infrastructure. Conversely, if a
patch only updates documentation the incremental rebuild can be close to
instant. In practice, based on a weekend's worth of observations, we
seem to be ending up close to "doing almost nothing" almost all the time
than to doing a full rebuild.

Note that incremental builds are not bullet proof and to a somewhat
large degree we are at the mercy of them working correctly. However,
when we encounter a build failure we first attempt a 'make clean'
followed by a full rebuild [4]. This is expected to solve build related
problems in all conceivable cases. There is, however, the chance of
miscompilation or similar issues introduced as part of buggy incremental
recompilation logic. A brief survey within Meta's kernel group suggests
that generally developers rely on incremental builds and anecdotal
problems were always in the "build fails" realm. #dogfood

Note furthermore that selftest and sample builds are not covered by the
incremental build machinery. The main reason for that is that neither of
them honors KBUILD_OUTPUT -- a deficiency known to upstream Linux folks
but not yet addressed. We could work around that conceptually, but
payoff is less for these steps and an upstream fix could get us there
without added complexity in the CI itself.

Lastly, it is important to note that, should CI, despite all efforts to
the contrary, get stuck because a cache somehow contains corrupted data
causing us to fail builds, the GitHub UI allows for the deletion of
cache artifacts. If no caches are present, this feature is basically a
no-op from an initial build perspective (i.e., a full build will be
performed).

[0] https://github.com/actions/cache
[1] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache
[2] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy
[3] https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#deleting-cache-entries
[4] libbpf/ci#73

Signed-off-by: Daniel Müller <deso@posteo.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants