new(proposals): driver kernel testing framework #1131

incertum · 2023-06-01T00:09:56Z

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap-engine-udig

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

Formalize a driver kernel testing framework.
@falcosecurity/libs-maintainers

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

incertum · 2023-06-01T00:10:43Z

/milestone 0.12.0

Andreagit97 · 2023-06-01T16:26:00Z

proposals/20230530-driver-kernel-testing-framework.md

+
+*Architectures*
+
+Place higher priority on testing for `x86_64` compared to `aarch64`.


uhm aarch64 is every day more common so not sure we really want to have a preferred architecture...moreover we are also trying to support s390x architecture, the only issue is that we don't have runners :/

Fair point, can tweak.

Perhaps another good discussion point: s390x is supported by libs but not officially by Falco. Maybe I can say that because of this initially we prioritize x86 and aarch64 tests, subject to change in future iterations of the test framework ...

@hbrueckner would you have ideas on how we can test s390x specifically for libs better? And ideally cut down the durations of s390x libs builds as well, they take a fairly long time atm. Thank you!

My recommendation is:

Falco officially supported architectures should be prioritized by their adoption level (ie. x86_64 is P0, aarch64 is P1)

just best effort (low priority) for any other arch supported by libs (ie. s390x is Nice-To-Have).

Love it, this way we clearly articulate priorities.

Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

Co-authored-by: Leonardo Grasso <me@leonardograsso.com> Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

maxgio92 · 2023-06-13T18:53:01Z

proposals/20230530-driver-kernel-testing-framework.md

+
+## Outlook
+
+The following possibilities serve as an outlook for future enhancements. These potential improvements are anticipated after the release of Falco 0.36.


Just as a possible future improvement, we could think about having the kernels grid dynamic.

ACK, I'll work this in with the next cleanup commit as we are collecting feedback.

@maxgio92 I have worked your suggestions in.

Thank you :)

maxgio92 · 2023-06-13T19:05:24Z

proposals/20230530-driver-kernel-testing-framework.md

+
+Select the most appropriate compiler version and build container for the CI-integrated tests. Apart from the compiler version, the GLIBC version in the build container can also have an impact on the ability to compile the driver for a given kernel.
+
+> The expanded CI tests may necessitate the use of approximately 30 low-resource virtual machines (VMs) that run continuously 24/7. These VMs would be distributed across multiple third-party cloud providers. To adequately cover the condensed kernel test grid, it is estimated that up to 70 test runs would be required for each testing cycle. These tests can be launched using GitHub workflows leveraging SSH remote commands. The test results are then retrieved through this method as well. Initially, it would be logical to support these tests on demand only to avoid simultaneous runs that may try to access the same VM at the same time. In addition to the test VMs, it may be necessary to expand the CI workflows in terms of builder containers.


We could consider also to spin up VMs with KVM on hosted GitHub runners, or with something like actuated (on microVMs).

maxgio92 · 2023-06-13T19:08:32Z

Love this proposal, @incertum, thank you for this great work!

TheFoxAtWork · 2023-06-14T14:24:03Z

Love this! Thank you so much for pulling this together!

Co-authored-by: Massimiliano Giovagnoli <me@maxgio.it> Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

incertum · 2023-06-14T17:14:49Z

Love this! Thank you so much for pulling this together!

Thank you @TheFoxAtWork!

incertum · 2023-06-14T17:16:55Z

@falcosecurity/libs-maintainers I have moved this PR out of WIP, it is now ready for final review. Thanks in advance!

FedeDP

Left some comments; LGTM, thank you @incertum ! Great work!

A minor question: are we talking about adding nightly tests? I think we do not want to run these tests on every PR/master change.
Perhaps weekly tests can be enough too.
Thank you again btw!

FedeDP · 2023-06-15T09:41:22Z

proposals/20230530-driver-kernel-testing-framework.md

+
+First, let's clarify a few definitions and provide further context.
+
+- `kernel versions`: In the context of the testing framework, kernel versions refer to changes in the major and minor version of the kernel (e.g., 5.15 or 6.4). These version changes are specifically relevant for testing the Falco drivers, with a particular emphasis on testing with Long-Term Support (LTS) releases.


i'd use kernel releases here; that's what we use elsewhere, and what test-infra dbg, driverkit and kernel-crawler use.

Tried to give it a spin, WDYT?

FedeDP · 2023-06-15T09:42:11Z

proposals/20230530-driver-kernel-testing-framework.md

+<details>
+    <summary>Build kernel drivers</summary>
+		<ul>
+			<li>Latest Archlinux kernel to spot possible incompatibilities with the latest kernel tree changes</li>


We also nightly-test the build against latest mainline kernel now: https://github.com/falcosecurity/libs/blob/master/.github/workflows/latest-kernel.yml

:) adjusted and slick btw!

FedeDP · 2023-06-15T09:42:39Z

proposals/20230530-driver-kernel-testing-framework.md

+			<li>linux-3.10</li>
+			<li>linux-4.18</li>
+			<li>linux-5.19</li>
+			<li>linux-6.2</li>


All of these are on x86_64.
I am looking into adding same tests for arm64 too.

Co-authored-by: Federico Di Pierro <nierro92@gmail.com> Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

incertum · 2023-06-15T15:33:07Z

A minor question: are we talking about adding nightly tests? I think we do not want to run these tests on every PR/master change. Perhaps weekly tests can be enough too. Thank you again btw!

where I mentioned it to be "on demand" initially I added "(such as nightly tests)"

FedeDP · 2023-06-15T15:37:48Z

Thanks for the clarification about the on demand term!

incertum · 2023-06-20T20:30:48Z

What needs to happen for us to move forward with this?

incertum · 2023-06-21T18:32:59Z

What needs to happen for us to move forward with this?

@TheFoxAtWork as a first step as suggested we created a new dedicated CNCF Service Desk ticket https://cncfservicedesk.atlassian.net/servicedesk/customer/portal/1/CNCFSD-1822. Thank you!

leogr

@incertum Thank you so much for this amazing proposal. 🙏

LGTM, just left a minor suggestion.

I'll give it a second look early next week then I will give my final approval.

proposals/20230530-driver-kernel-testing-framework.md

Co-authored-by: Leonardo Grasso <me@leonardograsso.com> Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

leogr

LGTM 👍

poiana · 2023-06-27T09:14:44Z

LGTM label has been added.

Git tree hash: d98ddee4f8accbc223b8c4c0b7eaef6d92b873e9

Andreagit97 · 2023-06-29T09:30:58Z

proposals/20230530-driver-kernel-testing-framework.md

+
+### Test Category 2
+
+Verifying that the kernel driver can load, run, and capture events without errors. This is determined through [scap-open](https://github.com/falcosecurity/libs/tree/master/userspace/libscap/examples/01-open) and unit tests conducted in virtual machine (VM) environments. In essence, when we mention that the "driver loads and runs", it implies that the scap-open counter for captured events during a test run is positive and that the [drivers_test](https://github.com/falcosecurity/libs/tree/master/test/drivers) unit tests pass. The latter tests not only load the driver live but also simulate syscall events and verify that the expected information is extracted from the kernel tracepoint and retrieved by the libscap driver type-specific engine in userspace.


run is positive and that the drivers_test unit tests

Just one minor note about this. We know that on older kernel versions ~<5.0 driver_tests could fail due to conflicting events captured. Let me explain better, is possible that while we search for a close event spawned by our test we face another clone event generated by the system causing a test failure (as far as i saw this is very frequent when we use tracepoints instead of raw_traceoints in the old probe, so in kernels between 4.14 and 4.17) ... so probably we won't be able to rely on the drivers_test on all machines as a source of truth. We will try to fix driver_tests to avoid these failures but in the meanwhile, I would consider scap-open as the only source of truth and use the drivers_tests as a best-effort check. WDYT?

Thanks @Andreagit97 I agree, driver_tests should be best effort at first and we can stabilize it subsequently.

Andreagit97

/approve

I left just one comment on the driver_tests part. Please unhold if you are fine with this :)
Thank you for the amazing job!

/hold

poiana · 2023-06-29T09:33:01Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Andreagit97, incertum, leogr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Andreagit97,incertum,leogr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

incertum · 2023-06-29T15:02:46Z

/unhold

new(proposals): driver kernel testing framework

896e3ce

Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

poiana added do-not-merge/work-in-progress kind/design release-note-none dco-signoff: yes area/proposals labels Jun 1, 2023

poiana requested review from Andreagit97 and hbrueckner June 1, 2023 00:10

poiana added approved size/L labels Jun 1, 2023

poiana added this to the 0.12.0 milestone Jun 1, 2023

Andreagit97 reviewed Jun 1, 2023

View reviewed changes

cleanup(proposals): add example kernel test grid

ce88b2d

Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

Andreagit97 modified the milestones: 0.12.0, libs-backlog Jun 7, 2023

poiana added size/XL and removed size/L labels Jun 8, 2023

cleanup(proposals): formatting and editing

78992e3

Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

incertum force-pushed the proposal-driver-kernel-testing-framework branch from e06e6b3 to 78992e3 Compare June 8, 2023 04:04

cleanup(proposals): tweak supported architectures section

014fb89

Co-authored-by: Leonardo Grasso <me@leonardograsso.com> Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

Andreagit97 modified the milestones: libs-backlog, next-driver Jun 12, 2023

maxgio92 reviewed Jun 13, 2023

View reviewed changes

cleanup(proposals): minor enhancements

89ce1ad

Co-authored-by: Massimiliano Giovagnoli <me@maxgio.it> Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

incertum changed the title ~~wip: new(proposals): driver kernel testing framework~~ new(proposals): driver kernel testing framework Jun 14, 2023

poiana removed the do-not-merge/work-in-progress label Jun 14, 2023

FedeDP reviewed Jun 15, 2023

View reviewed changes

cleanup(proposals): minor technical clarifications

118c70d

Co-authored-by: Federico Di Pierro <nierro92@gmail.com> Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

incertum mentioned this pull request Jun 20, 2023

[UMBRELLA] Falco collaboration with CNCF tag-env-sustainability falcosecurity/falco#2435

Open

leogr reviewed Jun 23, 2023

View reviewed changes

proposals/20230530-driver-kernel-testing-framework.md Outdated Show resolved Hide resolved

cleanup(proposals): adjust references

de4df0f

Co-authored-by: Leonardo Grasso <me@leonardograsso.com> Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>

leogr approved these changes Jun 27, 2023

View reviewed changes

poiana assigned leogr Jun 27, 2023

poiana added the lgtm label Jun 27, 2023

Andreagit97 reviewed Jun 29, 2023

View reviewed changes

Andreagit97 approved these changes Jun 29, 2023

View reviewed changes

poiana added the do-not-merge/hold label Jun 29, 2023

poiana assigned Andreagit97 Jun 29, 2023

poiana removed the do-not-merge/hold label Jun 29, 2023

poiana merged commit 3da143a into falcosecurity:master Jun 29, 2023

leogr modified the milestones: next-driver, 0.12.0 Jul 5, 2023

leogr mentioned this pull request Jul 5, 2023

Falco Graduation Path: Progress Tracker falcosecurity/evolution#281

Closed

incertum deleted the proposal-driver-kernel-testing-framework branch December 8, 2023 20:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new(proposals): driver kernel testing framework #1131

new(proposals): driver kernel testing framework #1131

incertum commented Jun 1, 2023

incertum commented Jun 1, 2023

Andreagit97 Jun 1, 2023

incertum Jun 1, 2023

leogr Jun 9, 2023

incertum Jun 9, 2023

maxgio92 Jun 13, 2023

incertum Jun 13, 2023

incertum Jun 14, 2023

maxgio92 Jun 15, 2023

maxgio92 Jun 13, 2023 •

edited

Loading

maxgio92 commented Jun 13, 2023

TheFoxAtWork commented Jun 14, 2023

incertum commented Jun 14, 2023

incertum commented Jun 14, 2023

FedeDP left a comment

FedeDP Jun 15, 2023

incertum Jun 15, 2023

FedeDP Jun 15, 2023

incertum Jun 15, 2023

FedeDP Jun 15, 2023

incertum commented Jun 15, 2023

FedeDP commented Jun 15, 2023

incertum commented Jun 20, 2023

incertum commented Jun 21, 2023

leogr left a comment

leogr left a comment

poiana commented Jun 27, 2023

Andreagit97 Jun 29, 2023

incertum Jun 29, 2023

Andreagit97 left a comment

poiana commented Jun 29, 2023

incertum commented Jun 29, 2023


		Architectures

		Place higher priority on testing for `x86_64` compared to `aarch64`.


		## Outlook

		The following possibilities serve as an outlook for future enhancements. These potential improvements are anticipated after the release of Falco 0.36.


		Select the most appropriate compiler version and build container for the CI-integrated tests. Apart from the compiler version, the GLIBC version in the build container can also have an impact on the ability to compile the driver for a given kernel.

		> The expanded CI tests may necessitate the use of approximately 30 low-resource virtual machines (VMs) that run continuously 24/7. These VMs would be distributed across multiple third-party cloud providers. To adequately cover the condensed kernel test grid, it is estimated that up to 70 test runs would be required for each testing cycle. These tests can be launched using GitHub workflows leveraging SSH remote commands. The test results are then retrieved through this method as well. Initially, it would be logical to support these tests on demand only to avoid simultaneous runs that may try to access the same VM at the same time. In addition to the test VMs, it may be necessary to expand the CI workflows in terms of builder containers.


		First, let's clarify a few definitions and provide further context.

		- `kernel versions`: In the context of the testing framework, kernel versions refer to changes in the major and minor version of the kernel (e.g., 5.15 or 6.4). These version changes are specifically relevant for testing the Falco drivers, with a particular emphasis on testing with Long-Term Support (LTS) releases.


		### Test Category 2

		Verifying that the kernel driver can load, run, and capture events without errors. This is determined through [scap-open](https://github.com/falcosecurity/libs/tree/master/userspace/libscap/examples/01-open) and unit tests conducted in virtual machine (VM) environments. In essence, when we mention that the "driver loads and runs", it implies that the scap-open counter for captured events during a test run is positive and that the [drivers_test](https://github.com/falcosecurity/libs/tree/master/test/drivers) unit tests pass. The latter tests not only load the driver live but also simulate syscall events and verify that the expected information is extracted from the kernel tracepoint and retrieved by the libscap driver type-specific engine in userspace.

new(proposals): driver kernel testing framework #1131

new(proposals): driver kernel testing framework #1131

Conversation

incertum commented Jun 1, 2023

incertum commented Jun 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxgio92 Jun 13, 2023 • edited Loading

Choose a reason for hiding this comment

maxgio92 commented Jun 13, 2023

TheFoxAtWork commented Jun 14, 2023

incertum commented Jun 14, 2023

incertum commented Jun 14, 2023

FedeDP left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

incertum commented Jun 15, 2023

FedeDP commented Jun 15, 2023

incertum commented Jun 20, 2023

incertum commented Jun 21, 2023

leogr left a comment

Choose a reason for hiding this comment

leogr left a comment

Choose a reason for hiding this comment

poiana commented Jun 27, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Andreagit97 left a comment

Choose a reason for hiding this comment

poiana commented Jun 29, 2023

incertum commented Jun 29, 2023

maxgio92 Jun 13, 2023 •

edited

Loading