Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new(proposals): driver kernel testing framework #1131

Merged

Conversation

incertum
Copy link
Contributor

@incertum incertum commented Jun 1, 2023

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap-engine-udig

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

Formalize a driver kernel testing framework.
@falcosecurity/libs-maintainers

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
@incertum
Copy link
Contributor Author

incertum commented Jun 1, 2023

/milestone 0.12.0

@poiana poiana added this to the 0.12.0 milestone Jun 1, 2023

*Architectures*

Place higher priority on testing for `x86_64` compared to `aarch64`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhm aarch64 is every day more common so not sure we really want to have a preferred architecture...moreover we are also trying to support s390x architecture, the only issue is that we don't have runners :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, can tweak.

Perhaps another good discussion point: s390x is supported by libs but not officially by Falco. Maybe I can say that because of this initially we prioritize x86 and aarch64 tests, subject to change in future iterations of the test framework ...

@hbrueckner would you have ideas on how we can test s390x specifically for libs better? And ideally cut down the durations of s390x libs builds as well, they take a fairly long time atm. Thank you!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My recommendation is:

  • Falco officially supported architectures should be prioritized by their adoption level (ie. x86_64 is P0, aarch64 is P1)
  • just best effort (low priority) for any other arch supported by libs (ie. s390x is Nice-To-Have).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it, this way we clearly articulate priorities.

Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
@Andreagit97 Andreagit97 modified the milestones: 0.12.0, libs-backlog Jun 7, 2023
@poiana poiana added size/XL and removed size/L labels Jun 8, 2023
Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
@incertum incertum force-pushed the proposal-driver-kernel-testing-framework branch from e06e6b3 to 78992e3 Compare June 8, 2023 04:04
Co-authored-by: Leonardo Grasso <me@leonardograsso.com>
Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
@Andreagit97 Andreagit97 modified the milestones: libs-backlog, next-driver Jun 12, 2023

## Outlook

The following possibilities serve as an outlook for future enhancements. These potential improvements are anticipated after the release of Falco 0.36.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a possible future improvement, we could think about having the kernels grid dynamic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK, I'll work this in with the next cleanup commit as we are collecting feedback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxgio92 I have worked your suggestions in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you :)


Select the most appropriate compiler version and build container for the CI-integrated tests. Apart from the compiler version, the GLIBC version in the build container can also have an impact on the ability to compile the driver for a given kernel.

> The expanded CI tests may necessitate the use of approximately 30 low-resource virtual machines (VMs) that run continuously 24/7. These VMs would be distributed across multiple third-party cloud providers. To adequately cover the condensed kernel test grid, it is estimated that up to 70 test runs would be required for each testing cycle. These tests can be launched using GitHub workflows leveraging SSH remote commands. The test results are then retrieved through this method as well. Initially, it would be logical to support these tests on demand only to avoid simultaneous runs that may try to access the same VM at the same time. In addition to the test VMs, it may be necessary to expand the CI workflows in terms of builder containers.
Copy link
Member

@maxgio92 maxgio92 Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider also to spin up VMs with KVM on hosted GitHub runners, or with something like actuated (on microVMs).

@maxgio92
Copy link
Member

Love this proposal, @incertum, thank you for this great work!

@TheFoxAtWork
Copy link

Love this! Thank you so much for pulling this together!

Co-authored-by: Massimiliano Giovagnoli <me@maxgio.it>
Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
@incertum
Copy link
Contributor Author

Love this! Thank you so much for pulling this together!

Thank you @TheFoxAtWork!

@incertum incertum changed the title wip: new(proposals): driver kernel testing framework new(proposals): driver kernel testing framework Jun 14, 2023
@incertum
Copy link
Contributor Author

@falcosecurity/libs-maintainers I have moved this PR out of WIP, it is now ready for final review. Thanks in advance!

Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments; LGTM, thank you @incertum ! Great work!

A minor question: are we talking about adding nightly tests? I think we do not want to run these tests on every PR/master change.
Perhaps weekly tests can be enough too.
Thank you again btw!


First, let's clarify a few definitions and provide further context.

- `kernel versions`: In the context of the testing framework, kernel versions refer to changes in the major and minor version of the kernel (e.g., 5.15 or 6.4). These version changes are specifically relevant for testing the Falco drivers, with a particular emphasis on testing with Long-Term Support (LTS) releases.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd use kernel releases here; that's what we use elsewhere, and what test-infra dbg, driverkit and kernel-crawler use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to give it a spin, WDYT?

<details>
<summary>Build kernel drivers</summary>
<ul>
<li>Latest Archlinux kernel to spot possible incompatibilities with the latest kernel tree changes</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also nightly-test the build against latest mainline kernel now: https://github.com/falcosecurity/libs/blob/master/.github/workflows/latest-kernel.yml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) adjusted and slick btw!

<li>linux-3.10</li>
<li>linux-4.18</li>
<li>linux-5.19</li>
<li>linux-6.2</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these are on x86_64.
I am looking into adding same tests for arm64 too.

Co-authored-by: Federico Di Pierro <nierro92@gmail.com>
Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
@incertum
Copy link
Contributor Author

A minor question: are we talking about adding nightly tests? I think we do not want to run these tests on every PR/master change. Perhaps weekly tests can be enough too. Thank you again btw!

where I mentioned it to be "on demand" initially I added "(such as nightly tests)"

@FedeDP
Copy link
Contributor

FedeDP commented Jun 15, 2023

Thanks for the clarification about the on demand term!

@incertum
Copy link
Contributor Author

What needs to happen for us to move forward with this?

@incertum
Copy link
Contributor Author

What needs to happen for us to move forward with this?

@TheFoxAtWork as a first step as suggested we created a new dedicated CNCF Service Desk ticket https://cncfservicedesk.atlassian.net/servicedesk/customer/portal/1/CNCFSD-1822. Thank you!

Copy link
Member

@leogr leogr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@incertum Thank you so much for this amazing proposal. 🙏

LGTM, just left a minor suggestion.

I'll give it a second look early next week then I will give my final approval.

proposals/20230530-driver-kernel-testing-framework.md Outdated Show resolved Hide resolved
Co-authored-by: Leonardo Grasso <me@leonardograsso.com>
Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Copy link
Member

@leogr leogr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@poiana
Copy link
Contributor

poiana commented Jun 27, 2023

LGTM label has been added.

Git tree hash: d98ddee4f8accbc223b8c4c0b7eaef6d92b873e9


### Test Category 2

Verifying that the kernel driver can load, run, and capture events without errors. This is determined through [scap-open](https://github.com/falcosecurity/libs/tree/master/userspace/libscap/examples/01-open) and unit tests conducted in virtual machine (VM) environments. In essence, when we mention that the "driver loads and runs", it implies that the scap-open counter for captured events during a test run is positive and that the [drivers_test](https://github.com/falcosecurity/libs/tree/master/test/drivers) unit tests pass. The latter tests not only load the driver live but also simulate syscall events and verify that the expected information is extracted from the kernel tracepoint and retrieved by the libscap driver type-specific engine in userspace.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run is positive and that the drivers_test unit tests

Just one minor note about this. We know that on older kernel versions ~<5.0 driver_tests could fail due to conflicting events captured. Let me explain better, is possible that while we search for a close event spawned by our test we face another clone event generated by the system causing a test failure (as far as i saw this is very frequent when we use tracepoints instead of raw_traceoints in the old probe, so in kernels between 4.14 and 4.17) ... so probably we won't be able to rely on the drivers_test on all machines as a source of truth. We will try to fix driver_tests to avoid these failures but in the meanwhile, I would consider scap-open as the only source of truth and use the drivers_tests as a best-effort check. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Andreagit97 I agree, driver_tests should be best effort at first and we can stabilize it subsequently.

Copy link
Member

@Andreagit97 Andreagit97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

I left just one comment on the driver_tests part. Please unhold if you are fine with this :)
Thank you for the amazing job!

/hold

@poiana
Copy link
Contributor

poiana commented Jun 29, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Andreagit97, incertum, leogr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Andreagit97,incertum,leogr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@incertum
Copy link
Contributor Author

/unhold

@poiana poiana merged commit 3da143a into falcosecurity:master Jun 29, 2023
@leogr leogr modified the milestones: next-driver, 0.12.0 Jul 5, 2023
@incertum incertum deleted the proposal-driver-kernel-testing-framework branch December 8, 2023 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants