Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(cri): fast-track add containers to cache (synchronous lookup case, Falco run w/ --disable-cri-async) #1595

Merged
merged 4 commits into from
Feb 28, 2024

Conversation

incertum
Copy link
Contributor

@incertum incertum commented Dec 20, 2023

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

Fast-track addition of enriched containers (fields successfully retrieved from the container runtime socket) to the container cache, bypassing the round-trip process:

`source_callback` -> `notify_new_container` ->
`container_to_sinsp_event(container_to_json(container_info), ...)` ->
`parse_container_json_evt` -> `m_inspector->m_container_manager.add_container()`

Although we still re-add the container in parse_container_json_evt to also support native 'container' events, it introduces an avoidable delay in the incoming syscall event stream. Syscall events do not explicitly require container events and instead directly retrieve container details from the container cache. This behavior could potentially contribute to the issues noted by adopters, such as the absence of container images in syscall events.

Part of #1589

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

* events and instead directly retrieve container details from the container cache. This behavior could potentially
* contribute to the issues noted by adopters, such as the absence of container images in syscall events.
*/
cache->add_container(std::make_shared<sinsp_container_info>(result), nullptr);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should introduce a new function as this one also loops over callbacks. In general, I dislike how certain functions are overloaded in the container engine.

CC @gnosek ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using existing replace_container instead. This should be safe and do what we want.
@mstemm would you have additional info on the callback or the general flow here? Eager to understand subtleties better. Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we even have a synchronous lookup option. That would surely lead to event drops wouldn't it? That's why we made the lookups asynchronous in the first place.

If you're asking how the asynchronous lookup flow works, you got the general steps right:

  1. The container engine does an asynchronous lookup of container metadata.
  2. The container metadata is packed into a CONTAINER_JSON event.
  3. The event is enqueued to the inspector by the container engine thread.
  4. On the next call to sinsp::next() by the event processing thread, the enqueued event is read and parsed. Parsing the event adds the container info to libsinsp.

I think step 2 is probably more confusing than it needs to be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The async is not working well in production, you have many events with missing container enrichment. My follow up question is that synchronous still isn't truly synchronous? We do the CRI API calls and parsing synchronously but then we don't add the container right away to the cache and still do the container event round trip before we add the container to the container cache. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought all the container engine code lookups were done in a separate thread, using the mechanisms in https://github.com/falcosecurity/libs/blob/master/userspace/libsinsp/container_engine/container_async_source.tpp. The reason for creating the CONTAINER_JSON event and enqueuing it to the inspector is because it's a separate thread and we can't directly access the libsinsp state, due to no locking/mutexes at all for those data structures.

There may be bugs in the async lookup path, and I agree that the current code is unnecessarily confusing, but you definitely don't want to go back to synchronous lookups as a solution for that. Some container engines had a delay of up to 500ms before container info were available for a new container, and stalling the event loop was definitely causing dropped syscalls while the synchronous lookup was underway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no one else can or wants to sure I can help. In this case since we talk about the container engine "master of disaster" seems more accurate.

I think as a first step we should start with storing the container before we convert to JSON in any case. That way we don't break the trace files and delays don't matter for the native container events anyways, but storing the container as fast as possible does matter a lot for the syscalls event stream.

After that we can rework the more general flow.

Sounds reasonable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if the container cache is protected across threads it makes sense to add it immediately and then create the CONTAINER_JSON event afterward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok one more question: what do the callbacks do or can do? Was there a reason the callback should only be called from the enqueued container event?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you mean the on_new_container callback? I believe it was a general mechanism for other parts of libsinsp to be notified when there is a new container. And for this, it's used to pass back control to from the container engines to libsinsp (I think, I haven't looked at the code in a while).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, sounds like no particular reason why they were invoked from the enqueued event right? Hence we could upfront them as well, I think other container engines do that. I will look into it.

@incertum
Copy link
Contributor Author

Maybe

/milestone 0.15.0

@poiana poiana added this to the 0.15.0 milestone Dec 20, 2023
@incertum incertum force-pushed the refactor/cri-sync-add-container-fasttrack branch 2 times, most recently from f685d0e to e6895e3 Compare December 21, 2023 20:15
@incertum incertum changed the title refactor(cri): fast-track add containers to cache (synchronous lookup case) wip: refactor(cri): fast-track add containers to cache (synchronous lookup case) Jan 8, 2024
@incertum
Copy link
Contributor Author

incertum commented Jan 8, 2024

I need to do another re-deploy with some preview patches from https://github.com/falcosecurity/libs/pull/1600/files#diff-8c61b5c69dd08557db645a44eed01006cb658f4099c0339262fa7ed5ff567e35R732 to get to the bottom of what causes missing container images.

In general, I will expand the scope of this PR a bit once I have better data around what the actual problem is, related to #291

@incertum incertum force-pushed the refactor/cri-sync-add-container-fasttrack branch from e6895e3 to ef937ec Compare January 23, 2024 04:54
@incertum incertum changed the title wip: refactor(cri): fast-track add containers to cache (synchronous lookup case) refactor(cri): fast-track add containers to cache (synchronous lookup case) Jan 23, 2024
userspace/libsinsp/container_engine/cri.cpp Show resolved Hide resolved
userspace/libsinsp/container_engine/cri.cpp Show resolved Hide resolved
@@ -5083,22 +5083,11 @@ void sinsp_parser::parse_container_json_evt(sinsp_evt *evt)
const Json::Value& lookup_state = container["lookup_state"];
if(check_json_val_is_convertible(lookup_state, Json::uintValue, "lookup_state"))
{
container_info->set_lookup_status(static_cast<sinsp_container_lookup::state>(lookup_state.asUInt()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not removed just moved to the bottom with updated logic.

userspace/libsinsp/parsers.cpp Outdated Show resolved Hide resolved
@incertum incertum force-pushed the refactor/cri-sync-add-container-fasttrack branch from ef937ec to cba3555 Compare February 11, 2024 02:41
@leogr
Copy link
Member

leogr commented Feb 13, 2024

/assign

mstemm
mstemm previously approved these changes Feb 13, 2024
Copy link
Contributor

@mstemm mstemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look okay to me. To be clear, it seems that the big change in the PR is that when doing synchronous container lookups (I guess this might be at startup or other uncommon cases), that the container info is added to the cache immediately instead of creating a fake event and then immediately parsing the event.

@poiana
Copy link
Contributor

poiana commented Feb 13, 2024

LGTM label has been added.

Git tree hash: a0f3fec8db9788ce489e0f703ec6c1068af11806

@therealbobo
Copy link
Contributor

I took a look at the failing test: given that we changed the behaviour of the cri runtime we have to change the test. Now we should send a different json to sinsp (e.g. {"value":1,container:{"image":"test_image"}}). This is due to the fact that, with this paths, we will filter out all the events that are not owned by a pod sandbox or have no image. cc @gnosek @leogr

Comment on lines 5235 to 5246
// note: The container image is the most crucial field from a security incident response perspective.
// We aim to raise the bar for successful container lookups. Conversely, pod sandboxes do not include
// a container image in the API response.
if (!container_info->m_image.empty() || container_info->is_pod_sandbox())
{
// Marked as SUCCESSFUL if and only if the container image field present or
// if it's a pod sandbox container
container_info->set_lookup_status(sinsp_container_lookup::state::SUCCESSFUL);
} else
{
container_info->set_lookup_status(sinsp_container_lookup::state::FAILED);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I disagree. This is hardcoding your policy ("all containers must have an image name"), which looks sensible but https://xkcd.com/1172/ .

How is the FAILED state an improvement here? ->this<- is the metadata we have, if it contains an empty image name, so be it, no reason to pretend all the other metadata does not exist. Given how fluid the CRI spec seems to be, I can totally imagine some-new-fancy-wrapper-over-runc run sha256:abcdef to not expose the image name at all (since we don't have it), but still, e.g. the labels might be valuable.

Not to mention, this is generic code, for all container engines, some of which may not even have a notion of image names (I'm not sure e.g. LXC does). If I am in the minority and we decide to change the policy to require an image name, please at the very least place it directly in the container engine(s).

I understand you need the image name for anything interesting, but the real fix would be to make metadata queries fast and reliable (which we know is impossible in all cases but we can keep trying). This is just shooting the messenger: given how the docker/cri engines are written, if you get a success without an image name, I'd assume we have not received an image name from the API and there's nothing we can do about it. OK, maybe retrying the query a bit later might help (doubt it but stranger things have happened), but that's entirely the business of a particular engine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @gnosek, I respectfully disagree. Here's how I feel: I'm simply trying to ensure the tool functions effectively in real-life production environments. However, it often encounters issues with the container engine, which leads security analysts to conclude that the tool isn't working properly. It seems that this harsh reality isn't receiving much attention from anyone around here. Therefore, all I'm attempting to do is to make improvements to address these issues.

As of today, we still add it to the cache even if it failed. However, now it's marked as not successful, which lays the foundation for future improvements.

Are you making guesses about future changes? I am fighting today's fire-drills.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, how does this chunk help? I'm not opposing the patch in general. I like the idea, I just still need to look at the implementation. I just can't imagine this particular bit doing anything good. I'm not even entirely sure it has an observable effect inside Falco (the callbacks are a different story, they'd be sad).

Combined with the other suggestion (of dropping FAILED lookups from the cache), it would keep retrying the lookups until one did return an image name, burning a ton of CPU in the process (both in Falco and the container runtime).

In the harsh reality I am so carefully ignoring, have you observed an API query that:

  • returned metadata other than image name initially (i.e. not just raised an error)
  • returned the image name too later on?

Like I said in the other comment, I'm all for marking the metadata for a container as "yeah, we got something but not everything we'd like" and retrying later (for some definition of later, probably other than "the next syscall"), but that has to be done in an explicit manner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jumping in late in this discussion. From my high-level understanding, this seem we're deciding between:

  • Discarding all partially-incomplete metadata in case the container image is not available and force a new fetch attempt later on thus consuming extra CPU in the hope of getting more complete data
  • Consider the fetch successful even in case some data (such as container image) is missing

Both come with evident pros and cons. If we struggle finding an agreement, to me this feels like the kind of behavior that should be configurable and not be enforced on any of the libs adopters. Of course I agree that security-related use cases have top-priority overall for the Falco project, but I also see that the libs themselves are an arsenal of system inspection tools that can be the foundation of different projects and products as well. To me, this feels like something we could negotiate as a "retry policy" new configuration point. Just my two cents, I trust both of you with your own expertise and evaluations.

Not to mention, this is generic code, for all container engines, some of which may not even have a notion of image names (I'm not sure e.g. LXC does). If I am in the minority and we decide to change the policy to require an image name, please at the very least place it directly in the container engine(s).

This one I'm very onboard with -- I think this should not affect the behavior of other container runtimes.

retrying later (for some definition of later, probably other than "the next syscall"), but that has to be done in an explicit manner.

Agree also on this, this "retry policy" definitely has to happen in a controlled or throttled way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we have the new ticket #1708, let's hold off on any changes in this regard in the parser, and someone shall refactor it in the future. Please be aware of #1707. It's incorrect to initialize a lookup to SUCCESSFUL, by the way. I believe this is the strongest indication of a very suboptimal shared design across the container engines we support.

@incertum incertum force-pushed the refactor/cri-sync-add-container-fasttrack branch 2 times, most recently from f3ac0ac to ba164f1 Compare February 26, 2024 18:13
@incertum
Copy link
Contributor Author

Needed another rebase.

@incertum
Copy link
Contributor Author

hmmm wanting to re-push the last commit, but GitHub is acting up
/hold

It needs to be

                               result.set_lookup_status(sinsp_container_lookup::state::STARTED);
				// note: The cache should not have SUCCESSFUL as lookup status at this point, else `parse_container_json_evt` would wrongly exit early.
				cache->replace_container(std::make_shared<sinsp_container_info>(result));
}
// note: On the other hand `parse_container_json_evt` expects SUCCESSFUL as lookup state for the incoming container event /
// the not yet cached container, this was previously done within `lookup_sync`.
result.set_lookup_status(sinsp_container_lookup::state::SUCCESSFUL);

@incertum incertum force-pushed the refactor/cri-sync-add-container-fasttrack branch from a0ccfdc to 1439f01 Compare February 26, 2024 18:55
incertum and others added 3 commits February 26, 2024 18:57
… case)

Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
* Now more precise, logical and clean, set STARTED at the beginning
and SUCCESSFUL at the end of `parse_container_json_evt` if and only if
the container image was successfully retrieved and present before calling
`add_container` or if it's a pod sanbox container.
* In `lookup_sync` we previously set the lookup_status to SUCCESSFUL, which can be
avoided to have a more consolidated and clean behavior in `parse_container_json_evt`.

Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
…tainer sync lookups

Co-authored-by: Roberto Scolaro <roberto.scolaro21@gmail.com>
Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
@incertum incertum force-pushed the refactor/cri-sync-add-container-fasttrack branch 2 times, most recently from ba164f1 to a0ccfdc Compare February 26, 2024 18:58
@incertum
Copy link
Contributor Author

/unhold

@therealbobo mind checking the last commit to ensure it's the exact same old code flow. It seems that we need to change the lookup status again to SUCCESSFUL for the further handling of the new container event (the non cached instance), see #1595 (comment)

@incertum incertum force-pushed the refactor/cri-sync-add-container-fasttrack branch from ff4bc4a to 3f2dfaf Compare February 26, 2024 19:04
…ting state updates in lookup_sync are preserved

Signed-off-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
@incertum incertum force-pushed the refactor/cri-sync-add-container-fasttrack branch from 3f2dfaf to 9646c87 Compare February 26, 2024 19:07
Copy link
Member

@leogr leogr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As privately discussed, I'm ok with this change. Any other improvement may come in follow-up PRs.

@poiana poiana added the lgtm label Feb 28, 2024
@poiana
Copy link
Contributor

poiana commented Feb 28, 2024

LGTM label has been added.

Git tree hash: d878d485880b017d5b47b87311c4765b6147fc3a

@poiana
Copy link
Contributor

poiana commented Feb 28, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: incertum, leogr, LucaGuerra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [LucaGuerra,incertum,leogr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana merged commit 3aad411 into master Feb 28, 2024
41 checks passed
@poiana poiana deleted the refactor/cri-sync-add-container-fasttrack branch February 28, 2024 17:11
@incertum
Copy link
Contributor Author

Thanks and FYI I am tracking overall / general progress here: falcosecurity/falco#2708.

@incertum incertum changed the title refactor(cri): fast-track add containers to cache (synchronous lookup case) refactor(cri): fast-track add containers to cache (synchronous lookup case, Falco run w/ --disable-cri-async) Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants