Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve falcosecurity/libs#932, use /proc/stat/ btime for boot time and /proc/1/cmdline for container start time #1003

Merged
merged 3 commits into from
Mar 28, 2023

Conversation

happy-dude
Copy link
Contributor

@happy-dude happy-dude commented Mar 23, 2023

See #932 and #1003 (comment) for more context

Previous:

  • change time of stat /proc/1 was used as boot time
  • change time of stat /proc/<pid>/root/proc/1 was used as container start time

This PR:

  • btime value inside /proc/stat is used as boot time
  • change time of stat /proc/<pid>/root/proc/1/cmdline is used as container start time

What type of PR is this?

/kind bug

Any specific area of the project related to this PR?

/area libscap

Does this PR require a change in the driver versions?

No

What this PR does / why we need it:

Use the btime from /proc/stat as boot time.

Use the change time from /proc/1/cmdline to derive container start time; this is similar to how Docker does it.

Which issue(s) this PR fixes:

Fixes #932

Special notes for your reviewer:

cc @Andreagit97 and @FedeDP and @incertum

Does this PR introduce a user-facing change?:

fix: inaccurate timestamps in particular kernel configurations

@poiana
Copy link
Contributor

poiana commented Mar 23, 2023

Welcome @happy-dude! It looks like this is your first PR to falcosecurity/libs 🎉

@poiana poiana added the size/S label Mar 23, 2023
Copy link
Contributor

@gnosek gnosek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from one minor nit :)

userspace/libscap/scap.c Outdated Show resolved Hide resolved
@FedeDP
Copy link
Contributor

FedeDP commented Mar 23, 2023

Thank you for this!
And thanks for testing it on your systems!
It LGTM aside what Grzeg said!

@FedeDP
Copy link
Contributor

FedeDP commented Mar 23, 2023

/cc @incertum too :)

@FedeDP
Copy link
Contributor

FedeDP commented Mar 23, 2023

/milestone 0.11.0

@poiana poiana added this to the 0.11.0 milestone Mar 23, 2023
@happy-dude
Copy link
Contributor Author

I'll create a build internally with this patch and see if it fixes the incorrect-timestamp issue on all systems Falco is deployed on and validate this further 👍

@incertum
Copy link
Contributor

@happy-dude thanks a lot for this patch, who would have thought, never surprised, always amazed!

Perhaps we could patch below cases as well in this PR?

diff --git a/userspace/libscap/linux/scap_procs.c b/userspace/libscap/linux/scap_procs.c
index 1b7d436e..3c872d61 100644
--- a/userspace/libscap/linux/scap_procs.c
+++ b/userspace/libscap/linux/scap_procs.c
@@ -501,11 +501,11 @@ int32_t scap_proc_fill_cgroups(char* error, int cgroup_version, struct scap_thre
 
 int32_t scap_proc_fill_pidns_start_ts(char* error, struct scap_threadinfo* tinfo, const char* procdirname)
 {
-       char filename[SCAP_MAX_PATH_SIZE];
+       char proc_cmdline_pidns[SCAP_MAX_PATH_SIZE];
        struct stat targetstat = {0};
 
-       snprintf(filename, sizeof(filename), "%sroot/proc/1", procdirname);
-       if(stat(filename, &targetstat) == 0)
+       snprintf(proc_cmdline_pidns, sizeof(proc_cmdline_pidns), "%sroot/proc/1/cmdline", procdirname);
+       if(stat(proc_cmdline_pidns, &targetstat) == 0)
        {
                tinfo->pidns_init_start_ts = targetstat.st_ctim.tv_sec * (uint64_t) 1000000000 + targetstat.st_ctim.tv_nsec;
                return SCAP_SUCCESS;
@@ -977,9 +977,11 @@ static int32_t scap_proc_add_from_proc(scap_t* handle, uint32_t tid, char* procd
                         dir_name, handle->m_lasterr);
        }
 
-       if(stat(dir_name, &dirstat) == 0)
+       char proc_cmdline[SCAP_MAX_PATH_SIZE];
+       snprintf(proc_cmdline, sizeof(proc_cmdline), "%scmdline", dir_name);
+       if(stat(proc_cmdline, &dirstat) == 0)
        {
-               tinfo->clone_ts = dirstat.st_ctim.tv_sec*1000000000 + dirstat.st_ctim.tv_nsec;
+               tinfo->clone_ts = dirstat.st_ctim.tv_sec * (uint64_t) 1000000000 + dirstat.st_ctim.tv_nsec;
        }

@gnosek and @FedeDP would you agree on adopting a consistent new approach?
Above includes a bit of a cleanup as well and extends Stanley's proposed naming convention.

@happy-dude
Copy link
Contributor Author

I was about to note that a few test builds didn't fix the issue on my hosts 😅

BUT @incertum may have caught the missing pieces, so I'll create a build and see what it looks like from there!

@happy-dude
Copy link
Contributor Author

An aside -- I'm not the best with git magic -- how do I give co-author credit to @incertum for her suggestions?

@happy-dude happy-dude changed the title Resolve falcosecurity/libs#932, use /proc/1/cmdline for boot/procfs creation time (WIP) Resolve falcosecurity/libs#932, use /proc/1/cmdline for boot/procfs creation time Mar 23, 2023
@happy-dude
Copy link
Contributor Author

happy-dude commented Mar 23, 2023

Unfortunate news:

The change didn't fix the timestamp on my nodes; it's possible that /proc/1/cmdline change may be insufficient in producing the right timestamp -- confirmed on both a x86-64 host and aarch64 host.

Changing this PR to (WIP) while we discuss further in the #932

@incertum
Copy link
Contributor

I was about to note that a few test builds didn't fix the issue on my hosts

sad ... yes let's investigate further in the ticket and try a few options

An aside -- I'm not the best with git magic -- how do I give co-author credit to @incertum for her suggestions?

git commit -s

-> opens your editor, just add the line below above or below your signed off line or you could also append it via another git command, up to you, you can find the emails of any of us in a previous commit of the person

Co-authored-by: Melissa Kilby melissa.kilby.oss@gmail.com

@poiana poiana added size/M and removed size/S labels Mar 24, 2023
userspace/libscap/scap.c Show resolved Hide resolved
userspace/libscap/scap.c Show resolved Hide resolved
@happy-dude happy-dude force-pushed the timestamp_issue_proc_cmdline branch 2 times, most recently from f9f1c7a to 6e3f981 Compare March 24, 2023 19:55
@poiana poiana added size/L and removed size/M labels Mar 24, 2023
Copy link
Contributor

@incertum incertum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @happy-dude also for adding the amazing comments. Pulled it and checked it. Times make sense on my end.

Perhaps @gnosek and @FedeDP you would have some other style preferences when parsing /proc/stat? Let's see.

Stanley could you test it once again on all your test servers just to be sure all issues are resolved?

userspace/libscap/scap.c Outdated Show resolved Hide resolved
@happy-dude happy-dude changed the title (WIP) Resolve falcosecurity/libs#932, use /proc/1/cmdline for boot/procfs creation time Resolve falcosecurity/libs#932, use /proc/stat/ btime for boot time and /proc/1/cmdline for container start time Mar 24, 2023
Copy link
Contributor

@gnosek gnosek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except maybe the /proc/stat reading code (I left a comment, take it or leave it :))

snprintf(filename, sizeof(filename), "%sroot/proc/1", procdirname);
if(stat(filename, &targetstat) == 0)
snprintf(proc_cmdline_pidns, sizeof(proc_cmdline_pidns), "%sroot/proc/1/cmdline", procdirname);
if(stat(proc_cmdline_pidns, &targetstat) == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: with this implementation, the "container start time" for host processes will not be equal to the boot time but (presumably) to the time when the host init started.

I am completely fine with this (and it does make sense), just pointing this out since we spent the whole Friday discussing the subtle differences of various timestamps :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks; added a comment block in the latest commit + rebase! Please let me know if I missed anything.

#1003 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with you @gnosek - but we can also polish this in a follow up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnosek hmmm having problems with if (tinfo->vpid != tinfo->pid) to ever evaluate to true even when placing the check after those fields should have been parsed. Then also never got to else if(strstr(line, "NStgid:") == line) in scap_proc_fill_info_from_stats, so in summary yes it should be a follow up PR, something with that parser seems wonky as well!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@incertum can you show me the code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just place that comparison if statement anywhere after vpid should have been populated or even add print or debug statements within that scap_proc_fill_info_from_stats parser, it never evaluated to true for me even though I was running containers etc with sleep processes. Is it working for you?

userspace/libscap/scap.c Show resolved Hide resolved
userspace/libscap/scap.c Show resolved Hide resolved
Copy link
Member

@Andreagit97 Andreagit97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

userspace/libscap/linux/scap_procs.c Outdated Show resolved Hide resolved
@happy-dude
Copy link
Contributor Author

happy-dude commented Mar 27, 2023

Just made a fresh commit + rebase!

/userspace/libscap/scap.c:

  • *boot_time = 0 at beginning of function
  • drop outer strstr, don't scan line twice

/userspace/libscap/linux/scap_procs.c:

  • added comment about container start time implementation
  • define and use SECOND_TO_NS instead of 1000000000 for rest of file

Request: I'll make a fresh build with this PR and evaluate the change on my systems; this should help me confirm if this fixes the boot time issue.

However, can I get an assist evaluating the container-start-time change and ensuring those timestamps are correct/accurate as expected?

@happy-dude happy-dude force-pushed the timestamp_issue_proc_cmdline branch 2 times, most recently from b8d8f2a to 99614d1 Compare March 27, 2023 14:34
…on time

See falcosecurity#932 for more context

Change occurrences of `/proc/1` to `/proc/1/cmdline` in
* userspace/libscap/linux/scap_procs.c
* userspace/libscap/scap.c

Previous:
```c
snprintf(proc_dir, sizeof(proc_dir), "%s/proc/1/", scap_get_host_root());
```

This PR:
```c
snprintf(proc_cmdline, sizeof(proc_cmdline), "%s/proc/1/cmdline", scap_get_host_root());
```

Co-authored-by: Grzegorz Nosek <grzegorz.nosek@sysdig.com>
Co-authored-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Signed-off-by: Stanley Chan <pocketgamer5000@gmail.com>
@happy-dude happy-dude force-pushed the timestamp_issue_proc_cmdline branch 2 times, most recently from 429a640 to c804f94 Compare March 27, 2023 16:12
happy-dude and others added 2 commits March 27, 2023 11:16
Get boot time from btime value in /proc/stat

ref: falcosecurity#932

/proc/uptime and btime in /proc/stat are fed by the same kernel sources.

Multiple ways to get boot time:
* btime in /proc/stat
* calculation via clock_gettime(CLOCK_REALTIME - CLOCK_BOOTTIME)
* calculation via time(NULL) - sysinfo().uptime

Maintainers preferred btime in /proc/stat because:
* value does not depend on calculation using current timestamp
* btime is "static" and doesn't change once set
* btime is available in kernels from 2008
* CLOCK_BOOTTIME is available in kernels from 2011 (2.6.38)

By scraping btime from /proc/stat, it is both the heaviest and most likely to succeed

Co-authored-by: Grzegorz Nosek <grzegorz.nosek@sysdig.com>
Co-authored-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Signed-off-by: Stanley Chan <pocketgamer5000@gmail.com>
Co-authored-by: Grzegorz Nosek <grzegorz.nosek@sysdig.com>
Co-authored-by: Melissa Kilby <melissa.kilby.oss@gmail.com>
Signed-off-by: Stanley Chan <pocketgamer5000@gmail.com>
Copy link
Contributor

@incertum incertum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A follow up item for us, see https://github.com/falcosecurity/libs/pull/1003/files#r1149676712, but proposing to merge this first.

/approve

@poiana
Copy link
Contributor

poiana commented Mar 27, 2023

LGTM label has been added.

Git tree hash: 8a531afd35e813542b8fe75035761e30f50e5d3f

@happy-dude
Copy link
Contributor Author

Confirming that all my test hosts (bare-metal) are reporting the right timestamps after applying this PR as a patch 👍

Copy link
Member

@Andreagit97 Andreagit97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana
Copy link
Contributor

poiana commented Mar 28, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Andreagit97, Happy-Dude, incertum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Andreagit97,incertum]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana merged commit 63da804 into falcosecurity:master Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Timestamp incorrect in event logs, dated in the future by 7 days/1 week
6 participants