-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] Enhanced abstraction of "file" concept in libsinsp filter/display classes for effective monitoring #1134
Comments
I think I'd avoid using I understand the appeal of just building out below the |
Here's a recent discussion on slack where we pointed out out that fd.* fields only make sense for syscalls on a fd: https://kubernetes.slack.com/archives/CMWH3EH32/p1685803022045059 (This isn't exactly the same, as the user was talking about spawned processes, but it's pretty close). |
No blockers. Few additional questions from my side to help find a solution that gives the best user experience if we wish to further discuss: Echoing Luca's question from the PR #1060 (comment):
If I am now putting on my Data Science / Big Data hat and what would help from my perspective is to have one field that is always populated with the "file path" on disk I would want to write my detection against, so that I don't have to remember which one the one is I have to care about for each unique system call, is it the source, the target, the path? This burden appears to still not be eliminated with the proposed PR. In summary there are 2 open discussion points:
Curious about more opinions on UX? |
That's exactly the goal here--to have a set of consistent fields so they can be used for a variety of file based syscalls. |
If If we start with adding a What about
Appreciate your additional input, but I have some reservations regarding the benefits and scope in this particular context. Referencing the table I created above, see #1134 (comment)
What about overloading Thanks for working on this Mark! |
About syscall support:
If it helps I can schedule an in-person meeting or we can discuss it during one of the upcoming falco meetings as well. Thank you for the detailed feedback! I want to make a change that is well-thought out that everyone can agree on as well. |
hmmm I understand your perspective, but I'm not convinced of Why not choose something like path.name // at least now it's consistent with fd.name naming, we could now even consider extending it to proc.exepath
path.type // we can now index if file or dir or fd
path.nameraw // similar to fd.nameraw and yes it has tons of value for threat detection, anything related to path traversals it is useful for
path.source // need detailed docs, see comments below
path.target // need detailed docs, see comments below
It appears that in many existing rules you focus just on one e.g. check out rules If we do it, yeah we need to cover everything in a consistent way also
yeah here I also still have reservations, for example for symlink:
At least with the original definitions I can read up the manpages and for symlink case, because you could also think of the |
I don't mind changing file.* to path.* if that's the consensus but to me file.* makes more sense as a prefix as it refers to filesystem paths. I could think of other paths that wouldn't be applicable here. proc.exepath is one example. Maybe fspath for filesystem path? I don't mind adding a field to distinguish between files and directories. But I don't think having fd as a type would make sense as a path for an open would rightly belong in both the file and fd categories. Also there will be many fds (network connections, shared memory regions) that would not be mapped to a filesystem path. If you want to distinguish between directories and files we could add a boolean file.isdir. I can also add .nameraw, although I think it will be difficult to build rules based on it as depending on the syscall, in some cases you will have fully resolved paths and for other syscalls you will not. I will add support for all the chmod/chown variants and quotactl (for now continuing to use file.*). You're right about how symlink maps source/target compared to link, they are currently inconsistent. Here's the current mappings:
Here's the sysdig output for
So to keep things consistent and also vaguely aligned with the man pages, I'll change the link/linkat variants to map oldpath to And I will definitely add documentation for how these new fields will work. It will be a standalone page and referred to from https://falco.org/docs/reference/rules/supported-fields/. I'll comment again (here I guess) once I've made the above changes. |
There's a problem with supporting file.isdir (or in general differentiating files and directories) for rename/renameat/renameat2. These can be used for either files or directories and there isn't anything in the event that can be used to distinguish between files/directories. You can't just do a stat() when coming up with a value for the field either, as that wouldn't work for capture files where the original filesystem is not available. Given this, I think it's better to not include a file.isdir field (or determining the type of the path) at all, unless there are really compelling reasons for it. |
Adding Re the symlinks/hardlinks thanks for checking and yes staying closer to the manpages is probably better, although personally I am not sure why they choose these conventions in the symlink case ... Thanks for planning a detailed docs page! Understood re the edge cases around dir/file distinction (thanks also for checking), arguably the syscall tells you if it's a dir I suppose ... In summary, we could live with the following? @falcosecurity/libs-maintainers any objections or other suggestions?
@mstemm one more request since we are aligning more and have clarity on including dir, we shall also include |
Sounds good to me! I'll start making the changes. If we're going to support .nameraw I think we also want .sourceraw/.targetraw so I will support those as well. And yes I can support umount. Seems like I should support mount as well then. source will be the path to the device and target will be where the device is mounted. This matches the man page. |
Thanks Mark! |
I finished all the changes above. fspath.name isn't working for fchmod/fchown as they work on fds and not paths. I should be able to fix that, but I'm going to make the change in a separate PR which I'm working on now. |
Great just left some comments in the PR, I think we can finish the review in the PR and unless someone has objections we can consider the discussion here resolved. For above plz add code comments indicating it's a todo to be fixed, ok from my end to do it in a new PR of course. |
Thanks to you both for the great discussion. I think both |
@jasondellaluce would we then go with |
I like |
Great let's go with below then, it appears to be the most consistent with current conventions in mind and would allow us to build
|
Closing and marking this discussion as concluded. We have reached alignment and the PR is merged as well. |
Motivation
fd.name
: if thefd.type
represents a file or directory, thefd.name
field contains the full path. If the path is not already an absolute path, a custom traversal parser can be used to derive the absolute path based on the cwd and the given arg.fd.nameraw
was introduced to expose the arg as given to detect path traversal attacks like../../../etc/passwd
fd.name.symlink_resolved
or similar name (also applicable for executable path) is discussed in [FEATURE] Resolving executable path symlink on execve #1111Syscalls in Linux often reference files or file names using custom arguments. This can make rule writing and data analysis cumbersome, as it requires remembering the relevant file paths args for a given syscall. Many users are also unfamiliar with the
evt.arg.*
fields and their meaning. Enhancing our documentation can address these issues by providing clear explanations and references to the concept of files in Linux. This will improve understanding and facilitate effective rule analysis.At the same time, enhancing the user experience by overloading the
fd.name
field would be beneficial.Feature
We can introduce more "overloading" instead of creating new field classes, as suggested in #1060. This approach promotes code reusability, simplifies system maintenance, and makes ETLs simpler, along with writing downstream rules.
Besides definitely embracing the reality that the concept of files and their attributes in Linux can be intricate ... 😅 same applies to the syscall args naming conventions in the Linux kernel ...
Reading below, it seems that keeping the
fd.
field class is the ideal choice for encapsulating file-related attributes.Current State + proposal of "overloading"
fd.name
even more:evt.arg.name
fd.name
, a mix of file or directoryfd.type
-> alsofd.nameraw
is available, however double check for each syscallevt.arg.path
fd.name
evt.arg.oldpath
fd.name.source
and overload tofd.name
evt.arg.newpath
fd.name.target
evt.arg.olddir
fd.name.source
evt.arg.newdir
fd.name.target
evt.arg.linkpath
fd.name.target
evt.arg.target
fd.name.source
and overload tofd.name
(this is the file we create a symlink over)evt.arg.quotafilepath
fd.name
evt.arg.pathname
fd.name
evt.arg.filename
andevt.arg.fd
as fchmod has no filenamefd.filename
, also check onfd.name
andfd.nameraw
behavior here to stay consistentfd.name
is overloaded with reference indicatingsrc->dst
ip tuplesfd.name
should be ?fd.name
should be ?The text was updated successfully, but these errors were encountered: