RFE: implement an audit container identifier #90

rgbriggs · 2018-06-01T15:46:51Z

Split this off from #32, leaving that issue for addressing namespace identifiers in audit records, should they be deemed necessary.

Implement an audit container identifier.

Add the ability to identify a task's assigned container using an audit container identifier. The registration process involves writing a u64 to file audit_containerid in the /proc filesystem under the PID of the target container task. This will result in a CONTAINER_ID record to log the event. Subsequent audit events that involve this task will have an auxiliary record CONTAINER to identify the container involved.

Depends: linux-audit/audit-userspace#51
See: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID

History:

Here's a patchset from David Howells that makes an attempt at a kernel container object that would have been useful for our use case:
https://lkml.org/lkml/2017/5/22/645
The LWN article reviewing it:
https://lwn.net/Articles/723561/
Posted Audit Kernel Container identifier proposal v1 upstream:
https://www.redhat.com/archives/linux-audit/2017-September/msg00082.html
https://lkml.org/lkml/2017/9/13/383
Posted RFC(v2): Audit Kernel Container IDs proposal
https://lkml.org/lkml/2017/10/12/354
"non-Cc:" fork https://lkml.org/lkml/2017/10/17/689
LWN coverage: https://lwn.net/Articles/740621/
Posted RFC(v3): Audit Kernel Container identifiers proposal
https://www.redhat.com/archives/linux-audit/2018-January/msg00014.html
https://lkml.org/lkml/2018/1/9/347
Posted RFC v1 patchset upstream:
https://lkml.org/lkml/2018/3/1/813
https://www.redhat.com/archives/linux-audit/2018-March/msg00004.html
Posted RFC v1 userspace patch for auditctl containerid filter support:
https://www.redhat.com/archives/linux-audit/2018-March/msg00030.html
https://lkml.org/lkml/2018/3/5/82
Posted v2 patchset upstream:
https://www.redhat.com/archives/linux-audit/2018-March/msg00110.html
https://lkml.org/lkml/2018/3/16/191
Posted v2 userspace patchset upstream:
https://www.redhat.com/archives/linux-audit/2018-March/msg00124.html
https://lkml.org/lkml/2018/3/16/210

The text was updated successfully, but these errors were encountered:

rgbriggs · 2018-06-06T17:11:27Z

Posted v3 kernel patchset upstream:
https://www.redhat.com/archives/linux-audit/2018-June/msg00048.html
https://lkml.org/lkml/2018/6/6/609

rgbriggs · 2018-07-31T20:16:39Z

posted v4 kernel patchset upstream:
https://www.redhat.com/archives/linux-audit/2018-July/msg00178.html
https://lkml.org/lkml/2018/7/31/855

nhorman · 2018-12-21T16:25:01Z

I have a question regarding the container id assignment mechanism. Curently the patch in the v4 posted series loosely defines a container by virtue of the fact that a nonce is assigned to it from user space. This is nice for a few reasons, primarily because it excuses the kernel from having to have any definition of what a container is (ostensibly we consider a container to be a unique collection of namespaces, and possibly cgroups, but the kernel wants no knowledge of that). It seems this implementation of container assignment suffers from a few shortcommings however:

There appears to be no mechanism that prevents a container from modifying its own id (presuming CAP_SYS_ADMIN is not removed from its capability set, which I think doesn't occur for trusted containers)
There appears to be no mechanism for preventing a container from changing namespaces once an id is set for it, meaning that the correlation between the id and whatever you want to define in userspace as being a 'container' is lost
There is no current mechanism that prevents multiple unique 'containers' from sharing an id

All of these problems are of course fixable in the current implementation, but for the sake of argument, I'd like to propose an alternate solution that may (I've not 100% thought it out yet) reduce the complexity of the code, make the semantics of the control of the container id from userspace more clear, and solve the above problems

To start, I'd like to define (from a userspace perspective strictly) what we see as a container:

A container is defined by user space as a process that have a specific collection of (namespaces and cgroups) AND does NOT have the abilty to enter new namespaces and cgroups

I would propose that we implement the following in the kernel to enforce that policy:

Create a new capability, CAP_AUDIT_SETNS, which gates the ability to successfully call the setns system call. This capability is inherited by its children. If CAP_AUDIT_SETNS is re-granted to a process, its contid (see 2, below), is reset to AUDIT_CID_UNSET).
Create a new field in the audit struct per task, contid (which exists in this patchset already). This field is assigned a nonce (likely using the ida_simple_get api) if the CAP_AUDIT_SETNS capability is dropped, which would also generate an audit log message (simmilar to the one generated in audit_set_contid)
If a process calls fork/clone with any of the CLONE_ flag set, the CAP_AUDIT_SETNS capability is restored to the child process, and its contid value is reset to AUDIT_CID_UNSET (allowing for a process in a container to start a new container, but not to change its own 'container nature')

I think this approach may have a few advantages:
a) It provides a established mechanism (the capabilities subsystem), to provide a gated point at which the above container definition becomes locked from the perspective of the initial process inside the container. While the parent may re-grant the privlidge, it creates an environment whereby dropping the capability both locks the nature of what the container is (a unique set of namespaces and cgroups), and provides a unique id to the container for audit purposes

b) It removes the need to manage container id's from userspace, reducing the risk for errors in assigning duplicate identifiers, and reduces code size by removing the need to create a new proc file and validate the information passed through it.

c) It reduces complexity in the kernel code. The kernel can guarantee unique identifiers using the gating CAP_AUDIT_SETNS flag as the moment to generate the id.

This also (may) smooth upstream acceptace, because from a kernel standpoint, all we're doing is adding a capability to an established interface, and generating a nonce for the purposes of audit on the dropping of that capability. From user space its nice because we can consider the dropping of the ability to enter a new namespace as the border between a process being outside of, and inside of, a container.

We may have already gone down the path of the existing implementation to consider a change of this magnitude, but I wanted to bring this up before we were locked into a design

pcmoore · 2018-12-22T14:29:51Z

There appears to be no mechanism that prevents a container from modifying its own id (presuming CAP_SYS_ADMIN is not removed from its capability set, which I think doesn't occur for trusted containers)

I'm replying strictly from memory here, so I might have some of the minor details wrong, but managing the audit container ID should require CAP_AUDIT_ADMIN. The idea being that only container orchestrator processes would be granted this capability, not the individual containers themselves.

It gets slightly more confusing if you want to allow nested container orchestrators, but you're already dealing with other problems if you are going this route.

There appears to be no mechanism for preventing a container from changing namespaces once an id is set for it, meaning that the correlation between the id and whatever you want to define in userspace as being a 'container' is lost.

Once again, managing an audit container ID is gated by CAP_AUDIT_ADMIN so it is unlikely to be an issue. It is also worth noting that once the process spawns any children, or additional threads, you can't change the audit container ID.

There is no current mechanism that prevents multiple unique 'containers' from sharing an id.

Not in the kernel, that is correct. Like many things, this is something that is left to the container orchestrator.

One of the design constraints, if not the most important design constraint, was to avoid defining "container" in the context of the kernel with the audit container ID work. We defer all logic for setting, and managing the audit container ID to the userspace container orchestrator. In this first round of patches the kernel's only role here is to report the audit container ID as part of the audit event stream, and ensure the audit container ID is inherited properly for newly created threads/processes.

Later the kernel will add some intelligence for routing audit records based on the audit container ID, and allow multiple audit daemons to capture specific audit event streams, but even that will be carefully done so as to not define "container" in the kernel.

nhorman · 2018-12-24T12:23:07Z

I'm replying strictly from memory here, so I might have some of the minor details wrong, but managing the audit container ID should require CAP_AUDIT_ADMIN. The idea being that only container orchestrator processes would be granted this capability, not the individual containers themselves.
Right, I think its CAP_AUDIT_CONTROL, but no matter, and you are absolutely right, the container id is unwriteable by any process that doesn't have that capability. That said, I was less worried about a contained process changing its ID, and more worried about a contained process changing properties that an orchestrator might associate. i.e. a contained process may have its container id be fixed, but it could easily still call setns on it self, and enter the namespace of another process, breaking whatever association the orchestrator might have assumed would be established. My question was really an attempt to enforce that implied mapping by creating a capability control that could block entry to other processes namespaces in such a way that the orcestrator could preserve its notion of what a container was, without imbuing the kernel with any knowledge of containers (and hopefully at the same time, creating a trigger mechanism that the kernel audit code could use to auto-generate said ids). If you think thats too much information in the kernel, thats fine, but I wanted to ask the question.

It gets slightly more confusing if you want to allow nested container orchestrators, but you're already dealing with other problems if you are going this route.
Yeah, I'm not super worried about that (though bifurcating capability control between setns and unshare might be useful here, at least for the purposes of this conversation)

Once again, managing an audit container ID is gated by CAP_AUDIT_ADMIN so it is unlikely to be an issue. It is also worth noting that once the process spawns any children, or additional threads, you can't change the audit container ID.

Yes, but its not changing the audit container id that I'm asking about, its changing the namespaces that a given set of processes with an immutable container id that I'm concerned with. Maybe it doesn't matter for the purposes of audit, but it seems like it should be. As an example, Process A is spawned by an orchestator, and assigned net namespace 1, and container id 10. If Process A then forks a child with CLONE_NEWNET set, creating process B, and it (process A) then calls setns(, ), we have a situation in which process A has entered a new net namespace, but kept its old container id, all without the orchestrators knoweldge. My question is, does the orchestrator care? I'm assuming here that the entire purpose of creating a container id is to have a simple handle to refer to a unique set of namespaces and cgroups that the orchestrator can track, and allowing this change breaks the implied mapping of that handle to those namespaces. If it doesn't matter, then please let me know, and I can drop this entirely, it just seems like it should.

rgbriggs · 2018-12-24T22:17:19Z

On 2018-12-21 16:25, Neil Horman wrote: ...

It seems this implementation of container assignment suffers from a few shortcommings however: 1) There appears to be no mechanism that prevents a container from modifying its own id (presuming CAP_SYS_ADMIN is not removed from its capability set, which I think doesn't occur for trusted containers)

This was included in earlier versions, preventing a task from setting its own audit container ID (contid), but it was decided this was too restrictive, and it was desirable to allow a container orchestrator to set its own contid. There were also other restrictions earlier that prevented a child contid being set if its parent contid was different, or a flag that indicated that inheritance, but those were similarly removed as too restrictive, leaving that management up to the orchestrator.

2) There appears to be no mechanism for preventing a container from changing namespaces once an id is set for it, meaning that the correlation between the id and whatever you want to define in userspace as being a 'container' is lost

Since the container is an arbitrary collection of namespaces, cgroups and seccomp and there is no universally agreed-upon definition, it was decided that the actual namespace membership wasn't in fact relevant. Any of that process' children will inherit its parent contid and any namespaces created would automatically become part of that container. There is another issue open to track namespaces in audit. It was originally thought we wanted to track container activity by using a set of namespaces, but it became evident that this was complex, required too much network and disk bandwidth, and wasn't even reliable and complete. There is still value in tracking those namespaces, but it isn't going to solve the primary problem we are trying to solve. (see #32)

3) There is no current mechanism that prevents multiple unique 'containers' from sharing an id

This problem was also solved in a previous bit of code, but it was decided this was an orchestrator management issue.

All of these problems are of course fixable in the current implementation, but for the sake of argument, I'd like to propose an alternate solution that may (I've not 100% thought it out yet) reduce the complexity of the code, make the semantics of the control of the container id from userspace more clear, and solve the above problems To start, I'd like to define (from a userspace perspective strictly) what we see as a container: **A container is defined by user space as a process that have a specific collection of (namespaces and cgroups) AND does NOT have the abilty to enter new namespaces and cgroups**

This second clause we had discussed and decided was too restrictive. Of course we want to restrict a process from moving itself to another container's namespace set, but this can already be done using namespace management tools. We saw no reason to restrict it from creating new namespaces and using them, and their children would all inherit their contid.

I would propose that we implement the following in the kernel to enforce that policy: 1) Create a new capability, CAP_AUDIT_SETNS, which gates the ability to successfully call the setns system call. This capability is inherited by its children. If CAP_AUDIT_SETNS is re-granted to a process, its contid (see 2, below), is reset to AUDIT_CID_UNSET).

I had already suggested using a new capability to gate the ability to set the contid, but we had received an objection that creating a new capability was unnecessary since it could be covered with CAP_AUDIT_CONTROL. This suggestion would allow a process that was previously confined to a container to essentially break out of it, which defeats the purpose.

2) Create a new field in the audit struct per task, contid (which exists in this patchset already). This field is assigned a nonce (likely using the ida_simple_get api) if the CAP_AUDIT_SETNS capability is dropped, which would also generate an audit log message (simmilar to the one generated in audit_set_contid)

Would this field replace it, or are you suggesting adding a field of a slightly different name? An earlier proposal had used a kernel-assigned container serial number to ensure each new container had a unique ID, but this was rejected partly due to the need for the orchestrator to read back that new ID to learn what it was, the orchestrator lost the ability to use IDs that made sense to it, but also the inability for the orchestrator to add a new process to an existing container.

3) If a process calls fork/clone with any of the CLONE_<namespace> flag set, the CAP_AUDIT_SETNS capability is restored to the child process, and its contid value is reset to AUDIT_CID_UNSET (allowing for a process in a container to start a new container, but not to change its own 'container nature')

I don't think we want to allow a contained process to break out of its container, even with a new capability. I'll have to reflect on this idea/approach to understand its goal and see if it solves a challenge we currently have...

I think this approach may have a few advantages: a) It provides a established mechanism (the capabilities subsystem), to provide a gated point at which the above container definition becomes locked from the perspective of the initial process inside the container. While the parent may re-grant the privlidge, it creates an environment whereby dropping the capability both locks the nature of what the container is (a unique set of namespaces and cgroups), and provides a unique id to the container for audit purposes

As indicated above, we think we don't want to prevent a task in a container from creating a new namespace that would inherit its creator's contid.

b) It removes the need to manage container id's from userspace, reducing the risk for errors in assigning duplicate identifiers, and reduces code size by removing the need to create a new proc file and validate the information passed through it.

We had decided that we wanted to delegate that responsibility to userspace intentionally. How would you propose discovering the newly created contid if it were assigned from the kernel?

c) It reduces complexity in the kernel code. The kernel can guarantee unique identifiers using the gating CAP_AUDIT_SETNS flag as the moment to generate the id. This also (may) smooth upstream acceptace, because from a kernel standpoint, all we're doing is adding a capability to an established interface, and generating a nonce for the purposes of audit on the dropping of that capability. From user space its nice because we can consider the dropping of the ability to enter a new namespace as the border between a process being outside of, and inside of, a container.

Interesting... More reflection required...

We may have already gone down the path of the existing implementation to consider a change of this magnitude, but I wanted to bring this up before we were locked into a design

I think we still have that flexibility.

rgbriggs · 2018-12-24T22:43:12Z

On 2018-12-24 04:23, Neil Horman wrote: Right, I think its CAP_AUDIT_CONTROL, but no matter, and you are absolutely right, the container id is unwriteable by any process that doesn't have that capability. That said, I was less worried about a contained process changing its ID, and more worried about a contained process changing properties that an orchestrator might associate. i.e. a contained process may have its container id be fixed, but it could easily still call setns on it self, and enter the namespace of another process, breaking whatever association the orchestrator might have assumed would be established. My question was really an attempt to enforce that implied mapping by creating a capability control that could block entry to other processes namespaces in such a way that the orcestrator could preserve its notion of what a container was, without imbuing the kernel with any knowledge of containers (and hopefully at the same time, creating a trigger mechanism that the kernel audit code could use to auto-generate said ids). If you think thats too much information in the kernel, thats fine, but I wanted to ask the question. >It gets slightly more confusing if you want to allow nested container orchestrators, but you're already dealing with other problems if you are going this route. Yeah, I'm not super worried about that (though bifurcating capability control between setns and unshare might be useful here, at least for the purposes of this conversation) >Once again, managing an audit container ID is gated by CAP_AUDIT_ADMIN so it is unlikely to be an issue. It is also worth noting that once the process spawns any children, or additional threads, you can't change the audit container ID. Yes, but its not changing the audit container id that I'm asking about, its changing the namespaces that a given set of processes with an immutable container id that I'm concerned with. Maybe it doesn't matter for the purposes of audit, but it seems like it should be. As an example, Process A is spawned by an orchestator, and assigned net namespace 1, and container id 10. If Process A then forks a child with CLONE_NEWNET set, creating process B, and it (process A) then calls setns(<net>, <process B>), we have a situation in which process A has entered a new net namespace, but kept its old container id, all without the orchestrators knoweldge. My question is, does the orchestrator care? I'm assuming here that the entire purpose of creating a container id is to have a simple handle to refer to a unique set of namespaces and cgroups that the orchestrator can track, and allowing this change breaks the implied mapping of that handle to those namespaces. If it doesn't matter, then please let me know, and I can drop this entirely, it just seems like it should.

The orchestrator should not care about a process creating a new namespace, and if it does, it should remove the capability that allows it to do so (CAP_SYS_ADMIN). (That raises the question about creating a new capability for managing namespaces since the capability that currently gates that action is a bit overloaded.) I had previously thought this through and there was something else preventing a process from setting its own namespace to cross into another container's space, but I'm not reemberhing it now... It is certainly possible for mulitple containers to share a namespace, which is addressed towards the end of the v4 patchset.

nhorman · 2018-12-26T01:10:28Z

@rgbriggs @pcmoore Forgive me for consolidating your above responses, but the conversation it getting lengthy and I'm having trouble keeping up with all the comments. To abbreviate your thoughts on my proposed design changes @rgbriggs please feel free to reflect and comment on them as you see fit, but my goal with them was really twofold:

To understand what the functional goal was in this patch set from a userspace semantics standpoint (i.e. what does a container id mean to an orchestrator)
To suggest some improvements to the implementation of those semantics, should my assumptions about (1) be correct

If you think there are improvements to be made with my suggestions/thoughts, great. If no, thats also fine.

I think, based on what you have both said, this is my understanding of the user space semantics, as you see them:

a) A container id is a write once nonce, set by an orchestrator on an initial process in a container (for some arbitrary definition of the term container), and inherited by its children. Once set, it is immutable.

b) A container id is assigned to a process and its children, but has no fixed correlation to the same set of namespaces and cgroups. If an orchestrator wishes to make the set of processes with a given container id have a fixed set of namespaces and cgroups, it (the orchestrator) should drop the lead processes CAP_SYS_ADMIN capability prior to it forking any children

c) The uniqueness of a container is managed in userspace. It is the responsibility of an orchestrator to ensure that all containers in a system (for any definition of container is wishes to enforce) have a unique id, or that multiple containers sharing an id do so according to a sane policy.

Do you both agree with points (a),(b), and (c)? If not, please correct me. If you do agree, then the comments below become valid:

Regarding point (a), it makes sense to me, more or less. My goal with my alternate proposal was to take the generation of a container id out of the hands of userspace so as to ease the mechanics of generating said nonce (doing it in the kernel allows for uniqueness very easily, but requires embedding policy in the kernel to trigger its generation based on a set of events, which is tantamount to the kernel enforcing what a container is). I would like to point out that what you are describing with this nonce also sounds very similar to a session id to me (i.e. a process that calls setsid() to start a new session could be considered a container in the same way that your container id would denote it, and potentially be usable without any kernel changes). Just some food for thought.
Regarding point (b), I'm fine with that. Userspace can very easily drop CAP_SYS_ADMIN to prevent the unsharing of namespaces within a process tree. That said, while the fork system call gates the unsharing of namespaces with that capability, the unshare and setns system calls do not appear to, so there is I think some additional work required here to enforce this capability as it pertains to namespaces. As a philosophy however, I'm definately on board with the idea of using this capability to gate namespace creation and assignment.
Regarding point (c), This actually worries me alot. While I understand the desire to manage container id assignement in user space, It relies on the assumption that there is a single orchestrator running in userspace at one time. Any single orchestrator is capable of ensuring each container receives a unique id, but the interface as designed makes no allowance for the parallel execution of two orchestrators. It would simple to obfuscate the audit logs by simply having two copies of openshift running. Any sufficiently privileged process can write the container id of any process, and duplicate an existing container id, leading that field in the audit log becoming useless or worse, intentionally misleading. I think some rework is called for there.

rgbriggs · 2018-12-27T17:04:39Z

On 2018-12-25 17:10, Neil Horman wrote: a) A container id is a write once nonce, set by an orchestrator on an initial process in a container (for some arbitrary definition of the term container), and inherited by its children. Once set, it is immutable.

Correct. My understanding is that an orchestrator can inject commands into a container (usually for config) and so would need to run a process and "attach" it to an existing container. It is quite likely I've misunderstood and it is somehow communicating with an existing process in that container to get that information across.

b) A container id is assigned to a process and its children, but has no fixed correlation to the same set of namespaces and cgroups. If an orchestrator wishes to make the set of processes with a given container id have a fixed set of namespaces and cgroups, it (the orchestrator) should drop the lead processes CAP_SYS_ADMIN capability prior to it forking any children

I believe so.

c) The uniqueness of a container is managed in userspace. It is the responsibility of an orchestrator to ensure that all containers in a system (for any definition of container is wishes to enforce) have a unique id, or that multiple containers sharing an id do so according to a sane policy.

Yes.

1) Regarding point (a), it makes sense to me, more or less. My goal with my alternate proposal was to take the generation of a container id out of the hands of userspace so as to ease the mechanics of generating said nonce (doing it in the kernel allows for uniqueness very easily, but requires embedding policy in the kernel to trigger its generation based on a set of events, which is tantamount to the kernel enforcing what a container is). I would like to point out that what you are describing with this nonce also sounds very similar to a session id to me (i.e. a process that calls setsid() to start a new session could be considered a container in the same way that your container id would denote it, and potentially be usable without any kernel changes). Just some food for thought.

This embedding of the container definition policy enforcement in the kernel was the exact objection of Casey Schauffler. I will have to look at setsid more closely. I assumed you were talking about the audit sessionid which has been raised during the design proposals along with loginuid, but they aren't quite the same.

2) Regarding point (b), I'm fine with that. Userspace can very easily drop CAP_SYS_ADMIN to prevent the unsharing of namespaces within a process tree. That said, while the fork system call gates the unsharing of namespaces with that capability, the unshare and setns system calls do not appear to, so there is I think some additional work required here to enforce this capability as it pertains to namespaces. As a philosophy however, I'm definately on board with the idea of using this capability to gate namespace creation and assignment.

I believe they are all covered. clone(2) checks CAP_SYS_ADMIN if any CLONE_NEW* flags are present, setns(2) does in each of the ns->ops->install() calls, and unshare(2) checks in unshare_nsproxy_namespaces().

3) Regarding point (c), This actually worries me alot. While I understand the desire to manage container id assignement in user space, It relies on the assumption that there is a single orchestrator running in userspace at one time. Any single orchestrator is capable of ensuring each container receives a unique id, but the interface as designed makes no allowance for the parallel execution of two orchestrators. It would simple to obfuscate the audit logs by simply having two copies of openshift running. Any sufficiently privileged process can write the container id of any process, and duplicate an existing container id, leading that field in the audit log becoming useless or worse, intentionally misleading. I think some rework is called for there.

This would be an issue with parallel or nested orchestrators alike. Parallel orchestrators on one machine had not been considered. This was the reason for my preference of a serial contid generated in the kernel or pseudo-random UUID contid generated by the orchestrator that would be checked for uniqueness upon set. However, my understanding is that would prevent the orchestrator from injecting commands into a container it previously spawned. We had considered allowing an orchestrator to set the contid only of its own descendants.

nhorman · 2018-12-27T19:44:50Z

@rgbriggs Hey, thanks for the response. Answers to your thoughts:

I will have to look at setsid more closely. I assumed you were talking about the audit sessionid which has been raised during the design proposals along with loginuid, but they aren't quite the same.

No, setsid() is the systemcall that assigns a unique session id to a process group leader in the namespace of the process. If called prior to entering any new namespaces, it is unique within the process namespace of the orchestrator, and as such, could be used as an audit container id that is guaranteed to be unique for the lifetime of the container. Using it might also be nice because it uses existing in frastructure to assign a unique id to a process group, which It seems, based on your prior answers, is more or less what you are considering a container. As before, just some food for thought

I believe they are all covered. clone(2) checks CAP_SYS_ADMIN if any CLONE_NEW* flags are present, setns(2) does in each of the ns->ops->install() calls, and unshare(2) checks in unshare_nsproxy_namespaces()

Yep, you're right, I hadn't dug deeply enough, apologies.

This would be an issue with parallel or nested orchestrators alike. Parallel orchestrators on one machine had not been considered. This was the reason for my preference of a serial contid generated in the kernel or pseudo-random UUID contid generated by the orchestrator that would be checked for uniqueness upon set. However, my understanding is that would prevent the orchestrator from injecting commands into a container it previously spawned. We had considered allowing an orchestrator to set the contid only of its own descendants.

I'm not sure I followthe reasoning above. A serial or random UUID I think works just as well as my setsid() suggestion above (arguably better), especially if it allows for a uniqueness guarantee. Is the concern that if the kernel generates a random id, that the orchestrator won't know what the id is, thereby preventing a mapping of the id in the audit log to the process set? If so, thats an easy fix, your write-only proc file can become a read only proc file that exports the random value to the orchestrator. Or is there something more going on here?

rgbriggs · 2018-12-27T20:12:01Z

On 2018-12-27 11:44, Neil Horman wrote: Is the concern that if the kernel generates a random id, that the orchestrator won't know what the id is, thereby preventing a mapping of the id in the audit log to the process set? If so, thats an easy fix, your write-only proc file can become a read only proc file that exports the random value to the orchestrator.

The "audit: read container ID of a process" patch does that. It was added as a debug feature, but is being considered more seriously for inclusion due to having added CAP_AUDIT_CONTROL to restrict its use to try to reduce abuse.

nhorman · 2018-12-27T20:54:44Z

I would strongly agree with that. Even if the kernel is not responsible for computation of a unique id, it should be able to validate the uniqueness of an id to ensure the integrity of the audit log in the presence of multiple orchestrators. And allowing that container id to be read back is essential in the event that an orchestrator restarts with containers outstanding, so that the process->container id map can be rebuilt

Implement the proc fs write to set the audit container identifier of a process, emitting an AUDIT_CONTAINER_OP record to document the event. This is a write from the container orchestrator task to a proc entry of the form /proc/PID/audit_containerid where PID is the process ID of the newly created task that is to become the first task in a container, or an additional task added to a container. The write expects up to a u64 value (unset: 18446744073709551615). The writer must have capability CAP_AUDIT_CONTROL. This will produce a record such as this: type=CONTAINER_ID msg=audit(2018-06-06 12:39:29.636:26949) : op=set opid=2209 old-contid=18446744073709551615 contid=123456 pid=628 auid=root uid=root tty=ttyS0 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 comm=bash exe=/usr/bin/bash res=yes The "op" field indicates an initial set. The "pid" to "ses" fields are the orchestrator while the "opid" field is the object's PID, the process being "contained". Old and new audit container identifier values are given in the "contid" fields, while res indicates its success. It is not permitted to unset the audit container identifier. A child inherits its parent's audit container identifier. See: linux-audit/audit-kernel#90 See: linux-audit/audit-userspace#51 See: linux-audit/audit-testsuite#64 See: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Serge Hallyn <serge@hallyn.com> Acked-by: Steve Grubb <sgrubb@redhat.com> (am from https://patchwork.kernel.org/patch/10551315/) BUG=chromium:918980 TEST=Build, boot and GCP internal testing. Changed the return value of the default audit_get_contid as the kuid_t is a 32-bit value where the other version is a u64 failing compilation on 32-bit kernels. Signed-off-by: Thomas Garnier <thgarnie@google.com> Change-Id: Iee61e96d015715f1dde24f92c230f14410cb5a79 Reviewed-on: https://chromium-review.googlesource.com/1379655 Reviewed-by: Dmitry Torokhov <dtor@chromium.org> Reviewed-by: Robert Kolchmeyer <rkolchmeyer@google.com> Reviewed-by: Kees Cook <keescook@chromium.org>

rgbriggs · 2019-03-16T12:18:41Z

Posted v5:
kernel:
https://www.redhat.com/archives/linux-audit/2019-March/msg00025.html
https://lkml.org/lkml/2019/3/15/532
userspace:
https://www.redhat.com/archives/linux-audit/2019-March/msg00036.html
https://lkml.org/lkml/2019/3/15/544

rgbriggs · 2019-04-09T03:55:18Z

Post v6:
kernel:
https://www.redhat.com/archives/linux-audit/2019-April/msg00030.html
https://lkml.org/lkml/2019/4/8/1164

rgbriggs · 2019-04-09T19:33:18Z

post v6 userspace:
https://www.redhat.com/archives/linux-audit/2019-April/msg00062.html
https://lkml.org/lkml/2019/4/9/774

rgbriggs · 2019-04-10T21:42:28Z

Test case v1 PR: linux-audit/audit-testsuite#83

Implement the proc fs write to set the audit container identifier of a process, emitting an AUDIT_CONTAINER_OP record to document the event. This is a write from the container orchestrator task to a proc entry of the form /proc/PID/audit_containerid where PID is the process ID of the newly created task that is to become the first task in a container, or an additional task added to a container. The write expects up to a u64 value (unset: 18446744073709551615). The writer must have capability CAP_AUDIT_CONTROL. This will produce a record such as this: type=CONTAINER_OP msg=audit(2018-06-06 12:39:29.636:26949) : op=set opid=2209 contid=123456 old-contid=18446744073709551615 The "op" field indicates an initial set. The "opid" field is the object's PID, the process being "contained". New and old audit container identifier values are given in the "contid" fields. It is not permitted to unset the audit container identifier. A child inherits its parent's audit container identifier. Please see the github audit kernel issue for the main feature: linux-audit/audit-kernel#90 Please see the github audit userspace issue for supporting additions: linux-audit/audit-userspace#51 Please see the github audit testsuiite issue for the test case: linux-audit/audit-testsuite#64 Please see the github audit wiki for the feature overview: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Serge Hallyn <serge@hallyn.com> Acked-by: Steve Grubb <sgrubb@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>

Create a new audit record AUDIT_CONTAINER_ID to document the audit container identifier of a process if it is present. Called from audit_log_exit(), syscalls are covered. A sample raw event: type=SYSCALL msg=audit(1519924845.499:257): arch=c000003e syscall=257 success=yes exit=3 a0=ffffff9c a1=56374e1cef30 a2=241 a3=1b6 items=2 ppid=606 pid=635 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=3 comm="bash" exe="/usr/bin/bash" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="tmpcontainerid" type=CWD msg=audit(1519924845.499:257): cwd="/root" type=PATH msg=audit(1519924845.499:257): item=0 name="/tmp/" inode=13863 dev=00:27 mode=041777 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:tmp_t:s0 nametype= PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 type=PATH msg=audit(1519924845.499:257): item=1 name="/tmp/tmpcontainerid" inode=17729 dev=00:27 mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 type=PROCTITLE msg=audit(1519924845.499:257): proctitle=62617368002D6300736C65657020313B206563686F2074657374203E202F746D702F746D70636F6E7461696E65726964 type=CONTAINER_ID msg=audit(1519924845.499:257): contid=123458 Please see the github audit kernel issue for the main feature: linux-audit/audit-kernel#90 Please see the github audit userspace issue for supporting additions: linux-audit/audit-userspace#51 Please see the github audit testsuiite issue for the test case: linux-audit/audit-testsuite#64 Please see the github audit wiki for the feature overview: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Serge Hallyn <serge@hallyn.com> Acked-by: Steve Grubb <sgrubb@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>

This defines the message number for the audit container identifier registration record should the kernel headers not be up to date, gives the record number a name for printing and allows the record to be interpreted since it is in the 1000 range like AUDIT_LOGIN. See: linux-audit#51 See: linux-audit/audit-kernel#90 See: linux-audit/audit-testsuite#64 See: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID Signed-off-by: Richard Guy Briggs <rgb@redhat.com>

This defines the message number for the audit container identifier information record should the kernel headers not be up to date and gives the record number a name for printing. See: linux-audit#51 See: linux-audit/audit-kernel#90 See: linux-audit/audit-testsuite#64 See: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID Signed-off-by: Richard Guy Briggs <rgb@redhat.com>

Add support to be able to set a capability to allow a task to set the audit container identifier of descendants. See: linux-audit#51 See: linux-audit/audit-kernel#90 See: linux-audit/audit-testsuite#64 See: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID Add the audit_get_capcontid() and audit_set_capcontid() calls analogous to CAP_AUDIT_CONTROL for descendant user namespaces. Signed-off-by: Richard Guy Briggs <rgb@redhat.com>

rgbriggs · 2020-06-27T15:58:35Z

Post v9 kernel:
https://www.redhat.com/archives/linux-audit/2020-June/msg00108.html
https://lkml.org/lkml/2020/6/27/205

Post v9 userspace:
https://www.redhat.com/archives/linux-audit/2020-June/msg00122.html

Implement the proc fs write to set the audit container identifier of a process, emitting an AUDIT_CONTAINER_OP record to document the event. This is a write from the container orchestrator task to a proc entry of the form /proc/PID/audit_containerid where PID is the process ID of the newly created task that is to become the first task in a container, or an additional task added to a container. The write expects up to a u64 value (unset: 18446744073709551615). The writer must have capability CAP_AUDIT_CONTROL. This will produce a record such as this: type=CONTAINER_OP msg=audit(2018-06-06 12:39:29.636:26949) : op=set opid=2209 contid=123456 old-contid=18446744073709551615 The "op" field indicates an initial set. The "opid" field is the object's PID, the process being "contained". New and old audit container identifier values are given in the "contid" fields. It is not permitted to unset the audit container identifier. A child inherits its parent's audit container identifier. Store the audit container identifier in a refcounted kernel object that is added to the master list of audit container identifiers. This will allow multiple container orchestrators/engines to work on the same machine without danger of inadvertantly re-using an existing identifier. It will also allow an orchestrator to inject a process into an existing container by checking if the original container owner is the one injecting the task. A hash table list is used to optimize searches. Please see the github audit kernel issue for the main feature: linux-audit/audit-kernel#90 Please see the github audit userspace issue for supporting additions: linux-audit/audit-userspace#51 Please see the github audit testsuiite issue for the test case: linux-audit/audit-testsuite#64 Please see the github audit wiki for the feature overview: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Serge Hallyn <serge@hallyn.com> Acked-by: Steve Grubb <sgrubb@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>

Create a new audit record AUDIT_CONTAINER_ID to document the audit container identifier of a process if it is present. Called from audit_log_exit(), syscalls are covered. Include target_cid references from ptrace and signal. A sample raw event: type=SYSCALL msg=audit(1519924845.499:257): arch=c000003e syscall=257 success=yes exit=3 a0=ffffff9c a1=56374e1cef30 a2=241 a3=1b6 items=2 ppid=606 pid=635 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=3 comm="bash" exe="/usr/bin/bash" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="tmpcontainerid" type=CWD msg=audit(1519924845.499:257): cwd="/root" type=PATH msg=audit(1519924845.499:257): item=0 name="/tmp/" inode=13863 dev=00:27 mode=041777 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:tmp_t:s0 nametype= PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 type=PATH msg=audit(1519924845.499:257): item=1 name="/tmp/tmpcontainerid" inode=17729 dev=00:27 mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 type=PROCTITLE msg=audit(1519924845.499:257): proctitle=62617368002D6300736C65657020313B206563686F2074657374203E202F746D702F746D70636F6E7461696E65726964 type=CONTAINER_ID msg=audit(1519924845.499:257): contid=123458 Please see the github audit kernel issue for the main feature: linux-audit/audit-kernel#90 Please see the github audit userspace issue for supporting additions: linux-audit/audit-userspace#51 Please see the github audit testsuiite issue for the test case: linux-audit/audit-testsuite#64 Please see the github audit wiki for the feature overview: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Serge Hallyn <serge@hallyn.com> Acked-by: Steve Grubb <sgrubb@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>

Add support to be able to set a capability to allow a task to set the audit container identifier of descendants. See: linux-audit#51 See: linux-audit/audit-kernel#90 See: linux-audit/audit-testsuite#64 See: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID Add the audit_get_capcontid() and audit_set_capcontid() calls analogous to CAP_AUDIT_CONTROL for descendant user namespaces. Signed-off-by: Richard Guy Briggs <rgb@redhat.com>

This defines the message number for the audit container identifier registration record should the kernel headers not be up to date, gives the record number a name for printing and allows the record to be interpreted since it is in the 1000 range like AUDIT_LOGIN. See: linux-audit#51 See: linux-audit/audit-kernel#90 See: linux-audit/audit-testsuite#64 See: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID Signed-off-by: Richard Guy Briggs <rgb@redhat.com>

This defines the message number for the audit container identifier information record should the kernel headers not be up to date and gives the record number a name for printing. See: linux-audit#51 See: linux-audit/audit-kernel#90 See: linux-audit/audit-testsuite#64 See: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID Signed-off-by: Richard Guy Briggs <rgb@redhat.com>

Add support to be able to set a capability to allow a task to set the audit container identifier of descendants. See: linux-audit#51 See: linux-audit/audit-kernel#90 See: linux-audit/audit-testsuite#64 See: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID Add the audit_get_capcontid() and audit_set_capcontid() calls analogous to CAP_AUDIT_CONTROL for descendant user namespaces. Signed-off-by: Richard Guy Briggs <rgb@redhat.com>

This defines the message number for the audit container identifier registration record should the kernel headers not be up to date, gives the record number a name for printing and allows the record to be interpreted since it is in the 1000 range like AUDIT_LOGIN. See: linux-audit#51 See: linux-audit/audit-kernel#90 See: linux-audit/audit-testsuite#64 See: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID Signed-off-by: Richard Guy Briggs <rgb@redhat.com>

This defines the message number for the audit container identifier information record should the kernel headers not be up to date and gives the record number a name for printing. See: linux-audit#51 See: linux-audit/audit-kernel#90 See: linux-audit/audit-testsuite#64 See: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID Signed-off-by: Richard Guy Briggs <rgb@redhat.com>

Add support to be able to set a capability to allow a task to set the audit container identifier of descendants. See: linux-audit#51 See: linux-audit/audit-kernel#90 See: linux-audit/audit-testsuite#64 See: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID Add the audit_get_capcontid() and audit_set_capcontid() calls analogous to CAP_AUDIT_CONTROL for descendant user namespaces. Signed-off-by: Richard Guy Briggs <rgb@redhat.com>

The audit-related parameters in struct task_struct should ideally be collected together and accessed through a standard audit API and the audit structures made opaque to other kernel subsystems. Collect the existing loginuid, sessionid and audit_context together in a new opaque struct audit_task_info called "audit" in struct task_struct. Use kmem_cache to manage this pool of memory. Un-inline audit_free() to be able to always recover that memory. Please see the upstream github issues linux-audit/audit-kernel#81 linux-audit/audit-kernel#90 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>

Implement the proc fs write to set the audit container identifier of a process, emitting an AUDIT_CONTAINER_OP record to document the event. This is a write from the container orchestrator task to a proc entry of the form /proc/PID/audit_containerid where PID is the process ID of the newly created task that is to become the first task in a container, or an additional task added to a container. The write expects up to a u64 value (unset: 18446744073709551615). The writer must have capability CAP_AUDIT_CONTROL. This will produce a record such as this: time->Thu Nov 26 10:24:46 2020 type=PROCTITLE msg=audit(1606404286.956:174546): proctitle=2F7573722F62696E2F7065726C002D7700636F6E7461696E657269642F74657374 type=SYSCALL msg=audit(1606404286.956:174546): arch=c000003e syscall=1 success=yes exit=19 a0=6 a1=557446a6a650 a2=13 a3=8 items=0 ppid=6827 pid=8724 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 ses=1 comm="perl" exe="/usr/bin/perl" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null) type=CONTAINER_OP msg=audit(1606404286.956:174546): op=set opid=8771 contid=4112973747854606336 old-contid=-1 The "op" field indicates an initial set. The "opid" field is the object's PID, the process being "contained". New and old audit container identifier values are given in the "contid" fields. It is not permitted to unset the audit container identifier. A child inherits its parent's audit container identifier. Store the audit container identifier in a refcounted kernel object that is added to the master list of audit container identifiers. This will allow multiple container orchestrators/engines to work on the same machine without danger of inadvertantly re-using an existing identifier. It will also allow an orchestrator to inject a process into an existing container by checking if the original container owner is the one injecting the task. A hash table list is used to optimize searches. audit: log drop of contid on exit of last task Since the life of each audit container indentifier is being tracked, we can match the creation event with the destruction event. Log the destruction of the audit container identifier when the last process in that container exits. Add support for reading the audit container identifier from the proc filesystem. This is a read from the proc entry of the form /proc/PID/audit_containerid where PID is the process ID of the task whose audit container identifier is sought. The read expects up to a u64 value (unset: 18446744073709551615). This read requires CAP_AUDIT_CONTROL. Add an entry to Documentation/ABI for /proc/$pid/audit_containerid. Please see the github audit kernel issue for the main feature: linux-audit/audit-kernel#90 Please see the github audit userspace issue for supporting additions: linux-audit/audit-userspace#51 Please see the github audit testsuiite issue for the test case: linux-audit/audit-testsuite#64 Please see the github audit wiki for the feature overview: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Serge Hallyn <serge@hallyn.com> Acked-by: Steve Grubb <sgrubb@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>

Create a new audit record AUDIT_CONTAINER_ID to document the audit container identifier of a process if it is present. Called from audit_log_exit(), syscalls are covered. Include target_cid references from ptrace and signal. A sample raw event: time->Thu Nov 26 10:24:40 2020 type=PROCTITLE msg=audit(1606404280.226:174542): proctitle=2F7573722F62696E2F7065726C002D7700636F6E7461696E657269642F74657374 type=PATH msg=audit(1606404280.226:174542): item=1 name="/tmp/audit-testsuite-dir-8riQ/testsuite-1606404267-WNldVJCr" inode=428 dev=00:1f mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0 type=PATH msg=audit(1606404280.226:174542): item=0 name="/tmp/audit-testsuite-dir-8riQ/" inode=427 dev=00:1f mode=040700 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0 type=CWD msg=audit(1606404280.226:174542): cwd="/root/rgb/git/audit-testsuite/tests" type=SYSCALL msg=audit(1606404280.226:174542): arch=c000003e syscall=257 success=yes exit=6 a0=ffffff9c a1=557446bd5f10 a2=80241 a3=1b6 items=2 ppid=8724 pid=8758 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 ses=1 comm="perl" exe="/usr/bin/perl" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="testsuite-1606404267-WNldVJCr" record=1 type=CONTAINER_ID msg=audit(1606404280.226:174542): record=1 contid=527940429489930240 Please see the github audit kernel issue for the main feature: linux-audit/audit-kernel#90 Please see the github audit userspace issue for supporting additions: linux-audit/audit-userspace#51 Please see the github audit testsuiite issue for the test case: linux-audit/audit-testsuite#64 Please see the github audit wiki for the feature overview: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Serge Hallyn <serge@hallyn.com> Acked-by: Steve Grubb <sgrubb@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>

rgbriggs · 2021-01-07T17:43:09Z

2020-12-21
post v10 kernel
https://www.redhat.com/archives/linux-audit/2020-December/msg00047.html
https://lkml.org/lkml/2020/12/21/338
post v10 user
https://www.redhat.com/archives/linux-audit/2020-December/msg00059.html
https://lkml.org/lkml/2020/12/21/361
This was quickly addressed by the upstream kernel audit maintainer that ACKs on the first patch were questionable, which I acknowledged as being out of date triggering another version.

rgbriggs · 2021-01-12T16:06:15Z

post v11 kernel
https://www.redhat.com/archives/linux-audit/2021-January/msg00007.html
https://lkml.org/lkml/2021/1/12/818

The audit-related parameters in struct task_struct should ideally be collected together and accessed through a standard audit API and the audit structures made opaque to other kernel subsystems. Collect the existing loginuid, sessionid and audit_context together in a new opaque struct audit_task_info called "audit" in struct task_struct. Use kmem_cache to manage this pool of memory. Un-inline audit_free() to be able to always recover that memory. Please see the upstream github issues linux-audit/audit-kernel#81 linux-audit/audit-kernel#90 Signed-off-by: Richard Guy Briggs <rgb@redhat.com>

Implement the proc fs write to set the audit container identifier of a process, emitting an AUDIT_CONTAINER_OP record to document the event. This is a write from the container orchestrator task to a proc entry of the form /proc/PID/audit_containerid where PID is the process ID of the newly created task that is to become the first task in a container, or an additional task added to a container. The write expects up to a u64 value (unset: 18446744073709551615). The writer must have capability CAP_AUDIT_CONTROL. This will produce an event such as this with the new CONTAINER_OP record: time->Thu Nov 26 10:24:27 2020 type=PROCTITLE msg=audit(1606404267.551:174524): proctitle=2F7573722F62696E2F7065726C002D7700636F6E7461696E657269642F74657374 type=SYSCALL msg=audit(1606404267.551:174524): arch=c000003e syscall=1 success=yes exit=20 a0=6 a1=557446aa9180 a2=14 a3=100 items=0 ppid=6827 pid=8724 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 ses=1 comm="perl" exe="/usr/bin/perl" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null) type=CONTAINER_OP msg=audit(1606404267.551:174524): op=set opid=8730 contid=4515122123205246976 old-contid=-1 The "op" field indicates an initial set. The "opid" field is the object's PID, the process being "contained". New and old audit container identifier values are given in the "contid" fields. It is not permitted to unset the audit container identifier. A child inherits its parent's audit container identifier. Store the audit container identifier in a refcounted kernel object that is added to the master list of audit container identifiers. This will allow multiple container orchestrators/engines to work on the same machine without danger of inadvertantly re-using an existing identifier. It will also allow an orchestrator to inject a process into an existing container by checking if the original container owner is the one injecting the task. A hash table list is used to optimize searches. Since the life of each audit container indentifier is being tracked, we match the creation event with the destruction event. Log the drop of the audit container identifier when the last process in that container exits. Add support for reading the audit container identifier from the proc filesystem. This is a read from the proc entry of the form /proc/PID/audit_containerid where PID is the process ID of the task whose audit container identifier is sought. The read expects up to a u64 value (unset: (u64)-1). This read requires CAP_AUDIT_CONTROL. Add an entry to Documentation/ABI for /proc/$pid/audit_containerid. Please see the github audit kernel issue for the main feature: linux-audit/audit-kernel#90 Please see the github audit userspace issue for supporting additions: linux-audit/audit-userspace#51 Please see the github audit testsuiite issue for the test case: linux-audit/audit-testsuite#64 Please see the github audit wiki for the feature overview: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID Signed-off-by: Richard Guy Briggs <rgb@redhat.com>

Create a new audit record AUDIT_CONTAINER_ID to document the audit container identifier of a process if it is present. Called from audit_log_exit(), syscalls are covered. Include target_cid references from ptrace and signal. A sample raw event: time->Thu Nov 26 10:24:40 2020 type=PROCTITLE msg=audit(1606404280.226:174542): proctitle=2F7573722F62696E2F7065726C002D7700636F6E7461696E657269642F74657374 type=PATH msg=audit(1606404280.226:174542): item=1 name="/tmp/audit-testsuite-dir-8riQ/testsuite-1606404267-WNldVJCr" inode=428 dev=00:1f mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0 type=PATH msg=audit(1606404280.226:174542): item=0 name="/tmp/audit-testsuite-dir-8riQ/" inode=427 dev=00:1f mode=040700 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0 type=CWD msg=audit(1606404280.226:174542): cwd="/root/rgb/git/audit-testsuite/tests" type=SYSCALL msg=audit(1606404280.226:174542): arch=c000003e syscall=257 success=yes exit=6 a0=ffffff9c a1=557446bd5f10 a2=80241 a3=1b6 items=2 ppid=8724 pid=8758 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 ses=1 comm="perl" exe="/usr/bin/perl" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="testsuite-1606404267-WNldVJCr" record=1 type=CONTAINER_ID msg=audit(1606404280.226:174542): record=1 contid=527940429489930240 Please see the github audit kernel issue for the main feature: linux-audit/audit-kernel#90 Please see the github audit userspace issue for supporting additions: linux-audit/audit-userspace#51 Please see the github audit testsuiite issue for the test case: linux-audit/audit-testsuite#64 Please see the github audit wiki for the feature overview: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID Signed-off-by: Richard Guy Briggs <rgb@redhat.com>

bch_fs::snapshots is allocated by kvzalloc in __snapshot_t_mut. It should be freed by kvfree not kfree. Or umount will triger: [ 406.829178 ] BUG: unable to handle page fault for address: ffffe7b487148008 [ 406.830676 ] #PF: supervisor read access in kernel mode [ 406.831643 ] #PF: error_code(0x0000) - not-present page [ 406.832487 ] PGD 0 P4D 0 [ 406.832898 ] Oops: 0000 [#1] PREEMPT SMP PTI [ 406.833512 ] CPU: 2 PID: 1754 Comm: umount Kdump: loaded Tainted: G OE 6.7.0-rc7-custom+ #90 [ 406.834746 ] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 406.835796 ] RIP: 0010:kfree+0x62/0x140 [ 406.836197 ] Code: 80 48 01 d8 0f 82 e9 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 78 9f 1f 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 56 9f 1f 01 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 b0 00 00 00 66 90 48 8b 07 f6 [ 406.837810 ] RSP: 0018:ffffb9d641607e48 EFLAGS: 00010286 [ 406.838213 ] RAX: ffffe7b487148000 RBX: ffffb9d645200000 RCX: ffffb9d641607dc4 [ 406.838738 ] RDX: 000065bb00000000 RSI: ffffffffc0d88b84 RDI: ffffb9d645200000 [ 406.839217 ] RBP: ffff9a4625d00068 R08: 0000000000000001 R09: 0000000000000001 [ 406.839650 ] R10: 0000000000000001 R11: 000000000000001f R12: ffff9a4625d4da80 [ 406.840055 ] R13: ffff9a4625d00000 R14: ffffffffc0e2eb20 R15: 0000000000000000 [ 406.840451 ] FS: 00007f0a264ffb80(0000) GS:ffff9a4e2d500000(0000) knlGS:0000000000000000 [ 406.840851 ] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 406.841125 ] CR2: ffffe7b487148008 CR3: 000000018c4d2000 CR4: 00000000000006f0 [ 406.841464 ] Call Trace: [ 406.841583 ] <TASK> [ 406.841682 ] ? __die+0x1f/0x70 [ 406.841828 ] ? page_fault_oops+0x159/0x470 [ 406.842014 ] ? fixup_exception+0x22/0x310 [ 406.842198 ] ? exc_page_fault+0x1ed/0x200 [ 406.842382 ] ? asm_exc_page_fault+0x22/0x30 [ 406.842574 ] ? bch2_fs_release+0x54/0x280 [bcachefs] [ 406.842842 ] ? kfree+0x62/0x140 [ 406.842988 ] ? kfree+0x104/0x140 [ 406.843138 ] bch2_fs_release+0x54/0x280 [bcachefs] [ 406.843390 ] kobject_put+0xb7/0x170 [ 406.843552 ] deactivate_locked_super+0x2f/0xa0 [ 406.843756 ] cleanup_mnt+0xba/0x150 [ 406.843917 ] task_work_run+0x59/0xa0 [ 406.844083 ] exit_to_user_mode_prepare+0x197/0x1a0 [ 406.844302 ] syscall_exit_to_user_mode+0x16/0x40 [ 406.844510 ] do_syscall_64+0x4e/0xf0 [ 406.844675 ] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 406.844907 ] RIP: 0033:0x7f0a2664e4fb Signed-off-by: Su Yue <glass.su@suse.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

pcmoore assigned rgbriggs Jun 1, 2018

pcmoore added enhancement priority/medium labels Jun 1, 2018

This was referenced Jun 6, 2018

RFE: support audit container ID records linux-audit/audit-userspace#51

Closed

RFE: add namespace IDs to audit records #32

Open

pcmoore mentioned this issue Dec 11, 2018

RFE: group all audit task parameters together #81

Closed

rgbriggs mentioned this issue Mar 19, 2019

BUG: records from one event not grouped together linux-audit/audit-userspace#86

Closed

rgbriggs mentioned this issue Sep 8, 2020

auditd.service: use ConditionCapability besides ConditionKernelCommandLine linux-audit/audit-userspace#136

Closed

pandamasta mentioned this issue Nov 24, 2021

Audit root user in LXC container wazuh/wazuh#10945

Open

rc-andres mentioned this issue Nov 2, 2022

support kernels where the loginuid is inside an audit_task_info redcanaryco/redcanary-ebpf-sensor#63

Merged

AkihiroSuda mentioned this issue Jan 19, 2023

provide a portable mechanism for processes within container to obtain their image and container ids opencontainers/runtime-spec#1105

Open

pcmoore mentioned this issue Jul 6, 2023

RFE: improve filtering events by exe for containers #145

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFE: implement an audit container identifier #90

RFE: implement an audit container identifier #90

rgbriggs commented Jun 1, 2018 •

edited

Loading

rgbriggs commented Jun 6, 2018

rgbriggs commented Jul 31, 2018

nhorman commented Dec 21, 2018

pcmoore commented Dec 22, 2018

nhorman commented Dec 24, 2018

rgbriggs commented Dec 24, 2018 via email

rgbriggs commented Dec 24, 2018 via email

nhorman commented Dec 26, 2018

rgbriggs commented Dec 27, 2018 via email

nhorman commented Dec 27, 2018

rgbriggs commented Dec 27, 2018 via email

nhorman commented Dec 27, 2018

rgbriggs commented Mar 16, 2019

rgbriggs commented Apr 9, 2019

rgbriggs commented Apr 9, 2019

rgbriggs commented Apr 10, 2019

rgbriggs commented Jun 27, 2020

rgbriggs commented Jan 7, 2021

rgbriggs commented Jan 12, 2021

RFE: implement an audit container identifier #90

RFE: implement an audit container identifier #90

Comments

rgbriggs commented Jun 1, 2018 • edited Loading

rgbriggs commented Jun 6, 2018

rgbriggs commented Jul 31, 2018

nhorman commented Dec 21, 2018

pcmoore commented Dec 22, 2018

nhorman commented Dec 24, 2018

rgbriggs commented Dec 24, 2018 via email

rgbriggs commented Dec 24, 2018 via email

nhorman commented Dec 26, 2018

rgbriggs commented Dec 27, 2018 via email

nhorman commented Dec 27, 2018

rgbriggs commented Dec 27, 2018 via email

nhorman commented Dec 27, 2018

rgbriggs commented Mar 16, 2019

rgbriggs commented Apr 9, 2019

rgbriggs commented Apr 9, 2019

rgbriggs commented Apr 10, 2019

rgbriggs commented Jun 27, 2020

rgbriggs commented Jan 7, 2021

rgbriggs commented Jan 12, 2021

rgbriggs commented Jun 1, 2018 •

edited

Loading