Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PodSecurity] Aggregate identical warnings for multiple pods in a namespace #103213

Closed
2 tasks
Tracked by #103192 ...
tallclair opened this issue Jun 25, 2021 · 7 comments · Fixed by #105889
Closed
2 tasks
Tracked by #103192 ...

[PodSecurity] Aggregate identical warnings for multiple pods in a namespace #103213

tallclair opened this issue Jun 25, 2021 · 7 comments · Fixed by #105889
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/auth Categorizes an issue or PR as relevant to SIG Auth. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@tallclair
Copy link
Member

tallclair commented Jun 25, 2021

  • Aggregate identical warnings for multiple pods in a namespace
  • Test coverage of aggregation
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 25, 2021
@tallclair tallclair added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Jun 25, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 25, 2021
@liggitt liggitt added kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jun 27, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jun 27, 2021
@enj enj added this to Needs Triage in SIG Auth Old Jul 6, 2021
@njuptlzf
Copy link
Contributor

njuptlzf commented Jul 8, 2021

First test the current situation before making changes.
/assign

@njuptlzf
Copy link
Contributor

njuptlzf commented Jul 8, 2021

If a pod has multiple causes of errors,

[noruntimeclasspod: message, message2 runtimeclass1pod: message, message2 runtimeclass2pod: message, message2 runtimeclass3pod: message1, message2 runtimeclass4pod: message1, message2, runtimeclass5pod: message1, runtimeclass6pod: message2]

I have two thoughts

  1. Consider multiple causes of errors as a whole,
[(message, message2): [noruntimeclasspod, runtimeclass1pod, runtimeclass2pod],
(message1, message2): [runtimeclass3pod, runtimeclass4pod],
(message1): [runtimeclass5pod],
(message2): [runtimeclass6pod]]
  1. Each error cause is used as a key to associate related pods
[(message): [noruntimeclasspod, runtimeclass1pod, runtimeclass2pod]
(message1): [runtimeclass3pod, runtimeclass4pod, runtimeclass5pod],
(message2): [noruntimeclasspod, runtimeclass1pod, runtimeclass2pod, runtimeclass3pod, runtimeclass4pod, runtimeclass6pod]]

The warning of thoughts1 will be repeated a lot; the podName of thoughts2 will be repeated a lot.

According to requirements, I might choose thought2.

@tallclair
Copy link
Member Author

@njuptlzf I prefer the second approach you propose as well.

The only caveat: for a namespace with a small number of pods (1 in the extreme case), this approach would actually end up being a lot more verbose than the un-aggregated case. One option is we could set some threshold for number of pods with errors and only aggregate if the number exceeds that threshold. However, having 2 different potential output formats could be annoying if you're trying to do any automation on the response. @liggitt WDYT?

@liggitt
Copy link
Member

liggitt commented Jul 27, 2021

Thinking through the workflow of someone dealing with the warnings across multiple pods... to fix warnings, they still have to visit the individual pod or workload definitions and make appropriate changes. To make that easy, I still think the pod should be the primary unit of organization, not the individual messages.

The main reason I think we should aggregate pods with identical warnings is that it is common to have effectively identical pods created from the same workload controller. If I have a replicaset with 100 pods, I'd rather see myrs1-pod-hg2f32hg (and 99 other pods): <... all warnings about myrs1-pod-hg2f32hg ...>

Taking the example above:

[
noruntimeclasspod: message, message2
runtimeclass1pod: message, message2
runtimeclass2pod: message, message2
runtimeclass3pod: message1, message2
runtimeclass4pod: message1, message2
runtimeclass5pod: message1
runtimeclass6pod: message2
]

it's easier to figure out everything I need to do to fix specific pods/workloads if we aggregate to this format:

noruntimeclasspod (and 2 other pods): message, message2
runtimeclass3pod (and 1 other pod): message1, message2
runtimeclass5pod: message1
runtimeclass6pod: message2

I can then fix those four pods/workloads completely, then rerun the dry-run to check my work. If I modified root workload definitions that affected other elided pods, great! If some of the elided pods just happened to have identical warnings but were not controlled by the same root workload definitions, then they'll be surfaced by name when I recheck.

@liggitt liggitt moved this from Beta to In Progress in SIG-Auth: PodSecurity Jul 27, 2021
@njuptlzf
Copy link
Contributor

njuptlzf commented Jul 28, 2021

Okay, let me modify the logic and ut again.

@tallclair
Copy link
Member Author

@liggitt that makes sense, but we might want to key off the controller in addition to the error messages. E.g.

mydeploy-pod-hg2f32hg (and 99 others from apps/v1 ReplicaSet mydeploy-f2g3gc): message, message1
mydeploy2-pod-3f42hu (and 30 others from apps/v1 ReplicaSet mydeploy2-f2234u): message, message1

or is that getting too verbose?

@liggitt
Copy link
Member

liggitt commented Sep 7, 2021

hmm... I don't feel super-strongly either way, but that wouldn't necessarily point you at the object you'd actually need to edit to fix the issue... in the case of a deployment, you'd want to edit the deployment, but the pod ownerRefs would be pointing at the intermediate replicaset.

@enj enj moved this from Needs Triage to In Progress in SIG Auth Old Sep 27, 2021
@liggitt liggitt moved this from In Progress to Done (1.23, Beta) in SIG-Auth: PodSecurity Oct 26, 2021
SIG Auth Old automation moved this from In Progress to Closed / Done Oct 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/auth Categorizes an issue or PR as relevant to SIG Auth. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
SIG-Auth: PodSecurity
Done (1.23, Beta)
SIG Auth Old
Closed / Done
4 participants