Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet: set terminationMessagePath perms to 0660 #108076

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

skrobul
Copy link

@skrobul skrobul commented Feb 11, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

Currently, kubelet creates a world-readable and world-writeable empty files in /var/lib/kubelet/pods/{podUID}/containers/{containerName}/{containerId}. These are meant to be written by the process in containers when container is terminated.

Originally, this file was created with 0644, then despite security concerns, it was changed to 0666 in #31839. This was completed to allow containers running as non-root to write termination messages. Later on, in 2019 this has been highlighted as a security vulnerability in Kubernetes Security Audit Report in #81116.

This PR changes termination log file mode to 0660 which is the best of both worlds - it removes world-writable file, yet still allows the container user and it's group to write the termination message.

Which issue(s) this PR fixes:

Related (fixes only part) #81116

Does this PR introduce a user-facing change?

Change world-accessible permissions to owner and group only read/write for files created by kubelet '/var/lib/kubelet/pods/{podUID}/containers/busysleep/{containerId}'. 
Termination message file (by default `/dev/termination-log`) can now be written only by the container user and it's group.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 11, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @skrobul. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 11, 2022
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 11, 2022
@ehashman ehashman added this to Triage in SIG Node PR Triage Feb 11, 2022
Copy link
Member

@ehashman ehashman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test
/triage accepted
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 7, 2022
@ehashman ehashman moved this from Triage to Needs Reviewer in SIG Node PR Triage Apr 7, 2022
@skrobul
Copy link
Author

skrobul commented Apr 8, 2022

/retest

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 7, 2022
@skrobul
Copy link
Author

skrobul commented Jul 7, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 7, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@bart0sh bart0sh moved this from Needs Reviewer to Waiting on Author in SIG Node PR Triage Apr 28, 2023
@skrobul skrobul marked this pull request as draft April 28, 2023 10:28
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 28, 2023
@skrobul
Copy link
Author

skrobul commented Apr 28, 2023

/retest

@skrobul skrobul marked this pull request as ready for review April 28, 2023 20:18
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 28, 2023
@skrobul
Copy link
Author

skrobul commented Apr 28, 2023

/retest

@bart0sh bart0sh moved this from Waiting on Author to Needs Reviewer in SIG Node PR Triage Apr 30, 2023
@skrobul skrobul requested a review from rata July 25, 2023 20:15
Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks reasonable assuming this is what we want. And tests are of course missing :)

However, I don't see much activity here to know what sig-node thinkgs about this. I'd recommend you, again, to join sig-node to have their interest and also ask if this should have a (very simple) KEP and feature gate or not. I think it probably should have.

If @SergeyKanzhelev or someone from sig-node can have a look and answer here, without the need to join sig-node, maybe that is simpler for you @skrobul ? :)

@skrobul
Copy link
Author

skrobul commented Jul 26, 2023

I'd recommend you, again, to join sig-node to have their interest

According to the docs in order to become member, I need to show "multiple contributions" before being added to the organisation. This is my very first contribution, so not exactly there yet.

@rata
Copy link
Member

rata commented Jul 26, 2023

@skrobul wait, that is mixing several different things. First, that link is for sig-docs, that is the Single Interest Group (SIG) about documentation. I'm talking about sig-node.

Secondly, you don't need to join any github org to join the weekly meetings. You can find the sig-node info here: https://github.com/kubernetes/community/tree/master/sig-node. Anyone can join and discuss the topics they want.

To add topics to the agenda, you just need to subscribe to the sig-node mailing list, and then you add your topic to the next weeks agenda (if there isn't one created, you can create one). Then, you join at appropiate date and time on zoom, discuss with the community the problem and solution you propose, they will tell you if a KEP is needed, a feature gate, etc. (my guess is that a very lightweight KEP will be needed. But hopefully it isn't, let's see!).

All of this should be explained there and/or the meeting agenda google docs link.

@skrobul
Copy link
Author

skrobul commented Jul 26, 2023

Secondly, you don't need to join any github org to join the weekly meetings. You can find the sig-node info here: https://github.com/kubernetes/community/tree/master/sig-node. Anyone can join and discuss the topics they want.

ohh that makes more sense now, thank you! I will do some reading and join soon. Once again thank you for taking time to write that up, appreciate it.

@marquiz
Copy link
Contributor

marquiz commented Oct 2, 2023

@skrobul are you still working on this? What was outcome of the discussion in SIG Node meeting (I think I missed that meeting myself)?

@SergeyKanzhelev
Copy link
Member

@skrobul are you still working on this? What was outcome of the discussion in SIG Node meeting (I think I missed that meeting myself)?

Link: https://docs.google.com/document/d/1Ne57gvidMEWXR70OxxnRkYquAoMpt56o75oZtg-OeBg/edit#bookmark=id.8ats7bythcn4 There is a recording available.

I cannot check recording now, but from what I recollect we discussed that there might be scenarios that can be broken by this. Something around external tool writing termination message about oom kill or similar.

I think suggestion was that it will be a breaking change and we will need to run the KEP process to work ensure we did a due diligence.

But my recollection may be faulty.

@marquiz
Copy link
Contributor

marquiz commented Oct 3, 2023

Thanks @SergeyKanzhelev. I quickly checked the recording and you indeed suggested writing a KEP. I think @rata was inclined in the same direction. My feeling about this is similar: KEP would allow exploring potential risks and alternative solutions in a well-defined manner plus prevent breaking existing users (with feature gate)

/cc @dchen1107

@skrobul
Copy link
Author

skrobul commented Oct 3, 2023

@skrobul are you still working on this?

No, I have done everything I could at this stage

What was outcome of the discussion in SIG Node meeting (I think I missed that meeting myself)?

The outcome as far as I remember was that:

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 22, 2024
@skrobul
Copy link
Author

skrobul commented Jan 22, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 22, 2024
} else if pod.Spec.SecurityContext != nil && pod.Spec.SecurityContext.RunAsUser != nil {
containerUid = int(*pod.Spec.SecurityContext.RunAsUser)
} else {
containerUid = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also consider SecurityContext.RunAsNonRoot here and below?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I understand the RunAsNonRoot does not influence the UID selection, it merely enables a validation that it's non-root, is that not the case?

Copy link
Contributor

@bart0sh bart0sh Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, but this change would set UID to 0 (root) even if RunAsNonRoot is true. It's may be ok, but looked suspicious to me. What would happen if container runs as non-root, but containerLogPath owner is root?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The container would not run at all if the RunAsNonRoot was set to true, but the pod.Spec.SecurityContext.RunAsUser was set unset. In other words, as far as I understand there is no way for the container to start as non-root without changing the RunAsUser either on container or pod level. What am I missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about it. I thought that RunAsNonRoot and RunAsUser are independent options. By default user UID and/or username is taken from the image.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Status: Needs Reviewer
SIG Node PR Triage
Needs Reviewer
Development

Successfully merging this pull request may close these issues.

None yet

9 participants