Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-10615: debug node: Run under an unconfined SELinux context #842

Closed
wants to merge 1 commit into from

Conversation

travier
Copy link
Member

@travier travier commented Jun 8, 2021

Run the node debug pod under an unconfined SELinux context
(unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023) to avoid issues
with the spc_t context.

Fixes: #641
See also:

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 8, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 8, 2021

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@travier
Copy link
Member Author

travier commented Jun 8, 2021

Looks like this is effectively requesting an unconfined pod but does not result in one:

$ ./oc debug node/ip-10-0-140-177.us-east-2.compute.internal -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    debug.openshift.io/source-container: container-00
    debug.openshift.io/source-resource: /v1, Resource=nodes/ip-10-0-140-177.us-east-2.compute.internal
  creationTimestamp: null
  name: ip-10-0-140-177us-east-2computeinternal-debug
  namespace: default
spec:
  containers:
  - command:
    - /bin/sh
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ac752e15c3b5a557a8c4e12d02279ab97f531c56381b92f75cbc0c68205f561e
    name: container-00
    resources: {}
    securityContext:
      privileged: true
      runAsUser: 0
      seLinuxOptions:
        role: unconfined_r
        type: unconfined_t
        user: unconfined_u
    stdin: true
    stdinOnce: true
    tty: true
    volumeMounts:
    - mountPath: /host
      name: host
  hostNetwork: true
  hostPID: true
  nodeName: ip-10-0-140-177.us-east-2.compute.internal
  restartPolicy: Never
  volumes:
  - hostPath:
      path: /
      type: Directory
    name: host
status: {}

Still running as spec_t and nothing stands out in the logs so far. Investigating.

@cgwalters
Copy link
Member

Thanks for working on this! One tricky aspect is we likely want to handle older cluster versions that have an older selinux-policy where this will fail.

@cgwalters
Copy link
Member

One alternative flow I'd like to enable is something like typing hostdo in the debug shell switches you into something that should feel absolutely indistinguishable from ssh. (For example, we do run through PAM, you become unconfined_t, you're actually in the host mount namespace, etc.) We could actually implement this just in the support-tools container without changing oc.

@travier
Copy link
Member Author

travier commented Jun 24, 2021

Another potential idea here is to make an option for oc debug node to only be a proxy from the local ssh client to the remote node ssh server and then let SSH, PAM, SSSD, etc. handle everything else, just like on a classic system.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 22, 2021
@travier
Copy link
Member Author

travier commented Sep 22, 2021

/remove-lifecycle stale
/lifecycle frozen

I would still be interesting to do that one. I need to figure out what's blocking it.

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 22, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 22, 2021

@travier: The lifecycle/frozen label cannot be applied to Pull Requests.

In response to this:

/remove-lifecycle stale
/lifecycle frozen

I would still be interesting to do that one. I need to figure out what's blocking it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 21, 2021
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 20, 2022
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Feb 19, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 19, 2022

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@travier
Copy link
Member Author

travier commented Mar 29, 2023

@travier
Copy link
Member Author

travier commented Mar 29, 2023

/reopen
/remove-lifecycle rotten

@openshift-ci openshift-ci bot reopened this Mar 29, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 29, 2023

@travier: Reopened this PR.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 29, 2023
@travier travier marked this pull request as ready for review March 29, 2023 11:32
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 29, 2023
@openshift-ci openshift-ci bot requested review from soltysh and removed request for smarterclayton March 29, 2023 11:32
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 29, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: travier
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Run the node debug pod under an unconfined SELinux context
(unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023) to avoid issues
with the `spc_t` context.

Fixes: openshift#641
See also:
 - https://bugzilla.redhat.com/show_bug.cgi?id=1839065
 - https://bugzilla.redhat.com/show_bug.cgi?id=1896369
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 29, 2023

@travier: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-agnostic-ovn-cmd b057767 link true /test e2e-agnostic-ovn-cmd

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sdodson
Copy link
Member

sdodson commented Mar 29, 2023

/retitle OCPBUGS-10615: debug node: Run under an unconfined SELinux context

@openshift-ci openshift-ci bot changed the title debug node: Run under an unconfined SELinux context OCPBUGS-10615: debug node: Run under an unconfined SELinux context Mar 29, 2023
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 29, 2023
@openshift-ci-robot
Copy link

@travier: This pull request references Jira Issue OCPBUGS-10615, which is invalid:

  • expected the bug to target the "4.14.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Run the node debug pod under an unconfined SELinux context
(unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023) to avoid issues
with the spc_t context.

Fixes: #641
See also:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@travier
Copy link
Member Author

travier commented Mar 29, 2023

Note that I'm not sure about what the tests here exercise in functionality so I'm not sure this change is tested.

@travier
Copy link
Member Author

travier commented Mar 29, 2023

This does not work for me but I'm not sure how to debug this:

$ ./oc debug node/ip-10-0-139-210.ec2.internal
Temporary namespace openshift-debug-hmvzf is created for debugging node...
Starting pod/ip-10-0-139-210ec2internal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.139.210
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-5.1# id -Z
system_u:system_r:spc_t:s0
sh-5.1#

Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think setting unconfined_t is not the right choice here, the container already runs privileged so it should run as spc_t.

@travier
Copy link
Member Author

travier commented Mar 30, 2023

I think setting unconfined_t is not the right choice here, the container already runs privileged so it should run as spc_t.

Yes, it runs as spc_t right now which is not exactly the same as unconfined_t and is the source of our issues. See the bugs:

@saschagrunert
Copy link
Member

I think setting unconfined_t is not the right choice here, the container already runs privileged so it should run as spc_t.

Yes, it runs as spc_t right now which is not exactly the same as unconfined_t and is the source of our issues. See the bugs:

Thank you for the context!

@atiratree
Copy link
Member

atiratree commented Mar 30, 2023

It seems that the SELinuxOptions are effectively skipped by cri-o when we use HostPID, HostIPC, HostNetwork in our debug pod (openshift/origin#25488 (comment))

cri-o/cri-o#5501 and openshift/origin#25488 summarize the issue nicely and the cri-o PR has a discussion about possible solution which cri-o/cri-o#6579 is trying to implement.

This PR does not solve the issue as it is still reproducible with these changes. It seems this should be addressed on the cri-o side first.
/hold

If the cri-o issue gets addressed, the suggested context looks okay to me as unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 are expected values for a root user, but I think it would be better if cri-o would set such values automatically.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 30, 2023
@saschagrunert
Copy link
Member

@atiratree I opened a PR for discussion a possible solution in CRI-O: cri-o/cri-o#6780

@rhatdan
Copy link
Contributor

rhatdan commented Mar 31, 2023

Setting the labels should be allowed always. The namespaces should have no effect. I don't know why looking at a share network namespace is ever a problem, but the IPC and pid would potentially cause issues with a confined container process not being able to see or work with a spc_t or unconfined_t process.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 30, 2023
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 30, 2023
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Aug 30, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 30, 2023

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

@travier: This pull request references Jira Issue OCPBUGS-10615. The bug has been updated to no longer refer to the pull request using the external bug tracker.

In response to this:

Run the node debug pod under an unconfined SELinux context
(unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023) to avoid issues
with the spc_t context.

Fixes: #641
See also:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFE] oc debug node: Use unconfined_t as SELinux context for debug container
8 participants