Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multus: add host checking to validation tool #14230

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

BlaineEXE
Copy link
Member

@BlaineEXE BlaineEXE commented May 16, 2024

In order to help users check that they have implemented the newly-added Multus host configuration prerequisites, add a check to the validation tool to verify connectivity.

Because users who are already running clusters with Multus enabled, add a flag that allows users to only check for host configuration prerequisites. This mode will not start the large number of clients that would normally be started because those clients could disrupt a running Rook cluster negatively.

Host checking pods require host network access. Many Kubernetes
distributions have pod security features enabled. In order to allow
non-Vanilla distros to run this tool, allow specifying a service account
that pods will run as, which can be configured by the admin to allow
test pods.

Manual validation tests:

  • flakiness check still works
  • expect to fail when NAD doesn't have route
  • cleanup cleans up host checkers
  • expect to fail when host doesn't have route
  • expect to succeed when NAD and host routing is correct
  • --host-check-only results in tool stopping after host check succeeds
  • --host-check-only exits with early success when public-net is unset
  • tool enters host check only mode when --host-check-only flag AND/OR config file config is set
  • test on openshift - works, but needs SA+role+binding to allow with SCCs
    • add example resources needed for openshift?
  • update documentation to mention using the validation tool
  • fix KinD-based CI tests

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

@BlaineEXE BlaineEXE force-pushed the multus-validation-test-add-host-checking branch 5 times, most recently from 2b5e387 to 0355394 Compare May 20, 2024 22:49
@BlaineEXE BlaineEXE added this to In progress in v1.14 via automation May 28, 2024
@BlaineEXE BlaineEXE force-pushed the multus-validation-test-add-host-checking branch from f96dbc6 to ea13a6a Compare May 31, 2024 15:36
Copy link

mergify bot commented Jun 4, 2024

This pull request has merge conflicts that must be resolved before it can be merged. @BlaineEXE please rebase it. https://rook.io/docs/rook/latest/Contributing/development-flow/#updating-your-fork

@BlaineEXE BlaineEXE force-pushed the multus-validation-test-add-host-checking branch 2 times, most recently from 639a122 to 8fc06e7 Compare June 7, 2024 22:39
@BlaineEXE
Copy link
Member Author

I was able to get this working on openshift, but I wasn't able to define my own custom SCC. The pod was perpetually saying that it wasn't allowed by any SCC, and the custom SCC was never in the list. @subhamkrai or @Madhu-1 do you remember if you saw this issue when testing other things and how you might've resolved that?

kind: SecurityContextConstraints
apiVersion: security.openshift.io/v1
metadata:
  name: multus-validation-test
# host checker daemonset runs on host network
allowHostNetwork: true
allowedCapabilities: ["NET_BIND_SERVICE"]
runAsUser:
  type: MustRunAsNonRoot
seLinuxContext:
  type: RunAsAny
# disallow everything noncritical
allowPrivilegedContainer: false
allowHostDirVolumePlugin: false
allowHostPID: false
allowHostIPC: false
allowHostPorts: false
readOnlyRootFilesystem: true
users:
  - system:serviceaccount:openshift-storage:multus-validation-test
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: multus-validation-test
  namespace: openshift-storage

In the end, I was able to get this working by instead specifying a Role and RoleBinding that allowed the SA to use an existing SCC (hostnetwork-v2). I think this solution is a safe go-forward strategy, but I'm still curious why my first attempt didn't work.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: multus-validation-test
  namespace: openshift-storage
rules:
  - apiGroups:
      - security.openshift.io
    resourceNames:
      - hostnetwork-v2
    resources:
      - securitycontextconstraints
    verbs:
      - use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: multus-validation-test
  namespace: openshift-storage
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: multus-validation-test
subjects:
  - kind: ServiceAccount
    name: multus-validation-test
    namespace: openshift-storage

@BlaineEXE BlaineEXE force-pushed the multus-validation-test-add-host-checking branch 2 times, most recently from 4433dac to 9e899a5 Compare June 24, 2024 21:34
@BlaineEXE BlaineEXE marked this pull request as ready for review June 24, 2024 21:34
@BlaineEXE BlaineEXE force-pushed the multus-validation-test-add-host-checking branch from 9e899a5 to 1f79e2b Compare June 24, 2024 21:36
@BlaineEXE BlaineEXE force-pushed the multus-validation-test-add-host-checking branch 2 times, most recently from 622c6e3 to 12cc170 Compare July 2, 2024 16:33
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the usage and documentation I've worked on has this tool being used interactively, especially with the config file addition, and doubly so with the openshift manifests being needed. I can't keep trying to maintain the job definition on top of the tool, so I'm removing it. I altered the docs to refer to running the tool interactively.

Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small suggestions

the tool's `serviceAccountName` config option or `--service-account-name` CLI flag to instruct
the tool to run using a particular ServiceAccount in order to allow necessary permissions.
An example compatible with openshift is provided in the Rook repository at
[.deploy/examples/multus-validation-test-scc.yaml](https://github.com/rook/rook/blob/master/deploy/examples/multus-validation-test-scc.yaml)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[.deploy/examples/multus-validation-test-scc.yaml](https://github.com/rook/rook/blob/master/deploy/examples/multus-validation-test-scc.yaml)
[deploy/examples/multus-validation-test-scc.yaml](https://github.com/rook/rook/blob/master/deploy/examples/multus-validation-test-scc.yaml)

@@ -0,0 +1,38 @@
# ServiceAccount and RBAC to support running multus validation test on OpenShift
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems more like rbac than scc. How about naming the file multus-validation-test-rbac.yaml?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought SCC made sense because the RBAC gives access to an SCC, and SCCs are only available on openshift, implying this is an openshift-focused manifest.

@@ -349,16 +370,13 @@ func (vt *ValidationTest) numClientsReady(ctx context.Context, expectedNumPods i
return numReady, nil
}

func (vt *ValidationTest) getClientPods(ctx context.Context, expectedNumPods int) (*core.PodList, error) {
func (vt *ValidationTest) getPodsWithLabel(ctx context.Context, label string) (*core.PodList, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an existing method k8sutil.PodsRunningWithLabel(). Would it make sense to call that method instead? Or maybe move this helper into k8sutil if it's different?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k8sutil.PodsRunningWithLabel() only lists pods in Running state. This method needs to list pods regardless of state.

tests/scripts/multus/host-cfg-ds.yaml Show resolved Hide resolved
@BlaineEXE BlaineEXE force-pushed the multus-validation-test-add-host-checking branch 2 times, most recently from e508cdb to aa71688 Compare July 3, 2024 15:41
@BlaineEXE BlaineEXE requested a review from travisn July 3, 2024 15:41
In order to help users check that they have implemented the newly-added
Multus host configuration prerequisites, add a check to the validation
tool to verify connectivity.

Because users who are already running clusters with Multus enabled, add
a flag that allows users to only check for host configuration
prerequisites. This mode will not start the large number of clients that
would normally be started because those clients could disrupt a running
Rook cluster negatively.

Host checking pods require host network access. Many Kubernetes
distributions have pod security features enabled. In order to allow
non-Vanilla distros to run this tool, allow specifying a service account
that pods will run as, which can be configured by the admin to allow
test pods.

Signed-off-by: Blaine Gardner <blaine.gardner@ibm.com>
@BlaineEXE BlaineEXE force-pushed the multus-validation-test-add-host-checking branch from aa71688 to b67e28a Compare July 3, 2024 17:00
Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one more question...

@@ -38,6 +38,8 @@ var (
var (
DefaultValidationNamespace = "rook-ceph"

DefaultServiceAccountName = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using the k8sutil.DefaultServiceAccount to set rook-ceph-default? Or does the default need to be blank?

Copy link
Member Author

@BlaineEXE BlaineEXE Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me think about this more and do some testing on minikube and OpenShift. This could be a good way to avoid having to use the multus-validation-test-openshift.yaml RBAC resources (or equivalent configuration on other security-constrained K8s distros), but I should verify to be sure.

Marking as draft while I look into this -- next week probably.

@BlaineEXE BlaineEXE marked this pull request as draft July 3, 2024 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
v1.14
In progress
Development

Successfully merging this pull request may close these issues.

None yet

2 participants