New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e manifests #69868

Merged
merged 5 commits into from Oct 30, 2018

Conversation

@pohly
Contributor

pohly commented Oct 16, 2018

/kind cleanup

What this PR does / why we need it:

For CSI testing we want to get away from having to replicate .yaml files in code. Now the CSI drivers and their RBAC rules get created from .yaml files.

Special notes for your reviewer:

My proposal is to review and potentially merge this PR first, then create the rbac.yaml in the individual repos at the URLs linked to here in the readmes.

Release note:

- The builtin system:csi-external-provisioner and system:csi-external-attacher cluster roles
  are deprecated and will not be updated for deployments of CSI sidecar container versions >= 0.4.
  Deployments with the current CSI sidecar containers have to provide their own RBAC
  definitions. The reason is that the rules depend on how the sidecar containers are used,
  which is defined by the deployment.

/sig storage
/sig testing
/cc @msau42

@pohly

This comment has been minimized.

Contributor

pohly commented Oct 16, 2018

@msau42 I went ahead and also rewrote the GCE CSI test, which allowed me to remove a lot of code from csi_objects.go. Can you perhaps check that I didn't break that test? Or is it running in the Kubernetes CI?

@pohly

This comment has been minimized.

Contributor

pohly commented Oct 16, 2018

Note that the csiTestDriver implementation for hostpath and GCE now use almost the same code. The real difference is in the set of .yaml files and in the names of the containers. It seems very feasible now to just have one struct which defines these settings and then fill in that struct from .yaml files, either shipped with the test suite or provided by a user.

Only createGCESecrets() may still need some special handling.

@pohly

This comment has been minimized.

Contributor

pohly commented Oct 17, 2018

/retest

@davidz627

This comment has been minimized.

Collaborator

davidz627 commented Oct 17, 2018

/cc

@k8s-ci-robot k8s-ci-robot requested a review from davidz627 Oct 17, 2018

@pohly

This comment has been minimized.

Contributor

pohly commented Oct 17, 2018

I found that the gcePD CSI driver test case does run in the CI... and that it currently hangs. I've added more debug output, so perhaps soon I'll know why. I got stuck trying to run it locally because it wasn't obvious how to set up the required service account secret.

@pohly

This comment has been minimized.

Contributor

pohly commented Oct 17, 2018

Some help would be welcome. The gcePD test failed in https://prow.k8s.io/log?job=pull-kubernetes-e2e-gce&id=56220 again because of permission issues. I don't understand what the error messages are trying to tell me, the service account setup looks sane to me:

I1017 18:50:42.646] Oct 17 18:45:34.524: INFO: creating ServiceAccount:
I1017 18:50:42.646] {
I1017 18:50:42.646]   "kind": "ServiceAccount",
I1017 18:50:42.646]   "apiVersion": "v1",
I1017 18:50:42.646]   "metadata": {
I1017 18:50:42.646]     "name": "csi-driver-registrar",
I1017 18:50:42.646]     "namespace": "e2e-tests-csi-volumes-nh7pq",
I1017 18:50:42.646]     "creationTimestamp": null
I1017 18:50:42.646]   }
I1017 18:50:42.646] }
...
I1017 18:50:42.651] Oct 17 18:45:34.531: INFO: creating ClusterRole
I1017 18:50:42.651] :{
I1017 18:50:42.651]   "kind": "ClusterRole",
I1017 18:50:42.651]   "apiVersion": "rbac.authorization.k8s.io/v1",
I1017 18:50:42.651]   "metadata": {
I1017 18:50:42.651]     "name": "driver-registrar-runner-e2e-tests-csi-volumes-nh7pq",
I1017 18:50:42.651]     "creationTimestamp": null
I1017 18:50:42.652]   },
I1017 18:50:42.652]   "rules": [
I1017 18:50:42.652]     {
I1017 18:50:42.652]       "verbs": [
I1017 18:50:42.652]         "get",
I1017 18:50:42.652]         "list",
I1017 18:50:42.652]         "watch",
I1017 18:50:42.652]         "create",
I1017 18:50:42.652]         "update",
I1017 18:50:42.652]         "patch"
I1017 18:50:42.652]       ],
I1017 18:50:42.653]       "apiGroups": [
I1017 18:50:42.653]         ""
I1017 18:50:42.653]       ],
I1017 18:50:42.653]       "resources": [
I1017 18:50:42.653]         "events"
I1017 18:50:42.653]       ]
I1017 18:50:42.653]     },
I1017 18:50:42.653]     {
I1017 18:50:42.653]       "verbs": [
I1017 18:50:42.653]         "get",
I1017 18:50:42.654]         "update",
I1017 18:50:42.654]         "patch"
I1017 18:50:42.654]       ],
I1017 18:50:42.654]       "apiGroups": [
I1017 18:50:42.654]         ""
I1017 18:50:42.654]       ],
I1017 18:50:42.654]       "resources": [
I1017 18:50:42.654]         "nodes"
I1017 18:50:42.654]       ]
I1017 18:50:42.654]     }
I1017 18:50:42.654]   ]
I1017 18:50:42.655] }
...
I1017 18:50:42.657] Oct 17 18:45:34.539: INFO: creating ClusterRoleBinding:
I1017 18:50:42.657] {
I1017 18:50:42.657]   "kind": "ClusterRoleBinding",
I1017 18:50:42.657]   "apiVersion": "rbac.authorization.k8s.io/v1",
I1017 18:50:42.657]   "metadata": {
I1017 18:50:42.657]     "name": "csi-driver-registrar-role-e2e-tests-csi-volumes-nh7pq",
I1017 18:50:42.658]     "creationTimestamp": null
I1017 18:50:42.658]   },
I1017 18:50:42.658]   "subjects": [
I1017 18:50:42.658]     {
I1017 18:50:42.658]       "kind": "ServiceAccount",
I1017 18:50:42.658]       "name": "csi-driver-registrar",
I1017 18:50:42.658]       "namespace": "e2e-tests-csi-volumes-nh7pq"
I1017 18:50:42.658]     }
I1017 18:50:42.658]   ],
I1017 18:50:42.658]   "roleRef": {
I1017 18:50:42.659]     "apiGroup": "rbac.authorization.k8s.io",
I1017 18:50:42.659]     "kind": "ClusterRole",
I1017 18:50:42.659]     "name": "driver-registrar-runner-e2e-tests-csi-volumes-nh7pq"
I1017 18:50:42.659]   }
I1017 18:50:42.659] }
...
I1017 18:45:34.534] Oct 17 18:45:27.723: INFO: At 2018-10-17 18:40:25 +0000 UTC - event for csi-gce-node: {daemonset-controller } FailedCreate: Error creating: pods "csi-gce-node-" is forbidden: unable to validate against any pod security policy: []
I1017 18:45:34.535] Oct 17 18:45:27.723: INFO: At 2018-10-17 18:40:26 +0000 UTC - event for csi-gce-controller: {statefulset-controller } FailedCreate: create Pod csi-gce-controller-0 in StatefulSet csi-gce-controller failed error: pods "csi-gce-controller-0" is forbidden: unable to validate against any pod security policy: []

@liggitt @davidz627

@liggitt

This comment has been minimized.

Member

liggitt commented Oct 17, 2018

the current use of that manifest appears to require elevated PSP permissions, set up here:

utils.PrivilegedTestPSPClusterRoleBinding(cs, config.Namespace, false, /* teardown */
[]string{g.controllerServiceAccount.Name, g.nodeServiceAccount.Name})

are those permissions still getting set up in the namespace where the CSI driver is getting deployed?

@davidz627

This comment has been minimized.

Collaborator

davidz627 commented Oct 17, 2018

@pohly, the CI runs in an environment with PodSecurityPolicy on, you can emulate this in local testing with: ENABLE_POD_SECURITY_POLICY=true

Since the GCE PD Daemonset + Statefulset don't use the default test service account, you have to give the elevated test pod security policies to the service accounts linked to the GCE PD Deployments.

@liggitt has linked the correct line to set them up

@pohly

This comment has been minimized.

Contributor

pohly commented Oct 17, 2018

Thanks guys, that should get me further. It's late here now and I am out tomorrow, so I'll look at that on Friday. I'll probably be able to figure out how utils.PrivilegedTestPSPClusterRoleBinding works, but if someone has the corresponding .yaml definitions that might save me some time.

@pohly

This comment has been minimized.

Contributor

pohly commented Oct 18, 2018

So utils.PrivilegedTestPSPClusterRoleBinding is just another ClusterRoleBinding with a RoleRef that references a special pre-defined e2e-test-privileged-psp role (0698354).

@davidz627: Was there a reason for calling utils.PrivilegedTestPSPClusterRoleBinding instead of just adding the e2e-test-privileged-psp role to the g.controllerClusterRoles and g.nodeClusterRoles lists?
Which service accounts need this additional ClusterRoleBinding? The call to utils.PrivilegedTestPSPClusterRoleBinding was only added for the gcePD driver test. The hostPath driver test also used its own service account:

func (h *hostpathCSIDriver) createCSIDriver() {
By("deploying csi hostpath driver")
f := h.f
cs := f.ClientSet
config := h.config
h.serviceAccount = csiServiceAccount(cs, config, "hostpath", false)
csiClusterRoleBindings(cs, config, false, h.serviceAccount, h.combinedClusterRoleNames)
role := csiControllerRole(cs, config, false)
csiControllerRoleBinding(cs, config, false, role, h.serviceAccount)
csiHostPathPod(cs, config, false, f, h.serviceAccount)
}

The gcePD case is passing now after I adapted the gcePD RBAC .yaml file. But the hostPath test shows the same "unable to validate against any pod security policy" error, and I'm puzzled how this worked before.

@pohly

This comment has been minimized.

Contributor

pohly commented Oct 18, 2018

One possible difference is that now the hostpath service account is not bound to any of the pre-defined system roles anymore. Before, it was bound to system:csi-external-provisioner and system:csi-external-attacher. Does that explain the regression?

Is there some other pre-define system role that the service account should be bound to? Binding it to e2e-test-privileged-psp doesn't look right, because that looks like something that is only present in the Kubernetes CI setup, right?

@pohly

This comment has been minimized.

Contributor

pohly commented Oct 18, 2018

/retest

2 similar comments
@pohly

This comment has been minimized.

Contributor

pohly commented Oct 19, 2018

/retest

@pohly

This comment has been minimized.

Contributor

pohly commented Oct 19, 2018

/retest

@pohly

This comment has been minimized.

Contributor

pohly commented Oct 19, 2018

So this PR should now pass all tests because of commit "e2e: PSP role binding for hostpath" =
1f6e9da. I still don't understand why that RoleBinding wasn't necessary earlier. I'd be more comfortable proposing this PR for merging if someone could explain that.

I've taken out the log capturing (see PR #69986 for that) and cleaned up the code. Except for this mystery around the PSP RoleBinding this is ready for merging or at least some initial review. IMHO improving the code (see the TODO in create.go) can wait until later. If this gets merged with that TODO, I'll file an issue and assign it to me.

@pohly pohly changed the title from WIP: e2e manifests to e2e manifests Oct 19, 2018

pohly added some commits Oct 16, 2018

e2e: deploy from manifest files + enhance CSI volume output
Ensuring that CSI drivers get deployed for testing exactly as intended
was problematic because the original .yaml files had to be converted
into code. e2e/manifest helped a bit, but not enough:
- could not load all entities
- didn't handle loading .yaml files with multiple entities
- actually creating and deleting entities still had to be done in tests

The new framework utility code handles all of that, including the
tricky cleanup operation that tests got wrong (AfterEach does not get
called after test failures!).

In addition, it is ensuring that each test gets its own instance of the
entities.

The PSP role binding for hostpath is now necessary because we switch
from creating a pod directly to creation via the StatefulSet
controller, which runs with less privileges.

Without this, the hostpath test runs into these errors in the
kubernetes-e2e-gce job:

Oct 19 16:30:09.225: INFO: At 2018-10-19 16:25:07 +0000 UTC - event for csi-hostpath-attacher: {statefulset-controller } FailedCreate: create Pod csi-hostpath-attacher-0 in StatefulSet csi-hostpath-attacher failed error: pods "csi-hostpath-attacher-0" is forbidden: unable to validate against any pod security policy: []
Oct 19 16:30:09.225: INFO: At 2018-10-19 16:25:07 +0000 UTC - event for csi-hostpath-provisioner: {statefulset-controller } FailedCreate: create Pod csi-hostpath-provisioner-0 in StatefulSet csi-hostpath-provisioner failed error: pods "csi-hostpath-provisioner-0" is forbidden: unable to validate against any pod security policy: []
Oct 19 16:30:09.225: INFO: At 2018-10-19 16:25:07 +0000 UTC - event for csi-hostpathplugin: {daemonset-controller } FailedCreate: Error creating: pods "csi-hostpathplugin-" is forbidden: unable to validate against any pod security policy: []

The extra role binding is silently ignored on clusters which don't
have this particular role.
e2e: update bazel BUILD files
Generated via hack/update-bazel.sh.
@pohly

This comment has been minimized.

Contributor

pohly commented Oct 26, 2018

@msau let me come back to the offer of not renaming the gcePD driver. I had that working before the driver update to gcp-compute-persistent-disk-csi-driver:v0.1.0.alpha, but now I can't get it to work again and my possibilities to debug that are rather limited and slow.

Therefore the latest incarnation of this PR deploys the gcePD driver without renaming, as in current Kubernetes master. If that works, please merge. I'm not going to wait for clean test results myself anymore today, though.

I left in the version update - killing two birds with one stone, basically. Otherwise we would have two conflicting PRs.

/hold cancel

@@ -16,7 +16,7 @@ spec:
containers:
- name: csi-driver-registrar
imagePullPolicy: Always
image: quay.io/k8scsi/driver-registrar:v0.3.0
image: quay.io/k8scsi/driver-registrar:v0.4.1

This comment has been minimized.

@msau42

msau42 Oct 26, 2018

Member

Hm I wonder if the alpha pd driver is going to have issues with the latest sidecars.

Also, unfortunately we haven't published public images of the beta gce pd driver yet. So I think it's best to just leave PD driver image versions the way it was before, and we will update the versions when we're ready.

Also I think we need to add the [Serial] tag to pd so that tests won't be run it parallel.

This comment has been minimized.

@davidz627

davidz627 Oct 26, 2018

Collaborator

+1

This comment has been minimized.

@pohly

pohly Oct 27, 2018

Contributor

The test has passed, so it looks like the alpha driver is compatible with the latest sidecar drivers. Do you want to revert that part nonetheless?

Please lets add the [Serial] tag as part of PR #68025 /cc @mkimuram

It has been working so far and this PR doesn't make it worse. From a practical perspective,
I don't see a good way to add it to the current test because of the way how it loops over the different drivers, and that part will be changed anyway:

	for driverName, initCSIDriver := range csiTestDrivers {
		curDriverName := driverName
		curInitCSIDriver := initCSIDriver

		Context(fmt.Sprintf("CSI plugin test using CSI driver: %s", curDriverName), func() {

This comment has been minimized.

@msau42

msau42 Oct 27, 2018

Member

If its passing then it's fine to leave it. It's not a configuration that's officially been tested/supported by us, but I will update it later once we get the latest driver image published.

This comment has been minimized.

@pohly

pohly Oct 27, 2018

Contributor

@msau42 so is this PR ready for merging now?

@pohly

This comment has been minimized.

Contributor

pohly commented Oct 27, 2018

/test pull-kubernetes-integration

1 similar comment
@pohly

This comment has been minimized.

Contributor

pohly commented Oct 27, 2018

/test pull-kubernetes-integration

image: quay.io/k8scsi/csi-attacher:v0.4.0
args:
- --v=5
- --csi-address=$(ADDRESS)

This comment has been minimized.

@msau42

msau42 Oct 27, 2018

Member

do you want to simplify the hostpath specs too?

This comment has been minimized.

@pohly

pohly Oct 27, 2018

Contributor

Yes, but let's do that in a separate PR and then also coordinate the update with the original .yaml file in kubernetes-csi/docs.

I've filed kubernetes-csi/docs#68 for this.

@@ -16,7 +16,7 @@ spec:
containers:
- name: csi-driver-registrar
imagePullPolicy: Always
image: quay.io/k8scsi/driver-registrar:v0.3.0
image: quay.io/k8scsi/driver-registrar:v0.4.1

This comment has been minimized.

@msau42

msau42 Oct 27, 2018

Member

If its passing then it's fine to leave it. It's not a configuration that's officially been tested/supported by us, but I will update it later once we get the latest driver image published.

@AishSundar

This comment has been minimized.

Contributor

AishSundar commented Oct 29, 2018

@pohly is this planned to be merged in 1.13 timeframe? if so please give this a priority and milestone label accordingly. @nikopen in case you need him to apply the milestone label

@pohly

This comment has been minimized.

Contributor

pohly commented Oct 29, 2018

Yes, this is for 1.13.

/milestone v1.13
/priority important-soon

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Oct 29, 2018

@pohly: You must be a member of the kubernetes/kubernetes-milestone-maintainers github team to set the milestone.

In response to this:

Yes, this is for 1.13.

/milestone v1.13
/priority important-soon

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pohly

This comment has been minimized.

Contributor

pohly commented Oct 29, 2018

@nikopen can you please add the v1.13 milestone label to this PR?

@nikopen

This comment has been minimized.

Member

nikopen commented Oct 29, 2018

/milestone v1.13

@k8s-ci-robot k8s-ci-robot added this to the v1.13 milestone Oct 29, 2018

@msau42

This comment has been minimized.

Member

msau42 commented Oct 29, 2018

/lgtm

@jsafrane

This comment has been minimized.

Member

jsafrane commented Oct 29, 2018

/assign @saad-ali
for approval

@saad-ali

This comment has been minimized.

Member

saad-ali commented Oct 29, 2018

/approve

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Oct 29, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly, saad-ali

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 2ecc71d into kubernetes:master Oct 30, 2018

18 checks passed

cla/linuxfoundation pohly authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
// - only the latest stable API version for each item is supported
func (f *Framework) PatchItems(items ...interface{}) error {
for _, item := range items {
Logf("patching original content of %T:\n%s", item, PrettyPrint(item))

This comment has been minimized.

@gnufied

gnufied Oct 30, 2018

Member

Printing all of the YAMLs is far too noisy - #70448

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment