Remove requirement to run the Portworx volume driver on master node #45518

harsh-px · 2017-05-09T00:28:36Z

What this PR does / why we need it:
This change removes requirement to run the Portworx volume driver on Kubernetes master node.

Special notes for your reviewer:
Before this pull request, in order to use a Portworx volume, users had to run the Portworx container on the master node. Since it isn't ideal (and impossible on GKE) to schedule any pods on the master node, this PR removes that requirement.

Portworx volume driver no longer has to run on the master.

k8s-reviewable · 2017-05-09T00:28:42Z

This change is

k8s-ci-robot · 2017-05-09T00:28:46Z

Hi @harsh-px. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

harsh-px · 2017-05-09T05:34:41Z

/assign @jsafrane

jsafrane · 2017-05-09T13:59:01Z

@k8s-bot ok to test

jsafrane · 2017-05-09T14:00:46Z

pkg/volume/portworx/portworx_util.go

+
+						if driverClient != nil {
+							util.portworxClient = driverClient
+							break OUTER


I really don't like this break, can't you just return here?

Good suggestion. Fixed in the latest incremental (4164374)

jsafrane · 2017-05-09T14:02:04Z

pkg/volume/portworx/portworx_util.go

+			e = err
+			kubeClient := volumeHost.GetKubeClient()
+			if kubeClient != nil {
+				nodes, err := kubeClient.CoreV1().Nodes().List(metav1.ListOptions{})


Please add a label selector to ListOptions to save the label check below and transfer less data.

Removed filtering logic in latest incremental. Please see reasoning in comment below.

jsafrane · 2017-05-09T14:06:14Z

pkg/volume/portworx/portworx_util.go

+	pxdDriverName       = "pxd"
+	pwxSockName         = "pwx"
+	pvcClaimLabel       = "pvc"
+	labelNodeRoleMaster = "node-role.kubernetes.io/master"


This is Portworx specific label and thus should have portworx prefix. In addition, masters and nodes can be separate machines, /master is not a good name here... Perhaps portworx.kubernetes.io/driver? Feel free to suggest a better name.

In latest incremental (4164374), I decided to remove filtering based on a specific label.

Previously the goal was to filter out master nodes but for that we needed to label master nodes during deployement. We weren't gaining enough for this added step so I decided to remove the filtering altogether.

jsafrane · 2017-05-11T07:47:49Z

With the latest commit, you scan all nodes on every mount and unmount, that's IMO very inefficient. There can be thousands of nodes, only few of them can run something on port 9001, yet you try all of them. That's no-go for me.

I don't know Portworx internals and I am confused by the terminology here. You are looking for a "master" node. What is this "master"? Is it Kubernetes API server(s)? That's not listed in nodes in most cases. Is it a set of nodes that runs some sort of Portworx server pods? Use a service (with well-known name in a well-known namespace) to get to it.

In addition, can I, as malicious user, run a dummy Portworx service on port 9001 on a random node and steal your data or even credentials? I hope you at least validate server ssl certificates...

harsh-px · 2017-05-11T16:44:28Z

@jsafrane There is some confusion here. Let me try to clear it.

The code scans the nodes only on the first mount. After it finds a validated node, it remembers that and there won't be subsequent scans (becuase of the if util.portworxClient == nil)
We are not looking for the master node. By master node, I mean the node running the k8s api server. The Portworx API server will not be deployed on the master/api-server node.
All nodes in the k8s cluster will run the Portworx container since we get deployed as a DaemonSet. So the scan loop with reference with point 1 will in most cases find Portworx running on the first node in the list.
We don't save any user credentials or data as of today. The portworx api server at 9001 has endpoints for volume operations.

Let me know that clears your concerns.

jsafrane · 2017-05-12T08:22:13Z

The code scans the nodes only on the first mount. After it finds a validated node, it remembers that and there won't be subsequent scans (becuase of the if util.portworxClient == nil)

No. Kubelet creates new mounter for each mount. And mounter gets empty PortworxVolumeUtil. https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/portworx/portworx.go#L92
So for each mount you download list of all nodes. That's IMHO bad. Can you cache the client in portworxVolumePlugin instead?

And if you are going to cache the node, you should recover somehow when the node is deleted and you should create a new client, pointing to a different node.

All nodes in the k8s cluster will run the Portworx container since we get deployed as a DaemonSet. So the scan loop with reference with point 1 will in most cases find Portworx running on the first node in the list.

Why do you need to list all nodes then? If the container is on all nodes, kubelet can easily try localhost:9001 (or hostname:9001) or get its own node addresses, you don't need to list all of them.

thockin · 2017-05-12T20:16:12Z

Are you scanning every node in the cluster, looking for something that tastes like your management server? With no credentials? That any pod in the cluster can trivially fake?

This is a really bad idea, and really needs to be re-thought.

Why not publish a Service? It still has no credentials, but at least you're not just hunting.

thockin · 2017-05-12T20:16:57Z

I'm out, assigning to saad

harsh-px · 2017-05-18T22:24:56Z

@thockin and @jsafrane: Good inputs. And yes agreed ! We are working on a service-based design.

A quick question: What do you guys think about running a NodePort type service and the Portworx volume plugin always sends the request to localhost:. The NodePort service will take care of routing the request to one of the pods backing the service.

Our primary use case here is GKE where our pod won't be running on the master but the volume create request is sent to the master. So if we have a NodePort service running, we can update our plugin driver running on master to send requests to localhost:.

Alternately, I can use the default clusterIP service and talk to the service using it's cluster IP.

jsafrane · 2017-05-24T11:29:48Z

Now it looks much better (and shorter). I don'd like only some copying of code, which can be IMO easily refactored into a function, the rest is OK.

…mount workflow

jsafrane · 2017-05-25T16:59:42Z

/lgtm

k8s-github-robot · 2017-05-25T16:59:56Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: harsh-px, jsafrane

Needs approval from an approver in each of these OWNERS Files:

~~pkg/volume/OWNERS~~ [jsafrane]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

jsafrane · 2017-05-25T17:00:24Z

/release-note-none

k8s-github-robot · 2017-05-25T17:18:11Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-ci-robot · 2017-05-25T18:31:40Z

@harsh-px: The following test(s) failed:

Test name	Commit	Details	Rerun command
Jenkins kops AWS e2e	`4164374`	link	`@k8s-bot kops aws e2e test this`
Jenkins verification	`4164374`	link	`@k8s-bot verify test this`
pull-kubernetes-unit	`ad4f21f`	link	`@k8s-bot pull-kubernetes-unit test this`
pull-kubernetes-e2e-gce-etcd3	`ad4f21f`	link	`@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-github-robot · 2017-05-25T18:45:59Z

Automatic merge from submit-queue (batch tested with PRs 45518, 46127, 46146, 45932, 45003)

…18-upstream-release-1.6 Automatic merge from submit-queue Automated cherry pick of #45518 upstream release 1.6 Cherrypick of #45518 **What this PR does / why we need it**: This change removes requirement to run the Portworx volume driver on Kubernetes master node. **Special notes for your reviewer**: Before this pull request, in order to use a Portworx volume, users had to run the Portworx container on the master node. Since it isn't ideal (and impossible on GKE) to schedule any pods on the master node, this PR removes that requirement.

k8s-cherrypick-bot · 2017-05-27T02:16:50Z

Commit found in the "release-1.6" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

harsh-px · 2017-05-30T18:32:17Z

@saad-ali How do I ensure this makes it into the next 1.7 release?

For the 1.6 release, I created a cherry-pick pull request (#46528) that has been merged succesfully but I don't see a branch for the next 1.7 release.

jsafrane · 2017-05-31T08:01:46Z

@harsh-px, master is the 1.7 branch, so this PR will be in 1.7

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 9, 2017

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 9, 2017

k8s-github-robot assigned thockin and jsafrane May 9, 2017

k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels May 9, 2017

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 9, 2017

jsafrane reviewed May 9, 2017

View reviewed changes

harsh-px force-pushed the px-remote branch from e2770ee to 4164374 Compare May 10, 2017 18:34

adityadani mentioned this pull request May 10, 2017

Portworx Volume Plugin kubernetes/enhancements#170

Closed

23 tasks

saad-ali added the cherrypick-candidate label May 11, 2017

saad-ali added this to the v1.6 milestone May 11, 2017

thockin assigned saad-ali and unassigned thockin May 12, 2017

harsh-px force-pushed the px-remote branch from 4164374 to 61e09ee Compare May 22, 2017 06:15

k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 22, 2017

harsh-px force-pushed the px-remote branch from 61e09ee to a7895ea Compare May 22, 2017 20:35

Harsh Desai added 5 commits May 24, 2017 14:52

Add support for Portworx plugin to query remote API servers

244a0b7

Use Portworx service as api endpoint for volume operations

e860da4

fix bazel build

779455a

Remove call to common unmount routine as Portworx takes care of all u…

bbfda9c

…mount workflow

Dedup common code for fetching portworx driver

ad4f21f

harsh-px force-pushed the px-remote branch from d974f0e to ad4f21f Compare May 24, 2017 21:52

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 25, 2017

k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. labels May 25, 2017

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels May 25, 2017

jsafrane removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label May 25, 2017

k8s-github-robot merged commit b017a7a into kubernetes:master May 25, 2017

enisoc added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels May 26, 2017

harsh-px mentioned this pull request May 26, 2017

Automated cherry pick of #45518 upstream release 1.6 #46528

Merged

k8s-cherrypick-bot removed the cherrypick-candidate label May 27, 2017

harsh-px deleted the px-remote branch May 30, 2017 18:36

ericchiang mentioned this pull request May 31, 2017

gce-multizone test has been broken since may 25 #46737

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove requirement to run the Portworx volume driver on master node #45518

Remove requirement to run the Portworx volume driver on master node #45518

harsh-px commented May 9, 2017 •

edited by enisoc

Loading

k8s-reviewable commented May 9, 2017

k8s-ci-robot commented May 9, 2017

harsh-px commented May 9, 2017

jsafrane commented May 9, 2017

jsafrane May 9, 2017

harsh-px May 10, 2017

jsafrane May 9, 2017

harsh-px May 10, 2017

jsafrane May 9, 2017

harsh-px May 10, 2017

jsafrane commented May 11, 2017

harsh-px commented May 11, 2017

jsafrane commented May 12, 2017 •

edited

Loading

thockin commented May 12, 2017

thockin commented May 12, 2017

harsh-px commented May 18, 2017 •

edited

Loading

jsafrane commented May 24, 2017

jsafrane commented May 25, 2017

k8s-github-robot commented May 25, 2017

jsafrane commented May 25, 2017

k8s-github-robot commented May 25, 2017

k8s-ci-robot commented May 25, 2017

k8s-github-robot commented May 25, 2017

k8s-cherrypick-bot commented May 27, 2017

harsh-px commented May 30, 2017

jsafrane commented May 31, 2017

Remove requirement to run the Portworx volume driver on master node #45518

Remove requirement to run the Portworx volume driver on master node #45518

Conversation

harsh-px commented May 9, 2017 • edited by enisoc Loading

k8s-reviewable commented May 9, 2017

k8s-ci-robot commented May 9, 2017

harsh-px commented May 9, 2017

jsafrane commented May 9, 2017

jsafrane May 9, 2017

Choose a reason for hiding this comment

harsh-px May 10, 2017

Choose a reason for hiding this comment

jsafrane May 9, 2017

Choose a reason for hiding this comment

harsh-px May 10, 2017

Choose a reason for hiding this comment

jsafrane May 9, 2017

Choose a reason for hiding this comment

harsh-px May 10, 2017

Choose a reason for hiding this comment

jsafrane commented May 11, 2017

harsh-px commented May 11, 2017

jsafrane commented May 12, 2017 • edited Loading

thockin commented May 12, 2017

thockin commented May 12, 2017

harsh-px commented May 18, 2017 • edited Loading

jsafrane commented May 24, 2017

jsafrane commented May 25, 2017

k8s-github-robot commented May 25, 2017

jsafrane commented May 25, 2017

k8s-github-robot commented May 25, 2017

k8s-ci-robot commented May 25, 2017

k8s-github-robot commented May 25, 2017

k8s-cherrypick-bot commented May 27, 2017

harsh-px commented May 30, 2017

jsafrane commented May 31, 2017

harsh-px commented May 9, 2017 •

edited by enisoc

Loading

jsafrane commented May 12, 2017 •

edited

Loading

harsh-px commented May 18, 2017 •

edited

Loading