Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI Node info registration in kubelet #67684

Merged

Conversation

verult
Copy link
Contributor

@verult verult commented Aug 22, 2018

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #67683

Special notes for your reviewer:
Feature issue: kubernetes/enhancements#557
Design doc: kubernetes/community#2034

Missing pieces:

  • CSI client retry and exponential backoff logic.
  • CSINodeInfo object validation
  • e2e test with all the CSI machinery.

An RBAC rule is also added to support external-provisioner topology updates.

Release note:

Registers volume topology information reported by a node-level Container Storage Interface (CSI) driver. This enables Kubernetes support of CSI topology mechanisms.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Aug 22, 2018
@verult
Copy link
Contributor Author

verult commented Aug 22, 2018

/sig storage
/assign @liggitt @saad-ali @vladimirvivien @sbezverk @msau42


// updateNode repeatedly attempts to update the corresponding node object
// which is modified by applying the given update functions in order.
func (lm *nodeInfoManager) updateNode(updateFuncs []nodeUpdateFunc) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will rename all the receivers to nim.

}

if val, ok := existingDriverMap[csiDriverName]; ok {
if reflect.DeepEqual(val, topologyKeys) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will check set equality instead in the new version that uses CSINodeInfo.

@verult verult force-pushed the top-csi-driver-registration branch from c71f9cb to b2ff926 Compare August 22, 2018 22:17
@verult
Copy link
Contributor Author

verult commented Aug 24, 2018

Feel free to hold off on reviewing this PR for now - I'm almost done with wiring up CSINodeInfo and a good amount of logic will become obsolete

@verult verult force-pushed the top-csi-driver-registration branch from b2ff926 to ab5b04e Compare August 25, 2018 02:43
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. area/kubelet sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 25, 2018
@verult
Copy link
Contributor Author

verult commented Aug 25, 2018

Added integration with CSINodeInfo. Please only review the last commit - everything before it are rebased from #67803.

Missing functionality is updated in the top post, but adding those won't cause any major structural changes. The PR is ready for review now.

@liggitt @saad-ali @vladimirvivien @sbezverk @msau42 @gnufied

@verult verult force-pushed the top-csi-driver-registration branch from ab5b04e to 0128473 Compare August 25, 2018 03:06
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 27, 2018
@verult
Copy link
Contributor Author

verult commented Aug 27, 2018

@saad-ali and I discussed offline, and we are going to leave the CSI label prefix as is in Kubernetes instead of reversing it, for the time being.


var nodeKind = v1.SchemeGroupVersion.WithKind("Node")

// nodeInfoManager is struct of channels used for communication between the driver registration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment needs to be updated. I don't see any channels being used

return fmt.Errorf("error adding CSI driver node info: driverNodeID must not be empty")
}

if topology == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a plugin doesn't support topology will this be nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thanks for the catch!

}
}

func (nim *nodeInfoManager) AddNodeInfo(driverName string, driverNodeID string, topology *csipb.Topology) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here describing exactly all the objects/fields that are being modified/added?

// exponential backoff to avoid exhausting the apiserver.

nodeClient := nim.k8s.CoreV1().Nodes()
node, err := nodeClient.Get(string(nim.nodeName), metav1.GetOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there plans to use a shared informer? If so, we should clone the node object first before modifying it so it doesn't impact other uses that may be sharing the same pointer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I wasn't planning to. Driver registration is fairly infrequent so watching the object doesn't seem worth it


needUpdate := false
for _, update := range updateFuncs {
newNode, updated, err := update(node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works only if the multiple updateFuncs are not conflicting with each other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make it more explicit in the comments for this function that because updateFuncs are called in order, each updateFunc should consider the effects of previous updateFuncs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each updateFunc should consider the effects of previous updateFuncs

what does that mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean, if two update functions both update same field then last one will win, but caller of this function should try and ensure that - it does not happen in first place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well said! Will update the comment

driverInfo.NodeID = driverNodeID
driverInfo.TopologyKeys = topologyKeys
}
driverInfos = append(driverInfos, driverInfo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be adding all drivers in this call to just update a single driver? Do all the CSI drivers register again when we upgrade the node?

Should we be looking at the annotations at all? IIUC, this is supposed to be the replacement for the annotations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah CSI drivers should register again upon node upgrade, and this is here as a safeguard to ensure annotations and CSINodeInfo are consistent while we support both. There is always a chance that kubelet is upgraded while the previous Node object stays, and annotations and CSINodeInfo could be momentarily out of sync.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to keep the annotations and CSIDriverInfo in sync on the first driver's registration, if the other driver will register again later? In an upgrade scenario, then this means we're on the first boot after upgrade and we're processing the first driver registration. We're adding an entry for all the other drivers too even though they have not gone through registration yet. Could that cause other issues?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method should get whole list of registered CSI drivers so it creates/updates complete CSINodeInfo with all drivers. The result should be the same in the end, however it feels more robust. I don't line CSINodeInfo with empty topology for some drivers just because we translated it from an annotation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method gets called for a single driver registration though, and only on the first one since it's creating the object for the first time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @msau42 offline. The current implementation doesn't guarantee the node info is always up-to-date. In particular, with the current kubelet plugin registration design, there is no way to remove a non-existent driver unless it detects that a driver socket is removed (after unregistration is implemented). If a driver is removed while kubelet is down, the node information will remain. So whatever issue comes up when CSINodeInfo is kept in sync with annotation is unavoidable today.

But I'll remove the logic to keep the two sources in sync, because it complicates the implementation without adding too much benefit. One minor advantage of keeping them in sync is to avoid debugging confusion, but I think it'll be OK.

// all other CSI components will be able to get the actual socket of CSI drivers by its name.
csiDrivers.Lock()
defer csiDrivers.Unlock()
csiDrivers.driversMap[pluginName] = csiDriver{driverName: pluginName, driverEndpoint: endpoint}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the lock was being held across the entire function instead of just the map update. Could there be any issues if this map has an entry but we haven't finished registering the plugin yet? Should we update the map at the very end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In both the previous and the current version, the lock is only held on write, all within this RegistrationCallback() function. The map has to be updated before the csi.NodeGetInfo() call unfortunately because it relies on the driver info to call out to gRPC.

However, when plugin unregistration is implemented in the future, depending on how it's implemented, there could be a race between calls of RegistrationCallback(). Will update to lock the entire function once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @msau42 offline. For now we'll keep the map locked as short as possible. When unregistration is implemented we'll reconsider again.

Will add comments explaining why it's OK to keep locking short.

…ing to CSINodeInfo; added unit tests for node updates; updated RBAC, NodeAuthorizer, NodeRestriction.
@saad-ali
Copy link
Member

saad-ali commented Sep 7, 2018

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 7, 2018
@@ -164,6 +164,10 @@ func NodeRules() []rbacv1.PolicyRule {
nodePolicyRules = append(nodePolicyRules, csiDriverRule)
}
}
if utilfeature.DefaultFeatureGate.Enabled(features.KubeletPluginsWatcher) {
csiNodeInfoRule := rbacv1helpers.NewRule("get", "create", "update", "patch", "delete").Groups("csi.storage.k8s.io").Resources("csinodeinfos").RuleOrDie()
Copy link
Member

@liggitt liggitt Sep 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the unit test is failing because this feature flag is enabled by default now. Is this still the right flag that determines if the node needs these permissions, or is it fenced by the CSINodeInfo flag as well now?

If this is correct as is, the fixture for the unit test just needs regenerating (instructions are in the unit test failure message)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's one spot I missed. Thanks for the catch!

@liggitt
Copy link
Member

liggitt commented Sep 7, 2018

one comment on the gate being checked by the RBAC policy change and the unit test failure, authz and admission changes lgtm otherwise

it's worth noting that the removal of driver info from the node depends on observing the removal of the driver (rather than reconciling to current observed state), and doesn't handle errors in removing driver info by retrying. That doesn't necessarily block this PR, but seems important to resolve before this graduates.

@verult
Copy link
Contributor Author

verult commented Sep 7, 2018

To clarify, in this PR driver is only unregistered when there's an error adding the driver. Driver unregistration will be added in the next release. The rest of the CSI system should always assume node info is somewhat out of date, since kubelet can't provide the guarantee that updates are reflected immediately anyway, in the best case.

Does the same logic apply to driver addition as well?

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 7, 2018
@verult
Copy link
Contributor Author

verult commented Sep 7, 2018

Updated RBAC test

@verult
Copy link
Contributor Author

verult commented Sep 7, 2018

Keeping a laundry list of considerations for CSI kubelet driver registration moving forward:
#67972

@liggitt
Copy link
Member

liggitt commented Sep 7, 2018

/approve

RBAC, node authz and admission changes lgtm

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 7, 2018
@verult
Copy link
Contributor Author

verult commented Sep 7, 2018

/retest

@verult
Copy link
Contributor Author

verult commented Sep 7, 2018

The GCE e2e failure is likely caused by this PR. Taking a look

@msau42
Copy link
Member

msau42 commented Sep 8, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 8, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, msau42, saad-ali, verult

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Sep 8, 2018

@verult: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-local-e2e-containerized 7ab3a2fb33939d37d78475d20eb68f646c7eb115 link /test pull-kubernetes-local-e2e-containerized

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md.

@verult
Copy link
Contributor Author

verult commented Sep 12, 2018

Per #68519 (comment): installing CRD outside k8s core brings back issue #2:

What should be the AddNodeInfo() behavior if CRD doesn’t exist? Continue with registration? Fail and unregister?

I side with @msau42 that if the CSINodeInfo feature flag is enabled, one would expect topology to work, and continuing with registration by falling back to node annotation means topology breaks. So it should fail and unregister the driver. This is the current behavior so no actions required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CSI topology support: node info population through kubelet driver registration mechanism