Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay CSI client initialization #74652

Merged
merged 1 commit into from
Mar 11, 2019
Merged

Conversation

cofyc
Copy link
Member

@cofyc cofyc commented Feb 27, 2019

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:

#72500 (comment)

Which issue(s) this PR fixes:

Fixes #72500

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Delay CSI client initialization to make reconstruction of CSI volume possible because clients may not be available on kubelet restart.

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 27, 2019
@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 27, 2019
@cofyc cofyc changed the title Delay CSI client initialization WIP: Delay CSI client initialization Feb 27, 2019
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 27, 2019
@cofyc cofyc changed the title WIP: Delay CSI client initialization Delay CSI client initialization Feb 27, 2019
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 27, 2019
@cofyc
Copy link
Member Author

cofyc commented Feb 27, 2019

/priority important-soon
/sig storage
/assign @msau42 @jingxu97

Note that reconstructVolume will still fail because operationExecutor.CheckVolumeExistenceOperation checks on an invalid path, which can be fixed by this PR: #74653.

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 27, 2019
@msau42
Copy link
Member

msau42 commented Feb 27, 2019

/retest

@msau42
Copy link
Member

msau42 commented Feb 27, 2019

/test pull-kubernetes-e2e-gce-csi-serial

@msau42
Copy link
Member

msau42 commented Feb 27, 2019

/test pull-kubernetes-e2e-gce

@msau42
Copy link
Member

msau42 commented Feb 28, 2019

/test pull-kubernetes-e2e-gce-csi-serial

@cofyc
Copy link
Member Author

cofyc commented Feb 28, 2019

This PR should work with #74653 to fix [sig-storage] CSI Volumes [Driver: pd.csi.storage.gke.io][Serial] [Testpattern: Dynamic PV (default fs)] subPath should unmount if pod is force deleted while kubelet is down [Disruptive][Slow] flakiness.

@cofyc
Copy link
Member Author

cofyc commented Feb 28, 2019

How about review and merge #74653 first? I guess the issue #74653 trying to resolve is obvious (that's why I separated them into two PRs).

@msau42
Copy link
Member

msau42 commented Feb 28, 2019

I noticed that the skip attach check is also called early. Would this have any problem during reconstruction? I guess during reconstruction we only call mount/unmount/map/unmap. And this attachable check will end up failing later during reconciliation.

@msau42
Copy link
Member

msau42 commented Mar 1, 2019

cc @jsafrane to think about the skip attach case when the driver/kubelet restarts and is the driver is not available by the time VolumesAreAttached/WaitForAttach is called.

@msau42
Copy link
Member

msau42 commented Mar 1, 2019

Maybe skip attach is ok because it doesn't call the csi driver, it gets the K8s api object.

@cofyc
Copy link
Member Author

cofyc commented Mar 1, 2019

At before, reconciler.reconstructVolume will fail in NewMounter and NewBlockVolumeMapper because driver plugin may not be registered by plugin watcher (so I decided to delay CSI client initialization in this PR) and in CheckVolumeExistenceOperation because it checks on an invalid path (#74653).

If it succeeds reconstructing CSI volume information from the host, CSI volumes will be put into the actual and desired state of the world.

After this, reconciliation main loop may still fail because CSI driver CRD is not synced or CSI driver client is not discovered by plugin watcher yet. But the difference is it will retry again now. When CSI driver CRD is synced and CSI client is discovered, the operation will succeed.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 8, 2019
@cofyc
Copy link
Member Author

cofyc commented Mar 8, 2019

/retest
I’m looking at failing tests.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 9, 2019
@k8s-ci-robot k8s-ci-robot added area/kubelet area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 9, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cofyc, msau42, saad-ali

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 9, 2019
@msau42
Copy link
Member

msau42 commented Mar 11, 2019

/test pull-kubernetes-e2e-gce-csi-serial

@msau42
Copy link
Member

msau42 commented Mar 11, 2019

/lgtm
/hold cancel

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Mar 11, 2019
@k8s-ci-robot k8s-ci-robot merged commit 8477c48 into kubernetes:master Mar 11, 2019
if c.csiClient != nil {
return c.csiClient, nil
}
csi, err := newCsiDriverClient(c.driverName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can c.csiClient be assigned directly (we have the write lock) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean c.csiClient, err := newCsiDriverClient(c.driverName)?
Yes, it is equal, no much difference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is one difference, with the second style newCsiDriverClient(c.driverName) must return nil client on error, with the first style we do not need make such assumption.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Mar 14, 2019
k8s-ci-robot added a commit that referenced this pull request Mar 19, 2019
…upstream-release-1.13

Automated cherry pick of #74652: Delay CSI client initialization
@cofyc cofyc deleted the fix72500 branch May 4, 2019 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. kind/flake Categorizes issue or PR as related to a flaky test. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CSI plugin registration problem after kublet restarts
7 participants