Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce CSI log and event spam #71581

Merged
merged 1 commit into from
Dec 1, 2018

Conversation

saad-ali
Copy link
Member

@saad-ali saad-ali commented Nov 30, 2018

What type of PR is this?

Uncomment only one, leave it on its own line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:
Ensure volume mount error checking is done inside the operation so that failures get handled with exponential backoff, etc.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #71569

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Reduce CSI log and event spam.

Ensure volume mount error checking is done inside the operation so that
failures get handled with exponential backoff, etc.
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 30, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: saad-ali

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 30, 2018
@liggitt
Copy link
Member

liggitt commented Nov 30, 2018

If this significantly reduces event spam, we might consider picking it for 1.13.1

@@ -82,7 +86,7 @@ func NewOperationGenerator(kubeClient clientset.Interface,
// OperationGenerator interface that extracts out the functions from operation_executor to make it dependency injectable
type OperationGenerator interface {
// Generates the MountVolume function needed to perform the mount of a volume plugin
GenerateMountVolumeFunc(waitForAttachTimeout time.Duration, volumeToMount VolumeToMount, actualStateOfWorldMounterUpdater ActualStateOfWorldMounterUpdater, isRemount bool) (volumetypes.GeneratedOperations, error)
GenerateMountVolumeFunc(waitForAttachTimeout time.Duration, volumeToMount VolumeToMount, actualStateOfWorldMounterUpdater ActualStateOfWorldMounterUpdater, isRemount bool) volumetypes.GeneratedOperations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for other reviewers: helpful to turn on "Hide whitespace changes" in github diff settings

affinityErr := checkNodeAffinity(og, volumeToMount, volumePlugin)
if affinityErr != nil {
eventErr, detailedErr := volumeToMount.GenerateError("MountVolume.NodeAffinity check failed", affinityErr)
og.recorder.Eventf(volumeToMount.Pod, v1.EventTypeWarning, kevents.FailedMountVolume, eventErr.Error())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone log an event? Or did we decide it was just spammy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes these errors will still generate logs and events, but they are now be protected by the backoff mechanism so they're not created every 100 ms -- it will quickly back off to once every two minutes.

@justinsb
Copy link
Member

Query about whether this is going to be too quiet, but lgtm (happy to mark it as such, except that then I think it would merge, and we might not be ready?)

@@ -725,7 +725,7 @@ func (oe *operationExecutor) MountVolume(
if fsVolume {
// Filesystem volume case
// Mount/remount a volume when a volume is attached
generatedOperations, err = oe.operationGenerator.GenerateMountVolumeFunc(
generatedOperations = oe.operationGenerator.GenerateMountVolumeFunc(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you remove the return err because it is always nil? How about all the other GernerateXXXFunc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mount was the most problematic in terms of log/event spam because of the new CSI access pattern. I want to do the same for the others methods because the same issue technically exists with them.

I'm leaning towards getting this merged quick to fix the known issue, and then doing a follow up to make the other methods follow this pattern. But also happy to those as additional commits to this PR if folks want that.

@saad-ali
Copy link
Member Author

happy to mark it as such, except that then I think it would merge, and we might not be ready

I'm ok with merging this as is to fix the known log spam and then doing a follow up to update the other methods to follow this pattern.

@jingxu97
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 30, 2018
@saad-ali
Copy link
Member Author

/test pull-kubernetes-integration

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit 83b3baa into kubernetes:master Dec 1, 2018
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Dec 18, 2018
k8s-ci-robot added a commit that referenced this pull request Jan 5, 2019
…81-upstream-release-1.13

Automated Cherry Pick of #71581 to release-1.13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Operation executor doesn't seem to be throttling or backing off on retries
6 participants