-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce CSI log and event spam #71581
Conversation
Ensure volume mount error checking is done inside the operation so that failures get handled with exponential backoff, etc.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: saad-ali The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
If this significantly reduces event spam, we might consider picking it for 1.13.1 |
@@ -82,7 +86,7 @@ func NewOperationGenerator(kubeClient clientset.Interface, | |||
// OperationGenerator interface that extracts out the functions from operation_executor to make it dependency injectable | |||
type OperationGenerator interface { | |||
// Generates the MountVolume function needed to perform the mount of a volume plugin | |||
GenerateMountVolumeFunc(waitForAttachTimeout time.Duration, volumeToMount VolumeToMount, actualStateOfWorldMounterUpdater ActualStateOfWorldMounterUpdater, isRemount bool) (volumetypes.GeneratedOperations, error) | |||
GenerateMountVolumeFunc(waitForAttachTimeout time.Duration, volumeToMount VolumeToMount, actualStateOfWorldMounterUpdater ActualStateOfWorldMounterUpdater, isRemount bool) volumetypes.GeneratedOperations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for other reviewers: helpful to turn on "Hide whitespace changes" in github diff settings
affinityErr := checkNodeAffinity(og, volumeToMount, volumePlugin) | ||
if affinityErr != nil { | ||
eventErr, detailedErr := volumeToMount.GenerateError("MountVolume.NodeAffinity check failed", affinityErr) | ||
og.recorder.Eventf(volumeToMount.Pod, v1.EventTypeWarning, kevents.FailedMountVolume, eventErr.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anyone log an event? Or did we decide it was just spammy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes these errors will still generate logs and events, but they are now be protected by the backoff mechanism so they're not created every 100 ms -- it will quickly back off to once every two minutes.
Query about whether this is going to be too quiet, but lgtm (happy to mark it as such, except that then I think it would merge, and we might not be ready?) |
@@ -725,7 +725,7 @@ func (oe *operationExecutor) MountVolume( | |||
if fsVolume { | |||
// Filesystem volume case | |||
// Mount/remount a volume when a volume is attached | |||
generatedOperations, err = oe.operationGenerator.GenerateMountVolumeFunc( | |||
generatedOperations = oe.operationGenerator.GenerateMountVolumeFunc( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here you remove the return err because it is always nil? How about all the other GernerateXXXFunc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mount was the most problematic in terms of log/event spam because of the new CSI access pattern. I want to do the same for the others methods because the same issue technically exists with them.
I'm leaning towards getting this merged quick to fix the known issue, and then doing a follow up to make the other methods follow this pattern. But also happy to those as additional commits to this PR if folks want that.
I'm ok with merging this as is to fix the known log spam and then doing a follow up to update the other methods to follow this pattern. |
/lgtm |
/test pull-kubernetes-integration |
/retest Review the full test history for this PR. Silence the bot with an |
…81-upstream-release-1.13 Automated Cherry Pick of #71581 to release-1.13
What type of PR is this?
What this PR does / why we need it:
Ensure volume mount error checking is done inside the operation so that failures get handled with exponential backoff, etc.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #71569
Special notes for your reviewer:
Does this PR introduce a user-facing change?: