Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Incorrect Error Handling for Volumes in Optimizing State #1833

Merged

Conversation

torredil
Copy link
Member

@torredil torredil commented Nov 8, 2023

Is this a bug fix or adding new feature?

Bug

What is this PR about? / Why do we need it?

This PR fixes a bug where an error is incorrectly returned if a volume is in an optimizing state.

According to cloud, a modification is considered done if the volume is an optimizing state:

func volumeModificationDone(state string) bool {
if state == ec2.VolumeModificationStateCompleted || state == ec2.VolumeModificationStateOptimizing {
return true
}

In the case where ResizeOrModifyDisk hasn't been previously called for a volume, we simply move on to check if the volume is in the desired state. If it is, ResizeOrModifyDisk succeeds even if the volume is still optimizing. However, if checkDesiredState returns an error or the sidecar times out, latestMod won't be nil the second time around and we error out if the volume is optimizing:

} else if state == ec2.VolumeModificationStateOptimizing {
return true, 0, fmt.Errorf("volume %q in OPTIMIZING state, cannot currently modify", volumeID)
}

What testing is done?

  • make test
  • CI

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 8, 2023
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 8, 2023
Copy link
Contributor

@ConnorJC3 ConnorJC3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not attempting to start a modification when a volume is in the OPTIMIZING state is intentional

@AndrewSirenko
Copy link
Contributor

AndrewSirenko commented Nov 8, 2023

Not attempting to start a modification when a volume is in the OPTIMIZING state is intentional

Why are we validating this in our driver instead of relying on the EC2 API? Won't EC2 ModifyVolume error out on this case?

@torredil
Copy link
Member Author

torredil commented Nov 8, 2023

Not attempting to start a modification when a volume is in the OPTIMIZING state is intentional

By my understanding not starting a modification if the volume is already in the desired state would be a more appropriate check, could you elaborate?

@ConnorJC3
Copy link
Contributor

ConnorJC3 commented Nov 8, 2023

Why are we validating this in our driver instead of relying on the EC2 API? Won't EC2 ModifyVolume error out on this case?

Because it produces spurious failed API calls. A volume in the OPTIMIZING state cannot be modified, attempting to make an API call is 100% guaranteed to fail. It produces additional unnecessary load on the AWS API in a case that we can 100% prevent.

By my understanding not starting a modification if the volume is already in the desired state would be a more appropriate check, could you elaborate?

The appropriate fix here is that this check should not be running until we are sure we need to make a modification. Removing an entire block of checks is not the appropriate fix for this bug.

@AndrewSirenko
Copy link
Contributor

AndrewSirenko commented Nov 8, 2023

The appropriate fix here is that this check should not be running until we are sure we need to make a modification. Removing an entire block of checks is not the appropriate fix for this bug.

If we really want to prevent an extra ModifyVolume API call, I think the most appropriate check would be something along the lines of:

if lastModification.State == OPTIMIZING
    if lastModification.NewValues == ModifyVolumeRequest.DesiredValues
           return modification success
    else
        return error("Cannot modify volume in optimizing state")

@ConnorJC3
Copy link
Contributor

ConnorJC3 commented Nov 8, 2023

If we really want to prevent an extra ModifyVolume API call, I think the most appropriate check would be something along the lines of:

Agree that would work, but I think the simplest fix given the existing code is just to move this if block to AFTER the needsVolumeModification check and leave everything else the same - that would prevent it from firing in the case where the sidecar times out, and only fire in the case where we are attempting to start a new modification that is needed.

else if state == ec2.VolumeModificationStateOptimizing {
  return true, 0, fmt.Errorf("volume %q in OPTIMIZING state, cannot currently modify", volumeID)
}

Signed-off-by: Eddie Torres <torredil@amazon.com>
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 9, 2023
@torredil torredil changed the title Do not return an error if volume is optimizing in validateModifyVolume Fix Incorrect Error Handling for Volumes in Optimizing State Nov 9, 2023
@ConnorJC3
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 9, 2023
@AndrewSirenko
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AndrewSirenko

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 9, 2023
@k8s-ci-robot k8s-ci-robot merged commit 85f9d4b into kubernetes-sigs:master Nov 9, 2023
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants