New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry if response StatusCode 200 and ContentLength -1 #2298
Retry if response StatusCode 200 and ContentLength -1 #2298
Conversation
✅ Deploy Preview for kubernetes-sigs-cloud-provide-azure canceled.
|
pkg/azureclients/armclient/util.go
Outdated
return response, rerr | ||
} | ||
if regionalResponse.StatusCode == 200 && (regionalResponse.ContentLength == -1 || regionalResponse.ContentLength == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not response.StatusCode >= 200 && response.StatusCode < 300
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's replace regionalResponse.StatusCode == 200
with response.StatusCode >= 200 && response.StatusCode < 300
to keep align with above condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
pkg/azureclients/armclient/util.go
Outdated
return response, rerr | ||
} | ||
if regionalResponse.StatusCode == 200 && (regionalResponse.ContentLength == -1 || regionalResponse.ContentLength == 0) { | ||
contentLengthErrStr := fmt.Sprintf("Send.sendRegionalRequest failed to get response from regional host, error: StatusCode is 200 but ContentLength is %d. Ignoring the result.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about changing the error message simpler, e.g. fmt.Sprintf("empty response with ContentLength %d and StatusCode %d", regionalResponse.ContentLength, response.StatusCode)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
pkg/azureclients/armclient/util.go
Outdated
@@ -123,7 +123,8 @@ func DoHackRegionalRetryDecorator(c *Client) autorest.SendDecorator { | |||
// Hack: retry the regional ARM endpoint in case of ARM traffic split and arm resource group replication is too slow | |||
// Empty content and 2xx http status code are returned in this case. | |||
// Issue: https://github.com/kubernetes-sigs/cloud-provider-azure/issues/1296 | |||
emptyResp := (response.ContentLength == 0 || trimmed == "" || trimmed == "{}") && response.StatusCode >= 200 && response.StatusCode < 300 | |||
// Update: ContentLength == -1 is also considered because ARM can possibly return ContentLength -1 and StatusCode 200. | |||
emptyResp := (response.ContentLength == 0 || response.ContentLength == -1 || trimmed == "" || trimmed == "{}") && response.StatusCode >= 200 && response.StatusCode < 300 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add some unit tests for the new retry logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
bd2cfbb
to
1d505d2
Compare
/retest |
d7b0041
to
00b0352
Compare
/test pull-cloud-provider-azure-e2e-capz |
/retest |
00b0352
to
5d9ab56
Compare
5d9ab56
to
61983c0
Compare
ContentLength -1 is kind of different from ContentLength 0. This PR is dealing with a situation satisfying all 3 conditions: ContentLength -1, StatusCode 200 and empty response body. For ContentLength 0, its relation with empty response body is OR. |
/assign @feiskyer Please take another look. |
61983c0
to
38792b7
Compare
ARM will possibly return StatusCode 200 and ContentLength -1 which is a bug. However, cloudprovider can handle it better by retrying. Signed-off-by: Zhecheng Li <zhechengli@microsoft.com>
38792b7
to
d2a175d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: feiskyer, lzhecheng The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherrypick release-1.24 |
/cherrypick release-1.23 |
/cherrypick release-1.1 |
/cherrypick release-1.25 |
@lzhecheng: #2298 failed to apply on top of branch "release-1.24":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@lzhecheng: #2298 failed to apply on top of branch "release-1.23":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@lzhecheng: #2298 failed to apply on top of branch "release-1.1":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@lzhecheng: new pull request created: #2315 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: Zhecheng Li zhechengli@microsoft.com
What type of PR is this?
/kind bug
What this PR does / why we need it:
ARM will possibly return StatusCode 200 and ContentLength -1 which is a bug. However, cloudprovider can handle it better by retrying.
Which issue(s) this PR fixes:
Fixes #2296
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: