Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(retry): fix retries when using protobuf encoding #13316

Merged
merged 2 commits into from
Jul 1, 2024
Merged

Conversation

ashwanthgoli
Copy link
Contributor

@ashwanthgoli ashwanthgoli commented Jun 25, 2024

What this PR does / why we need it:
retry.go expects the Details field of grpc status to be populated which it then uses to read the http response code, it fallbacks to retrying if this field is not set.

But with protobuf encoding Details is not populated and loki retries 4xxs. I do not think it is necessary to set this.

httpgrpc.HTTPResponseFromError gets the Status from the error and additionally tries to decode the http response from Details field. But all retry.go needs is the Code from Status field which would be set for both encoding formats, so replacing this call with status.FromError() which only pulls out the Status from error should fix the retry behaviour.

references:

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@ashwanthgoli ashwanthgoli marked this pull request as ready for review June 25, 2024 14:56
@ashwanthgoli ashwanthgoli requested a review from a team as a code owner June 25, 2024 14:56
@@ -89,8 +89,8 @@ func (r retry) Do(ctx context.Context, req Request) (Response, error) {
}

// Retry if we get a HTTP 500 or a non-HTTP error.
httpResp, ok := httpgrpc.HTTPResponseFromError(err)
if !ok || httpResp.Code/100 == 5 {
status, ok := status.FromError(err)
Copy link
Contributor

@duricanikolic duricanikolic Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this approach work for you:

  if !IsClientError(err) {
    ...
  }

where

func isClientError(err error) bool {
	if grpcutil.ErrorToStatusCode(err); code/100 == 4 {
		return true
	}
	return false
}

and grpcutil comes from dskit?

Copy link
Contributor

@chaudum chaudum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@chaudum chaudum added type/bug Somehing is not working as expected backport k209 labels Jul 1, 2024
@grafanabot
Copy link
Collaborator

This PR must be merged before a backport PR will be created.

1 similar comment
@grafanabot
Copy link
Collaborator

This PR must be merged before a backport PR will be created.

@ashwanthgoli ashwanthgoli merged commit a457c5d into main Jul 1, 2024
62 checks passed
@ashwanthgoli ashwanthgoli deleted the fix-retries branch July 1, 2024 07:40
grafanabot pushed a commit that referenced this pull request Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport k209 size/M type/bug Somehing is not working as expected
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants