Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow server to return HTTP 4xx errors #7045

Merged
merged 5 commits into from
Jan 4, 2024
Merged

Conversation

duricanikolic
Copy link
Contributor

@duricanikolic duricanikolic commented Jan 3, 2024

What this PR does

This PR initializes dskit's server.Server and httpgrpc.Server in such a way that they return errors with HTTP status code 4xx, and therefore allow the latter to appear as status_code label in the request duration metrics.

Which issue(s) this PR fixes or relates to

Part of #1830

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@duricanikolic duricanikolic self-assigned this Jan 3, 2024
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that this change will also force the httpgrpc server to return an error and ignore the httpgrpc response when the code is 4xx?

There is this place in the query-fronted which needs an http response (taken from http grpc) in order to propagate the error back to the user.

switch r.StatusCode {
case http.StatusServiceUnavailable:
return nil, apierror.New(apierror.TypeUnavailable, string(mustReadResponseBody(r)))
case http.StatusTooManyRequests:
return nil, apierror.New(apierror.TypeTooManyRequests, string(mustReadResponseBody(r)))
case http.StatusRequestEntityTooLarge:
return nil, apierror.New(apierror.TypeTooLargeEntry, string(mustReadResponseBody(r)))
default:
if r.StatusCode/100 == 5 {
return nil, apierror.New(apierror.TypeInternal, string(mustReadResponseBody(r)))
}
}

what's referred to as the r response in this snippet should be what the querier computes here

response, err := sp.handler.Handle(ctx, request)

If my understanding is correct, then we will convert all 4xx promQL errors into 4xx errors. At the same time no tests are failing, so maybe I'm missing something

@@ -400,7 +400,7 @@ func createMultitenantAlertmanager(cfg *MultitenantAlertmanagerConfig, fallbackC
return nil, errors.Wrap(err, "failed to initialize Alertmanager's ring")
}

am.grpcServer = server.NewServer(&handlerForGRPCServer{am: am})
am.grpcServer = server.NewServer(&handlerForGRPCServer{am: am}, []server.Option{server.WithReturn4XXErrors}...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it enough to do this?

Suggested change
am.grpcServer = server.NewServer(&handlerForGRPCServer{am: am}, []server.Option{server.WithReturn4XXErrors}...)
am.grpcServer = server.NewServer(&handlerForGRPCServer{am: am}, server.WithReturn4XXError)

same applies to the other invocations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I will change this. Thank you

CHANGELOG.md Outdated Show resolved Hide resolved
@dimitarvdimitrov
Copy link
Contributor

is this PR a duplicate of #6983?

@pracucci
Copy link
Collaborator

pracucci commented Jan 4, 2024

Do I understand correctly that this change will also force the httpgrpc server to return an error and ignore the httpgrpc response when the code is 4xx?

Yes, that's correct. I've done an analysis of possible side effects here.

There is this place in the query-fronted which needs an http response (taken from http grpc) in order to propagate the error back to the user.

Going to take a look now.

@pracucci
Copy link
Collaborator

pracucci commented Jan 4, 2024

what's referred to as the r response in this snippet should be what the querier computes here mimir/pkg/querier/worker/scheduler_processor.go Line 247 in 5d0316a
response, err := sp.handler.Handle(ctx, request)

@dimitarvdimitrov That's correct, but I live it's handled correctly there. The error (which starting from this PR we'll get for 4xx errors too) is converted back to an HTTP response. When using httpgrpc, the HTTP response body is stored in status.Details, so HTTPResponseFromError() coverts it back to an HTTP response:

response, err := sp.handler.Handle(ctx, request)
if err != nil {
var ok bool
response, ok = httpgrpc.HTTPResponseFromError(err)
if !ok {
response = &httpgrpc.HTTPResponse{
Code: http.StatusInternalServerError,
Body: []byte(err.Error()),
}
}
}

@duricanikolic
Copy link
Contributor Author

is this PR a duplicate of #6983?

It is, and #6983 will be closed.

@duricanikolic duricanikolic changed the title Allow reporting HTTP 4xx codes in status_code label of request duration metrics Allow server to return HTTP 4xx errors Jan 4, 2024
@duricanikolic duricanikolic marked this pull request as ready for review January 4, 2024 11:36
@duricanikolic duricanikolic requested review from grafanabot and a team as code owners January 4, 2024 11:36
@dimitarvdimitrov
Copy link
Contributor

Do I understand correctly that this change will also force the httpgrpc server to return an error and ignore the httpgrpc response when the code is 4xx?

Yes, that's correct. I've done an analysis of possible side effects here.

I didn't check the linked issue before. The analysis there makes sense to me

Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pracucci pracucci self-requested a review January 4, 2024 13:12
@@ -183,7 +183,9 @@ func (q *RemoteQuerier) Read(ctx context.Context, query *prompb.Query) (*prompb.

resp, err := q.client.Handle(ctx, &req)
if err != nil {
level.Warn(log).Log("msg", "failed to perform remote read", "err", err, "qs", query)
if code := grpcutil.ErrorToStatusCode(err); code/100 == 5 {
Copy link
Collaborator

@pracucci pracucci Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do the opposite check, because we may get a code < 100 in case of a non HTTP error.

Suggested change
if code := grpcutil.ErrorToStatusCode(err); code/100 == 5 {
if code := grpcutil.ErrorToStatusCode(err); code/100 != 4 {

@@ -227,7 +229,9 @@ func (q *RemoteQuerier) query(ctx context.Context, query string, ts time.Time, l

resp, err := q.sendRequest(ctx, &req, logger)
if err != nil {
level.Warn(logger).Log("msg", "failed to remotely evaluate query expression", "err", err, "qs", query, "tm", ts)
if code := grpcutil.ErrorToStatusCode(err); code/100 == 5 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

error4xx = status.Error(http.StatusUnprocessableEntity, "this is a 4xx error")
error5xx = status.Error(http.StatusInternalServerError, "this is a 5xx error")
)
testCases := map[string]struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test case returning a non httpgrpc error (e.g. errors.New("mocked error")). We expect the log.

Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (modulo a couple of minor comments)

…on metrics

Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
@duricanikolic duricanikolic merged commit 9f2a79a into main Jan 4, 2024
28 checks passed
@duricanikolic duricanikolic deleted the yuri/allow-4xx-errors branch January 4, 2024 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants