Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

services/horizon: map client disconnects as Status 499 instead of 503 Problem #4098

Merged

Conversation

sreuland
Copy link
Contributor

@sreuland sreuland commented Nov 24, 2021

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

Client Disconnects were being mapped by render/problem into ServiceUnavailable(503).

Why

The problem was causing horizon_http_requests_duration_seconds_count metric keys to be generated with status=503 label which wasn't accurate, there was no server timeout on the request, rather just the opposite, the client disconnected from the socket.

Closes #3710

Known limitations

Client socket disconnect is only detectable if load balancer/proxy being used does immediate close of upstream client connections or no LB and the client is connected directly to server. Detection of client socket disconnect relies on the golang http server detecting the socket close and in turn closing the Done channel on the http request context.

@sreuland sreuland self-assigned this Nov 24, 2021
@bartekn
Copy link
Contributor

bartekn commented Nov 24, 2021

PR. Removed from "Horizon and SDKs" because there's a corresponding issue there (and this PR is linked).

@bartekn bartekn changed the title /services/horizon/internal/render: map client disconnects as Status 499 instead of 503 Problem services/horizon: map client disconnects as Status 499 instead of 503 Problem Nov 24, 2021
@sreuland
Copy link
Contributor Author

Hello @bartekn , I updated PR per feedback, should be ready for re-review, thanks!

@sreuland sreuland requested a review from a team November 29, 2021 04:02
@@ -55,7 +55,7 @@ func init() {
problem.RegisterError(db2.ErrInvalidOrder, problem.BadRequest)
problem.RegisterError(sse.ErrRateLimited, hProblem.RateLimitExceeded)
problem.RegisterError(context.DeadlineExceeded, hProblem.Timeout)
problem.RegisterError(context.Canceled, hProblem.ServiceUnavailable)
problem.RegisterError(context.Canceled, hProblem.ClientDisconnected)
problem.RegisterError(db.ErrCancelled, hProblem.ServiceUnavailable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (we need to confirm this) that when context is cancelled while DB query is being executed it's returned as db.ErrCancelled. Could you check it and, if that's correct, can you also update the returned problem for db.ErrCancelled in line 59? It's also worth testing if the same error is returned in case of timeout. If that's the case then it will be too complicated to change so let's not add any changes to this PR then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, yes, it looks like db.ErrCancelled is overloaded, i.e. it could represent upstream cancel or a downstream timeout, I can make adjust in session.go to break it out in same fashion done here with check on ctx.Err() and will need a new test for cancel vs. timeout. I found this one article pretty helpful is understanding the postgres timeout vs. cancel context nuances:
https://www.alexedwards.net/blog/how-to-manage-database-timeouts-and-cancellations-in-go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @bartekn , here's included distinction of the db conn timeout vs cancel: 9a808356bff4cab47

Copy link
Contributor

@bartekn bartekn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sreuland sreuland merged commit 4d04063 into stellar:master Nov 30, 2021
erika-sdf pushed a commit to erika-sdf/go that referenced this pull request Dec 3, 2021
Copy link

@eliane345 eliane345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Desculpe _me

@eliane345
Copy link

Desculpe...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve HTTP metrics to distinguish client cancelled requests in 503 responses
3 participants