Skip to content
This repository has been archived by the owner on Apr 3, 2019. It is now read-only.

Design client backoff protocol #100

Closed
ckarlof opened this issue Jul 31, 2013 · 12 comments
Closed

Design client backoff protocol #100

ckarlof opened this issue Jul 31, 2013 · 12 comments
Assignees
Labels

Comments

@ckarlof
Copy link
Contributor

ckarlof commented Jul 31, 2013

Handle misbehaving clients or periods of high load. A 20x response with a header or a 503 with or without a header would be appropriate.

@ckarlof
Copy link
Contributor Author

ckarlof commented Jul 31, 2013

@rfk
Copy link
Contributor

rfk commented Jul 31, 2013

Based on an IRC conversation with warner, I strawman-propose that we do a blend of the sync2.0 style backoff and a proof-of-work scheme. We can start with a polite request for the client to back off:

200 OK
Backof:  <time to back off, in seconds>

If things get really hairy, we send out a 503:

503 Service Unavailable
Retry-After:  <time to wait before re-trying, in seconds>
PoW-Required:  <proof-of-work protocol parameters>

The client at this point has two options. It can just wait and try again later, or it can do a hashcash-style proof-of-work thing and re-submit its request:

PUT /whatever HTTP/1.1
Host:  blah blah
PoW:  <proof-of-work hash>

The client is expected to submit a fresh proof-of-work with each new request, until the retry-after time has expired.

@warner does this adequately capture the gist of our conversation? Thoughts?

@ckarlof
Copy link
Contributor Author

ckarlof commented Jul 31, 2013

Details on PoW protocol here:
https://wiki.mozilla.org/Identity/AttachedServices/KeyServerProtocol#Proof-Of-Work

On Wed, Jul 31, 2013 at 3:59 PM, Ryan Kelly notifications@github.comwrote:

Based on an IRC conversation with warner, I strawman-propose that we do a
blend of the sync2.0 style backoff and a proof-of-work scheme. We can start
with a polite request for the client to back off:

200 OK
Backof: <time to back off, in seconds>

If things get really hairy, we send out a 503:

503 Service Unavailable
Retry-After: <time to wait before re-trying, in seconds>
PoW-Required:

The client at this point has two options. It can just wait and try again
later, or it can do a hashcash-style proof-of-work thing and re-submit its
request:

PUT /whatever HTTP/1.1
Host: blah blah
PoW:

The client is expected to submit a fresh proof-of-work with each new
request, until the retry-after time has expired.

@warner https://github.com/warner does this adequately capture the gist
of our conversation? Thoughts?


Reply to this email directly or view it on GitHubhttps://github.com//issues/100#issuecomment-21902561
.

@ckarlof
Copy link
Contributor Author

ckarlof commented Jul 31, 2013

Client side support for PoW needs to be baked in from the start.

@warner
Copy link
Contributor

warner commented Aug 26, 2013

Yeah, that mostly matches what I remember.

One thing to clarify for the docs: the client's "options" (retry-after and PoW) aren't really equivalent. We can't distinguish one client from another, so there's no way for us to tell that a client has been politely/patiently waiting (and then accept their request without the PoW).

If the DoS attack has stopped by the time they retry (and we're no longer requiring PoWs), then the retry-after might happen to work. But that state might last for hours or days. So only a really lazy client should just do retry-after without the proof-of-work, and they should be prepared to not connect for long periods of time.

How exactly would 503+Retry-After fit in? I guess if we're busy enough to emit 200+Backoff, and find that's not enough, the next stage is to start rejecting requests randomly, and 503+Retry-After tells them "it's ok, it's not your fault, please come back eventually". At that point, most good clients should already have been honoring the Backoff=x header from their last successful request. So either that delay is not sufficient, or there are clients who aren't honoring it (who might go away if we require PoW).

  • normal operations:
    • POST -> 200 OK
  • somewhat busy:
    • POST -> 200 OK, Backoff=x
    • wait x
    • POST -> 200 OK, Backoff=x
  • really busy:
    • POST -> 503, Retry-After=x (probabilistically)
    • wait x
    • POST -> 200 or 503 (probabilistically)
  • really really busy:
    • POST -> 503, PoW-Required=params (always)
    • compute PoW
    • POST (with PoW) -> 200 OK, Backoff=x
    • wait x
    • POST -> 503, PoW-Required=params
    • compute PoW
    • POST (with PoW) -> 200 OK, Backoff=x

(The time between the fetch of the PoW parameters and the submission of the completed PoW should be as short as possible)

So I guess I'm wondering if we should report 503+Retry-After, or 503+PoW-Required, but never both.

@rfk
Copy link
Contributor

rfk commented Aug 26, 2013

Good points. One small nit: clients might arrive in the middle of a DoS and never have seen a Backoff header before being hit with a 503.

What I was going for with Retry-After was basically "we estimate it will be at least this long until we switch off PoW", which might let the client make a more intelligent choice between waiting versus doing the work. It's not a promise that your request will succeed if you wait that long - more a guideline than an actual rule.

Happy to make these two headers exclusive if it will simplify things for the client.

@rfk
Copy link
Contributor

rfk commented Oct 25, 2013

/cc @telliott for perspective on proof-of-work idea

@telliott
Copy link

I like the general idea of proof-of-work for clients hitting us too often, but 503 isn't really a good match, since it's a server-side-problem status code, and this is a client problem. 403 is probably the appropriate status here and is explicit that this is a client-fixable issue.

@rfk
Copy link
Contributor

rfk commented Oct 25, 2013

RFC6585 also defines a "429 Too Many Requests" status which is appropriate here.

@dannycoates
Copy link
Contributor

PoW might be useful for both kinds of load but I'm not sure I like penalizing clients (computationally) in the 503 high server load case. It seems nicer to return a RETRY-AFTER header and trust the client to respect it.

For the 429 case where individual clients are too chatty I like PoW.

@ckarlof
Copy link
Contributor Author

ckarlof commented Oct 28, 2013

We should distinguish between these two cases.

@ckarlof
Copy link
Contributor Author

ckarlof commented Nov 19, 2013

Basic backoff design in #323.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants