Skip to content
This repository has been archived by the owner on Mar 8, 2021. It is now read-only.

different users may share the same connection and receiving different ratelimit headers. #69

Closed
ioquatix opened this issue Oct 26, 2019 · 10 comments

Comments

@ioquatix
Copy link

We avoided mentioning connections as that's outside our intended semantic.
afaik, in http/2 a per-user ratelimit policy, different users may share the
same connection and receiving different ratelimit headers.

As someone who has implemented HTTP/1 and HTTP/2 connection pooling, It sounds like this is going to be almost impossible to implement correctly on the client side in a generic way. The current logic I've generally been trying to implement (and I guess most web browsers)

  • HTTP/1 has published recommended limits, 4 connections to a single host/endpoint.
  • HTTP/2 has published recommended limits, a single connection to a single host, and follow the negotiated concurrency limits.

I don't know many libraries that actually follow this in user code thought.

The next step is to consider the work flow and forgive me if this is covered in the existing draft... but:

response = internet.get("http://www.google.com/search?q=cats")

if duration = response.rate_limited?
    sleep(duration)
    try again
else
    return response
end

This is the kind of logic that makes the most sense to me.

@ioquatix
Copy link
Author

What would probably make the most sense is to have a dedicated status code for: This request was okay, but you are in violation of the rate limit/quota. Therefore, you've not been given a valid response, e.g. 429.

Then, you just need a single header which is the time to wait until the next request. But how do you know if that applies to other paths? Well, I guess you have to try it. Otherwise, how do you know rate limiting applies to one or more urls, users or whatever other scope.

@ioggstream
Copy link
Owner

ioggstream commented Oct 28, 2019

this is going to be almost impossible to implement correctly on the client side in a generic way

As that depends on how the client shapes its traffic, I think there can't be an one-size fits
all mechanism for that.
The client, knowing the way it connects to the server and the granularity of multiplexing,
can use the values to distribute requests (eg. per user, ...).
An API sending requests on behalf of multiple users via a single connection,
can balance between them based on ratelimit headers values
issuing requests for users having remaining quotas and
throttling the others. @unleashed

imho we can't cover all the possible ways in which those headers are used,
but the spec enables client to find the headers without doing something like:

header_map = {
  "x-ratelimit-limit": x_ratelimit_limit_parser,
  "x-ratelimit-limit-month": x_ratelimit_month_parser,
  "x-ratelimit-limit-day": x_ratelimit_day_parser,
  "x-ratelimit-today": x_ratelimit_day_today_parser,
}
for h, parser in header_names.items():
    if h in headers:
         h_value = parser(headers[h])
         ...

how do you know rate limiting applies to one or more urls, users or whatever other scope.

I filed a new issue #70
Once we fix that for retry-after we'll do it in ratelimit-headers.
Can you provide some feedback on that?
Thanks for all your time!
R.

@unleashed
Copy link
Collaborator

An API sending requests on behalf of multiple users via a single connection,
can balance between them based on ratelimit headers values
issuing requests for users having remaining quotas and
throttling the others.

Right. I understand this as a feature, not an issue of this draft.

@ioquatix so whenever you receive rate limiting information only via the specified headers (ie. no other out of band mechanism or other headers or future extensions, if any), you can only assume they apply to the tightest scope that your specific request used.

Whether that means the rate limiting information applies to requests with the same path, user, client, or a combination thereof is deliberately not specified. For specific applications the details can be documented (ie. the contract of an API), and for generic clients a certain degree of inference should be possible by observing the values returned across requests. If such a generic client does not have enough information, then the right course of action is trying out the request to find out.

In particular I think providing a full solution with multiple scopes is quite difficult due to the many different use cases, not only from the POV of networking and infrastructure but from business logic or even from limits imposed transitively via the service being consumed (ie. limits stemming from other services the one you are accessing is itself using). That can be a lot of information to help clients more precisely shape their service access patterns, ie. I have seen rate limiting descriptions for just one service for one particular user in the order of hundreds of kilobytes (using XML, but still quite a feat).

Having this draft being conservative regarding scopes (and we should ensure the spec leaves the door open for that) helps having a baseline in which clients can make use of some information to cover some basic and common use cases, and some less generic clients (API specific ones) can have even greater control of how they consume a service.

An idea for a future I-D covering scopes would be having some simple method to provide either a limited, simple set of well known and well defined scopes, or a URI to retrieve the different scopes (the schema if you will, possibly including some static limiting information) to which a given service applies rate limits.

@ioquatix
Copy link
Author

If people have to implement it on a per-endpoint level, and it can't be implemented at the connection/client level, it's going to be almost impossible to use in practice. The burden of bespoke implementations of rate limiting per endpoint is simply too big for most use cases.

We already have general advice on how to handle concurrency and rate limiting at a connection level from the relevant RFCs, e.g. HTTP/1 not more than 4-8 connections per host and HTTP/2 has various settings which affect concurrency and simultaneous requests.

The case where this proposal makes sense is where you have abstractions over APIs which then use standard headers to determine rate limiting behaviour and then expose that to the client of the API.

Here are the various levels at which this affects implementation:

async-http

We have a gem for general http/1 & http/2 client and server. We can certainly expose the rate limit headers in a structured way, leaving interpretation up to the user. I expect in the vast majority of cases where this code is used directly, rate limiting which cannot be expressed per-connection will be ignored because it's probably too much work for too little gain (just try the request on status 429 after a short delay).

async-rest

We have a gem for general REST based APIs. Assuming some standard description of rate limiting, this could be then applied to any consumer.

async-slack

We have a gem for interacting with slack. I don't actually know if slack has rate limiting behaviour, but assuming they do, and we implemented it, anyone who uses that code would get it for free.

What I would suggest is that if you are aiming for this level, you should definitely optimise header size and compression over backwards compatibility. My experience with the various high level APIs is that they don't really have a huge amount of consistency. It won't matter what the headers are called because it's never exposed to the user. So you want to make those headers as good as possible w.r.t. minimising overheads.

@ioquatix
Copy link
Author

ioquatix commented Oct 29, 2019

Also, I'd suggest you get feedback from people who actually have real scale, e.g. AWS, Google, etc. Because they may already have strategies in place. I can only give feedback based on my own implementation work.

@unleashed
Copy link
Collaborator

I expect in the vast majority of cases where this code is used directly, rate limiting which cannot be expressed per-connection will be ignored because it's probably too much work for too little gain (just try the request on status 429 after a short delay).

I think it would be helpful if you could use a specific example to illustrate the issues.

At this level, if you are receiving a 429 with these headers, you can assume that such a delay is the value received in the corresponding header rather than any arbitrary delay chosen by the client. If you are using a different connection to retry it before the specified time elapses, I'd expect the header values for this retry to be coherent with the previous values, that is, not take into account the details of whether you are using one connection or another.

That said, you could still define your service's quota units in terms of units per-connection (ie. effectively scoping the rate limits to the current connection) and the assumption above would be invalid. We can either make the spec stricter and point out such limits must derive from properties of the service and the request per-se rather than client and server networking details, or alternatively we can work on how to add enough information for a generic HTTP client to be able to have unambiguous rate limiting information in as many contexts as possible (not just on a specific scope such as HTTP connections), which is something that so far we have been trying to avoid.

@ioquatix
Copy link
Author

Is the point of this RFC to define enough information to avoid having 429s?

Because if you can't avoid 429s, I don't know what the point of this RFC is. 429s + a single header saying "wait for X seconds" is sufficient for 99% of use cases and works everywhere, it's easy to implement and doesn't require context specific information. The server knows the context in which the client is being rate limited. The only downside is you end up making N+1 requests - i.e. the final request response is that you've been rate limited.

429 Too many requests
Rate-Limited: 10s

Beyond that, in order to introduce quotas, you need a far more elaborate model, one which you have acknowledged is beyond the scope of this document?

Bearing in mind there are a ton of reasons why you might want to rate limit - QoS, availability, pricing plans, etc.

So, I do see value in having this kind of discussion, but if it can't be implemented in a generic way, then naturally it won't be feasible to implement at the connection layer, which is where it really needs to be in order to get the most traction with the least overhead.

@ioggstream
Copy link
Owner

ioggstream commented Oct 31, 2019

get feedback from people who actually have real scale, e.g. AWS, Google, etc

With this I-D we aim at broaden the audience, but the ratelimit landscape we first drafted includes github, amazon, yelp and twitter.

in order to introduce quotas you need a far more elaborate model

The main goal for now is to align the semantics for all the services which
already adopted ratelimit headers.

one which you have acknowledged is beyond the scope of this document?

Your insights are very useful for extending the model to more complex uses.
It's something that can be added in future, and it's fine planning and discussing now
withouth hindering the short-term goal.

example

If your client manages different ratelimit keys (eg, let's say it's by user) on a single connection,
you can implement it like the following:

for user, request_data in queue:
  if user.has_quota():  # checks headers, ratelimit-reset, retry-after, whatever 
    response, headers = make_request(conn, request_data, user)
    user.update_quota(headers)
  

@ioggstream
Copy link
Owner

@ioquatix thanks for filing this issue: it was very important to properly scope the document.

The current draft describes the solution only at the semantic level, conforming to httpbis semantics latest

a server must not assume that two requests on the same connection are from the same user agent unless the connection is secured and specific to that agent

Can we close this issue?

@ioggstream
Copy link
Owner

Closed as resolved. We are moving issues to ietf gh.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants