rate limit by subnet and differentiate by userid #427

ccaputo · 2019-02-13T05:23:46Z

Keeping in mind closed "implement rate limiting #311" (#311), we are seeing queries from people randomizing their source IP within a single subnet. Thus the rate limiting system should group IPv4 /24s and IPv6 /64s so that queries from a single subnet are rate limited with other hosts from the same subnet and same userid or lack of userid.

The later is important because we don't want different logged in users at a conference to all be rate limited together, whether coming from the same subnet or being NATed.

ccaputo · 2019-02-13T05:41:07Z

Grizz asked: "do we want to rate limit differently for auth vs non auth?"

I responded: "Always nice to have the flexibility, even if the initial setting is the same. :-) I'd keep them the same initially and tune from there."

eloos · 2019-02-13T08:13:26Z

I think it makes sense to rate-limit differently based on authed or not-authed.

ynbrthr · 2020-06-02T21:53:06Z

@peeringdb/pc can we have consensus to implement and close that?

+1 from me

arnoldnipper · 2020-06-02T22:31:20Z

+1

ynbrthr · 2020-06-03T15:45:01Z

@grizz , your question to @ccaputo at the time sounds like an implicit '+1' for that feature, can you confirm? :)

ynbrthr · 2020-06-17T00:17:00Z

ok, someone else then? @peeringdb/pc

shane-kerr · 2020-06-17T08:01:29Z

+1

I think it's fine to start with a global rate limit (not accounting for authentication status), but I am also happy if it gets implemented based on authentication status.

mcmanuss8 · 2020-06-17T14:52:09Z

+1 - I'd say we should aggregate by cidr for unauthed users, and by api key (#266) for authed users.

Ideally we'd have a rate limit for authed users that's higher than for unauthed users, and some configuration ops can tweak per API key if someone is appropriating the default rate limit and needs it raised (or lowered!).

grizz · 2020-06-27T15:20:49Z

Summary
Add separate rate limits for API calls from auth and unauthed users.

If unauthed, aggregate by /24 or /64 for ipv4 and ipv6 respectively.

If authed, have a global default, but allow override on a per user or auth key basis.

ccaputo · 2020-10-10T20:53:34Z

From my perspective on @peeringdb/oc , this is needed. We've got more and more clients performing automated queries and need ways to be able to balance those against human requests.

ccaputo · 2020-10-10T21:06:47Z

@grizz , when you wrote:

Add separate rate limits for API calls from auth and unauthed users.

I just want to clarify you aren't intending to limit this to http: requests with /api/ in the URL? I believe all query types should be able to be limited.

arnoldnipper · 2020-10-10T21:42:30Z

@ccaputo, I understand @grizz like

a GUI rate-limit (regardless of being authed or unauthed)
an API rate-limit for authed users
an API rate-limit for unauthed users

ccaputo · 2020-10-12T20:32:53Z

@grizz wrote:

Summary
Add separate rate limits for API calls from auth and unauthed users.

If unauthed, aggregate by /24 or /64 for ipv4 and ipv6 respectively.

If authed, have a global default, but allow override on a per user or auth key basis.

@arnoldnipper wrote:

@ccaputo, I understand @grizz like

a GUI rate-limit (regardless of being authed or unauthed)

an API rate-limit for authed users

an API rate-limit for unauthed users

@grizz, I would like to see an in-database per-user rate limit knob for both API and GUI queries. Ex. user 'joe' could be limited to 10 qps to the GUI and 5 qps to /api/ URLs.

I think Ops/@peeringdb/oc needs to be able to change the global defaults, for auth and unauthed users, without needing to modify the internals of a container or the env on each instance, as apparently required by #311. Ie. as part of the database, but the parent process would need to periodically (ex. once per minute) yet efficiently poll for changes to the config.

it would also be great if we could rate-limit particular unauth CIDR blocks, via entries in the database, broken down by API vs. GUI queries. Ex. global unauth rate of 100 qps for API & GUI, 192.0.2/24 @ 20 qps for API and 30 qps for GUI, and 2001:DB8::/32 @ 5 qps for API and 10 qps for GUI. This would be in addition to applying the global unauth rate to aggregate /24's & /64's, also broken down by API vs. GUI query method.

Also, we need to know when rate limiting is actually happening, and based on which rule. It either needs to be logged (periodically, after first limiting of a specific type) to the django log or to the database.

grizz · 2020-10-27T01:37:42Z

@ccaputo agree completely, that makes sense.

This will also need to have a separate endpoint for GUI API calls.

arnoldnipper · 2020-10-27T13:29:07Z

This will also need to have a separate endpoint for GUI API calls.

What are GUI API calls, @grizz?

ccaputo · 2020-10-27T14:09:26Z

This will also need to have a separate endpoint for GUI API calls.

What are GUI API calls, @grizz?

/api/ calls performed in the service of GUI requests. Ex. "POST /api/netixlan" when adding one's network record at an IX.

arnoldnipper · 2020-10-27T14:57:19Z

/api/ calls performed in the service of GUI requests. Ex. "POST /api/netixlan" when adding one's network record at an IX.

Isn't then every GUI call a GUI API call?

ccaputo · 2020-10-27T15:04:40Z

/api/ calls performed in the service of GUI requests. Ex. "POST /api/netixlan" when adding one's network record at an IX.

Isn't then every GUI call a GUI API call?

I believe @grizz is referring to /api/ usage as a result of certain web/GUI interactions. Most web/GUI usage does not utilize /api/ requests, but some does, such as those that result in {POST|PUT|DELETE} operations.

grizz · 2020-10-27T19:03:47Z

Yes, @ccaputo is correct. The page will load without any API calls, but to interact with it, it's all API calls. GUI Api calls would essentially be adding a QoS layer to user interaction.

@arnoldnipper That was coming from a thread on the ops mailing list, sorry for not putting a bit more description to go along with it.

arnoldnipper · 2020-10-27T21:45:37Z

The page will load without any API calls

Out of curiosity, @grizz. How is the data retrieved?

grizz · 2020-10-27T23:13:22Z

From a database read

danieldjewell · 2022-08-17T05:17:56Z

A couple of observations:

While I understand the general idea/theory of rate limiting by CIDR, it has some major practical issues:
1. Rate limiting by /24 in IPv4 is way overbroad and will unfairly/unjustly throttle/deny users who aren't even on the same network ... (and a /24 IPv4 doesn't come close to the same scope as a /64 in IPv6)
2. Throttling based on a /64 in IPv6 makes more sense, in theory (because one could always use very-short lived temp addresses within that /64 network ...) However, given that many residential ISPs grant a /56 (mine does), that would give me ~256 subnets to use to avoid the throttle.
3. None of the above takes into account NAT (IPv4 of course) or large single /64 subnet IPv6 shared networks (I mean, I would hope a big organization doesn't put everyone in a single /64, but....).
It would appear that based on discussion here and in substantially rate limit unauthenticated /api/ queries to encourage authenticated queries #853 that the data itself is meant to be open? If so, why not make a simple static DB dump (perhaps once a day? Once an hour?) that the cli can download and use as a starting point for sync? (I've looked, but can't find any DB dumps...) It's likely that doing so would greatly reduce the number of calls to /api ... And would, at least, provide folks who are pounding the API to get the data (authenticated or not) an alternative (which would probably also be much faster and easier anyway)
I am genuinely confused by the statement in substantially rate limit unauthenticated /api/ queries to encourage authenticated queries #853 that not having authenticated queries makes it difficult for Ops to find out who's behind the queries. Of course it's difficult! They're anonymous! That said, if logging the IP address isn't in the cards (and there are many privacy reasons why logging isn't a good idea), perhaps a more limited/aggregate type record where only the biggest anonymous users (offenders?) are logged/throttled? (I'd be surprised if the majority of API calls didn't come from a very small number of sources... It probably follows a Pareto-esque distribution -- i.e. the 80/20 rule.)
FWIW, the mere existence of https://github.com/peeringdb/peeringdb-py (and especially its peeringdb sync command) appear to be at odds with the AUP? At a minimum, I think it's important for an anonymous user to be able to run the peeringdb sync command without running into throttling.

ccaputo · 2022-08-17T06:40:55Z

First I should say: It has been a while since this issue has been touched, and it may no longer be needed. API rate limiting of duplicate requests and query rates have been addressed with other issues. Non-API rate limiting may not be needed.

A couple of observations:

While I understand the general idea/theory of rate limiting by CIDR, it has some major practical issues:

Rate limiting by /24 in IPv4 is way overbroad and will unfairly/unjustly throttle/deny users who aren't even on the same network ... (and a /24 IPv4 doesn't come close to the same scope as a /64 in IPv6)

Throttling based on a /64 in IPv6 makes more sense, in theory (because one could always use very-short lived temp addresses within that /64 network ...) However, given that many residential ISPs grant a /56 (mine does), that would give me ~256 subnets to use to avoid the throttle.

None of the above takes into account NAT (IPv4 of course) or large single /64 subnet IPv6 shared networks (I mean, I would hope a big organization doesn't put everyone in a single /64, but....).

The workaround for a user impacted by throttling, is to login.

It would appear that based on discussion here and in substantially rate limit unauthenticated /api/ queries to encourage authenticated queries #853 that the data itself is meant to be open? If so, why not make a simple static DB dump (perhaps once a day? Once an hour?) that the cli can download and use as a starting point for sync? (I've looked, but can't find any DB dumps...) It's likely that doing so would greatly reduce the number of calls to /api ... And would, at least, provide folks who are pounding the API to get the data (authenticated or not) an alternative (which would probably also be much faster and easier anyway)

While not a DB dump per se, peeringdb-py is very similar to this. It provides an incremental update mechanism with a rapid initial dump, consisting of a single HTTP GET per object type.

I am genuinely confused by the statement in substantially rate limit unauthenticated /api/ queries to encourage authenticated queries #853 that not having authenticated queries makes it difficult for Ops to find out who's behind the queries. Of course it's difficult! They're anonymous! That said, if logging the IP address isn't in the cards (and there are many privacy reasons why logging isn't a good idea), perhaps a more limited/aggregate type record where only the biggest anonymous users (offenders?) are logged/throttled? (I'd be surprised if the majority of API calls didn't come from a very small number of sources... It probably follows a Pareto-esque distribution -- i.e. the 80/20 rule.)

IP addresses are routinely logged these days. I don't recall the specifics, but years ago we may not have been logging accurate source IPs due to the use of a load balancer front end, and a failure to parse the actual remote user IP. These days we do store the actual remote user IPs in access logs.

Broken scripts or scripts written by people who do not understand the costs, or fail to understand the tragedy of the commons, or fail to include delays, can come from a variety of sources. Many of the sources end up being in various clouds, using address space completely unrelated to the interested party or organization, from an RIR lookup standpoint.

FWIW, the mere existence of https://github.com/peeringdb/peeringdb-py (and especially its peeringdb sync command) appear to be at odds with the AUP? At a minimum, I think it's important for an anonymous user to be able to run the peeringdb sync command without running into throttling.

Current throttle limits do not prevent an anonymous user from running peeringdb sync. That said, as a courtesy, I personally recommend users/orgs do authenticate.

The AUP supports the use of peeringdb-py for Internet operational purposes within an organization.

danieldjewell · 2022-08-17T17:49:03Z

Current throttle limits do not prevent an anonymous user from running peeringdb sync. That said, as a courtesy, I personally recommend users/orgs do authenticate.

I can personally confirm this to be 100% false as of yesterday (which is kinda what prompted me to even search GH Issues). I ran a peeringdb sync using the version of peeringdb-py that's current on PyPi and ran into HTTP 429 errors about halfway through.

EDIT: After a very brief glance at the server throttling code, I suspect (but cannot confirm) that this might have something to do with the Anon request size limit rather than the rate limit -- if that's enabled on the production server.

ccaputo · 2022-08-17T18:58:32Z

Current throttle limits do not prevent an anonymous user from running peeringdb sync. That said, as a courtesy, I personally recommend users/orgs do authenticate.

I can personally confirm this to be 100% false as of yesterday (which is kinda what prompted me to even search GH Issues). I ran a peeringdb sync using the version of peeringdb-py that's current on PyPi and ran into HTTP 429 errors about halfway through.

EDIT: After a very brief glance at the server throttling code, I suspect (but cannot confirm) that this might have something to do with the Anon request size limit rather than the rate limit -- if that's enabled on the production server.

Sorry for the problems. I have analyzed the logs of anonymous peeringdb-py clients and have now raised the throttle setting from 10 requests per minute to 20 requests per minute. Hopefully that will be sufficient for peeringdb sync tasks. (Please don't sync more frequently than once per hour. If run from cron, use the RANDOM function in the docs to avoid a thundering herd.)

Please report back if this helps things.

After testing and hopefully confirming improvement, I highly recommend authenticating using an API key since there may be a future in which authentication is required in order to run peeringdb-py without problems.

ccaputo · 2022-08-18T15:55:01Z

Sorry for the problems. I have analyzed the logs of anonymous peeringdb-py clients and have now raised the throttle setting from 10 requests per minute to 20 requests per minute. Hopefully that will be sufficient for peeringdb sync tasks. (Please don't sync more frequently than once per hour. If run from cron, use the RANDOM function in the docs to avoid a thundering herd.)

Since the anonymous limit change from 10 qpm to 20 qpm some 21 hours ago, 429's due to apparent peeringdb-py users has fallen to zero. I am going to announce this change to pdb-tech (https://lists.peeringdb.com/cgi-bin/mailman/listinfo/pdb-tech) and update https://docs.peeringdb.com/howtos/ accordingly.

ccaputo · 2023-02-28T16:35:36Z

My perspective of current throttling and the user education that has happened in the last year, and associated increase in users working to query PeeringDB more efficiently and/or take advantage of tools such as:

https://docs.peeringdb.com/howto/api_keys/
https://docs.peeringdb.com/howto/peeringdb-py/
https://github.com/peeringdb/peeringdb-py

is that we may no longer need to drill down to the level of throttling proposed by this issue.

Keeping in mind @grizz's Summary should we need to revisit this topic:

Summary
Add separate rate limits for API calls from auth and unauthed users.

If unauthed, aggregate by /24 or /64 for ipv4 and ipv6 respectively.

If authed, have a global default, but allow override on a per user or auth key basis.

if there are no objections, I intend to close this issue in a week.

arnoldnipper added this to the Decide milestone Nov 24, 2019

ynbrthr self-assigned this Jun 2, 2020

ynbrthr modified the milestones: 1 Decide, 2 Consensus Reached Jun 17, 2020

ynbrthr modified the milestones: 2 Consensus Reached, 3 Consensus Finalized Jun 24, 2020

grizz added the Time:Epic More than 10 hours work label Jun 27, 2020

grizz modified the milestones: 3 Consensus Finalized, 4 Ready for Implementation Jun 27, 2020

leovegoda added the api label Sep 9, 2020

arnoldnipper removed the api label Sep 24, 2020

ccaputo mentioned this issue Oct 12, 2020

substantially rate limit unauthenticated /api/ queries to encourage authenticated queries #853

Closed

ccaputo mentioned this issue Apr 16, 2021

Cache hints are needed for optimal CDN use #970

Closed

ccaputo mentioned this issue Jul 25, 2021

queries by inactive users, which result in HTTP result 401, take too long to process, potentially burning CPU #1011

Closed

ccaputo closed this as completed Mar 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rate limit by subnet and differentiate by userid #427

rate limit by subnet and differentiate by userid #427

ccaputo commented Feb 13, 2019

ccaputo commented Feb 13, 2019

eloos commented Feb 13, 2019

ynbrthr commented Jun 2, 2020

arnoldnipper commented Jun 2, 2020

ynbrthr commented Jun 3, 2020

ynbrthr commented Jun 17, 2020

shane-kerr commented Jun 17, 2020

mcmanuss8 commented Jun 17, 2020

grizz commented Jun 27, 2020 •

edited

ccaputo commented Oct 10, 2020

ccaputo commented Oct 10, 2020

arnoldnipper commented Oct 10, 2020

ccaputo commented Oct 12, 2020

grizz commented Oct 27, 2020

arnoldnipper commented Oct 27, 2020

ccaputo commented Oct 27, 2020 •

edited

arnoldnipper commented Oct 27, 2020

ccaputo commented Oct 27, 2020

grizz commented Oct 27, 2020

arnoldnipper commented Oct 27, 2020

grizz commented Oct 27, 2020

danieldjewell commented Aug 17, 2022

ccaputo commented Aug 17, 2022 •

edited

danieldjewell commented Aug 17, 2022 •

edited

ccaputo commented Aug 17, 2022

ccaputo commented Aug 18, 2022 •

edited

ccaputo commented Feb 28, 2023

rate limit by subnet and differentiate by userid #427

rate limit by subnet and differentiate by userid #427

Comments

ccaputo commented Feb 13, 2019

ccaputo commented Feb 13, 2019

eloos commented Feb 13, 2019

ynbrthr commented Jun 2, 2020

arnoldnipper commented Jun 2, 2020

ynbrthr commented Jun 3, 2020

ynbrthr commented Jun 17, 2020

shane-kerr commented Jun 17, 2020

mcmanuss8 commented Jun 17, 2020

grizz commented Jun 27, 2020 • edited

ccaputo commented Oct 10, 2020

ccaputo commented Oct 10, 2020

arnoldnipper commented Oct 10, 2020

ccaputo commented Oct 12, 2020

grizz commented Oct 27, 2020

arnoldnipper commented Oct 27, 2020

ccaputo commented Oct 27, 2020 • edited

arnoldnipper commented Oct 27, 2020

ccaputo commented Oct 27, 2020

grizz commented Oct 27, 2020

arnoldnipper commented Oct 27, 2020

grizz commented Oct 27, 2020

danieldjewell commented Aug 17, 2022

ccaputo commented Aug 17, 2022 • edited

danieldjewell commented Aug 17, 2022 • edited

ccaputo commented Aug 17, 2022

ccaputo commented Aug 18, 2022 • edited

ccaputo commented Feb 28, 2023

grizz commented Jun 27, 2020 •

edited

ccaputo commented Oct 27, 2020 •

edited

ccaputo commented Aug 17, 2022 •

edited

danieldjewell commented Aug 17, 2022 •

edited

ccaputo commented Aug 18, 2022 •

edited