Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rate limit by subnet and differentiate by userid #427

Closed
ccaputo opened this issue Feb 13, 2019 · 27 comments
Closed

rate limit by subnet and differentiate by userid #427

ccaputo opened this issue Feb 13, 2019 · 27 comments
Assignees
Labels
Time:Epic More than 10 hours work

Comments

@ccaputo
Copy link
Contributor

ccaputo commented Feb 13, 2019

Keeping in mind closed "implement rate limiting #311" (#311), we are seeing queries from people randomizing their source IP within a single subnet. Thus the rate limiting system should group IPv4 /24s and IPv6 /64s so that queries from a single subnet are rate limited with other hosts from the same subnet and same userid or lack of userid.

The later is important because we don't want different logged in users at a conference to all be rate limited together, whether coming from the same subnet or being NATed.

@ccaputo
Copy link
Contributor Author

ccaputo commented Feb 13, 2019

Grizz asked: "do we want to rate limit differently for auth vs non auth?"

I responded: "Always nice to have the flexibility, even if the initial setting is the same. :-) I'd keep them the same initially and tune from there."

@eloos
Copy link

eloos commented Feb 13, 2019

I think it makes sense to rate-limit differently based on authed or not-authed.

@arnoldnipper arnoldnipper added this to the Decide milestone Nov 24, 2019
@ynbrthr ynbrthr self-assigned this Jun 2, 2020
@ynbrthr
Copy link

ynbrthr commented Jun 2, 2020

@peeringdb/pc can we have consensus to implement and close that?

+1 from me

@arnoldnipper
Copy link
Contributor

+1

@ynbrthr
Copy link

ynbrthr commented Jun 3, 2020

@grizz , your question to @ccaputo at the time sounds like an implicit '+1' for that feature, can you confirm? :)

@ynbrthr
Copy link

ynbrthr commented Jun 17, 2020

ok, someone else then? @peeringdb/pc

@shane-kerr
Copy link

+1

I think it's fine to start with a global rate limit (not accounting for authentication status), but I am also happy if it gets implemented based on authentication status.

@ynbrthr ynbrthr modified the milestones: 1 Decide, 2 Consensus Reached Jun 17, 2020
@mcmanuss8
Copy link
Contributor

+1 - I'd say we should aggregate by cidr for unauthed users, and by api key (#266) for authed users.

Ideally we'd have a rate limit for authed users that's higher than for unauthed users, and some configuration ops can tweak per API key if someone is appropriating the default rate limit and needs it raised (or lowered!).

@grizz
Copy link
Member

grizz commented Jun 27, 2020

Summary
Add separate rate limits for API calls from auth and unauthed users.

If unauthed, aggregate by /24 or /64 for ipv4 and ipv6 respectively.

If authed, have a global default, but allow override on a per user or auth key basis.

@grizz grizz added the Time:Epic More than 10 hours work label Jun 27, 2020
@leovegoda leovegoda added the api label Sep 9, 2020
@arnoldnipper arnoldnipper removed the api label Sep 24, 2020
@ccaputo
Copy link
Contributor Author

ccaputo commented Oct 10, 2020

From my perspective on @peeringdb/oc , this is needed. We've got more and more clients performing automated queries and need ways to be able to balance those against human requests.

@ccaputo
Copy link
Contributor Author

ccaputo commented Oct 10, 2020

@grizz , when you wrote:

Add separate rate limits for API calls from auth and unauthed users.

I just want to clarify you aren't intending to limit this to http: requests with /api/ in the URL? I believe all query types should be able to be limited.

@arnoldnipper
Copy link
Contributor

@ccaputo, I understand @grizz like

  • a GUI rate-limit (regardless of being authed or unauthed)

  • an API rate-limit for authed users

  • an API rate-limit for unauthed users

@ccaputo
Copy link
Contributor Author

ccaputo commented Oct 12, 2020

@grizz wrote:

Summary
Add separate rate limits for API calls from auth and unauthed users.

If unauthed, aggregate by /24 or /64 for ipv4 and ipv6 respectively.

If authed, have a global default, but allow override on a per user or auth key basis.

@arnoldnipper wrote:

@ccaputo, I understand @grizz like

  • a GUI rate-limit (regardless of being authed or unauthed)
  • an API rate-limit for authed users
  • an API rate-limit for unauthed users

@grizz, I would like to see an in-database per-user rate limit knob for both API and GUI queries. Ex. user 'joe' could be limited to 10 qps to the GUI and 5 qps to /api/ URLs.

I think Ops/@peeringdb/oc needs to be able to change the global defaults, for auth and unauthed users, without needing to modify the internals of a container or the env on each instance, as apparently required by #311. Ie. as part of the database, but the parent process would need to periodically (ex. once per minute) yet efficiently poll for changes to the config.

it would also be great if we could rate-limit particular unauth CIDR blocks, via entries in the database, broken down by API vs. GUI queries. Ex. global unauth rate of 100 qps for API & GUI, 192.0.2/24 @ 20 qps for API and 30 qps for GUI, and 2001:DB8::/32 @ 5 qps for API and 10 qps for GUI. This would be in addition to applying the global unauth rate to aggregate /24's & /64's, also broken down by API vs. GUI query method.

Also, we need to know when rate limiting is actually happening, and based on which rule. It either needs to be logged (periodically, after first limiting of a specific type) to the django log or to the database.

@grizz
Copy link
Member

grizz commented Oct 27, 2020

@ccaputo agree completely, that makes sense.

This will also need to have a separate endpoint for GUI API calls.

@arnoldnipper
Copy link
Contributor

This will also need to have a separate endpoint for GUI API calls.

What are GUI API calls, @grizz?

@ccaputo
Copy link
Contributor Author

ccaputo commented Oct 27, 2020

This will also need to have a separate endpoint for GUI API calls.

What are GUI API calls, @grizz?

/api/ calls performed in the service of GUI requests. Ex. "POST /api/netixlan" when adding one's network record at an IX.

@arnoldnipper
Copy link
Contributor

/api/ calls performed in the service of GUI requests. Ex. "POST /api/netixlan" when adding one's network record at an IX.

Isn't then every GUI call a GUI API call?

@ccaputo
Copy link
Contributor Author

ccaputo commented Oct 27, 2020

/api/ calls performed in the service of GUI requests. Ex. "POST /api/netixlan" when adding one's network record at an IX.

Isn't then every GUI call a GUI API call?

I believe @grizz is referring to /api/ usage as a result of certain web/GUI interactions. Most web/GUI usage does not utilize /api/ requests, but some does, such as those that result in {POST|PUT|DELETE} operations.

@grizz
Copy link
Member

grizz commented Oct 27, 2020

Yes, @ccaputo is correct. The page will load without any API calls, but to interact with it, it's all API calls. GUI Api calls would essentially be adding a QoS layer to user interaction.

@arnoldnipper That was coming from a thread on the ops mailing list, sorry for not putting a bit more description to go along with it.

@arnoldnipper
Copy link
Contributor

The page will load without any API calls

Out of curiosity, @grizz. How is the data retrieved?

@grizz
Copy link
Member

grizz commented Oct 27, 2020

From a database read

@danieldjewell
Copy link

A couple of observations:

  1. While I understand the general idea/theory of rate limiting by CIDR, it has some major practical issues:

    1. Rate limiting by /24 in IPv4 is way overbroad and will unfairly/unjustly throttle/deny users who aren't even on the same network ... (and a /24 IPv4 doesn't come close to the same scope as a /64 in IPv6)
    2. Throttling based on a /64 in IPv6 makes more sense, in theory (because one could always use very-short lived temp addresses within that /64 network ...) However, given that many residential ISPs grant a /56 (mine does), that would give me ~256 subnets to use to avoid the throttle.
    3. None of the above takes into account NAT (IPv4 of course) or large single /64 subnet IPv6 shared networks (I mean, I would hope a big organization doesn't put everyone in a single /64, but....).
  2. It would appear that based on discussion here and in substantially rate limit unauthenticated /api/ queries to encourage authenticated queries #853 that the data itself is meant to be open? If so, why not make a simple static DB dump (perhaps once a day? Once an hour?) that the cli can download and use as a starting point for sync? (I've looked, but can't find any DB dumps...) It's likely that doing so would greatly reduce the number of calls to /api ... And would, at least, provide folks who are pounding the API to get the data (authenticated or not) an alternative (which would probably also be much faster and easier anyway)

  3. I am genuinely confused by the statement in substantially rate limit unauthenticated /api/ queries to encourage authenticated queries #853 that not having authenticated queries makes it difficult for Ops to find out who's behind the queries. Of course it's difficult! They're anonymous! That said, if logging the IP address isn't in the cards (and there are many privacy reasons why logging isn't a good idea), perhaps a more limited/aggregate type record where only the biggest anonymous users (offenders?) are logged/throttled? (I'd be surprised if the majority of API calls didn't come from a very small number of sources... It probably follows a Pareto-esque distribution -- i.e. the 80/20 rule.)

  4. FWIW, the mere existence of https://github.com/peeringdb/peeringdb-py (and especially its peeringdb sync command) appear to be at odds with the AUP? At a minimum, I think it's important for an anonymous user to be able to run the peeringdb sync command without running into throttling.

@ccaputo
Copy link
Contributor Author

ccaputo commented Aug 17, 2022

First I should say: It has been a while since this issue has been touched, and it may no longer be needed. API rate limiting of duplicate requests and query rates have been addressed with other issues. Non-API rate limiting may not be needed.

A couple of observations:

  1. While I understand the general idea/theory of rate limiting by CIDR, it has some major practical issues:

    1. Rate limiting by /24 in IPv4 is way overbroad and will unfairly/unjustly throttle/deny users who aren't even on the same network ... (and a /24 IPv4 doesn't come close to the same scope as a /64 in IPv6)
    2. Throttling based on a /64 in IPv6 makes more sense, in theory (because one could always use very-short lived temp addresses within that /64 network ...) However, given that many residential ISPs grant a /56 (mine does), that would give me ~256 subnets to use to avoid the throttle.
    3. None of the above takes into account NAT (IPv4 of course) or large single /64 subnet IPv6 shared networks (I mean, I would hope a big organization doesn't put everyone in a single /64, but....).

The workaround for a user impacted by throttling, is to login.

  1. It would appear that based on discussion here and in substantially rate limit unauthenticated /api/ queries to encourage authenticated queries #853 that the data itself is meant to be open? If so, why not make a simple static DB dump (perhaps once a day? Once an hour?) that the cli can download and use as a starting point for sync? (I've looked, but can't find any DB dumps...) It's likely that doing so would greatly reduce the number of calls to /api ... And would, at least, provide folks who are pounding the API to get the data (authenticated or not) an alternative (which would probably also be much faster and easier anyway)

While not a DB dump per se, peeringdb-py is very similar to this. It provides an incremental update mechanism with a rapid initial dump, consisting of a single HTTP GET per object type.

  1. I am genuinely confused by the statement in substantially rate limit unauthenticated /api/ queries to encourage authenticated queries #853 that not having authenticated queries makes it difficult for Ops to find out who's behind the queries. Of course it's difficult! They're anonymous! That said, if logging the IP address isn't in the cards (and there are many privacy reasons why logging isn't a good idea), perhaps a more limited/aggregate type record where only the biggest anonymous users (offenders?) are logged/throttled? (I'd be surprised if the majority of API calls didn't come from a very small number of sources... It probably follows a Pareto-esque distribution -- i.e. the 80/20 rule.)

IP addresses are routinely logged these days. I don't recall the specifics, but years ago we may not have been logging accurate source IPs due to the use of a load balancer front end, and a failure to parse the actual remote user IP. These days we do store the actual remote user IPs in access logs.

Broken scripts or scripts written by people who do not understand the costs, or fail to understand the tragedy of the commons, or fail to include delays, can come from a variety of sources. Many of the sources end up being in various clouds, using address space completely unrelated to the interested party or organization, from an RIR lookup standpoint.

  1. FWIW, the mere existence of https://github.com/peeringdb/peeringdb-py (and especially its peeringdb sync command) appear to be at odds with the AUP? At a minimum, I think it's important for an anonymous user to be able to run the peeringdb sync command without running into throttling.

Current throttle limits do not prevent an anonymous user from running peeringdb sync. That said, as a courtesy, I personally recommend users/orgs do authenticate.

The AUP supports the use of peeringdb-py for Internet operational purposes within an organization.

@danieldjewell
Copy link

danieldjewell commented Aug 17, 2022

Current throttle limits do not prevent an anonymous user from running peeringdb sync. That said, as a courtesy, I personally recommend users/orgs do authenticate.

I can personally confirm this to be 100% false as of yesterday (which is kinda what prompted me to even search GH Issues). I ran a peeringdb sync using the version of peeringdb-py that's current on PyPi and ran into HTTP 429 errors about halfway through.

EDIT: After a very brief glance at the server throttling code, I suspect (but cannot confirm) that this might have something to do with the Anon request size limit rather than the rate limit -- if that's enabled on the production server.

@ccaputo
Copy link
Contributor Author

ccaputo commented Aug 17, 2022

Current throttle limits do not prevent an anonymous user from running peeringdb sync. That said, as a courtesy, I personally recommend users/orgs do authenticate.

I can personally confirm this to be 100% false as of yesterday (which is kinda what prompted me to even search GH Issues). I ran a peeringdb sync using the version of peeringdb-py that's current on PyPi and ran into HTTP 429 errors about halfway through.

EDIT: After a very brief glance at the server throttling code, I suspect (but cannot confirm) that this might have something to do with the Anon request size limit rather than the rate limit -- if that's enabled on the production server.

Sorry for the problems. I have analyzed the logs of anonymous peeringdb-py clients and have now raised the throttle setting from 10 requests per minute to 20 requests per minute. Hopefully that will be sufficient for peeringdb sync tasks. (Please don't sync more frequently than once per hour. If run from cron, use the RANDOM function in the docs to avoid a thundering herd.)

Please report back if this helps things.

After testing and hopefully confirming improvement, I highly recommend authenticating using an API key since there may be a future in which authentication is required in order to run peeringdb-py without problems.

@ccaputo
Copy link
Contributor Author

ccaputo commented Aug 18, 2022

Sorry for the problems. I have analyzed the logs of anonymous peeringdb-py clients and have now raised the throttle setting from 10 requests per minute to 20 requests per minute. Hopefully that will be sufficient for peeringdb sync tasks. (Please don't sync more frequently than once per hour. If run from cron, use the RANDOM function in the docs to avoid a thundering herd.)

Since the anonymous limit change from 10 qpm to 20 qpm some 21 hours ago, 429's due to apparent peeringdb-py users has fallen to zero. I am going to announce this change to pdb-tech (https://lists.peeringdb.com/cgi-bin/mailman/listinfo/pdb-tech) and update https://docs.peeringdb.com/howtos/ accordingly.

@ccaputo
Copy link
Contributor Author

ccaputo commented Feb 28, 2023

My perspective of current throttling and the user education that has happened in the last year, and associated increase in users working to query PeeringDB more efficiently and/or take advantage of tools such as:

https://docs.peeringdb.com/howto/api_keys/
https://docs.peeringdb.com/howto/peeringdb-py/
https://github.com/peeringdb/peeringdb-py

is that we may no longer need to drill down to the level of throttling proposed by this issue.

Keeping in mind @grizz's Summary should we need to revisit this topic:

Summary
Add separate rate limits for API calls from auth and unauthed users.

If unauthed, aggregate by /24 or /64 for ipv4 and ipv6 respectively.

If authed, have a global default, but allow override on a per user or auth key basis.

if there are no objections, I intend to close this issue in a week.

@ccaputo ccaputo closed this as completed Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Time:Epic More than 10 hours work
Projects
None yet
Development

No branches or pull requests

9 participants