Consider not forcing www canonicalization for API requests, due to endless stream of errant API requests now happening #1139

ccaputo · 2022-03-30T16:42:58Z

#916 implemented www.peeringdb.com canonicalization.

While I understand there are benefits for browser-originated non-API requests, I am not sure there is a benefit to forcing this canonicalization on API requests, and yet a major downside has come up. We now have a continuous stream of API requests in which the client is ignoring the 301 redirect. These are burning CPU and log space, in addition to obscuring other issues when observing logs, since they outnumber everything else. A sample from one server instance, with first three octets of IPv4 addresses obscured:

119 - - [30/Mar/2022:16:39:55 +0000] "GET /api/netixlan?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
147 - - [30/Mar/2022:16:39:55 +0000] "GET /api/ixfac?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
147 - - [30/Mar/2022:16:39:55 +0000] "GET /api/poc?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
147 - - [30/Mar/2022:16:39:55 +0000] "GET /api/ix?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
147 - - [30/Mar/2022:16:39:55 +0000] "GET /api/net?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
192 - - [30/Mar/2022:16:39:55 +0000] "GET /api/org?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
147 - - [30/Mar/2022:16:39:55 +0000] "GET /api/netixlan?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
243 - - [30/Mar/2022:16:39:55 +0000] "GET /api/netixlan?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.002
192 - - [30/Mar/2022:16:39:55 +0000] "GET /api/netfac?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
243 - - [30/Mar/2022:16:39:55 +0000] "GET /api/ixfac?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.002
192 - - [30/Mar/2022:16:39:55 +0000] "GET /api/netixlan?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
119 - - [30/Mar/2022:16:39:55 +0000] "GET /api/org?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
243 - - [30/Mar/2022:16:39:55 +0000] "GET /api/net?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
243 - - [30/Mar/2022:16:39:55 +0000] "GET /api/poc?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
243 - - [30/Mar/2022:16:39:55 +0000] "GET /api/fac?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
243 - - [30/Mar/2022:16:39:55 +0000] "GET /api/ixpfx?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
243 - - [30/Mar/2022:16:39:55 +0000] "GET /api/org?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
119 - - [30/Mar/2022:16:39:55 +0000] "GET /api/ixlan?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
119 - - [30/Mar/2022:16:39:55 +0000] "GET /api/net?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
119 - - [30/Mar/2022:16:39:55 +0000] "GET /api/ix?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
119 - - [30/Mar/2022:16:39:55 +0000] "GET /api/ixfac?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
147 - - [30/Mar/2022:16:39:55 +0000] "GET /api/ixfac?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
243 - - [30/Mar/2022:16:39:55 +0000] "GET /api/netixlan?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
147 - - [30/Mar/2022:16:39:55 +0000] "GET /api/ixpfx?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
147 - - [30/Mar/2022:16:39:55 +0000] "GET /api/ix?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
147 - - [30/Mar/2022:16:39:55 +0000] "GET /api/poc?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
147 - - [30/Mar/2022:16:39:55 +0000] "GET /api/netfac?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
147 - - [30/Mar/2022:16:39:55 +0000] "GET /api/fac?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
147 - - [30/Mar/2022:16:39:55 +0000] "GET /api/ixlan?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
192 - - [30/Mar/2022:16:39:55 +0000] "GET /api/netfac?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.002
192 - - [30/Mar/2022:16:39:55 +0000] "GET /api/fac?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001
192 - - [30/Mar/2022:16:39:55 +0000] "GET /api/net?depth=1 HTTP/1.1" 301 0 "-" "python-httpx/0.20.0" 0.001

Short of null routing or ACLing the requests, I don't know how stop them or contact them (without substantial work), and so I fear we may have years of this happening.

We may want to consider not forcing this canonicalization of API requests, if it doesn't break anything to do so, and helps restore clients to functionality.

Current and upcoming API throttling tools (ex #1126) may not help this issue, depending on whether the canonicalization happens before or after API throttling. If nothing else, we may need to have canonicalization after API throttling.

The text was updated successfully, but these errors were encountered:

arnoldnipper · 2022-03-30T23:59:26Z

@peeringdb/pc should be hotfixed IMO

ynbrthr · 2022-04-01T23:30:29Z

+1 in principle, I got bitten by that issue (curl not following the redirect) but I don't know if that'd break anything wrt auth indeed

ccaputo · 2022-04-04T14:13:43Z

Current and upcoming API throttling tools (ex #1126) may not help this issue, depending on whether the canonicalization happens before or after API throttling. If nothing else, we may need to have canonicalization after API throttling.

@vegu has indicated off GitHub that changing the ordering of API throttling to be ahead of canonicalization is likely a non-trivial change. He also indicated that changing the canonicalization to only apply to non-API requests does not appear problematic from a coding standpoint.

I recommend this change - don't canonicalize API requests - ASAP move forward unless there is an objection to be considered.

leovegoda · 2022-04-04T14:42:05Z

@peeringdb/pc @ccaputo is this something that needs a point release or should we schedule it for the next schedule release?

grizz · 2022-04-04T15:47:08Z

To play devil's advocate, since those aren't working now and not having www. will break caching, wouldn't it make more sense to just drop those requests after the first the redirect?

Eventually someone will notice and correct their script. :)

martinhannigan · 2022-04-04T16:07:30Z

I like this.

…

On Mon, Apr 4, 2022 at 11:47 AM Matt Griswold ***@***.***> wrote: To play devil's advocate, since those aren't working now and not having www. will break caching, wouldn't it make more sense to just drop those requests after the first the redirect? Eventually someone will notice and correct their script. :) — Reply to this email directly, view it on GitHub <#1139 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFA2YQXE3GNJMIK2XLIYV4LVDMFIPANCNFSM5SCQ64VA> . You are receiving this because you are on a team that was mentioned.Message ID: ***@***.***>

ccaputo · 2022-04-04T16:37:17Z

To play devil's advocate, since those aren't working now and not having www. will break caching, wouldn't it make more sense to just drop those requests after the first the redirect?

Eventually someone will notice and correct their script. :)

@grizz: I'm intrigued. By drop do you mean no response whatsoever, just close TCP, or a 4xx response or ???

Would you maintain state in some manner, like for a given IP having already been told to redirect in the last 5 mins, if they ignore just drop?

@peeringdb/pc @ccaputo is this something that needs a point release or should we schedule it for the next schedule release?

@leovegoda: Not sure. Let's see where the above goes before figuring that out.

grizz · 2022-04-04T17:11:42Z

@grizz: I'm intrigued. By drop do you mean no response whatsoever, just close TCP, or a 4xx response or ???

I was originally thinking just close TCP, but we could probably return a 4XX as well, as long as it's caught early in the pipeline (which it would be on the redirect handler).

Would you maintain state in some manner, like for a given IP having already been told to redirect in the last 5 mins, if they ignore just drop?

Yes, that's what I was thinking. Give a redirect, any other requests from the same IP for X time period will just get ignored.

ccaputo · 2022-04-04T17:19:52Z

@grizz: I'm intrigued. By drop do you mean no response whatsoever, just close TCP, or a 4xx response or ???

I was originally thinking just close TCP, but we could probably return a 4XX as well, as long as it's caught early in the pipeline (which it would be on the redirect handler).

TBD how nginx would handle the closed TCP request, but something that slows the requests like that could, would be nice. I wonder if there is a hack possible (or already in existence) to signal nginx to be similarly harsh with the client.

Would you maintain state in some manner, like for a given IP having already been told to redirect in the last 5 mins, if they ignore just drop?

Yes, that's what I was thinking. Give a redirect, any other requests from the same IP for X time period will just get ignored.

I love the idea. Let's see if anyone else has thoughts on this.

netravnen · 2022-04-07T20:14:13Z

Would you maintain state in some manner, like for a given IP having already been told to redirect in the last 5 mins, if they ignore just drop?

Yes, that's what I was thinking. Give a redirect, any other requests from the same IP for X time period will just get ignored.

I love the idea. Let's see if anyone else has thoughts on this.

I'm in favour of this approach!

If we do not enforce the change to www for all requests and "just" implement workarounds. We are no closer to the goal than
when we set out to start the change, in my honest opinion.

ccaputo · 2022-04-16T17:46:23Z

@peeringdb/pc can we make sure @grizz's solution goes into the next release? The endless 301's continue unabated.

mcmanuss8 · 2022-04-18T17:53:53Z

Can we reach out to the users of these requests (if they're using user or org api keys) and ask them to switch?

ccaputo · 2022-04-18T18:42:37Z

Can we reach out to the users of these requests (if they're using user or org api keys) and ask them to switch?

This isn't really practical.

We don't have auth-id logging enabled on production to know, but I highly doubt these are using api keys.

In the last log rotation day (ie. a full day, now gzipped) there were over 1,900 unique source IP addresses making API queries and being redirected. Of these, 195 different source IP addresses made the same query 100 times or more. Of that, 116 different source IP addresses made the same query 1,000 times or more.

ccaputo · 2022-05-13T22:34:37Z

A quick check indicates about 30% of the HTTP requests to the API servers result in 301 redirects. It would be good to make progress on this.

ccaputo · 2022-05-17T15:11:25Z

I learned today that this problem triggered a bug in one user's code that resulted in PeeringDB getting far more requests than previous, from the user. It would be good to make progress on this.

grizz · 2022-05-18T16:15:36Z

Actually, I think it would make more sense to do this in AWS or a thin responder that's not tied or touching PeeringDB at all. With the added advantage that we don't need to wait for a release.

Since they're spamming and not following the redirects, instead of closing the connection, it would be nice to send a redirect and then tarpit the connection.
I think doing the redirect in AWS would just fix it, but it would still be spamming up to the Load Balancers. If we care about that, we can do a quick App that responds as I said above for all non www requests.

Thoughts?

ccaputo · 2022-05-18T17:46:14Z

Potentially interesting re AWS. Following up with you via email.

ccaputo · 2022-08-02T17:16:25Z

I think we need to revisit this. The forcing of canonical for /api/ requests continues to waste developer time (#1206). There is an apparent bug in the python requests module in which the followup query after a 301 redirect, fails to include (or mangles?) the request headers necessary for authentication. Thus the PeeringDB server responds as if it is an unauthenticated request, while the developer thinks they are authenticating. Chaos ensues. Totally opaque.

mcmanuss8 · 2022-08-02T19:14:48Z

+1 @grizz what are our options here?

ccaputo · 2022-08-09T21:35:05Z

I think we need to revisit this. The forcing of canonical for /api/ requests continues to waste developer time (#1206). There is an apparent bug in the python requests module in which the followup query after a 301 redirect, fails to include (or mangles?) the request headers necessary for authentication. Thus the PeeringDB server responds as if it is an unauthenticated request, while the developer thinks they are authenticating. Chaos ensues. Totally opaque.

Turns out this isn't a bug in the python requests module (code here: https://github.com/psf/requests/blob/177dd90f/requests/sessions.py#L284-L296), but rather "by-design" to prevent leakage of credentials in the face of a malicious 301 redirect. Other software packages may perform this same safety credential dropping upon redirect.

BTW, another PeeringDB client developer hit this problem today. It is opaque because they get an HTTP 200 and some useful data after the redirect, without realizing it is unauthenticated and thus possibly missing User-only fields.

dwcarder · 2022-08-09T22:14:03Z

So, I hit this issue today (see #1220 for a bit more color)

The symptom was that the API worked via the redirect, but the auth header was stripped off for my protection via python-requests as @ccaputo describes. This had the effect of the api appearing to work, but note this would now be an anonymous query subjecting my client to the lower rate-limit now in effect. I would have preferred either that there was no redirect for API access, or a hard 4xx error forcing me to fix my client code.

martinhannigan · 2022-10-11T09:20:39Z

What would keep it out?

…

On Sat, Apr 16, 2022 at 13:46 Chris Caputo ***@***.***> wrote: @peeringdb/pc <https://github.com/orgs/peeringdb/teams/pc> can we make sure @grizz <https://github.com/grizz>'s solution goes into the next release? The endless 301's continue unabated. — Reply to this email directly, view it on GitHub <#1139 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFA2YQTBAAYETC34KMZHJODVFL4HXANCNFSM5SCQ64VA> . You are receiving this because you are on a team that was mentioned.Message ID: ***@***.***>

arnoldnipper assigned grizz Mar 30, 2022

arnoldnipper added this to the 1 Decide milestone Mar 31, 2022

ccaputo mentioned this issue Aug 2, 2022

HTTP Basic Authentication not working without "www" prefix in URL #1206

Closed

arnoldnipper modified the milestones: 1 Decide, 2 Consensus Reached, 3a Needs Implementation discussion Aug 9, 2022

netravnen mentioned this issue Jan 10, 2024

CORS Access-Control-Allow-Origin header missing in API responses #1034

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider not forcing www canonicalization for API requests, due to endless stream of errant API requests now happening #1139

Consider not forcing www canonicalization for API requests, due to endless stream of errant API requests now happening #1139

ccaputo commented Mar 30, 2022

arnoldnipper commented Mar 30, 2022

ynbrthr commented Apr 1, 2022

ccaputo commented Apr 4, 2022 •

edited

leovegoda commented Apr 4, 2022

grizz commented Apr 4, 2022

martinhannigan commented Apr 4, 2022 via email

ccaputo commented Apr 4, 2022 •

edited

grizz commented Apr 4, 2022

ccaputo commented Apr 4, 2022

netravnen commented Apr 7, 2022

ccaputo commented Apr 16, 2022

mcmanuss8 commented Apr 18, 2022 •

edited

ccaputo commented Apr 18, 2022

ccaputo commented May 13, 2022

ccaputo commented May 17, 2022

grizz commented May 18, 2022

ccaputo commented May 18, 2022

ccaputo commented Aug 2, 2022 •

edited

mcmanuss8 commented Aug 2, 2022

ccaputo commented Aug 9, 2022

dwcarder commented Aug 9, 2022

martinhannigan commented Oct 11, 2022 via email

Consider not forcing www canonicalization for API requests, due to endless stream of errant API requests now happening #1139

Consider not forcing www canonicalization for API requests, due to endless stream of errant API requests now happening #1139

Comments

ccaputo commented Mar 30, 2022

arnoldnipper commented Mar 30, 2022

ynbrthr commented Apr 1, 2022

ccaputo commented Apr 4, 2022 • edited

leovegoda commented Apr 4, 2022

grizz commented Apr 4, 2022

martinhannigan commented Apr 4, 2022 via email

ccaputo commented Apr 4, 2022 • edited

grizz commented Apr 4, 2022

ccaputo commented Apr 4, 2022

netravnen commented Apr 7, 2022

ccaputo commented Apr 16, 2022

mcmanuss8 commented Apr 18, 2022 • edited

ccaputo commented Apr 18, 2022

ccaputo commented May 13, 2022

ccaputo commented May 17, 2022

grizz commented May 18, 2022

ccaputo commented May 18, 2022

ccaputo commented Aug 2, 2022 • edited

mcmanuss8 commented Aug 2, 2022

ccaputo commented Aug 9, 2022

dwcarder commented Aug 9, 2022

martinhannigan commented Oct 11, 2022 via email

ccaputo commented Apr 4, 2022 •

edited

ccaputo commented Apr 4, 2022 •

edited

mcmanuss8 commented Apr 18, 2022 •

edited

ccaputo commented Aug 2, 2022 •

edited