Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use raw_path instead of path in the router to be able to differentiate requests which include %2F #1828

Closed
wants to merge 3 commits into from

Conversation

ChiliJohnson
Copy link

I proposed this functionality in #1827 and added a bit of color in there, this was a quick-and-dirty change so I don't really have a good sense for any far-reaching implications of this change. But it was relatively-simple to implement so I thought I'd give it a shot.

Using raw_path instead of the pre-parsed path in the router would enable the it to distinguish between paths which include URL-encoded slashes, e.g. between my%2Froute/example and my/route/example

…inguish between paths when there are url-encoded slashes.
…distinguish between routes with path conversions when path parameters have URL-encoded slashes.
@ChiliJohnson ChiliJohnson changed the title Raw path routing Use raw_path instead of path in the router to be able to differentiate requests which include %2F Aug 29, 2022
@tomchristie
Copy link
Member

Compare and contrast with a more well established Python web framework.

What does Flask do?

@Kludex Kludex mentioned this pull request Feb 14, 2023
8 tasks
@Kludex
Copy link
Sponsor Member

Kludex commented Feb 16, 2023

@ChiliJohnson Are you still interested in this PR?

@ChiliJohnson
Copy link
Author

Yes I'm definitely still interested!

As far as Flask / Werkzeug / WSGI go, it seems that PEP 3333 implicitly requires that the path variable (used for routing in Flask/Werkzeug) be pre-unquoted by time they reach the app. So as a result, Flask also has this inability to distinguish between / and %2F at the app level.

Here are a few relevant discussions about this in WSGI land:

I guess the question here is do we want Starlette to match this behavior with the WSGI-based projects, or use this feature as a point of differentiation from them?

It seems like it could be nice to give app developers more control in how they route requests, but at the same time I can see benefit to Starlette acting more like the other players in this space

@adriangb
Copy link
Member

My vote would be to move forward with this change. I like the idea of giving developers more power.

One thing I’m not immediately clear on is what does this break? Can it brake any Starlette or FastAPI applications? Will this require changes in FastAPI itself?

@Kludex Kludex added this to the Version 1.x milestone Mar 10, 2023
@Kludex
Copy link
Sponsor Member

Kludex commented Mar 12, 2023

In any case, the decision other web framework made have influence in our decisions here...

Besides that, it doesn't look like there's many people interested in this over the years...

@ChiliJohnson
Copy link
Author

This would break web apps relying on the fact that /path/some/example and /path%2Fsome/example currently match a single "/path/{param:path}" route, which given that most URLs are probably constructed in a consistent way, likely means that there isn't a huge surface area for affected FastAPI / Starlette apps.

A complicating factor here is that there's the potential that an upstream web server or reverse proxy may be pre-unquoting the URL before we even know about it. In the case of Uvicorn + Starlette (including the changes from this PR) this doesn't happen, but I can see how it would be possible; and in that case the behavior of routing should remain exactly the same

@kefir-
Copy link

kefir- commented Aug 7, 2023

As far as Flask / Werkzeug / WSGI go, it seems that PEP 3333 implicitly requires that the path variable (used for routing in Flask/Werkzeug) be pre-unquoted by time they reach the app. So as a result, Flask also has this inability to distinguish between / and %2F at the app level.

Is PEP 3333 (implicitly) in conflict with this explicit description in RFC 3986 aka STD 66? I should think the explicit RFC/STD takes precedence here.

2.4. When to Encode or Decode

[snip]

When a URI is dereferenced, the components and subcomponents
significant to the scheme-specific dereferencing process (if any)
must be parsed and separated before the percent-encoded octets within
those components can be safely decoded, as otherwise the data may be
mistaken for component delimiters. The only exception is for
percent-encoded octets corresponding to characters in the unreserved
set, which can be decoded at any time. For example, the octet
corresponding to the tilde ("~") character is often encoded as "%7E"
by older URI processing implementations; the "%7E" can be replaced by
"~" without changing its interpretation.

@kefir-
Copy link

kefir- commented Aug 8, 2023

Interpreting %2F as / is also against W3C recommendations, as described in Example 2 at https://www.w3.org/Addressing/URL/4_URI_Recommentations.html

Screenshot:

a94ac64a0b108cb9

@ChiliJohnson
Copy link
Author

Thanks for the great research @kefir- ! From what you've posted it definitely seems like the current behavior of the router is in breach of these core web standards; I guess the question now is is there too much inertia coming from all the apps out there which maybe (intentionally or unintentionally) depend on this incorrect behavior? Maybe it makes sense to release this as part of a new major version?

@kefir-
Copy link

kefir- commented Aug 8, 2023

I'll add another source. That is RFC 3986 section 2.2, which states:

URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent.

/ is a reserved character.

@fclairamb
Copy link

I have a concrete example of why this is a problem. I'm creating an HTTP proxy to simplify authentication around AWS's codeartifact and some URLs can contain / in their name.
This service hosts packages that can sometimes contain "/". A request to the package /npm/$repository/@angular%2Fforms/ is proxied to /npm/$repository/@angular/forms/ and becomes a 400.

So far the only way I found to work around this is to identify those queries and URL-encode the decoded URL string (which should have never been decoded in the first place).

Argument extracted from the path should indeed be URL-decoded but the Request.url property should not be decoded.

Please fix this.

@Kludex
Copy link
Sponsor Member

Kludex commented Dec 6, 2023

Related: django/asgiref#51

@Kludex
Copy link
Sponsor Member

Kludex commented Dec 7, 2023

@simonw If you have time, could you give me your opinion here? 🙏

@Kludex Kludex modified the milestones: Version 1.x, Version 1.0 Dec 7, 2023
@Kludex
Copy link
Sponsor Member

Kludex commented Dec 16, 2023

@Kludex Kludex closed this Dec 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants