Skip to content
This repository has been archived by the owner on Nov 6, 2022. It is now read-only.

Allow octets > 127 in path components. #37

Closed
wants to merge 1 commit into from
Closed

Allow octets > 127 in path components. #37

wants to merge 1 commit into from

Conversation

pgriess
Copy link
Contributor

@pgriess pgriess commented May 11, 2011

  • This is non-spec behavior, but it appears that most HTTP servers
    implicitly support non-ASCII characters when parsing path components.
    Extend http-parser to allow this.
  • Fill out slots [128, 256) in normal_url_char[] with 1 so that these
    high octets are accepted in path components.
  • Add unit test for paths that include such non-ASCII characters.

- This is non-spec behavior, but it appears that most HTTP servers
  implicitly support non-ASCII characters when parsing path components.
  Extend http-parser to allow this.
- Fill out slots [128, 256) in normal_url_char[] with 1 so that these
  high octets are accepted in path components.
- Add unit test for paths that include such non-ASCII characters.
@ry ry closed this in 50b9bec May 11, 2011
@ry
Copy link
Contributor

ry commented May 11, 2011

LGTM. thanks!

@mnot
Copy link

mnot commented May 12, 2011

I don't have a big problem with doing this in http_parser, I guess, but I'd be concerned if node's client started sending requests with non-ascii URLs; as you point out, some servers (including proxies) don't support them.

To me, making this kind of change is only making things better for people who are doing things wrong, and postponing (a bit) their realisation of that. I.e., they're going to have interop problems with servers / intermediaries that don't support this anyway, so they shouldn't be doing it -- what's the use case for supporting it?

@pgriess
Copy link
Contributor Author

pgriess commented May 21, 2011

Thanks for taking a look at this, Mark.

The use-case for this is that I have an http-parser-based proxy (not Node) that is rejecting requests with UTF-8 paths as malformed. I don't control the clients that are sending requests to this proxy, and it appears that this type of traffic is not as uncommon as I would have hoped.

I suppose http-parser could support both strict and lenient modes for those who wished to control this a bit more closely. Ry, would you be interested in that? In practice, I'd imagine that most (all?) people would run with strict mode disabled unless they controlled request generation.

@mnot
Copy link

mnot commented May 21, 2011

Sounds reasonable. One of the things I'd really like to see out of the parser is more fine-grained information, rather than all-or-nothing failures.

@ry
Copy link
Contributor

ry commented May 21, 2011

@pgriess there's already a HTTP_STRICT preprocessor symbol. I'd be for rejecting UTF8 paths based on that.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants