Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update URI::Generic.build/build2 to use RFC3986_PARSER #105

Closed
wants to merge 1 commit into from

Conversation

gareth
Copy link

@gareth gareth commented May 28, 2024

[Description below cross-posted from https://bugs.ruby-lang.org/issues/19266]


In June 2014, uri/common was updated to introduce a RFC3986-compliant parser (URI::RFC3986_PARSER) as an alternative to the previous RFC2396 parser, and common methods like URI() were updated to use that new parser by default. The only methods in common not updated were URI.extract and URI.regexp which are marked as obsolete. (The old parser was kept in the DEFAULT_PARSER constant despite it not being the default for those methods, presumably for backward compatibility.)

However, similar methods called on URI::Generic were never updated to use this new parser. This means that methods like URI::Generic.build fail when given input that succeeds normally, and this also affects subclasses like URI::HTTP:

$ pry -r uri -r uri/common -r uri/generic

[1] pry(main)> URI::Generic.build(host: "underscore_host.example")
URI::InvalidComponentError: bad component(expected host component): underscore_host.example
from /Users/gareth/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/uri/generic.rb:591:in `check_host'

[2] pry(main)> URI::HTTP.build(host: "underscore_host.example")
URI::InvalidComponentError: bad component(expected host component): underscore_host.example
from /Users/gareth/.asdf/installs/ruby/3.1.3/lib/ruby/3.1.0/uri/generic.rb:591:in `check_host'

[3] pry(main)> URI("http://underscore_host.example")
=> #<URI::HTTP http://underscore_host.example>

URI::Generic.new allows a configurable parser positional argument to override the class' default parser, but other factory methods like .build don't allow this override.

Arguably this doesn't cause problems because at least in the case above, the URI can be built with the polymorphic constructor, but having the option to build URIs from explicit named parts is useful, and leaving the outdated functionality in the Generic class is ambiguous. It's possible that the whole Generic class and its subclasses aren't intended to be used directly how I'm intending here, but there's nothing I could see that suggested this is the case.

I'm not aware of the entire list of differences between RFC2396 and RFC3986. The relevant difference here is that in RFC2396 an individual segment of a host (domainlabels) could only be alphanum | alphanum *( alphanum | "-" ) alphanum, whereas RFC3986 allows hostnames to include any of ALPHA / DIGIT / "-" / "." / "_" / "~". It's possible that other differences might cause issues for developers, but since this has gone over 9 years without anyone else caring about this, this is definitely not especially urgent.

@gareth
Copy link
Author

gareth commented May 28, 2024

There are other issues in the Ruby bug tracker such as #19756 where the discussion immediately turns to whether underscores should be allowed in hostnames.

For me that's not the issue here. Ruby has supported underscores in URI hostnames since 2014, just not with the generic build methods, and I can't see a way that this specific difference is justified.

@jeremyevans
Copy link
Contributor

Failures are not your fault. I think this repo needs something like ruby/webrick@6f41212 and ruby/webrick@a27d7ed

In bb83f32, a new URI parser was introduced to support RFC3986, where
previously support was limited to RFC2396. The default parser used by
URI() was updated to point to the new parser.

However, the parser used by URI::Generic.build and build2 still points
at the old parser. Among other things this means that hostnames with
underscores can't correctly be parsed by this method.

This commit switches to explicitly use the newer parser for these
methods, and adds a previously failing test that now passes.
@hsbt
Copy link
Member

hsbt commented Jun 11, 2024

I fixed CI and rebased this PR.

Copy link
Contributor

@jeremyevans jeremyevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend this be merged, but the final decision is with @nurse

@hsbt hsbt closed this in #107 Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants