Update URI::Generic.build/build2 to use RFC3986_PARSER #105
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[Description below cross-posted from https://bugs.ruby-lang.org/issues/19266]
In June 2014, uri/common was updated to introduce a RFC3986-compliant parser (
URI::RFC3986_PARSER
) as an alternative to the previous RFC2396 parser, and common methods likeURI()
were updated to use that new parser by default. The only methods incommon
not updated wereURI.extract
andURI.regexp
which are marked as obsolete. (The old parser was kept in theDEFAULT_PARSER
constant despite it not being the default for those methods, presumably for backward compatibility.)However, similar methods called on
URI::Generic
were never updated to use this new parser. This means that methods likeURI::Generic.build
fail when given input that succeeds normally, and this also affects subclasses likeURI::HTTP
:URI::Generic.new
allows a configurableparser
positional argument to override the class' default parser, but other factory methods like.build
don't allow this override.Arguably this doesn't cause problems because at least in the case above, the URI can be built with the polymorphic constructor, but having the option to build URIs from explicit named parts is useful, and leaving the outdated functionality in the
Generic
class is ambiguous. It's possible that the whole Generic class and its subclasses aren't intended to be used directly how I'm intending here, but there's nothing I could see that suggested this is the case.I'm not aware of the entire list of differences between RFC2396 and RFC3986. The relevant difference here is that in RFC2396 an individual segment of a host (
domainlabel
s) could only bealphanum | alphanum *( alphanum | "-" ) alphanum
, whereas RFC3986 allows hostnames to include any ofALPHA / DIGIT / "-" / "." / "_" / "~"
. It's possible that other differences might cause issues for developers, but since this has gone over 9 years without anyone else caring about this, this is definitely not especially urgent.