-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid URIs are not detected. #161
Comments
This is what I am wondering, too. I stumbled upon this because validates_url is using the addressable gem to find out if an url is correct and things like |
From my point of view such URLs should be invalid. According to RFC3986:
whereas
Hence, whitespace is not an allowed character at least for the host. |
Addressable, for the most part, takes a very liberal view towards validating URLs. It was originally written because the Ruby standard library's URI implementation took the opposite approach and raised exceptions for things that were either weird, but actually valid URIs or invalid, but close enough that just parsing the URI without complaint would have resulted in being able to do something useful instead of being forced to just handle the exception and fail. For example, in older versions of Ruby, trying to parse 'http://user_name.example.com' would result in an invalid URI error. The '_' character is explicitly part of the So this kind of thing tends to inform my philosophy on when to raise exceptions and when not to – namely that I raise an exception only when a URI is invalid according to RFC 3986 and I ignore what related specifications have to say on the matter. And even then I tend to err on the side of not raising exceptions to allow those following Postel's Law to do something useful. In this case however, I completely agree on your reading of what should and shouldn't be allowed in the host component and I do think it would be reasonable to raise an exception for whitespace in the hostname, so long as the |
Just to add some additional context for this issue. Even if you decide that throwing an exception is not the correct approach, it would be nice if uri = Addressable::URI.parse(" //google.com/foo/bar")
=> #<Addressable::URI:0x884d1e70 URI: //google.com/foo/bar>
irb(main):002:0> uri.relative?
=> true
irb(main):003:0> uri.host
=> nil
# vs.
``` ruby
uri = Addressable::URI.parse("//google.com/foo/bar")
=> #<Addressable::URI:0x884ba9dc URI://google.com/foo/bar>
irb(main):005:0> uri.relative?
=> true
irb(main):006:0> uri.host
=> "google.com" Browsers seem to treat both situations the same (they ignore leading/trailing whitespace). So, in order to reliably model how a browser will act I need to perform a |
Historically, I've advocated for developers using |
I definitely want an indication of error when the host part violates RFC3986 because I'm trying to provide feedback to the user that the URL they entered was invalid. Given that we have |
Even heuristic_parse fails in some cases
The "google." is not a valid host. I think the host validation should be changed here. I do a heuristic_parse and again do the validation with a regex as of now.
I think we can use this regex in host= method so that it gives an InvalidURIError |
An even simpler case I ran into that fails with both |
@ovamsikrishna What are you expecting to have happen there? Browsers don't resolve |
I was comparing the behavior of URI.parse and Addressable::URI.parse and noticed that URI.parse will strip leading and trailing whitespace from the uri. Addressable::URI did not appear to strip any space and, upon further inspection, seems to allow any manner of invalid uri to be provided:
Addressable::URI.parse("\c$/BADCATZ ").to_s
Addressable::URI.parse(" http: ").to_s
Addressable::URI.parse("**4 \c\a\t ~").to_s
I am probably doing something wrong but URI.parse detects all of the examples above as invalid, which is what I would expect. Is this expected behavior from Addressable::URI.parse?
The text was updated successfully, but these errors were encountered: