Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URI Parse Error #376

Closed
michaeltelford opened this issue Jan 22, 2020 · 7 comments
Closed

URI Parse Error #376

michaeltelford opened this issue Jan 22, 2020 · 7 comments

Comments

@michaeltelford
Copy link

I'm running ruby 2.5.3 and addressable 2.7.0.

Consider the following:

[1] pry(main)> Addressable::URI.parse 'http://'
Addressable::URI::InvalidURIError: Absolute URI missing hierarchical segment: 'http://'
from /home/michael/.rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/addressable-2.7.0/lib/addressable/uri.rb:2449:in `validate'

Some might argue that it's OK to fail here because http:// is indeed an invalid URL. However, so is the following, which parses successfully:

[2] pry(main)> Addressable::URI.parse '!'
=> #<Addressable::URI:0x1316fe0 URI:!>

In the interest of making the code as robust as possible, is it not best to support the parsing of http:// (and https://)? My program extracts links from webpages before parsing them with Addressable::URI.parse. Since you get all sorts of crap on the web, it would be good to be able to support this scenario in Addressable rather than me having to implement a work-a-round.

@dentarg
Copy link
Collaborator

dentarg commented Jan 23, 2020

My program extracts links from webpages before parsing them with Addressable::URI.parse. Since you get all sorts of crap on the web, it would be good to be able to support this scenario in Addressable rather than me having to implement a work-a-round.

I understand where you are coming from. A sane? method has been discussed before to do this:

Feel free to check out https://github.com/twingly/twingly-url, I think it can handle your cases:

irb(main):004:0> Twingly::URL.parse("https://")
=> #<Twingly::URL::NullURL:0x00007fe6dc981258>
irb(main):005:0> Twingly::URL.parse("http://")
=> #<Twingly::URL::NullURL:0x00007fe6dca1b9c0>
irb(main):006:0> Twingly::URL.parse("!")
=> #<Twingly::URL::NullURL:0x00007fe6dca134a0>
irb(main):007:0> Twingly::URL.parse("!").valid?
=> false

(Note that some things (like the normalization) in twingly-url is purposefully built according to specific needs from Twingly (I work there))

@dentarg
Copy link
Collaborator

dentarg commented Jan 23, 2020

I will close this as a duplicate of #257

@dentarg dentarg closed this as completed Jan 23, 2020
@michaeltelford
Copy link
Author

michaeltelford commented Jan 25, 2020

Hi @dentarg,

Thanks for your comments. With respect, this issue isn't a duplicate of #257. I don't want an is_parsable? method etc. I want Addressable::URI.parse to successfully parse http:// and https:// strings; in the same way that parse works with the string ! (which is also invalid as far as URLs go).

The twingly-url gem looks cool but I'm already using addressable-uri and would prefer not to have to swap out dependencies and update my code base accordingly. Up until now, addressable-uri hasn't caused me any issues.

@dentarg
Copy link
Collaborator

dentarg commented Jan 25, 2020

Okay, I see.

This issue will stay closed because the behaviour you are seeing is intentional, see #216 (comment) for an explanation.

! works because it is considered relative, not absolute:

$ irb -r addressable/uri
irb(main):001:0> Addressable::URI.parse("!")
=> #<Addressable::URI:0x3fd5d486a774 URI:!>
irb(main):002:0> Addressable::URI.parse("!").relative?
=> true

@michaeltelford
Copy link
Author

Are there any other Strings (besides http:// and https://) that will cause an error upon parse? The fuller list I have the better work-a-round I can code in my application. Thanks.

@dentarg
Copy link
Collaborator

dentarg commented Jan 25, 2020

I don't know, but feel free to read the source code. :-)

@dentarg
Copy link
Collaborator

dentarg commented Jan 25, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants