Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with non-ascii URIs (IRIs) in redirects #196

Closed
jgr opened this issue Mar 30, 2015 · 4 comments · Fixed by #197
Closed

Dealing with non-ascii URIs (IRIs) in redirects #196

jgr opened this issue Mar 30, 2015 · 4 comments · Fixed by #197
Assignees
Milestone

Comments

@jgr
Copy link

jgr commented Mar 30, 2015

I'm running into problems when getting a URI that happens to redirect to a URI with non-ascii characters in it (an IRI from what I understand). A simple, repeatable case (in this case using http.rb 0.7.3 and Ruby 2.2.0) is:

require 'http'

REDIRECTS_TO_UMLAUT_URL = 'http://www.amazon.co.uk/dp/B00C6Q3SEQ'
http_client = HTTP.follow(true)
http_client.get(REDIRECTS_TO_UMLAUT_URL)

Which results in this error:

/Users/jamesrucker/.rbenv/versions/2.2.0/lib/ruby/2.2.0/uri/generic.rb:1100:in `rescue in merge': URI must be ascii only "http://www.amazon.co.uk/Bianka-Minte-K\xC3\xB6nig-Emilia-Sch\xC3\xBCle/e/B001K1JKKY/digital/ref=ntt_mp3_rdr/276-0649186-0513969?_encoding=UTF8&sn=d" (URI::InvalidURIError)
    from /Users/jamesrucker/.rbenv/versions/2.2.0/lib/ruby/2.2.0/uri/generic.rb:1097:in `merge'
    from /Users/jamesrucker/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/http-0.7.3/lib/http/request.rb:87:in `redirect'
    from /Users/jamesrucker/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/http-0.7.3/lib/http/redirector.rb:47:in `follow'
    from /Users/jamesrucker/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/http-0.7.3/lib/http/redirector.rb:22:in `perform'
    from /Users/jamesrucker/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/http-0.7.3/lib/http/client.rb:35:in `request'
    from /Users/jamesrucker/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/http-0.7.3/lib/http/chainable.rb:16:in `get'
    from ./fetch_umlaut_url.rb:16:in `<main>'

From doing a bit of googling, it sounds like two possible solutions are to call URI.encode before URI.parse (which isn't foolproof...it will encode hash fragments for instance) or to use the addressable gem, neither of which seemed like great options for a PR. Is there a more reasonable solution?

@tarcieri
Copy link
Member

Kinda sad this just doesn't work with stdlib URI.

I'm not opposed to using a gem if it has better handling of non-ASCII URIs.

@ixti
Copy link
Member

ixti commented Mar 30, 2015

@tarcieri we can actually support Addressable when it's available. I mean do not depend on it directly, but provide different URI parser backends...

@ixti
Copy link
Member

ixti commented Mar 30, 2015

Some kind of Multi-URI :D

@ixti ixti added this to the v0.9 milestone Mar 30, 2015
@ixti ixti self-assigned this Mar 30, 2015
@ixti
Copy link
Member

ixti commented Mar 30, 2015

Oh, just realized (after reading mentioned SO thread), there's actually a solution:

def normalize_uri(uri)
  return uri if uri.is_a? URI

  uri = uri.to_s
  uri, *tail = uri.rpartition "#" if uri["#"]

  URI(URI.encode(uri) << Array(tail).join)
end

ixti added a commit that referenced this issue Mar 31, 2015
@ixti ixti modified the milestones: v0.8, v0.9 Mar 31, 2015
ixti added a commit that referenced this issue Apr 1, 2015
@ixti ixti closed this as completed in #197 Apr 1, 2015
zanker pushed a commit to zanker/http.rb that referenced this issue May 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants