Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

URLs can't contain utf-8 characters since 4.1.2 (was working in 4.1.1) #16104

Closed
codev opened this Issue Jul 9, 2014 · 12 comments

Comments

Projects
None yet
3 participants

codev commented Jul 9, 2014

I have a route that allows people to be looked up by name in a url. Since upgrading to 4.1.2 from 4.1.1 calls to this route result in a BadRequest.

To reproduce the problem do this:

rails new test_rails app
cd test_rails_app
rails generate controller person biography

Edit routes.rb to have only:

get 'person/*name' => 'person#biography', :format => false

Run the server and go to http://localhost:3000/person/Šašinková
You'll get a BadRequest, roll the rails version back to 4.1.1 and it goes away.

I've chased the problem down to this commit 5460591 by @pixeltrix which changed ActionDispatch::Journey::Router::Utils::UriEncoder::unescape_uri

That method now does a force_encoding to force the output encoding to be the same as the uri coming in but I don't think that makes sense unless I'm missing something. The incoming encoding is US ASCII and is % encoded but the outgoing uri needs to be UTF-8 encoded. If I change it to force the encoding to UTF-8 or just remove the force encoding it works.

I have this patch I'm applying in config/initializers/patch_unicode_urls.rb

# Fix support for unicode characters in URLs
module ActionDispatch
  module Journey
    class Router
      class Utils
        class UriEncoder
          def unescape_uri(uri)
            # Don't force the encoding on the way out - MJS 2014-07-09
            uri.gsub(ESCAPED) { [$&[1, 2].hex].pack('C') }#.force_encoding(uri.encoding)
          end
        end
      end
    end
  end
end
Member

seuros commented Jul 9, 2014

I can't reproduce this error with 4.1.4

Owner

pixeltrix commented Jul 9, 2014

@codev can you supply some more information like ruby version, gems etc and a backtrace. You've blamed the change I've made but the method you're patching wasn't changed in that commit which was only to do with escaping urls and not unescaping so I'm confused to what would have caused such a breakage.

@seuros what version of ruby did you try to reproduce the error on?

Member

seuros commented Jul 9, 2014

2.0.0-p481 and 2.1.2, do you want me to try with 1.9.3-p547 ?

Owner

pixeltrix commented Jul 9, 2014

@seuros that'd be helpful, thanks

Owner

pixeltrix commented Jul 9, 2014

@seuros given that there's no changes to Action Pack between 4.1.2 and 4.1.4 I think there maybe something else at work here.

Member

seuros commented Jul 9, 2014

@pixeltrix 1.9.3 is know to be crazy with unicode.

Started GET "/person/%C5%A0a%C5%A1inkov%C3%A1" for 127.0.0.1 at 2014-07-09 11:48:26 +0000

ActionController::BadRequest (ActionController::BadRequest):
  actionpack (4.1.4) lib/action_dispatch/routing/route_set.rb:39:in `block in call'
  actionpack (4.1.4) lib/action_dispatch/routing/route_set.rb:35:in `each'
  actionpack (4.1.4) lib/action_dispatch/routing/route_set.rb:35:in `call'
  actionpack (4.1.4) lib/action_dispatch/journey/router.rb:71:in `block in call'
  actionpack (4.1.4) lib/action_dispatch/journey/router.rb:59:in `each'
  actionpack (4.1.4) lib/action_dispatch/journey/router.rb:59:in `call'
  actionpack (4.1.4) lib/action_dispatch/routing/route_set.rb:678:in `call'
  rack (1.5.2) lib/rack/etag.rb:23:in `call'
  rack (1.5.2) lib/rack/conditionalget.rb:25:in `call'
  rack (1.5.2) lib/rack/head.rb:11:in `call'
  actionpack (4.1.4) lib/action_dispatch/middleware/params_parser.rb:27:in `call'
  actionpack (4.1.4) lib/action_dispatch/middleware/flash.rb:254:in `call'
  rack (1.5.2) lib/rack/session/abstract/id.rb:225:in `context'
  rack (1.5.2) lib/rack/session/abstract/id.rb:220:in `call'
  actionpack (4.1.4) lib/action_dispatch/middleware/cookies.rb:560:in `call'
  activerecord (4.1.4) lib/active_record/query_cache.rb:36:in `call'
  activerecord (4.1.4) lib/active_record/connection_adapters/abstract/connection_pool.rb:621:in `call'
  activerecord (4.1.4) lib/active_record/migration.rb:380:in `call'
  actionpack (4.1.4) lib/action_dispatch/middleware/callbacks.rb:29:in `block in call'
  activesupport (4.1.4) lib/active_support/callbacks.rb:82:in `run_callbacks'
  actionpack (4.1.4) lib/action_dispatch/middleware/callbacks.rb:27:in `call'
  actionpack (4.1.4) lib/action_dispatch/middleware/reloader.rb:73:in `call'
  actionpack (4.1.4) lib/action_dispatch/middleware/remote_ip.rb:76:in `call'
  actionpack (4.1.4) lib/action_dispatch/middleware/debug_exceptions.rb:17:in `call'
  actionpack (4.1.4) lib/action_dispatch/middleware/show_exceptions.rb:30:in `call'
  railties (4.1.4) lib/rails/rack/logger.rb:38:in `call_app'
  railties (4.1.4) lib/rails/rack/logger.rb:20:in `block in call'
  activesupport (4.1.4) lib/active_support/tagged_logging.rb:68:in `block in tagged'
  activesupport (4.1.4) lib/active_support/tagged_logging.rb:26:in `tagged'
  activesupport (4.1.4) lib/active_support/tagged_logging.rb:68:in `tagged'
  railties (4.1.4) lib/rails/rack/logger.rb:20:in `call'
  actionpack (4.1.4) lib/action_dispatch/middleware/request_id.rb:21:in `call'
  rack (1.5.2) lib/rack/methodoverride.rb:21:in `call'
  rack (1.5.2) lib/rack/runtime.rb:17:in `call'
  activesupport (4.1.4) lib/active_support/cache/strategy/local_cache_middleware.rb:26:in `call'
  rack (1.5.2) lib/rack/lock.rb:17:in `call'
  actionpack (4.1.4) lib/action_dispatch/middleware/static.rb:64:in `call'
  rack (1.5.2) lib/rack/sendfile.rb:112:in `call'
  railties (4.1.4) lib/rails/engine.rb:514:in `call'
  railties (4.1.4) lib/rails/application.rb:144:in `call'
  rack (1.5.2) lib/rack/lock.rb:17:in `call'
  rack (1.5.2) lib/rack/content_length.rb:14:in `call'
  rack (1.5.2) lib/rack/handler/webrick.rb:60:in `service'
  /home/abdelkader/.rbenv/versions/1.9.3-p547/lib/ruby/1.9.1/webrick/httpserver.rb:138:in `service'
  /home/abdelkader/.rbenv/versions/1.9.3-p547/lib/ruby/1.9.1/webrick/httpserver.rb:94:in `run'
  /home/abdelkader/.rbenv/versions/1.9.3-p547/lib/ruby/1.9.1/webrick/server.rb:191:in `block in start_thread'


  Rendered /home/abdelkader/.rbenv/versions/1.9.3-p547/lib/ruby/gems/1.9.1/gems/actionpack-4.1.4/lib/action_dispatch/middleware/templates/rescues/_source.erb (0.7ms)
  Rendered /home/abdelkader/.rbenv/versions/1.9.3-p547/lib/ruby/gems/1.9.1/gems/actionpack-4.1.4/lib/action_dispatch/middleware/templates/rescues/_trace.html.erb (1.3ms)
  Rendered /home/abdelkader/.rbenv/versions/1.9.3-p547/lib/ruby/gems/1.9.1/gems/actionpack-4.1.4/lib/action_dispatch/middleware/templates/rescues/_request_and_response.html.erb (11.4ms)
  Rendered /home/abdelkader/.rbenv/versions/1.9.3-p547/lib/ruby/gems/1.9.1/gems/actionpack-4.1.4/lib/action_dispatch/middleware/templates/rescues/diagnostics.erb within rescues/layout (32.2ms)
Owner

pixeltrix commented Jul 9, 2014

I think this is a more likely source of the error: a617925

Trying to remember why I used force_encoding - I'll give it some thought.

codev commented Jul 9, 2014

@pixeltrix yes, sorry, pasted the wrong commit, I meant a617925 - also I meant 4.1.1 -> 4.1.4 - the change, and error, occurs going from 4.1.1 to 4.1.2, thanks for looking into it

@seuros ruby 1.9.3p545 on latest MacOSX in development and CentOS in production, your stack trace matches mine

@codev codev changed the title from URLs can't contain utf-8 characters since 4.1.2 to URLs can't contain utf-8 characters since 4.1.2 (was working in 4.1.1) Jul 9, 2014

Owner

pixeltrix commented Jul 9, 2014

Hmm, in 4.1.1 it called URI.parser.unescape so I basically ripped the implementation from there:

def unescape(str, escaped = @regexp[:ESCAPED])
  str.gsub(escaped) { [$&[1, 2].hex].pack('C') }.force_encoding(str.encoding)
end

However the implementation in Ruby 2.1.2 is exactly the same:

def unescape(str, escaped = @regexp[:ESCAPED])
  str.gsub(escaped) { [$&[1, 2].hex].pack('C') }.force_encoding(str.encoding)
end

The only difference is that in 2.0 the default encoding changed from ASCII-8BIT to UTF-8, so I assume that's why we see the error only in 1.9.3 but that doesn't explain why the URI.parser.unescape method doesn't blow up in 4.1.1 because the implementation is the same.

I'll investigate some more.

codev commented Jul 9, 2014

It's because URI::Parser::unescpae is monkey patched in active support to force the encoding to utf-8 if it isn't by default.

It's in https://github.com/rails/rails/blob/master/activesupport/lib/active_support/core_ext/uri.rb

Owner

pixeltrix commented Jul 9, 2014

Bloody monkey patches! 😄

Thanks for spotting this - I'll come up with a fix.

@karlentwistle karlentwistle pushed a commit to karlentwistle/rails that referenced this issue Jul 10, 2014

Karl Entwistle Force encoding of US-ASCII to UTF-8 in unescape_uri.
Because URI paths may contain non US-ASCII characters we need to force
the encoding of any unescaped URIs to UTF-8 if they are US-ASCII.
This essentially replicates the functionality of the monkey patch to
URI.parser.unescape in active_support/core_ext/uri.rb.

Fixes #16104.
beb9496

@karlentwistle karlentwistle pushed a commit to karlentwistle/rails that referenced this issue Jul 10, 2014

Karl Entwistle Force encoding of US-ASCII to UTF-8 in unescape_uri.
Because URI paths may contain non US-ASCII characters we need to force
the encoding of any unescaped URIs to UTF-8 if they are US-ASCII.
This essentially replicates the functionality of the monkey patch to
URI.parser.unescape in active_support/core_ext/uri.rb.

Fixes #16104.
8a29713

@pixeltrix pixeltrix closed this in #16123 Jul 10, 2014

codev commented Jul 10, 2014

Great thanks very much for fixing it so quickly!

@pixeltrix pixeltrix added a commit that referenced this issue Jul 10, 2014

@pixeltrix Karl Entwistle + pixeltrix Force encoding of US-ASCII to UTF-8 in unescape_uri.
Because URI paths may contain non US-ASCII characters we need to force
the encoding of any unescaped URIs to UTF-8 if they are US-ASCII.
This essentially replicates the functionality of the monkey patch to
URI.parser.unescape in active_support/core_ext/uri.rb.

Fixes #16104.

(cherry picked from commit 8a29713)
4c809df
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment