Permalink
Browse files

Add a decoder that supports ECMA unicode uris

  • Loading branch information...
1 parent 1824547 commit decaa23a175d2fc65b4bc103e7fff0027e3eb21c @raggi raggi committed Nov 3, 2012
Showing with 18 additions and 2 deletions.
  1. +13 −2 lib/rack/utils.rb
  2. +5 −0 test/spec_utils.rb
View
@@ -39,15 +39,26 @@ def escape_path(s)
# target encoding of the string returned, and it defaults to UTF-8
if defined?(::Encoding)
def unescape(s, encoding = Encoding::UTF_8)
- URI.decode_www_form_component(s, encoding)
+ URI.decode_www_form_component(unescape_unicode(s), encoding)
end
else
def unescape(s, encoding = nil)
- URI.decode_www_form_component(s, encoding)
+ URI.decode_www_form_component(unescape_unicode(s), encoding)
end
end
module_function :unescape
+ # See:
+ # http://en.wikipedia.org/wiki/Percent-encoding#Non-standard_implementations
+ # Issue 337
+ # Issue 360
+ def unescape_unicode(s)
+ s.gsub(/((?:%u[0-9a-fA-F]{4})+)/n){
+ [$1.delete('%u')].pack('H*').unpack("n*").pack("U*")
+ }
+ end
+ module_function :unescape_unicode
+
DEFAULT_SEP = /[&;] */n
class << self
View
@@ -101,6 +101,11 @@ def kcodeu
should.equal "q1!2\"'w$5&7/z8)?\\"
end
+ should "unescape non-standard unicode uri escaping (e.g. ECMA-262)" do
+ Rack::Utils.unescape_unicode("%u3042").should.equal ""
+ Rack::Utils.unescape("%u3042").should.equal ""
+ end
+
should "parse query strings correctly" do
Rack::Utils.parse_query("foo=bar").
should.equal "foo" => "bar"

2 comments on commit decaa23

Contributor

gioele replied Nov 4, 2012

Please note that allowing non-standard escaping sequences, you are opening a quite big security hole. Right now firewalls check for common unsafe sequences (e.g., ../, .., ;/) encoded or partially encoded in URLs. With this change you are allowing a new set of such exploitable unsafe sequences.

Owner

chneukirchen replied Nov 4, 2012

-1 from me, since this is undocumented in RFC and has actually been rejected by the W3C.

I agree that %-decoding should not crash in these cases.
We probably should not decode cookies.

Could we decode wrong %-encoded strings as is and push this decoding to the application?

Please sign in to comment.