URI.unescape "extension" fails with Unicode input#32183
URI.unescape "extension" fails with Unicode input#32183eileencodes merged 1 commit intorails:masterfrom kivikakk:uri-ext-fix
Conversation
I can't find that in 3-2-stable 😕 |
Ugh, don't tell me this is in one of our own patches to activesupport … investigating. |
|
Yeah, I agree it sounds like a sensible change.. that's why I tried to blame-chase why it'd gone away 😅 |
|
Haha, right? I'll let the latest commit fail CI so we can red–green this properly, then push the fix commit. |
Previously, URI.enscape could handle Unicode input (without any actual
escaped characters), or input with escaped characters (but no actual
Unicode characters) - not both.
URI.unescape("\xe3\x83\x90") # => "バ"
URI.unescape("%E3%83%90") # => "バ"
URI.unescape("\xe3\x83\x90%E3%83%90") # =>
# Encoding::CompatibilityError
We need to let `gsub` handle this for us, and then force back to the
original encoding of the input. The result String will be mangled if
the percent-encoded characters don't conform to the encoding of the
String itself, but that goes without saying.
Signed-off-by: Ashe Connor <ashe@kivikakk.ee>
|
This is ready for review. There's two failing builds (one required), but it looks like it's a timeout (and I can't restart it). |
|
Someone reran the failing builds! Thanks! ❤️ |
|
Thanks @kivikakk ❤️ |
|
@tenderlove this is Rails 😉 |
|
Oh gosh you meant upstream to Ruby. 😳 |
|
I dug into it a bit — |
|
Patch opened upstream: https://bugs.ruby-lang.org/issues/14586 After discussion with @tenderlove, it appears the following is the case:
So the best course of action looks like:
Alternatively, we can just keep this PR in and hope Ruby might take the patch, but keeping the diff of core libraries between Ruby and Rails seems like something to aim for. How does this sound? |
Do we call this monkey-patched method internally? If so, should we maybe be using CGI.unescape instead? |
Nope — this is only a convenience patch that our users might be relying on. |
|
This has been patched in Ruby! 🎉 So, I think we can keep this PR in. |
|
Good catch, I'll open a PR shortly. 👍 |
|
→ #32210 |

Summary
This is a fix for a bug @tenderlove and I have been stepping through together.
URI.unescapein Rails 4.2 throws anEncoding::CompatibilityErrorif the (UTF-8 tagged) argument contains actual Unicode characters.This doesn't happen on 3.x; compare the monkey-patches:Turns out this was a patch made in our own application. Looks like we should be able to just pull this across.3.xpatched4.2
The issue is that
[$&[1, 2].hex].pack('C')returns an ASCII-8BIT tagged string, which we then fail to gsub intostr. This wasn't a problem in the3.xpatched variant where the string was tagged as ASCII-8BIT anyway.This PR opens by correcting the test;
parser.escape(str)returns an US-ASCII (!) tagged string, soparser.unescapesucceeds for similar reasons as why the3.xpatched variant succeeded. This corrects the test to resemble the actual use-case: passing UTF-8 tagged strings intoURI.unescape.