New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASCII comes in two flavors that need to be UTF-8ified. #30119
Conversation
Thanks for the pull request, and welcome! The Rails team is excited to review your changes, and you should hear from @schneems (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. This repository is being automatically checked for code quality issues using Code Climate. You can see results for this analysis in the PR status below. Newly introduced issues should be fixed before a Pull Request is considered ready to review. Please see the contribution instructions for more information. |
Forgot to include a link to the page at Discourse describing this: https://meta.discourse.org/t/automatic-encoding-of-parsed-url-params/67232 |
Hi @begrif, Thank you for the patch and congratulations on sending your first pull request to Rails ! 🎉 Looking at the Discourse tickets, it looks like 9220935 could fix this issue. Did you try to replicate this issue from a fresh check-out of the Rails repository ? The aforementioned commit is pretty recent and not part of any release. |
@robin850 No, I have not tried a non-released Rails. The description of the problem and the fix look like that patch has a similar goal. |
Could you give the steps to reproduce the error with a Discourse application please ? I installed it and tried to create a tag with the |
Sure. There are a few steps.
In the failing case, the page never loads due to excessive redirects. In the working case the page loads. |
@begrif : I managed to reproduce the issue as you described and managed to run Discourse on current master (just had to remove Also I was wrong in my first comment ; the aforementioned commit has been back-ported on the We can certainly close this issue but if I'm wrong, feel free to comment and we can reopen. Thank you very much ! |
Summary
Discourse found an issue where URL parameters were entering the code as ASCII-8BIT instead of UTF-8. Examining the source tree for URL unescape code, I found that almost everywhere does not specify an encoding, relying on the UTF-8 default in Ruby. Almost everywhere.
In ActionDispatch there is a
unescape_uri
that attempts to preserve the encoding if it is not US-ASCII. Changing that to UTF-8-ify ASCII-8BIT fixes the issue seen in Discourse.Other Information
It is noteworthy that the ASCII-8BIT encoding is used throughout as the encoding of choice before %-escaping a URL string. This encoding mirrors the widespread usage of UTF-8 in URLs getting encoded as a stream of octets instead of codepoints. Clearly ASCII-8BIT is intended to be convertible back to UTF-8.