Commit
* Specify accept-charset on all forms. All recent browsers, as well as IE5+, will use the encoding specified for form parameters * Unfortunately, IE5+ will not look at accept-charset unless at least one character in the form's values is not in the page's charset. Since the user can override the default charset (which Rails sets to UTF-8), we provide a hidden input containing a unicode character, forcing IE to look at the accept-charset. * Now that the vast majority of web input is UTF-8, we set the inbound parameters to UTF-8. This will eliminate many cases of incompatible encodings between ASCII-8BIT and UTF-8. * You can safely ignore params[:_snowman_] TODO: * Validate inbound text to confirm it is UTF-8 * Combine the whole_form implementations in form_helper_test and form_tag_helper_test
- Loading branch information
There are no files selected for viewing
32 comments
on commit 25215d7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm...what's the motivation behind supporting IE5? Curious as I haven't seen IE5 showing up in any browser stats in years.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IE5+ (emphasis in the +) ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I suggest documenting the snowman tag? I just saw this in a HTTP trace and panicked somewhat in case my server had been compromised. I just think sending a new variable with every form request is something that people should know about.
Thanks,
iHiD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woah - was pretty surprised when I saw snowman in my log today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
☃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would actually help to do a (high profile) blog post about this ... I went searching for it when I started seeing snowmen and it was quite hard to track down.
It is the kind of thing that would freak people out, and, dare I say it, make people think of Rails less seriously. I love it, but on the surface, it feels like an immature Easter egg, rather than a cute hack for forcing unicode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why haven't we heard about this hackery before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bug exists in IE5, IE6, IE7, and IE8. If the user switches the browser's encoding to Latin-1 (to understand why a user would decide to do something seemingly so crazy, check out this google search: http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=diamond+with+a+question+mark+in+it), any form submission will be sent in Latin-1.
This means that if a user searches for "Ché Guevara", it will come through incorrectly on the server-side. In Ruby 1.9, this will result in an encoding error when the text inevitably makes its way into the regular expression engine. In Ruby 1.8, it will result in broken results for the user.
By creating a parameter that can only be understood by IE as a unicode character, we are forcing IE to look at the accept-charset attribute, which then tells it to encode all of the characters as UTF-8, even ones that can be encoded in Latin-1.
Keep in mind that in Ruby 1.8, it is extremely trivial to get Latin-1 data into your UTF-8 database (since NOTHING in the entire stack checks that the bytes that the user sent at any point are valid UTF-8 characters). As a result, it's extremely common for Ruby applications (and PHP applications, etc. etc.) to exhibit this user-facing bug, and therefore extremely common for users to try to change the encoding as a palliative measure.
All that said, when I wrote this patch, I didn't realize that the name of the parameter would ever appear in a user-facing place (it does with forms that use the GET action, such as search forms). Since it does, we will rename this parameter to _e, and use a more innocuous-looking unicode character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh come on, why not just name it to _ie ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would IE user agent sniffing be a bad idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this in twitter, but why not just set up a middleware that does params.delete(:_snowman_)
when it gets to rack? That way end-users will never see this. That, and documenting why you get it on your logs (or maybe removing it even before it hits the logs…) should be enough to keep everyone happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I love the silliness of _snowman. _ie is a wonderful choice as well, but _e is strikes me as enterprise-y and boring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@foca renaming is considered because if you send a form with GET, eg a search form, the query string has the snowman. We do not want end-users to have such a prominent strange parameter right there. The fact that it appears in the params hash is not an issue, as "_method" does.
It could be the case that _e is actually called the snowman parameter though :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote for changing snowman to lollipop because it will be more silly and even less corporate-y. That'll teach those suits!
Gotta love open source ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "Frosty" should be the new official Rails mascot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paulca Completely agree. Here's one option: I've set up a simple page with information about the Rails snowman people can easily search for. The repo is at http://github.com/Aupajo/rails-snowman-info and the website will be at http://railssnowman.info, once the DNS updates and the CNAME kicks in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Aupajo great idea. But it's no longer snowman but _snowman - checkout this commit: http://github.com/rails/rails/commit/caab17611668ff18a3c8642b2d45b353be5d9691 (with no underscore on the end). I've send you patch via email for it. If you didn't get it, here it is: http://gist.github.com/493743
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it not make sense just to add a config option to change the name of the parameter. There is always a small risk that whatever is chosen will conflict with an existing app, may be unacceptable in that particular organisation etc. By keeping a default that is well documented (nice work @Aupajo), new users will understand what is going on, but still have control to change it if necessary.
In terms of logs, I think that it should be filtered by default for new apps (in config.filter_parameters), as per :password.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iHiD : you shouldn't be using parameters starting with a _ in your forms anyway.
Moreover one of the principles of rails is : convention over configuration. It's just being applied here.
For the filter_parameters, that's intended for security reasons, to avoid having password in clear in the log file.
It doesn't seems applicable here as it's not a matter of security to have the _snowman parameter in the log. You can, however, always add it in your applications if you wish to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Aupajo - I agree that _method
is probably more dangerous. However, saying that, "method" is an HTTP word and its value is going to be sensible (get/post/put etc). Seeing a ☃ in your URL could be seen as somewhat unprofessional. I like _snowman
as a default, but I think it needs a simple config option just in case. Nice work with the explanation page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote to keep _snowman as a neat little Rails easter egg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cannikin I would not refer to this as an Easter Egg, as those are typically deliberately obfuscated artifacts. The _snowman thing is quite obvious. What I'd really like to find out is why this bug in IE has been hanging around for so long. As a recovering Microsoft addict, this topic is especially interesting to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name it _unicode_shim but keep the choice of characters? Then it's somewhat self-explanatory, and which character it is matters less, so it might as well be an amusing one, like the snowman
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For method=get, is the _snowman needed for IE at all? If not, then I think snowman is fine. Otherwise, changing to a less conspicuous looking Unicode character and parameter name sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think about it, maybe using slang for a drug-dealer is a less than ideal for a covert parameter/character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 on _snowman. I propose _force_ie_unicode_support. That describes exactly what the parameter is for. Self-documenting is definitely better than cute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gavinhughes This is now _utf8
(17a6dfb).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this should only apply to POST/PUT forms? It makes rather a mess of GET request URLs (eg, search results). Since GETs are theoretically idempotent, involving no important db writes, perhaps the problem here doesn't apply so much to them. Just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just out of curiosity, is this still used today?
Oh my god! He is here!!!