Permalink
Browse files

Fix several known web encoding issues:

* Specify accept-charset on all forms. All recent browsers,
  as well as IE5+, will use the encoding specified for form
  parameters
* Unfortunately, IE5+ will not look at accept-charset unless
  at least one character in the form's values is not in the
  page's charset. Since the user can override the default
  charset (which Rails sets to UTF-8), we provide a hidden
  input containing a unicode character, forcing IE to look
  at the accept-charset.
* Now that the vast majority of web input is UTF-8, we set
  the inbound parameters to UTF-8. This will eliminate many
  cases of incompatible encodings between ASCII-8BIT and
  UTF-8.
* You can safely ignore params[:_snowman_]

TODO:

* Validate inbound text to confirm it is UTF-8
* Combine the whole_form implementations in form_helper_test
  and form_tag_helper_test
  • Loading branch information...
1 parent 06b0d6e commit 25215d7285db10e2c04d903f251b791342e4dd6a @wycats wycats committed Jun 28, 2010
@@ -6,7 +6,11 @@ module Http
module Parameters
# Returns both GET and POST \parameters in a single hash.
def parameters
- @env["action_dispatch.request.parameters"] ||= request_parameters.merge(query_parameters).update(path_parameters).with_indifferent_access
+ @env["action_dispatch.request.parameters"] ||= begin
+ params = request_parameters.merge(query_parameters)
+ params.merge!(path_parameters)
+ encode_params(params).with_indifferent_access
+ end
end
alias :params :parameters
@@ -32,6 +36,31 @@ def path_parameters
end
private
+
+ # TODO: Validate that the characters are UTF-8. If they aren't,
+ # you'll get a weird error down the road, but our form handling
+ # should really prevent that from happening
+ def encode_params(params)
+ return params unless "ruby".encoding_aware?
+
+ if params.is_a?(String)
+ return params.force_encoding("UTF-8").encode!
+ elsif !params.is_a?(Hash)
+ return params
+ end
+
+ params.each do |k, v|
+ case v
+ when Hash
+ encode_params(v)
+ when Array
+ v.map! {|el| encode_params(el) }
+ else
+ encode_params(v)
+ end
+ end
+ end
+
# Convert nested Hash to HashWithIndifferentAccess
def normalize_parameters(value)
case value
@@ -530,22 +530,31 @@ def html_options_for_form(url_for_options, options, *parameters_for_url)
returning options.stringify_keys do |html_options|
html_options["enctype"] = "multipart/form-data" if html_options.delete("multipart")
html_options["action"] = url_for(url_for_options, *parameters_for_url)
+ html_options["accept-encoding"] = "UTF-8"
html_options["data-remote"] = true if html_options.delete("remote")
end
end
def extra_tags_for_form(html_options)
- case method = html_options.delete("method").to_s
+ snowman_tag = tag(:input, :type => "hidden",
+ :name => "_snowman_", :value => "☃")
@peanut
peanut Aug 25, 2010

Oh my god! He is here!!!

@Spaceghost
Spaceghost Sep 25, 2012

Oh snowman, how I miss you. 🤘

+
+ method = html_options.delete("method").to_s
+
+ method_tag = case method
when /^get$/i # must be case-insensitive, but can't use downcase as might be nil
html_options["method"] = "get"
''
when /^post$/i, "", nil
html_options["method"] = "post"
- protect_against_forgery? ? content_tag(:div, token_tag, :style => 'margin:0;padding:0;display:inline') : ''
+ token_tag
else
html_options["method"] = "post"
- content_tag(:div, tag(:input, :type => "hidden", :name => "_method", :value => method) + token_tag, :style => 'margin:0;padding:0;display:inline')
+ tag(:input, :type => "hidden", :name => "_method", :value => method) + token_tag
end
+
+ tags = snowman_tag << method_tag
+ content_tag(:div, tags, :style => 'margin:0;padding:0;display:inline')
end
def form_tag_html(html_options)
@@ -141,6 +141,29 @@ def assert_parses(expected, actual)
post "/parse", actual
assert_response :ok
assert_equal(expected, TestController.last_request_parameters)
+ assert_utf8(TestController.last_request_parameters)
+ end
+ end
+
+ def assert_utf8(object)
+ return unless "ruby".encoding_aware?
+
+ correct_encoding = Encoding.default_internal
+
+ unless object.is_a?(Hash)
+ assert_equal correct_encoding, object.encoding, "#{object.inspect} should have been UTF-8"
+ return
+ end
+
+ object.each do |k,v|
+ case v
+ when Hash
+ assert_utf8(v)
+ when Array
+ v.each {|el| assert_utf8(el) }
+ else
+ assert_utf8(v)
+ end
end
end
end
@@ -5,7 +5,7 @@ module ERBTest
class TagHelperTest < BlockTestCase
test "form_for works" do
output = render_content "form_for(:staticpage, :url => {:controller => 'blah', :action => 'update'})", ""
- assert_equal "<form action=\"/blah/update\" method=\"post\"></form>", output
+ assert_match %r{<form.*action="/blah/update".*method="post">.*</form>}, output
end
end
end
@@ -28,8 +28,8 @@ def maybe_deprecated
end
test "percent equals works with form tags" do
- expected_output = "<form action=\"foo\" method=\"post\">hello</form>"
- maybe_deprecated { assert_equal expected_output, render_content("form_tag('foo')", "<%= 'hello' %>") }
+ expected_output = %r{<form.*action="foo".*method="post">.*hello*</form>}
+ maybe_deprecated { assert_match expected_output, render_content("form_tag('foo')", "<%= 'hello' %>") }
end
test "percent equals works with fieldset tags" do
Oops, something went wrong.

32 comments on commit 25215d7

@grimen
grimen commented on 25215d7 Jun 28, 2010

Hmm...what's the motivation behind supporting IE5? Curious as I haven't seen IE5 showing up in any browser stats in years.

@josevalim
Member

IE5+ (emphasis in the +) ;)

@iHiD
iHiD commented on 25215d7 Jun 28, 2010

Can I suggest documenting the snowman tag? I just saw this in a HTTP trace and panicked somewhat in case my server had been compromised. I just think sending a new variable with every form request is something that people should know about.

Thanks,
iHiD

@grimen
grimen commented on 25215d7 Jun 28, 2010

Aha. :)

@maccman

Woah - was pretty surprised when I saw snowman in my log today.

@Aupajo
Aupajo commented on 25215d7 Jul 27, 2010

@paulca
paulca commented on 25215d7 Jul 27, 2010

It would actually help to do a (high profile) blog post about this ... I went searching for it when I started seeing snowmen and it was quite hard to track down.

It is the kind of thing that would freak people out, and, dare I say it, make people think of Rails less seriously. I love it, but on the surface, it feels like an immature Easter egg, rather than a cute hack for forcing unicode.

@mrichman

Why haven't we heard about this hackery before?

@wycats
Member
wycats commented on 25215d7 Jul 27, 2010

This bug exists in IE5, IE6, IE7, and IE8. If the user switches the browser's encoding to Latin-1 (to understand why a user would decide to do something seemingly so crazy, check out this google search: http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=diamond+with+a+question+mark+in+it), any form submission will be sent in Latin-1.

This means that if a user searches for "Ché Guevara", it will come through incorrectly on the server-side. In Ruby 1.9, this will result in an encoding error when the text inevitably makes its way into the regular expression engine. In Ruby 1.8, it will result in broken results for the user.

By creating a parameter that can only be understood by IE as a unicode character, we are forcing IE to look at the accept-charset attribute, which then tells it to encode all of the characters as UTF-8, even ones that can be encoded in Latin-1.

Keep in mind that in Ruby 1.8, it is extremely trivial to get Latin-1 data into your UTF-8 database (since NOTHING in the entire stack checks that the bytes that the user sent at any point are valid UTF-8 characters). As a result, it's extremely common for Ruby applications (and PHP applications, etc. etc.) to exhibit this user-facing bug, and therefore extremely common for users to try to change the encoding as a palliative measure.

All that said, when I wrote this patch, I didn't realize that the name of the parameter would ever appear in a user-facing place (it does with forms that use the GET action, such as search forms). Since it does, we will rename this parameter to _e, and use a more innocuous-looking unicode character.

@mrichman

Oh come on, why not just name it to _ie ;)

@maccman

Would IE user agent sniffing be a bad idea?

@foca
foca commented on 25215d7 Jul 27, 2010

I mentioned this in twitter, but why not just set up a middleware that does params.delete(:_snowman_) when it gets to rack? That way end-users will never see this. That, and documenting why you get it on your logs (or maybe removing it even before it hits the logs…) should be enough to keep everyone happy.

@ravinggenius

Personally I love the silliness of _snowman. _ie is a wonderful choice as well, but _e is strikes me as enterprise-y and boring.

@fxn
Member
fxn commented on 25215d7 Jul 28, 2010

@foca renaming is considered because if you send a form with GET, eg a search form, the query string has the snowman. We do not want end-users to have such a prominent strange parameter right there. The fact that it appears in the params hash is not an issue, as "_method" does.

It could be the case that _e is actually called the snowman parameter though :).

@aaronchi

I vote for changing snowman to lollipop because it will be more silly and even less corporate-y. That'll teach those suits!

Gotta love open source ;)

@norman
norman commented on 25215d7 Jul 28, 2010

I think "Frosty" should be the new official Rails mascot.

@Aupajo
Aupajo commented on 25215d7 Jul 28, 2010

@paulca Completely agree. Here's one option: I've set up a simple page with information about the Rails snowman people can easily search for. The repo is at http://github.com/Aupajo/rails-snowman-info and the website will be at http://railssnowman.info, once the DNS updates and the CNAME kicks in.

@lukaszx0
Member

@Aupajo great idea. But it's no longer snowman but _snowman - checkout this commit: http://github.com/rails/rails/commit/caab17611668ff18a3c8642b2d45b353be5d9691 (with no underscore on the end). I've send you patch via email for it. If you didn't get it, here it is: http://gist.github.com/493743

@iHiD
iHiD commented on 25215d7 Jul 28, 2010

Would it not make sense just to add a config option to change the name of the parameter. There is always a small risk that whatever is chosen will conflict with an existing app, may be unacceptable in that particular organisation etc. By keeping a default that is well documented (nice work @Aupajo), new users will understand what is going on, but still have control to change it if necessary.

In terms of logs, I think that it should be filtered by default for new apps (in config.filter_parameters), as per :password.

@Aupajo
Aupajo commented on 25215d7 Jul 28, 2010

@strzalek Thanks! I've applied your patch.

@ihid I think a company is much more likely to be tripped up by _method than _snowman. If one is acceptable, so should the other be.

@dmathieu

@ihid : you shouldn't be using parameters starting with a _ in your forms anyway.
Moreover one of the principles of rails is : convention over configuration. It's just being applied here.

For the filter_parameters, that's intended for security reasons, to avoid having password in clear in the log file.
It doesn't seems applicable here as it's not a matter of security to have the _snowman parameter in the log. You can, however, always add it in your applications if you wish to.

@iHiD
iHiD commented on 25215d7 Jul 28, 2010

@Aupajo - I agree that _method is probably more dangerous. However, saying that, "method" is an HTTP word and its value is going to be sensible (get/post/put etc). Seeing a ☃ in your URL could be seen as somewhat unprofessional. I like _snowman as a default, but I think it needs a simple config option just in case. Nice work with the explanation page.

@cannikin

I vote to keep _snowman as a neat little Rails easter egg.

@mrichman

@cannikin I would not refer to this as an Easter Egg, as those are typically deliberately obfuscated artifacts. The _snowman thing is quite obvious. What I'd really like to find out is why this bug in IE has been hanging around for so long. As a recovering Microsoft addict, this topic is especially interesting to me.

@knzconnor

name it _unicode_shim but keep the choice of characters? Then it's somewhat self-explanatory, and which character it is matters less, so it might as well be an amusing one, like the snowman

@nathany

For method=get, is the _snowman needed for IE at all? If not, then I think snowman is fine. Otherwise, changing to a less conspicuous looking Unicode character and parameter name sounds good.

@knzconnor

Now that I think about it, maybe using slang for a drug-dealer is a less than ideal for a covert parameter/character.

@gavinhughes

-1 on _snowman. I propose _force_ie_unicode_support. That describes exactly what the parameter is for. Self-documenting is definitely better than cute.

@Aupajo
Aupajo commented on 25215d7 Aug 17, 2010

@gavinhughes This is now _utf8 (rails/rails@17a6dfb).

@joseph
joseph commented on 25215d7 Dec 1, 2010

Perhaps this should only apply to POST/PUT forms? It makes rather a mess of GET request URLs (eg, search results). Since GETs are theoretically idempotent, involving no important db writes, perhaps the problem here doesn't apply so much to them. Just a thought.

@Ghostavio

just out of curiosity, is this still used today?

Please sign in to comment.