Permalink
Browse files

Fix several known web encoding issues:

* Specify accept-charset on all forms. All recent browsers,
  as well as IE5+, will use the encoding specified for form
  parameters
* Unfortunately, IE5+ will not look at accept-charset unless
  at least one character in the form's values is not in the
  page's charset. Since the user can override the default
  charset (which Rails sets to UTF-8), we provide a hidden
  input containing a unicode character, forcing IE to look
  at the accept-charset.
* Now that the vast majority of web input is UTF-8, we set
  the inbound parameters to UTF-8. This will eliminate many
  cases of incompatible encodings between ASCII-8BIT and
  UTF-8.
* You can safely ignore params[:_snowman_]

TODO:

* Validate inbound text to confirm it is UTF-8
* Combine the whole_form implementations in form_helper_test
  and form_tag_helper_test
  • Loading branch information...
wycats committed Jun 28, 2010
1 parent 06b0d6e commit 25215d7285db10e2c04d903f251b791342e4dd6a
@@ -6,7 +6,11 @@ module Http
module Parameters
# Returns both GET and POST \parameters in a single hash.
def parameters
@env["action_dispatch.request.parameters"] ||= request_parameters.merge(query_parameters).update(path_parameters).with_indifferent_access
@env["action_dispatch.request.parameters"] ||= begin
params = request_parameters.merge(query_parameters)
params.merge!(path_parameters)
encode_params(params).with_indifferent_access
end
end
alias :params :parameters
@@ -32,6 +36,31 @@ def path_parameters
end
private
# TODO: Validate that the characters are UTF-8. If they aren't,
# you'll get a weird error down the road, but our form handling
# should really prevent that from happening
def encode_params(params)
return params unless "ruby".encoding_aware?
if params.is_a?(String)
return params.force_encoding("UTF-8").encode!
elsif !params.is_a?(Hash)
return params
end
params.each do |k, v|
case v
when Hash
encode_params(v)
when Array
v.map! {|el| encode_params(el) }
else
encode_params(v)
end
end
end
# Convert nested Hash to HashWithIndifferentAccess
def normalize_parameters(value)
case value
@@ -530,22 +530,31 @@ def html_options_for_form(url_for_options, options, *parameters_for_url)
returning options.stringify_keys do |html_options|
html_options["enctype"] = "multipart/form-data" if html_options.delete("multipart")
html_options["action"] = url_for(url_for_options, *parameters_for_url)
html_options["accept-encoding"] = "UTF-8"
html_options["data-remote"] = true if html_options.delete("remote")
end
end
def extra_tags_for_form(html_options)
case method = html_options.delete("method").to_s
snowman_tag = tag(:input, :type => "hidden",
:name => "_snowman_", :value => "☃")

This comment has been minimized.

@defmthd

defmthd Aug 25, 2010

Oh my god! He is here!!!

This comment has been minimized.

@Spaceghost

Spaceghost Sep 25, 2012

Oh snowman, how I miss you. 🤘

method = html_options.delete("method").to_s
method_tag = case method
when /^get$/i # must be case-insensitive, but can't use downcase as might be nil
html_options["method"] = "get"
''
when /^post$/i, "", nil
html_options["method"] = "post"
protect_against_forgery? ? content_tag(:div, token_tag, :style => 'margin:0;padding:0;display:inline') : ''
token_tag
else
html_options["method"] = "post"
content_tag(:div, tag(:input, :type => "hidden", :name => "_method", :value => method) + token_tag, :style => 'margin:0;padding:0;display:inline')
tag(:input, :type => "hidden", :name => "_method", :value => method) + token_tag
end
tags = snowman_tag << method_tag
content_tag(:div, tags, :style => 'margin:0;padding:0;display:inline')
end
def form_tag_html(html_options)
@@ -141,6 +141,29 @@ def assert_parses(expected, actual)
post "/parse", actual
assert_response :ok
assert_equal(expected, TestController.last_request_parameters)
assert_utf8(TestController.last_request_parameters)
end
end
def assert_utf8(object)
return unless "ruby".encoding_aware?
correct_encoding = Encoding.default_internal
unless object.is_a?(Hash)
assert_equal correct_encoding, object.encoding, "#{object.inspect} should have been UTF-8"
return
end
object.each do |k,v|
case v
when Hash
assert_utf8(v)
when Array
v.each {|el| assert_utf8(el) }
else
assert_utf8(v)
end
end
end
end
@@ -5,7 +5,7 @@ module ERBTest
class TagHelperTest < BlockTestCase
test "form_for works" do
output = render_content "form_for(:staticpage, :url => {:controller => 'blah', :action => 'update'})", ""
assert_equal "<form action=\"/blah/update\" method=\"post\"></form>", output
assert_match %r{<form.*action="/blah/update".*method="post">.*</form>}, output
end
end
end
@@ -28,8 +28,8 @@ def maybe_deprecated
end
test "percent equals works with form tags" do
expected_output = "<form action=\"foo\" method=\"post\">hello</form>"
maybe_deprecated { assert_equal expected_output, render_content("form_tag('foo')", "<%= 'hello' %>") }
expected_output = %r{<form.*action="foo".*method="post">.*hello*</form>}
maybe_deprecated { assert_match expected_output, render_content("form_tag('foo')", "<%= 'hello' %>") }
end
test "percent equals works with fieldset tags" do
Oops, something went wrong.

32 comments on commit 25215d7

@grimen

This comment has been minimized.

grimen replied Jun 28, 2010

Hmm...what's the motivation behind supporting IE5? Curious as I haven't seen IE5 showing up in any browser stats in years.

@josevalim

This comment has been minimized.

Contributor

josevalim replied Jun 28, 2010

IE5+ (emphasis in the +) ;)

@iHiD

This comment has been minimized.

Contributor

iHiD replied Jun 28, 2010

Can I suggest documenting the snowman tag? I just saw this in a HTTP trace and panicked somewhat in case my server had been compromised. I just think sending a new variable with every form request is something that people should know about.

Thanks,
iHiD

@grimen

This comment has been minimized.

grimen replied Jun 28, 2010

Aha. :)

@maccman

This comment has been minimized.

Contributor

maccman replied Jun 28, 2010

Woah - was pretty surprised when I saw snowman in my log today.

@aaronchi

This comment has been minimized.

Contributor

aaronchi replied Jun 30, 2010

@Aupajo

This comment has been minimized.

Aupajo replied Jul 27, 2010

@paulca

This comment has been minimized.

Contributor

paulca replied Jul 27, 2010

It would actually help to do a (high profile) blog post about this ... I went searching for it when I started seeing snowmen and it was quite hard to track down.

It is the kind of thing that would freak people out, and, dare I say it, make people think of Rails less seriously. I love it, but on the surface, it feels like an immature Easter egg, rather than a cute hack for forcing unicode.

@mrichman

This comment has been minimized.

mrichman replied Jul 27, 2010

Why haven't we heard about this hackery before?

@wycats

This comment has been minimized.

Member

wycats replied Jul 27, 2010

This bug exists in IE5, IE6, IE7, and IE8. If the user switches the browser's encoding to Latin-1 (to understand why a user would decide to do something seemingly so crazy, check out this google search: http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=diamond+with+a+question+mark+in+it), any form submission will be sent in Latin-1.

This means that if a user searches for "Ché Guevara", it will come through incorrectly on the server-side. In Ruby 1.9, this will result in an encoding error when the text inevitably makes its way into the regular expression engine. In Ruby 1.8, it will result in broken results for the user.

By creating a parameter that can only be understood by IE as a unicode character, we are forcing IE to look at the accept-charset attribute, which then tells it to encode all of the characters as UTF-8, even ones that can be encoded in Latin-1.

Keep in mind that in Ruby 1.8, it is extremely trivial to get Latin-1 data into your UTF-8 database (since NOTHING in the entire stack checks that the bytes that the user sent at any point are valid UTF-8 characters). As a result, it's extremely common for Ruby applications (and PHP applications, etc. etc.) to exhibit this user-facing bug, and therefore extremely common for users to try to change the encoding as a palliative measure.

All that said, when I wrote this patch, I didn't realize that the name of the parameter would ever appear in a user-facing place (it does with forms that use the GET action, such as search forms). Since it does, we will rename this parameter to _e, and use a more innocuous-looking unicode character.

@mrichman

This comment has been minimized.

mrichman replied Jul 27, 2010

Oh come on, why not just name it to _ie ;)

@maccman

This comment has been minimized.

Contributor

maccman replied Jul 27, 2010

Would IE user agent sniffing be a bad idea?

@foca

This comment has been minimized.

Contributor

foca replied Jul 27, 2010

I mentioned this in twitter, but why not just set up a middleware that does params.delete(:_snowman_) when it gets to rack? That way end-users will never see this. That, and documenting why you get it on your logs (or maybe removing it even before it hits the logs…) should be enough to keep everyone happy.

@ravinggenius

This comment has been minimized.

ravinggenius replied Jul 27, 2010

Personally I love the silliness of _snowman. _ie is a wonderful choice as well, but _e is strikes me as enterprise-y and boring.

@fxn

This comment has been minimized.

Member

fxn replied Jul 28, 2010

@foca renaming is considered because if you send a form with GET, eg a search form, the query string has the snowman. We do not want end-users to have such a prominent strange parameter right there. The fact that it appears in the params hash is not an issue, as "_method" does.

It could be the case that _e is actually called the snowman parameter though :).

@aaronchi

This comment has been minimized.

Contributor

aaronchi replied Jul 28, 2010

I vote for changing snowman to lollipop because it will be more silly and even less corporate-y. That'll teach those suits!

Gotta love open source ;)

@norman

This comment has been minimized.

Contributor

norman replied Jul 28, 2010

I think "Frosty" should be the new official Rails mascot.

@Aupajo

This comment has been minimized.

Aupajo replied Jul 28, 2010

@paulca Completely agree. Here's one option: I've set up a simple page with information about the Rails snowman people can easily search for. The repo is at http://github.com/Aupajo/rails-snowman-info and the website will be at http://railssnowman.info, once the DNS updates and the CNAME kicks in.

@lukaszx0

This comment has been minimized.

Member

lukaszx0 replied Jul 28, 2010

@Aupajo great idea. But it's no longer snowman but _snowman - checkout this commit: http://github.com/rails/rails/commit/caab17611668ff18a3c8642b2d45b353be5d9691 (with no underscore on the end). I've send you patch via email for it. If you didn't get it, here it is: http://gist.github.com/493743

@iHiD

This comment has been minimized.

Contributor

iHiD replied Jul 28, 2010

Would it not make sense just to add a config option to change the name of the parameter. There is always a small risk that whatever is chosen will conflict with an existing app, may be unacceptable in that particular organisation etc. By keeping a default that is well documented (nice work @Aupajo), new users will understand what is going on, but still have control to change it if necessary.

In terms of logs, I think that it should be filtered by default for new apps (in config.filter_parameters), as per :password.

@Aupajo

This comment has been minimized.

Aupajo replied Jul 28, 2010

@strzalek Thanks! I've applied your patch.

@iHiD I think a company is much more likely to be tripped up by _method than _snowman. If one is acceptable, so should the other be.

@dmathieu

This comment has been minimized.

Contributor

dmathieu replied Jul 28, 2010

@iHiD : you shouldn't be using parameters starting with a _ in your forms anyway.
Moreover one of the principles of rails is : convention over configuration. It's just being applied here.

For the filter_parameters, that's intended for security reasons, to avoid having password in clear in the log file.
It doesn't seems applicable here as it's not a matter of security to have the _snowman parameter in the log. You can, however, always add it in your applications if you wish to.

@iHiD

This comment has been minimized.

Contributor

iHiD replied Jul 28, 2010

@Aupajo - I agree that _method is probably more dangerous. However, saying that, "method" is an HTTP word and its value is going to be sensible (get/post/put etc). Seeing a ☃ in your URL could be seen as somewhat unprofessional. I like _snowman as a default, but I think it needs a simple config option just in case. Nice work with the explanation page.

@cannikin

This comment has been minimized.

cannikin replied Jul 28, 2010

I vote to keep _snowman as a neat little Rails easter egg.

@mrichman

This comment has been minimized.

mrichman replied Jul 28, 2010

@cannikin I would not refer to this as an Easter Egg, as those are typically deliberately obfuscated artifacts. The _snowman thing is quite obvious. What I'd really like to find out is why this bug in IE has been hanging around for so long. As a recovering Microsoft addict, this topic is especially interesting to me.

@knzconnor

This comment has been minimized.

Contributor

knzconnor replied Jul 28, 2010

name it _unicode_shim but keep the choice of characters? Then it's somewhat self-explanatory, and which character it is matters less, so it might as well be an amusing one, like the snowman

@nathany

This comment has been minimized.

Contributor

nathany replied Jul 30, 2010

For method=get, is the _snowman needed for IE at all? If not, then I think snowman is fine. Otherwise, changing to a less conspicuous looking Unicode character and parameter name sounds good.

@knzconnor

This comment has been minimized.

Contributor

knzconnor replied Jul 30, 2010

Now that I think about it, maybe using slang for a drug-dealer is a less than ideal for a covert parameter/character.

@gavinhughes

This comment has been minimized.

gavinhughes replied Aug 17, 2010

-1 on _snowman. I propose _force_ie_unicode_support. That describes exactly what the parameter is for. Self-documenting is definitely better than cute.

@Aupajo

This comment has been minimized.

Aupajo replied Aug 17, 2010

@gavinhughes This is now _utf8 (17a6dfb).

@joseph

This comment has been minimized.

joseph replied Dec 1, 2010

Perhaps this should only apply to POST/PUT forms? It makes rather a mess of GET request URLs (eg, search results). Since GETs are theoretically idempotent, involving no important db writes, perhaps the problem here doesn't apply so much to them. Just a thought.

@Ghostavio

This comment has been minimized.

Ghostavio replied Oct 4, 2016

just out of curiosity, is this still used today?

Please sign in to comment.