Skip to content

Commit

Permalink
Fix several known web encoding issues:
Browse files Browse the repository at this point in the history
* Specify accept-charset on all forms. All recent browsers,
  as well as IE5+, will use the encoding specified for form
  parameters
* Unfortunately, IE5+ will not look at accept-charset unless
  at least one character in the form's values is not in the
  page's charset. Since the user can override the default
  charset (which Rails sets to UTF-8), we provide a hidden
  input containing a unicode character, forcing IE to look
  at the accept-charset.
* Now that the vast majority of web input is UTF-8, we set
  the inbound parameters to UTF-8. This will eliminate many
  cases of incompatible encodings between ASCII-8BIT and
  UTF-8.
* You can safely ignore params[:_snowman_]

TODO:

* Validate inbound text to confirm it is UTF-8
* Combine the whole_form implementations in form_helper_test
  and form_tag_helper_test
  • Loading branch information
wycats committed Jun 28, 2010
1 parent 06b0d6e commit 25215d7
Show file tree
Hide file tree
Showing 7 changed files with 318 additions and 198 deletions.
31 changes: 30 additions & 1 deletion actionpack/lib/action_dispatch/http/parameters.rb
Expand Up @@ -6,7 +6,11 @@ module Http
module Parameters
# Returns both GET and POST \parameters in a single hash.
def parameters
@env["action_dispatch.request.parameters"] ||= request_parameters.merge(query_parameters).update(path_parameters).with_indifferent_access
@env["action_dispatch.request.parameters"] ||= begin
params = request_parameters.merge(query_parameters)
params.merge!(path_parameters)
encode_params(params).with_indifferent_access
end
end
alias :params :parameters

Expand All @@ -32,6 +36,31 @@ def path_parameters
end

private

# TODO: Validate that the characters are UTF-8. If they aren't,
# you'll get a weird error down the road, but our form handling
# should really prevent that from happening
def encode_params(params)
return params unless "ruby".encoding_aware?

if params.is_a?(String)
return params.force_encoding("UTF-8").encode!
elsif !params.is_a?(Hash)
return params
end

params.each do |k, v|
case v
when Hash
encode_params(v)
when Array
v.map! {|el| encode_params(el) }
else
encode_params(v)
end
end
end

# Convert nested Hash to HashWithIndifferentAccess
def normalize_parameters(value)
case value
Expand Down
15 changes: 12 additions & 3 deletions actionpack/lib/action_view/helpers/form_tag_helper.rb
Expand Up @@ -530,22 +530,31 @@ def html_options_for_form(url_for_options, options, *parameters_for_url)
returning options.stringify_keys do |html_options|
html_options["enctype"] = "multipart/form-data" if html_options.delete("multipart")
html_options["action"] = url_for(url_for_options, *parameters_for_url)
html_options["accept-encoding"] = "UTF-8"
html_options["data-remote"] = true if html_options.delete("remote")
end
end

def extra_tags_for_form(html_options)
case method = html_options.delete("method").to_s
snowman_tag = tag(:input, :type => "hidden",
:name => "_snowman_", :value => "☃")

This comment has been minimized.

Copy link
@defmthd

defmthd Aug 25, 2010

Oh my god! He is here!!!

This comment has been minimized.

Copy link
@Spaceghost

Spaceghost Sep 25, 2012

Oh snowman, how I miss you. 🤘


method = html_options.delete("method").to_s

method_tag = case method
when /^get$/i # must be case-insensitive, but can't use downcase as might be nil
html_options["method"] = "get"
''
when /^post$/i, "", nil
html_options["method"] = "post"
protect_against_forgery? ? content_tag(:div, token_tag, :style => 'margin:0;padding:0;display:inline') : ''
token_tag
else
html_options["method"] = "post"
content_tag(:div, tag(:input, :type => "hidden", :name => "_method", :value => method) + token_tag, :style => 'margin:0;padding:0;display:inline')
tag(:input, :type => "hidden", :name => "_method", :value => method) + token_tag
end

tags = snowman_tag << method_tag
content_tag(:div, tags, :style => 'margin:0;padding:0;display:inline')
end

def form_tag_html(html_options)
Expand Down
Expand Up @@ -141,6 +141,29 @@ def assert_parses(expected, actual)
post "/parse", actual
assert_response :ok
assert_equal(expected, TestController.last_request_parameters)
assert_utf8(TestController.last_request_parameters)
end
end

def assert_utf8(object)
return unless "ruby".encoding_aware?

correct_encoding = Encoding.default_internal

unless object.is_a?(Hash)
assert_equal correct_encoding, object.encoding, "#{object.inspect} should have been UTF-8"
return
end

object.each do |k,v|
case v
when Hash
assert_utf8(v)
when Array
v.each {|el| assert_utf8(el) }
else
assert_utf8(v)
end
end
end
end
2 changes: 1 addition & 1 deletion actionpack/test/template/erb/form_for_test.rb
Expand Up @@ -5,7 +5,7 @@ module ERBTest
class TagHelperTest < BlockTestCase
test "form_for works" do
output = render_content "form_for(:staticpage, :url => {:controller => 'blah', :action => 'update'})", ""
assert_equal "<form action=\"/blah/update\" method=\"post\"></form>", output
assert_match %r{<form.*action="/blah/update".*method="post">.*</form>}, output
end
end
end
4 changes: 2 additions & 2 deletions actionpack/test/template/erb/tag_helper_test.rb
Expand Up @@ -28,8 +28,8 @@ def maybe_deprecated
end

test "percent equals works with form tags" do
expected_output = "<form action=\"foo\" method=\"post\">hello</form>"
maybe_deprecated { assert_equal expected_output, render_content("form_tag('foo')", "<%= 'hello' %>") }
expected_output = %r{<form.*action="foo".*method="post">.*hello*</form>}
maybe_deprecated { assert_match expected_output, render_content("form_tag('foo')", "<%= 'hello' %>") }
end

test "percent equals works with fieldset tags" do
Expand Down

32 comments on commit 25215d7

@grimen
Copy link

@grimen grimen commented on 25215d7 Jun 28, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...what's the motivation behind supporting IE5? Curious as I haven't seen IE5 showing up in any browser stats in years.

@josevalim
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IE5+ (emphasis in the +) ;)

@iHiD
Copy link
Contributor

@iHiD iHiD commented on 25215d7 Jun 28, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I suggest documenting the snowman tag? I just saw this in a HTTP trace and panicked somewhat in case my server had been compromised. I just think sending a new variable with every form request is something that people should know about.

Thanks,
iHiD

@grimen
Copy link

@grimen grimen commented on 25215d7 Jun 28, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha. :)

@maccman
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah - was pretty surprised when I saw snowman in my log today.

@aaronchi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aupajo
Copy link
Contributor

@Aupajo Aupajo commented on 25215d7 Jul 27, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulca
Copy link
Contributor

@paulca paulca commented on 25215d7 Jul 27, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would actually help to do a (high profile) blog post about this ... I went searching for it when I started seeing snowmen and it was quite hard to track down.

It is the kind of thing that would freak people out, and, dare I say it, make people think of Rails less seriously. I love it, but on the surface, it feels like an immature Easter egg, rather than a cute hack for forcing unicode.

@mrichman
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why haven't we heard about this hackery before?

@wycats
Copy link
Member Author

@wycats wycats commented on 25215d7 Jul 27, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug exists in IE5, IE6, IE7, and IE8. If the user switches the browser's encoding to Latin-1 (to understand why a user would decide to do something seemingly so crazy, check out this google search: http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=diamond+with+a+question+mark+in+it), any form submission will be sent in Latin-1.

This means that if a user searches for "Ché Guevara", it will come through incorrectly on the server-side. In Ruby 1.9, this will result in an encoding error when the text inevitably makes its way into the regular expression engine. In Ruby 1.8, it will result in broken results for the user.

By creating a parameter that can only be understood by IE as a unicode character, we are forcing IE to look at the accept-charset attribute, which then tells it to encode all of the characters as UTF-8, even ones that can be encoded in Latin-1.

Keep in mind that in Ruby 1.8, it is extremely trivial to get Latin-1 data into your UTF-8 database (since NOTHING in the entire stack checks that the bytes that the user sent at any point are valid UTF-8 characters). As a result, it's extremely common for Ruby applications (and PHP applications, etc. etc.) to exhibit this user-facing bug, and therefore extremely common for users to try to change the encoding as a palliative measure.

All that said, when I wrote this patch, I didn't realize that the name of the parameter would ever appear in a user-facing place (it does with forms that use the GET action, such as search forms). Since it does, we will rename this parameter to _e, and use a more innocuous-looking unicode character.

@mrichman
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh come on, why not just name it to _ie ;)

@maccman
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would IE user agent sniffing be a bad idea?

@foca
Copy link
Contributor

@foca foca commented on 25215d7 Jul 27, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this in twitter, but why not just set up a middleware that does params.delete(:_snowman_) when it gets to rack? That way end-users will never see this. That, and documenting why you get it on your logs (or maybe removing it even before it hits the logs…) should be enough to keep everyone happy.

@ravinggenius
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I love the silliness of _snowman. _ie is a wonderful choice as well, but _e is strikes me as enterprise-y and boring.

@fxn
Copy link
Member

@fxn fxn commented on 25215d7 Jul 28, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@foca renaming is considered because if you send a form with GET, eg a search form, the query string has the snowman. We do not want end-users to have such a prominent strange parameter right there. The fact that it appears in the params hash is not an issue, as "_method" does.

It could be the case that _e is actually called the snowman parameter though :).

@aaronchi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for changing snowman to lollipop because it will be more silly and even less corporate-y. That'll teach those suits!

Gotta love open source ;)

@norman
Copy link
Contributor

@norman norman commented on 25215d7 Jul 28, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "Frosty" should be the new official Rails mascot.

@Aupajo
Copy link
Contributor

@Aupajo Aupajo commented on 25215d7 Jul 28, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulca Completely agree. Here's one option: I've set up a simple page with information about the Rails snowman people can easily search for. The repo is at http://github.com/Aupajo/rails-snowman-info and the website will be at http://railssnowman.info, once the DNS updates and the CNAME kicks in.

@lukaszx0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aupajo great idea. But it's no longer snowman but _snowman - checkout this commit: http://github.com/rails/rails/commit/caab17611668ff18a3c8642b2d45b353be5d9691 (with no underscore on the end). I've send you patch via email for it. If you didn't get it, here it is: http://gist.github.com/493743

@iHiD
Copy link
Contributor

@iHiD iHiD commented on 25215d7 Jul 28, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not make sense just to add a config option to change the name of the parameter. There is always a small risk that whatever is chosen will conflict with an existing app, may be unacceptable in that particular organisation etc. By keeping a default that is well documented (nice work @Aupajo), new users will understand what is going on, but still have control to change it if necessary.

In terms of logs, I think that it should be filtered by default for new apps (in config.filter_parameters), as per :password.

@Aupajo
Copy link
Contributor

@Aupajo Aupajo commented on 25215d7 Jul 28, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@strzalek Thanks! I've applied your patch.

@iHiD I think a company is much more likely to be tripped up by _method than _snowman. If one is acceptable, so should the other be.

@dmathieu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iHiD : you shouldn't be using parameters starting with a _ in your forms anyway.
Moreover one of the principles of rails is : convention over configuration. It's just being applied here.

For the filter_parameters, that's intended for security reasons, to avoid having password in clear in the log file.
It doesn't seems applicable here as it's not a matter of security to have the _snowman parameter in the log. You can, however, always add it in your applications if you wish to.

@iHiD
Copy link
Contributor

@iHiD iHiD commented on 25215d7 Jul 28, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aupajo - I agree that _method is probably more dangerous. However, saying that, "method" is an HTTP word and its value is going to be sensible (get/post/put etc). Seeing a ☃ in your URL could be seen as somewhat unprofessional. I like _snowman as a default, but I think it needs a simple config option just in case. Nice work with the explanation page.

@cannikin
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote to keep _snowman as a neat little Rails easter egg.

@mrichman
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cannikin I would not refer to this as an Easter Egg, as those are typically deliberately obfuscated artifacts. The _snowman thing is quite obvious. What I'd really like to find out is why this bug in IE has been hanging around for so long. As a recovering Microsoft addict, this topic is especially interesting to me.

@knzconnor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name it _unicode_shim but keep the choice of characters? Then it's somewhat self-explanatory, and which character it is matters less, so it might as well be an amusing one, like the snowman

@nathany
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For method=get, is the _snowman needed for IE at all? If not, then I think snowman is fine. Otherwise, changing to a less conspicuous looking Unicode character and parameter name sounds good.

@knzconnor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it, maybe using slang for a drug-dealer is a less than ideal for a covert parameter/character.

@gavinhughes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 on _snowman. I propose _force_ie_unicode_support. That describes exactly what the parameter is for. Self-documenting is definitely better than cute.

@Aupajo
Copy link
Contributor

@Aupajo Aupajo commented on 25215d7 Aug 17, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gavinhughes This is now _utf8 (17a6dfb).

@joseph
Copy link

@joseph joseph commented on 25215d7 Dec 1, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should only apply to POST/PUT forms? It makes rather a mess of GET request URLs (eg, search results). Since GETs are theoretically idempotent, involving no important db writes, perhaps the problem here doesn't apply so much to them. Just a thought.

@Ghostavio
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just out of curiosity, is this still used today?

Please sign in to comment.