Permalink
Browse files

updated escape_html to use hex entities and to escape forward slash

Small merge work required.

Signed-off-by: raggi <jftucker@gmail.com>
  • Loading branch information...
1 parent b155fc5 commit 37ca2539c423f29337486cf0469dd49a019cfea3 @thinkerbot thinkerbot committed with raggi Jun 13, 2010
Showing with 14 additions and 2 deletions.
  1. +2 −1 lib/rack/utils.rb
  2. +1 −1 test/spec_showexceptions.rb
  3. +11 −0 test/spec_utils.rb
View
@@ -132,8 +132,9 @@ def build_nested_query(value, prefix = nil)
"&" => "&amp;",
"<" => "&lt;",
">" => "&gt;",
- "'" => "&#39;",
+ "'" => "&#x27;",
'"' => "&quot;",
+ "/" => "&#x2F;"
}
ESCAPE_HTML_PATTERN = Regexp.union(*ESCAPE_HTML.keys)
@@ -61,6 +61,6 @@
res.body.should.include "RuntimeError"
res.body.should.include "It was never supposed to work"
- res.body.should.include __FILE__
+ res.body.should.include Rack::Utils.escape_html(__FILE__)
end
end
View
@@ -175,6 +175,17 @@
message.should.equal "value must be a Hash"
end
+ should "should escape html entities [&><'\"/]" do
+ Rack::Utils.escape_html("foo").should.equal "foo"
+ Rack::Utils.escape_html("f&o").should.equal "f&amp;o"
+ Rack::Utils.escape_html("f<o").should.equal "f&lt;o"
+ Rack::Utils.escape_html("f>o").should.equal "f&gt;o"
+ Rack::Utils.escape_html("f'o").should.equal "f&#x27;o"
+ Rack::Utils.escape_html('f"o').should.equal "f&quot;o"
+ Rack::Utils.escape_html("f/o").should.equal "f&#x2F;o"
+ Rack::Utils.escape_html("<foo></foo>").should.equal "&lt;foo&gt;&lt;&#x2F;foo&gt;"
+ end
+
should "figure out which encodings are acceptable" do
helper = lambda do |a, b|
request = Rack::Request.new(Rack::MockRequest.env_for("", "HTTP_ACCEPT_ENCODING" => a))

13 comments on commit 37ca253

@richmeyers
Contributor

Escaping of slash seems to be responsible for breaking integrity. Why was it added?

@raggi
Member
raggi commented on 37ca253 Oct 20, 2011

because not doing so allows for evil

@richmeyers
Contributor

Just to clarify, I was looking for some kind of a reasonable explanation why a project billed as a "standard interface" between ruby frameworks and web servers decides to break compatibility. 4+ years since inception, 2+ years and 3 versions since 1.0 rack is still breaking applications that use it (which by now I imagine are all applications written in ruby).

@thinkerbot
Contributor

Perhaps you are looking for the original issue. In short these escapes are recommended by OWASP for preventing XSS.

@raggi
Member
raggi commented on 37ca253 Oct 22, 2011

@richmeyers sorry, but i will break (minor) compatibility to keep people safe. It's easy for apps to fix this, too.

Would you rather be left vulnerable?

@richmeyers
Contributor

@raggi I would like to see the following:

  1. Clear separation between security fixes and ordinary changes.
  2. Clear separation between changes that break backwards compatibility and ones that do not.
  3. If compatibility is broken, at least an acknowledgment of the fact and ideally what needs to be done to keep client code working.

This commit combines a compatibility-breaking security fix, as you say, with a change of spelling for the escaping of the single quote, which is neither and is not needed at all. There already existed an issue explaining the reasoning behind the change when the change was made, yet the issue was not mentioned in the commit.

Regarding "easy to fix", I cannot agree with you. This is the line in question: https://github.com/integrity/integrity/blob/e767171e8a358812e7340e6c290647421c6b815c/views/new.haml#L14 and in rack 1.3 the value displayed is something like "&#x27path&#x27to&#x27repo" instead of "/path/to/repo". As someone who wrote neither rack nor integrity I have no idea how to fix this. I consider myself fortunate to have even located the root cause of the problem as quickly as I have (i.e., in less than a day).

If this is indeed easy to fix, then you should have taken 10 minutes to explain what the fix is instead of making myself and countless other developers, I'm sure, spend hours figuring this out by reading source (which is far from being helpful, see above).

Lastly, with regard to this particular change, I am not convinced that a security issue existed. Neither the owasp page nor the ticket provide an example of how an unescaped slash can cause a security problem. "forward slash is included as it helps end an HTML entity" - well, letters are part of html entities, should they be escaped now too?

@thinkerbot
Contributor

@richmeyers One solution is to not upgrade. If a previous rack worked, then you have the option to stay with it especially if you don't think that this security fix is necessary for your app.

As someone who has also not written integrity I don't know how to fix your problem. If you were escaping a url for display to the user in a textual context then there should not be a breakage as the browser renders escaped entities as their unescaped characters. For example, from the command line:

echo "&lt;foo&gt;&lt;&#x2F;foo&gt;" > example.html
open example.html 

If a test is breaking then adjust the test to look for the escaped entities.

Regarding your last statement, I personally don't know if a security issue exists either but that's because I've never studied the issue. OWASP has and so I defer to their recommendations. See here. It was enough to assure me that I do not know what I need to know about XSS. And to your question ... 'helps end' and 'part of' are pretty different roles to play wrt HTML entities so no, letters quite likely do not need to be escaped.

@richmeyers
Contributor

@thinkerbot Not upgrading is not a solution. It is a workaround which works until something requires a newer version, and it addresses none of my concerns.

The article you linked to was quite informative. Why then is rack not escaping ` and potentially most (all?) of ascii punctuation? I noticed that the article you linked to, unlike owasp page, does not recommend escaping /, thus it does not explain how unescaped / is a security problem.

With the change made in this commit documentation of escape_html no longer matches its behavior. See https://github.com/rack/rack/blob/37ca2539c423f29337486cf0469dd49a019cfea3/lib/rack/utils.rb#L141. The change made is in fact a clear violation of the claimed purpose of the method, which was to escape a (well known and clearly defined) set of characters. Now it seems escape_html escapes anything it feels like escaping. If rack's escape_html is trying to escape text for a particular purpose, per http://wonko.com/post/html-escaping, what purpose is that? And is that purpose subject to change at any time with no warning or explanation?

@thinkerbot
Contributor

@richmeyers quite right. I recommend you make a pull request to fix the documentation.

If you still have concerns then that is great! It sounds like you have the drive to figure out why or why not / is a problem. If indeed it turns out not a problem then you have a contribution to make to OWASP and later to rack.

I for one think the changes are consistent with the purpose of escape_html, which as I understand it is to escape untrusted data before insertion into an HTML context (see rule 1 on the OWASP page). If you read further into the OWASP XSS page you will find the more extensive escaping is recommended for a different use case; ie escaping untrusted data before insertion into HTML attributes (rule 2). If the intent of html_escape is to sanitize untrusted data put into attributes, then yes it probably should escape more characters like [space] % * + , - / ; < = > ^ and |.

Evidently there is some confusion about the intent of escape_html. Again I suggest you take action. Talk to people, determine the common use case of escape_html, and propose patches as you find appropriate. Security is very difficult. I'm sure the community would welcome another expert.

@richmeyers
Contributor

@thinkerbot I almost had to double check if I wrote the changes in question and then forgot all about that, but no, it was not me. It was in fact you. Thanks for volunteering me to fix this mess, but no thanks.

It appears you don't even know what these changes are meant to accomplish. The recommendation to escape slashes is innocuous enough when applied to newly written code, however when an established method is altered to now escape slashes it creates breakage. Fix the documentation? As I already mentioned I do not understand what "escape_html" is meant to do now, with this change applied, or in the future with this sort of approach to modifying published interfaces. I have no intention of guessing how you think escape_html should behave today and I certainly have no desire to alter the documentation from "escapes ampersands, brackets and quotes" to "escapes an undefined and subject to change set of characters".

My suggestion is to revert this commit, putting escape_html back in sync with its documented purpose which is to escape the 5 characters it previously escaped. Then, if you want to escape other characters, especially without defining a sensible policy for what is escaped and what is not escaped, write a separate method for doing so. However, seeing as how there is no html interface between web servers and web frameworks that I'm aware of, it is my opinion that such a method has no place in rack.

I tried to avoid making the following list, but seeing how this discussion is degenerating into absurdity it seems more fitting than not:

  • Just because someone posted something on the internet does not make it true.
  • Just because someone put a "security" label on something does not make it a security issue.
  • If someone did something and you do the same thing in a different context, you might get different results.
  • The fact that your changes were merged does not make them any more correct or appropriate.
@raggi
Member
raggi commented on 37ca253 Oct 25, 2011

Why is integrity using escape_html to escape uris? it should be using uri escaping.

@tenderlove
Member

why so serious

Please sign in to comment.