Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quote attributes containing weird whitespace or '<' #11

Closed
gsnedders opened this issue Apr 9, 2013 · 1 comment

Comments

Projects
None yet
1 participant
@gsnedders
Copy link
Member

commented Apr 9, 2013

http://code.google.com/p/html5lib/issues/detail?id=93

Reported by zcorpan, Feb 27, 2009

This is similar to issue 92 except there's an old Opera bug where certain
characters are treated as whitespace.

http://www.opera.com/support/kb/view/900/

The characters are

U+0009, U+000A, U+000B, U+000C, U+000D, U+0020, U+002F, U+00A0, U+1680, U
+180E, U+180F, U+2000, U+2001, U+2002, U+2003, U+2004, U+2005, U+2006, U
+2007, U+2008, U+2009, U+200A, U+2028, U+2029, U+202F, U+205F and U+3000

html5lib should probably quote attribute values that contain any of these.

Also, given that Gecko and WebKit start a new tag for <foo bar=baz<quux>
you should probably also quote attribute values that contain "<".

Apr 27, 2009 excors

Also see http://software.hixie.ch/utilities/js/live-dom-viewer/saved/95

In addition to the values mentioned in the spec, the following seem to require
quoting:

Safari 3.0: U+0000 to U+0020 inclusive
Konqueror 4.1: U+0000 to U+0020 inclusive
Safari 3.1: U+000B
Opera 9.6: U+000B
IE6, IE8: U+000B, U+0060
Firefox 2/3: (Not U+0008 despite what that test script says; those characters just
get stripped, it seems)

Apr 27, 2009 zcorpan

(U+000B is not a valid character in HTML5, though I don't know if the serializer
tries to keep the character data valid.)

Sep 4, 2009 Simetrical

The spec should be updated to ban these too, then, right? They're not interoperably
supported. I doubt anyone will cry about not being able to use sub-0x20 characters in
unquoted attribute values, anyway. :) U+60 is `, doesn't seem like a big issue
either. Should this be brought up on the mailing list?

Sep 5, 2009 geoffers

IMO yes, just someone needs to get around to it. :)

Sep 6, 2009 zcorpan

I did, and Hixie rejected it saying that it's an issue that will go away over time.
Feel free to bring it up again (citing that sites who implement the spec using a
serializer will expose themselves to security problems with legacy browsers).

Sep 7, 2009 Simetrical

I posted this a couple of days ago:

http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-September/022711.html

Oct 28, 2009 geoffers

Accepted, though we still need to decide how much to quote.

Oct 30, 2009 geoffers

I don't think we need to try and get the spec to quote anything else.

This should presumably be a legacy_quote option or some such.

@gsnedders

This comment has been minimized.

Copy link
Member Author

commented Jun 23, 2013

The full list:

\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\x20\x2f\x60\xa0\u1680\u180e\u180f\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000

gsnedders added a commit to gsnedders/html5lib-python that referenced this issue Jul 19, 2013

Fix html5lib#11, html5lib#12: quote attributes that need escaping in …
…legacy browsers

These are mostly out of the market now, so this isn't massively
needed any more; nevertheless, avoiding XSS as much as possible is
inevitably desirable.

This alters the API so that quote_attr_values is now a ternary
setting, choosing between legacy-safe behaviour, spec behaviour, and
always quoting.

@ghost ghost assigned gsnedders Aug 13, 2013

gsnedders added a commit to gsnedders/html5lib-python that referenced this issue Sep 19, 2013

gsnedders added a commit to gsnedders/html5lib-python that referenced this issue Sep 19, 2013

@gsnedders gsnedders modified the milestones: 0.9999, 0.99999 Apr 29, 2015

gsnedders added a commit to gsnedders/html5lib-python that referenced this issue May 7, 2016

Fix html5lib#11, html5lib#12: quote attributes that need escaping in …
…legacy browsers

These are mostly out of the market now, so this isn't massively
needed any more; nevertheless, avoiding XSS as much as possible is
inevitably desirable.

This alters the API so that quote_attr_values is now a ternary
setting, choosing between legacy-safe behaviour, spec behaviour, and
always quoting.

gsnedders added a commit to gsnedders/html5lib-python that referenced this issue May 7, 2016

gsnedders added a commit to gsnedders/html5lib-python that referenced this issue May 7, 2016

gsnedders added a commit that referenced this issue May 9, 2016

Fix #11, #12: quote attributes that need escaping in legacy browsers
These are mostly out of the market now, so this isn't massively
needed any more; nevertheless, avoiding XSS as much as possible is
inevitably desirable.

This alters the API so that quote_attr_values is now a ternary
setting, choosing between legacy-safe behaviour, spec behaviour, and
always quoting.

gsnedders added a commit to gsnedders/html5lib-python that referenced this issue May 11, 2016

Fix html5lib#11, html5lib#12: quote attributes that need escaping in …
…legacy browsers

These are mostly out of the market now, so this isn't massively
needed any more; nevertheless, avoiding XSS as much as possible is
inevitably desirable.

This alters the API so that quote_attr_values is now a ternary
setting, choosing between legacy-safe behaviour, spec behaviour, and
always quoting.

gsnedders added a commit to gsnedders/html5lib-python that referenced this issue May 11, 2016

gsnedders added a commit to gsnedders/html5lib-python that referenced this issue May 11, 2016

gsnedders added a commit to gsnedders/html5lib-python that referenced this issue May 11, 2016

Fix html5lib#11, html5lib#12: quote attributes that need escaping in …
…legacy browsers

These are mostly out of the market now, so this isn't massively
needed any more; nevertheless, avoiding XSS as much as possible is
inevitably desirable.

This alters the API so that quote_attr_values is now a ternary
setting, choosing between legacy-safe behaviour, spec behaviour, and
always quoting.

gsnedders added a commit to gsnedders/html5lib-python that referenced this issue May 11, 2016

@gsnedders gsnedders closed this in 9b8d8eb May 17, 2016

gsnedders added a commit that referenced this issue May 17, 2016

Merge pull request #95 from gsnedders/escape-characters-serializer
Fix #11 by escaping enough to be safe in legacy browsers; r=nobody!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.