Two proposed changes to blaze-html (xhtml support, don't escape ') #48

Open
wants to merge 2 commits into
from

Conversation

Projects
None yet
3 participants
Contributor

jgm commented Dec 18, 2011

I wrote Jasper an email about these issues earlier. You can consider these concrete proposals. Perhaps both are controversial. The two changes are completely independent:

(1) Don't escape ' (justification in commit).

(2) Added modules for XHtml1.Strict, Transitional, and FrameSet. These use the same tags and attributes as the corresponding Html modules. The only differences are that leaf nodes are self-closed, and a different doctype is used.

These changes would allow me to use blaze-html for pandoc's output.

John MacFarlane added some commits Dec 18, 2011

Added XHtml1.{Strict,Transitional,FrameSet} modules.
* Modified Util/GenerateHtmlCombinators to generate these automatically.

* HtmlVariant now includes a selfClosing field; if True, leaf tags will be
 self-closed.

* Modified cabal file to include the new modules in exported modules.
Don't escape single quote (') as '.
Argument:  The only context in which ' needs to be escaped is inside
a single-quoted attribute value.  Since blaze-html uses double
quotes around attribute values, it should never need to escape '.
Collaborator

meiersi commented Dec 18, 2011

Hi John,

making blaze-html usable for pandoc is important to me. However, I'm not sure about dropping the escaping of singlequotes. Is it disallowed in XHTML? I'd rather be on the safe side, than weaken the security guarantees given by HTML code generated blaze-html. We used the following information about cross-site scripting as a template for choosing the escaping.

best regards,
Simon

PS: If we change the escaping, then I'll also have to adapt blaze-builder accordingly. That's no problem. It's more a note that I have to do it :-)

Contributor

jgm commented Dec 18, 2011

Simon,

I didn't have to make any changes to blaze-builder to change the
escaping: I just made a couple changes in the renderers:
jgm/blaze-html@e42848d

It's true that putting unescaped user input inside a single-quoted
attribute could get you in trouble, if the user supplied string
contained a ' character:

for example, this would be bad in ruby:

"bad"

because if the user passed in "dog's", you'd get:

<a href='dog's'>bad

and who knows how a browser might interpret that.

However, in blaze-html, you always use double quotes around attributes. So I
fail to see how this could be a problem for you. The ONLY time ' needs to be
escaped is when it's inside a single-quoted attribute. The following
is just fine:

bad

That said, the single quote escaping is not as big an issue for me
as the XHTML combinators, which I really need for pandoc. It's more
of an aesthetic issue. If you choose not to change blaze-html, I'll
probably work around it by using my own string-escape function in
pandoc. I'd rather not do that, but I could live with it if needed.

Best,
John

+++ Simon Meier [Dec 18 11 12:22 ]:

Hi John,

making blaze-html usable for pandoc is important to me. However, I'm not sure about dropping the escaping of singlequotes. Is it disallowed in XHTML? I'd rather be on the safe side, than weaken the security guarantees given by HTML code generated blaze-html. We used the following information about cross-site scripting as a template for choosing the escaping.

best regards,
Simon

PS: If we change the escaping, then I'll also have to adapt blaze-builder accordingly. That's no problem. It's more a note that I have to do it :-)


Reply to this email directly or view it on GitHub:
#48 (comment)

Owner

jaspervdj commented Dec 18, 2011

I think John's point is valid, but I'd rather be 100% sure. Perhaps we should throw a mail to the Haskell web-devel list to see if they can think of any security concerns?

Contributor

jgm commented Dec 18, 2011

+++ Jasper Van der Jeugt [Dec 18 11 13:41 ]:

I think John's point is valid, but I'd rather be 100% sure. Perhaps we should throw a mail to the Haskell web-devel list to see if they can think of any security concerns?

That's a good idea.

Owner

jaspervdj commented Dec 20, 2011

@jgm, I've cherry-picked the XHTML patch. This has already been put on Hackage as 0.4.3.0.

I'm planning on integrating the other patch as well, as there seem to be no security concerns. However, there is a small problem: some of the escaping code (more specifically, for the UTF-8 renderer) is indeed located in the blaze-builder library (the Blaze.ByteString.Builder.Html.Utf8 module). First off, I would like the output of the renderers to be consistent as much as possible, so this code would need to change as well.

Now, it would seem like a good idea to move this module from blaze-builder to blaze-html, if possible, since this is more relevant to the latter. However, in that case, we also need to check whether or not there are any other packages depending upon this module. @meiersi, what do you think? Cheers!

Collaborator

meiersi commented Dec 20, 2011

@jaspervdj why don't you just use your own modified copy of that function in the new version of blaze-html? This way users of blaze-builder get the existing behavior and you have the freedom to change the escaping. This would result in the least surprises, I think.

In the near future, the second patch to bytestring will also be completed and accepted. This patch will provide a safe means to define such escaping functions directly. The blaze-builder package will become obsolete and replaced with a stub reexporting the bytestring builder functions.

Contributor

jgm commented Dec 28, 2011

+++ Jasper Van der Jeugt [Dec 20 11 01:08 ]:

@jgm, I've cherry-picked the XHTML patch. This has already been put on Hackage as 0.4.3.0.

Thank you! Let me know when the escaping patch is integrated, and
I can simplify my code a bit.

Contributor

jgm commented Sep 28, 2012

I see that ' is still being escaped. The issue has come up again (http://github.com/jgm/pandoc/issues/629), so I thought I'd check to see whether anything was resolved (this issue is still open, so I guess not). Here is a link to Jasper's email to the haskell web-dev list and the one response he replied: http://www.haskell.org/pipermail/web-devel/2011/002282.html. It seems to me that there are good reasons to stop escaping ' outside of attributes, and no good reasons to continue...

Owner

jaspervdj commented Sep 28, 2012

I admit I had forgotten this issue. I'll fix it next week (leaving on a small trip tomorrow).

Contributor

jgm commented Sep 28, 2012

Great! No urgency on this end; I'm working around it for now.

+++ Jasper Van der Jeugt [Sep 28 12 07:51 ]:

I admit I had forgotten this issue. I'll fix it next week (leaving on a
small trip tomorrow).

--
Reply to this email directly or [1]view it on GitHub.
[J6T91GIPIyhU-8ti4GCGP7AlC2fiocPKodp06RQqyLyJVKPwspQRCLkbEJ5_dJrC.gif]

References

  1. #48 (comment)
Collaborator

meiersi commented Oct 1, 2012

BTW: I'm working on porting blaze-markup over to the new bytestring builder. I suggest we address this issue after this port, as it allows to also handle the fast UTF-8 path properly; i.e., using primitive encodings to implement the attribute and the standard HTML escaping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment