Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor discrepancy with fromHtmlEscapedString #1

Closed
lpsmith opened this issue Feb 21, 2015 · 4 comments
Closed

Minor discrepancy with fromHtmlEscapedString #1

lpsmith opened this issue Feb 21, 2015 · 4 comments

Comments

@lpsmith
Copy link
Owner

lpsmith commented Feb 21, 2015

See jaspervdj/blaze-markup#15

fromHtmlEscapedString in 0.4 strips out '\DEL' as well as ASCII control characters, whereas the same function from 0.3 leaves them in.

e.g. with blaze-builder-0.3.3.2

ghci> toLazyByteString (fromHtmlEscapedString "\DEL<foobar>\STX")
"\DEL&lt;foobar&gt;\STX"

with blaze-builder-0.4.0.0

ghci> toLazyByteString (fromHtmlEscapedString "\DEL<foobar>\STX")
"&lt;foobar&gt;"

Possible fixes would be to 1) release a revision of the 0.3 series that also strips out control characters, 2) adjust 0.4 so that it doesn't, or 3) leave it as-is.

It certainly is my intention that 0.4 be semantically identical to 0.3, but I am curious what the correct behavior should be here with regards to html escaping, and how much trouble this discrepancy will actually cause. Control characters aren't kosher in HTML, but I don't know if they can actually lead to security vulnerabilities. Similarly, this discrepancy does cause problems with blaze-markup's test suite, but I don't know if it'll cause any trouble in "real" applications.

@lpsmith
Copy link
Owner Author

lpsmith commented Feb 21, 2015

Ok, out of the 200 reverse dependencies of blaze-builder on hackage, there appears to be 3 direct reverse dependencies on fromHtmlEscapedString:

blaze-markup-0.6.3.0/src/Text/Blaze/Renderer/Utf8.hs
hackage-server-0.5.0/Distribution/Server/Framework/Templating.hs
lucid-2.9.1/src/Lucid/Base.hs

(This was done by scraping the package names out of the above link with grep and sed, cabal unpack each one of the results, then running grep -rl fromHtmlEscaped on the directory of unpacked packages.)

So if any maintainers of these packages would like to weigh in on this issue, I'm all ears. @dcoutts @jaspervdj @chrisdone

@lpsmith
Copy link
Owner Author

lpsmith commented Feb 21, 2015

Of course, this change also has the potential to affect the output of each one of these packages, and thus any downstream dependencies of these packages as well.

@chrisdone
Copy link

I'm okay with this staying as-is. Given that it's an HTML5 parse error, even browsers would be legitimate in stripping it out, so relying on that for real applications is living on borrowed time anyway.

@jaspervdj
Copy link
Collaborator

I'm also fine with stripping out the control characters.

@lpsmith lpsmith closed this as completed Apr 18, 2016
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Jan 1, 2020
* 0.4.1.0
- Gain compatibility with the Semigroup/Monoid proposal
- Add Word8 HTML escaping builders
- Speed up `fromHtmlEscapedText` and `fromHtmlEscapedLazyText`

* 0.4.0.2
- Fixed warnings on GHC 7.10, courtesy of Mikhail Glushenkov.

* 0.4.0.1
- Tightened the version constraints on the bytestring package for GHC
  7.8

* 0.4.0.0
- This is now a compatibility shim for the new bytestring builder.
  Most of the old internal modules are gone.  See this blog post for
  more information:

  <http://blog.melding-monads.com/2015/02/12/announcing-blaze-builder-0-4/>

- The 'Blaze.ByteString.Builder.Html.Utf8.fromHtmlEscaped*' functions
  now strip out any ASCII control characters present in their inputs.
  See <lpsmith/blaze-builder#1> for more
  information.
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Jan 14, 2020
* 0.4.1.0
- Gain compatibility with the Semigroup/Monoid proposal
- Add Word8 HTML escaping builders
- Speed up `fromHtmlEscapedText` and `fromHtmlEscapedLazyText`

* 0.4.0.2
- Fixed warnings on GHC 7.10, courtesy of Mikhail Glushenkov.

* 0.4.0.1
- Tightened the version constraints on the bytestring package for GHC
  7.8

* 0.4.0.0
- This is now a compatibility shim for the new bytestring builder.
  Most of the old internal modules are gone.  See this blog post for
  more information:

  <http://blog.melding-monads.com/2015/02/12/announcing-blaze-builder-0-4/>

- The 'Blaze.ByteString.Builder.Html.Utf8.fromHtmlEscaped*' functions
  now strip out any ASCII control characters present in their inputs.
  See <lpsmith/blaze-builder#1> for more
  information.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants