Incorrectly handles safeHTML in RSS <title> tags #1740

Closed
antifuchs opened this Issue Jan 1, 2016 · 3 comments

Projects

None yet

2 participants

@antifuchs

I've been trying to build an ATOM feed template for hugo (based on the one that I was using in octopress), and this one had a problem: Any <![CDATA[ ]]> section I put in a <title> tag got replaced with &lt;![CDATA[ ]]>, even if I used the safeHTML filter.

The workaround for me is to assign an atomworkaround xml namespace and use that in title tags, but that reduces compatibility with bad feed readers, which I'd really like to preserve.

This happens only with <title> tags, as far as I can tell.

An example

Here's an example - put this in an empty hugo installation's layouts/rss.xml file and run hugo to get a broken public/index.xml file:

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:atomworkaround="http://www.w3.org/2005/Atom">
  <title type="html">{{ `<![CDATA[ ` | safeHTML }} This gets replaced but shouldn't! ]]></title>
  <atomworkaround:title type="html">{{ `<![CDATA[ ` | safeHTML }} This works! ]]></atomworkaround:title>
</feed>
</feed>

The output you get will be:

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:atomworkaround="http://www.w3.org/2005/Atom">
  <title type="html">&lt;![CDATA[  This gets replaced but shouldn't! ]]></title>
  <atomworkaround:title type="html"><![CDATA[  This works! ]]></atomworkaround:title>
</feed>

Note the &lt; on line 3.

Correctly working counter-example: <content>

This bug is not triggered with non-title tags. Both elements in the following example are generated correctly and the xml file has no syntax error:

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:atomworkaround="http://www.w3.org/2005/Atom">
  <content type="html">{{ `<![CDATA[ ` | safeHTML }} This works! ]]></content>
  <atomworkaround:content type="html">{{ `<![CDATA[ ` | safeHTML }} This works! ]]></atomworkaround:content>
</feed>

My environment

I run hugo on OS X, from homebrew, version: Hugo Static Site Generator v0.15 BuildDate: 2015-11-26T07:29:07+01:00.

@moorereason
Collaborator

If this requires more discussion, please move it to the discussion forums. I don't think this is a bug in Hugo. This has more to do with the upstream Go templating engine.

For the title, just embed the XML.

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:atomworkaround="http://www.w3.org/2005/Atom">
  <title type="html"><![CDATA[This gets replaced but shouldn't! ]]></title>
  <atomworkaround:title type="html">{{ `<![CDATA[ ` | safeHTML }} This works! ]]></atomworkaround:title>
</feed>
</feed>
@antifuchs

I'm not sure what you're seeing, but if I embed the CDATA element directly (without | safeHTML), it replaces the left angle bracket with a &lt; even outside of <title> tags. If I adjust modify your example:

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:atomworkaround="http://www.w3.org/2005/Atom">
  <title type="html"><![CDATA[This gets replaced but shouldn't! ]]></title>
  <atomworkaround:title type="html"><![CDATA[ This doesn't work either! ]]></atomworkaround:title>
  <content type="html"><![CDATA[ And neither does this ]]></content>
</feed>

I still get this:

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:atomworkaround="http://www.w3.org/2005/Atom">
  <title type="html">&lt;![CDATA[This gets replaced but shouldn't! ]]></title>
  <atomworkaround:title type="html">&lt;![CDATA[ This doesn't work either! ]]></atomworkaround:title>
  <content type="html">&lt;![CDATA[ And neither does this ]]></content>
</feed>

... and that still leaves me with no way to construct a feed that actually works with the feed readers I care about.

I understand that golang's html/template is at fault here (just wrote a test program to convince myself the HTML escaper overeagerly escapes things in <title> tags - it does), but my point is that maybe Hugo should not be using HTML escape rules when generating more general XML - especially with the safeHTML filter.

@antifuchs

Got a work-around, so closing:

By pulling the start of the title tag into the unescaped portion, you can trick html/template into not escaping the start of the CDATA element:

{{ `<title type="html"><![CDATA[` | safeHTML }}{{ with .Title }}{{.}} on {{ end }}{{ .Site.Title }}]]></title>

It's pretty gross, but effective.

@antifuchs antifuchs closed this Jan 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment