Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"&" sometimes recognized as markup within a CDATA section rather than character data #48

Open
ScottG489 opened this issue Aug 6, 2021 · 0 comments

Comments

@ScottG489
Copy link

First I'd like to start by mentioning my assumption is that this is the code which powers the backend for https://validator.w3.org/feed/ and possibly https://www.rssboard.org/rss-validator/. However, the bug only occurs on the former website.

It seems that within certain elements that contain a CDATA section, if there is an ampersand (&) followed by a character that isn't a space, then the validator will report the following recommendation:

Invalid HTML: Named entity expected. Got none.

With a reference to this help doc.

The exact situation for this seems very specific. This doesn't reproduce for CDATA sections in all elements. Here is a minimal example that will reproduce the potential bug:

<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Foo Bar</title>
    <link>https://example.com</link>
    <description><![CDATA[foo &bar]]></description>
    <atom:link href="https://example.com" rel="self" type="application/rss+xml"/>
    <item>
      <description><![CDATA[foo &bar]]></description>
<guid>http://example.com/123</guid>
    </item>
  </channel>
</rss>

The recommendation will be reported on line 8 (not 5) within the <description> nested within <item>.

I tried reproducing this issue within a <title> nested within <item> but it did not reproduce. I also tested within a <description> nested within <channel> and it also didn't reproduce. Perhaps there are other situations where it will reproduce but I've only been able to reproduce it with the CDATA section inside a <description> nested within an <item>.

This seems to indicate that in this specific context, it's recognizing the "&" as markup within a CDATA section rather than character data. However, the official documentation on the CDATA sections specifies that:

Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using " < " and " & ". CDATA sections cannot nest.

Looking forward to hearing your thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant