Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't escape entities inside CDATA #193

Merged
merged 1 commit into from
Oct 26, 2022
Merged

Conversation

akirk
Copy link
Contributor

@akirk akirk commented Oct 24, 2022

Laravel feeds of posts that contain quotes in titles when parsed by standard-compliant XML parsers, contain " instead of just a ". The reason is that a CDATA section is being used in combination with Blade's htmlspecialchars handling.

https://www.w3.org/TR/REC-xml/#dt-cdsection

Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using " < " and " & ". CDATA sections cannot nest.

One possible solution would be to just remove the CDATA sections and let htmlspecialchars do its thing or to create a special output function for CDATA content. I chose the latter approach.

For unit tests, I have just added some "dangerous" markup to the "feedItemTitle" of the DummyItems but I am not sure this is the best approach. Please advise.

@akirk
Copy link
Contributor Author

akirk commented Oct 24, 2022

I just realized this was already addressed in the stale and auto-closed #77, this is still a problem.

@freekmurze
Copy link
Member

Thank you for this. We're not getting any reports from current users that this is currently broken.

Could you give an example where the current implementation renders the wrong results?

@akirk
Copy link
Contributor Author

akirk commented Oct 25, 2022

For example, https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Ffreek.dev%2Ffeed tells me

This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.
line 27, column 77: title should not contain HTML unless declared in the type attribute: ' (4 occurrences) [help]

or https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fbenjamincrozat.com%2Ffeed

line 10, column 77: title should not contain HTML unless declared in the type attribute: ' (13 occurrences) [help]

Or when parsing a feed with an XML parser and getting the entry title with xpath:

curl -s https://freek.dev/feed | xpath -e '//feed/entry[2]/title/text()'
Found 1 nodes in stdin:
-- NODE --
What&amp;#039;s new in PHP 8.3? Take a sneak peek.

Whereas when I fix the CDATA, I get this

xpath -e '//feed/entry[2]/title/text()' freek.rss
Found 1 nodes in freek.rss:
-- NODE --
What's new in PHP 8.3? Take a sneak peek.

For those feeds the Firefox Extension RSS Preview also shows the HTML entities in the titles:
Screenshot 2022-10-25 at 11 08 09

@freekmurze freekmurze merged commit d239ea7 into spatie:main Oct 26, 2022
@freekmurze
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants