Skip to content

Latest commit

 

History

History
22 lines (18 loc) · 1.46 KB

2008-04-02-184.md

File metadata and controls

22 lines (18 loc) · 1.46 KB
date layout slug title tags location geo
2008-04-02 21:19:22 UTC
post
184
CDATA in xml.. bad idea?
cdata
xml
wordpress
Filemobile, Mowatt Ave, Toronto, ON, CA
43.635695
-79.424994

While working on a simple feed parser, I hit upon some wordpress feeds.

I noticed that wordpress feeds make heavy usage of CDATA to encode content. I always figured this was a bad idea if you cannot control what ends up in the xml feed. (Example here.).

Doing some googling to see if I'm not just kicking dust brought me to an xml.com article titled 'Escaped Markup Considered Harmful, which seems to agree with my standpoint for the following reason:

Escaping markup, particularly with CDATA sections, just doesn't work. There are other things that might be wrong that would make the documents not well formed. There are Unicode characters that are forbidden, there are encoding issues for the characters that are allowed, and there are sequences of characters that must be avoided. (e.g., "]]>"). Not to mention the fact that CDATA sections don't nest.

CDATA can't be used to just dump in any type of content that won't work in normal XML sections.. You're still obligated to make your data valid unicode. In fact, it's the opposite; There's no way you could ever escape the ]]> character sequence.