Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for ]]> in CDATA in minidom #64913

Closed
arturcz mannequin opened this issue Feb 21, 2014 · 10 comments
Closed

Allow for ]]> in CDATA in minidom #64913

arturcz mannequin opened this issue Feb 21, 2014 · 10 comments
Labels
topic-XML type-bug An unexpected behavior, bug, or error

Comments

@arturcz
Copy link
Mannequin

arturcz mannequin commented Feb 21, 2014

BPO 20714
Nosy @loewis, @merwok, @serhiy-storchaka

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2014-02-23.21:59:56.376>
created_at = <Date 2014-02-21.02:04:09.505>
labels = ['expert-XML', 'type-bug']
title = 'Allow for ]]> in CDATA in minidom'
updated_at = <Date 2014-02-24.09:27:37.735>
user = 'https://bugs.python.org/arturcz'

bugs.python.org fields:

activity = <Date 2014-02-24.09:27:37.735>
actor = 'peter.otten'
assignee = 'none'
closed = True
closed_date = <Date 2014-02-23.21:59:56.376>
closer = 'loewis'
components = ['XML']
creation = <Date 2014-02-21.02:04:09.505>
creator = 'arturcz'
dependencies = []
files = []
hgrepos = ['217']
issue_num = 20714
keywords = ['patch']
message_count = 10.0
messages = ['211791', '211821', '211842', '212000', '212015', '212025', '212027', '212032', '212033', '212078']
nosy_count = 5.0
nosy_names = ['loewis', 'peter.otten', 'eric.araujo', 'serhiy.storchaka', 'arturcz']
pr_nums = []
priority = 'normal'
resolution = 'wont fix'
stage = 'test needed'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue20714'
versions = ['Python 3.5']

@arturcz
Copy link
Mannequin Author

arturcz mannequin commented Feb 21, 2014

Current support for ]]> inside CDATA is to raise an Exception. However, it could be solved by dividing the ]]> to two strings:

  • ]]
  • >
    and each one is a separate CDATA inside elemement. So, to put ]]> inside CDATA one can write:
    <element>
    <![CDATA[]]]]><![CDATA[>]]>
    </element>

@arturcz arturcz mannequin added topic-XML type-bug An unexpected behavior, bug, or error labels Feb 21, 2014
@merwok
Copy link
Member

merwok commented Feb 21, 2014

Does the XML spec allow that?

@merwok merwok changed the title Please allow for ]]> in CDATA in minidom.py Allow for ]]> in CDATA in minidom Feb 21, 2014
@arturcz
Copy link
Mannequin Author

arturcz mannequin commented Feb 21, 2014

Eric, I'm not sure what exactly your concern is, but I'll try to address two issues I can see.

First: both strings <![CDATA[]]]]> and <![CDATA[>]]> are a correct and valid examples of CDATA usage as per specification[1].

Second: is it allowed to have two occurences of CDATA inside one element? The same specification says only that ‟CDATA sections may occur anywhere character data may occur”. There is nothing said if multiple occurrences are allowed or disallowed.
Wikipedia suggests in [2] that it is OK, giving the same example of embedding ]]> inside CDATA. There is no hints in Talk page that this solution doesn't work for someone.
In other example [3] there is explicitly stated that: ‟the [...] application shouldn't care about the difference between abc and <![CDATA[abc]]> and <![CDATA[a]]><![CDATA[bc]]>”.

Last but not least: using following schema:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"\>
<xs:element name="foo"/>
</xs:schema>

following XML file:

<?xml version="1.0" ?>
<foo>
<![CDATA[]]]]><![CDATA[>]]>
</foo>

validates correctly with xmllint:
$ xmllint -noout --schema schema.xsd t.xml
t.xml validates

I hope it dissolves your concerns.

PS. I noticed I missed one ] in provided patch. There should be four of them in second parameter of replace.

[1] http://www.w3.org/TR/REC-xml/#sec-cdata-sect
[2] http://en.wikipedia.org/wiki/CDATA#Nesting
[3] http://oxygenxml.com/archives/xsl-list/200502/msg00787.html

@arturcz
Copy link
Mannequin Author

arturcz mannequin commented Feb 23, 2014

Proper patch with tests available in remote hg repo attached to this comment.

@loewis
Copy link
Mannequin

loewis mannequin commented Feb 23, 2014

Artur: Please provide the follwing information (in this bug report, or in any other bug report you create in the future)

  1. this is what I did
  2. this is what happened
  3. this is what should have happened instead
    In this case, a Python script that ought to work but currently fails would be appreciated.

I fail to see how "two CDATA sections in one element" are related to "support for ]]> inside CDATA". In your example, there is no ]]> inside CDATA, just two subsequent CDATA sections.

@arturcz
Copy link
Mannequin Author

arturcz mannequin commented Feb 23, 2014

Martin, the exact information you need are:

  1. this is what I did:
#!/usr/bin/env python
import unittest
import xmlrunner

class Foo(unittest.TestCase):
    def testFoo(self):
        self.assertTrue(False, ']]>')

unittest.main(testRunner=xmlrunner.XMLTestRunner(output='test-reports'))
  1. this is what happened:
    arturcz@szczaw:/tmp$ ./cdata.py

Running tests...
----------------------------------------------------------------------
F
======================================================================
FAIL [0.000s]: testFoo (main.Foo)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "./cdata.py", line 7, in testFoo
    self.assertTrue(False, ']]>')
AssertionError: ]]>

Ran 1 test in 0.001s

FAILED (failures=1)

Generating XML reports...
Traceback (most recent call last):
  File "./cdata.py", line 9, in <module>
    unittest.main(testRunner=xmlrunner.XMLTestRunner(output='test-reports'))
  File "/usr/lib/python2.7/unittest/main.py", line 95, in __init__
    self.runTests()
  File "/usr/lib/python2.7/unittest/main.py", line 232, in runTests
    self.result = testRunner.run(self.test)
  File "/usr/lib/python2.7/dist-packages/xmlrunner/__init__.py", line 415, in run
    result.generate_reports(self)
  File "/usr/lib/python2.7/dist-packages/xmlrunner/__init__.py", line 312, in generate_reports
    xml_content = doc.toprettyxml(indent='\t')
  File "/usr/lib/python2.7/xml/dom/minidom.py", line 58, in toprettyxml
    self.writexml(writer, "", indent, newl, encoding)
  File "/usr/lib/python2.7/xml/dom/minidom.py", line 1749, in writexml
    node.writexml(writer, indent, addindent, newl)
  File "/usr/lib/python2.7/xml/dom/minidom.py", line 814, in writexml
    node.writexml(writer, indent+addindent, addindent, newl)
  File "/usr/lib/python2.7/xml/dom/minidom.py", line 814, in writexml
    node.writexml(writer, indent+addindent, addindent, newl)
  File "/usr/lib/python2.7/xml/dom/minidom.py", line 814, in writexml
    node.writexml(writer, indent+addindent, addindent, newl)
  File "/usr/lib/python2.7/xml/dom/minidom.py", line 1150, in writexml
    raise ValueError("']]>' not allowed in a CDATA section")
ValueError: ']]>' not allowed in a CDATA section

and empty directory test-reports has been created.

  1. this is what should have happened instead:
    arturcz@szczaw:/tmp$ ./cdata.py

Running tests...
----------------------------------------------------------------------
F
======================================================================
FAIL [0.000s]: testFoo (main.Foo)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "./cdata.py", line 7, in testFoo
    self.assertTrue(False, ']]>')
AssertionError: ]]>

Ran 1 test in 0.001s

FAILED (failures=1)

Generating XML reports...

and file test-reports/TEST-Foo-${timestamp}.xml is created with following content:
<?xml version="1.0" ?>
<testsuite errors="0" failures="1" name="Foo-20140223203423" tests="1" time="0.000">
        <testcase classname="Foo" name="testFoo" time="0.000">
                <failure message="]]&gt;" type="AssertionError">
<![CDATA[Traceback (most recent call last):
  File "./cdata.py", line 7, in testFoo
    self.assertTrue(False, ']]]]><![CDATA[>')
AssertionError: ]]]]><![CDATA[>
]]>             </failure>
        </testcase>
        <system-out>
<![CDATA[]]>    </system-out>
        <system-err>
<![CDATA[]]>    </system-err>
</testsuite>

however, on the level of minidom.py module, there is an exact test provided in attached repository.

PS. I removed the patch by purpose - it's wrong and someone could be misleaded by it. The correct solution I propose is in the attached repository.

@loewis
Copy link
Mannequin

loewis mannequin commented Feb 23, 2014

I fail to see the bug in Python. minidom is behaving correctly. The error is in xmlrunner, which does

  error_info = str(test_result.get_error_info())
  failureText = xml_document.createCDATASection(error_info)

This is incorrect - it would have to check that error_info does not contain ]]> (since CDATA sections must not contain ]]>). If it finds ]]> in the error_info, it would have to create two CDATA sections, e.g. one up to and including ]], and the second one starting at > (repeated if there is more than one occurrence of ]]> in error_info.

Alternatively, it should just create a text node, since writing a text node will properly escape all special characters in error_info.

@arturcz
Copy link
Mannequin Author

arturcz mannequin commented Feb 23, 2014

Martin,
I partially agree with you.
After rethinking the issue I agree that changing the behavior of createCDATASection maybe is not a good idea.
On the other hand anyone in need of adding ]]> into CDATA must write a few lines of code, which will be essentially the same for each application. This part of code could be embedded into minidom module and it could be provided to each needing party. Please consider extending the interface of minidom module with method creating CDATA sections regardless of the provided data, splitting it into two CDATA if necessary.

If you disagree, feel free to close this issue.

@loewis
Copy link
Mannequin

loewis mannequin commented Feb 23, 2014

I don't think your proposal can be implemented. createXYZ, in DOM, only creates a single node, which is not yet attached at all into a tree. So if you would want a convenience function to create multiple CDATA section, the application would then still have to loop over them to insert them individually into the tree.

Personally, I would not have my application create multiple CDATA sections. Instead, I would use

if ']]>' in error_info:
    failureText = xml_document.createTextNode(error_info)
else:
    failureText = xml_document.createCDATASection(error_info)

instead. So I doubt that all application authors would really agree on a single solution to this problem.

Closing this as "won't fix".

@loewis loewis mannequin closed this as completed Feb 23, 2014
@peterotten
Copy link
Mannequin

peterotten mannequin commented Feb 24, 2014

Perhaps a look at the competition is still in order: Java silently breaks such an invalid CDATA in two, as suggested.

http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.htm says
"""
No lexical check is done on the content of a CDATA section and it is therefore possible to have the character sequence "]]>" in the content, which is illegal in a CDATA section per section 2.7 of [XML 1.0]. The presence of this character sequence must generate a fatal error during serialization or the cdata section must be splitted before the serialization (see also the parameter "split-cdata-sections" in the DOMConfiguration interface).
"""

The change Artur suggested would be covered by this.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-XML type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

1 participant