Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

toxml generates output that is not well formed #45731

Closed
drtomc mannequin opened this issue Nov 5, 2007 · 21 comments
Closed

toxml generates output that is not well formed #45731

drtomc mannequin opened this issue Nov 5, 2007 · 21 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@drtomc
Copy link
Mannequin

drtomc mannequin commented Nov 5, 2007

BPO 1390
Nosy @loewis
Files
  • bug.py
  • minidom_comment.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/loewis'
    closed_at = <Date 2008-05-23.15:19:08.001>
    created_at = <Date 2007-11-05.00:58:05.520>
    labels = ['type-bug', 'library']
    title = 'toxml generates output that is not well formed'
    updated_at = <Date 2008-05-23.15:19:07.960>
    user = 'https://bugs.python.org/drtomc'

    bugs.python.org fields:

    activity = <Date 2008-05-23.15:19:07.960>
    actor = 'loewis'
    assignee = 'loewis'
    closed = True
    closed_date = <Date 2008-05-23.15:19:08.001>
    closer = 'loewis'
    components = ['Library (Lib)']
    creation = <Date 2007-11-05.00:58:05.520>
    creator = 'drtomc'
    dependencies = []
    files = ['8692', '9418']
    hgrepos = []
    issue_num = 1390
    keywords = ['patch']
    message_count = 21.0
    messages = ['57115', '57118', '57119', '57120', '57140', '57143', '57144', '57184', '57187', '57188', '57189', '57190', '57191', '57192', '57195', '57278', '62329', '62331', '64104', '64106', '67247']
    nosy_count = 5.0
    nosy_names = ['loewis', 'jafo', '_doublep', 'drtomc', 'vdupras']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue1390'
    versions = ['Python 2.5']

    @drtomc drtomc mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Nov 5, 2007
    @drtomc
    Copy link
    Mannequin Author

    drtomc mannequin commented Nov 5, 2007

    The attached script yields a non-well-formed xml document.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 5, 2007

    That's not a bug in Python, but in your script. You should not pass such
    a string to createComment.

    @drtomc
    Copy link
    Mannequin Author

    drtomc mannequin commented Nov 5, 2007

    Either it is a bug in the DOM implementation, which should reject
    comments containing -->, a bug in toxml() which should refuse to
    serialize unserializable documents, or it is a bug in the documentation.

    cheers,
    Tom

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 5, 2007

    It's not a bug in the DOM implementation, as createCommment does not
    specify an exception in this case. It may be a bug in the W3 DOM
    specification; please report that to the W3 consortium.

    It's not a bug in toxml, which should always serialize the DOM tree if
    possible.

    As for a bug in the documentation: can you propose a change to the
    documentation that would make you happy?

    @drtomc
    Copy link
    Mannequin Author

    drtomc mannequin commented Nov 5, 2007

    Hi Martin,

    You write:
    It's not a bug in toxml, which should always serialize the
    DOM tree if possible.

    Right! In this case it is *not* possible. The generated serialization is
    not a well formed XML document.

    Having just consulted the DOM technical reports, I see that
    createComment is specified as not generating any exceptions, so although
    it would be quite a sensible place to check that one was not creating an
    insane document, probably one should not do the check there. I think
    you're right that this *is* a bug in DOM, and I will report it there.

    Having said that, I still think that toxml should throw an exception. In
    general, if toxml succeeds, I would expect to be able to parse the result.

    I can propose a doco change, but I think such would only be a partial
    solution. Something like the following addition to the description for
    createComment

    Note that comments containing the string C{-->} may make the document
    unserializable.

    cheers,
    Tom

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 5, 2007

    I'm not willing to change minidom unless there is precedence of how to
    deal with this case. So can you find DOM implementations in other
    languages that meet the DOM spec an still reject your code?

    @drtomc
    Copy link
    Mannequin Author

    drtomc mannequin commented Nov 5, 2007

    Hi Martin,

    toxml() is not part of the DOM, so it could be changed to throw an
    exception.

    However, I suggest doing nothing, for the moment - I've posted to the
    dom mailing list at w3, so I'll see what wisdom we get from its members.

    cheers,
    Tom

    @drtomc
    Copy link
    Mannequin Author

    drtomc mannequin commented Nov 6, 2007

    The W3 guys had some information that helps.

    The DOM3 Core specification contains the following

    No lexical check is done on the content of a comment and it is
    therefore possible to have the character sequence "--"
    (double-hyphen) in the content, which is illegal in a comment
    per section 2.5 of [XML 1.0]. The presence of this character
    sequence must generate a fatal error during serialization.

    This suggest that toxml is does not comply with DOM3 at any rate.

    cheers,
    Tom

    @doublep
    Copy link
    Mannequin

    doublep mannequin commented Nov 7, 2007

    Looks like bad design on W3 part: postponing an error happening, though
    it wouldn't be difficult to check right in createComment(). But I guess
    minidom should still be changed to conform to standard.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 7, 2007

    Would anybody want to provide a patch, then?

    @drtomc
    Copy link
    Mannequin Author

    drtomc mannequin commented Nov 7, 2007

    FWIW, the DOM guys considered mandating a check in createComment, but
    decided that the performance penalty was too great. I'm not sure I
    agree with them, but there you have it.

    Here are links to my query about the issue:
    http://lists.w3.org/Archives/Public/www-dom/2007OctDec/0017.html
    http://lists.w3.org/Archives/Public/www-dom/2007OctDec/0018.html

    @doublep
    Copy link
    Mannequin

    doublep mannequin commented Nov 7, 2007

    Well, it seems that allows createComment() in minidom to raise something
    implementation/language specific. I personally would prefer this (e.g.
    a ValueError) instead of raising on serialization step, as I prefer
    early error checks, when these checks obviously relate to some place in
    the code. In any way, I think that either createComment() should raise
    or toxml(), but generating non-well formed output is a bug.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 7, 2007

    It may the intention that these functions may raise exceptions in other
    cases as well - but can you also support that possibility from the text
    of the DOM spec itself?

    Adding an exception there may break existing applications, which already
    have except clauses to deal with all "possible" exceptions; now they
    break if additional exceptions become possible.

    @drtomc
    Copy link
    Mannequin Author

    drtomc mannequin commented Nov 7, 2007

    I think the specification is reasonably clear: createComment may not
    throw an exception. The serializer must throw an exception. (Personally,
    I think they have it round the wrong way - every time you write a
    serializer you have to write code to do the check; if it was in
    createComment, you'd only have to do it once. Never mind!)

    The problem of compatibility is, as always, a nasty one: whether or not
    to potentially break code that previously worked.

    In this case, I think modifying toxml (and the other serializing
    functions in the same library) to throw an exception is pretty unlikely
    to break existing code. The *only* way to trigger the error is if you
    call createComment with bad text. Moreover, the programs which
    "succeeded" before which now fail were almost certainly producing wrong
    output before, which if it did not break downstream processing, would at
    least produce strange bits of extra character data.

    If the library is changed to throw an exception, at least it will alert
    the author/maintainer to the problem.

    I would estimate the expected number of programs to be broken by such a
    change to be about 0. :-)

    This is certainly not the first time in the history of software
    development the break or not to break issue has come up. Is there a
    precedent in the python libraries for how to deal with this kind of
    issue? Can we add a quickAndBuggy = True default parameter to toxml,
    then in a couple of releases make it mandatory, then in a couple of
    further releases remove it and the old behaviour?

    cheers,
    Tom

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 7, 2007

    The standard procedure for an incompatible change would be to add such a
    parameter to 2.6, and then change the default behavior in 2.7 (or rather
    3.1). Of course, people will not notice the change in 2.6, and then be
    surprised as much about the default change in 2.7, assuming this problem
    arises in some application (and this issue is proof that the problem
    does arise in real life - unless drtomc brought up an artificial problem).

    @doublep
    Copy link
    Mannequin

    doublep mannequin commented Nov 8, 2007

    I think unexpected exception in toxml() is not worse than producing
    unreadable output. With createComment() it is different, since you can
    e.g. create a temporary DOM tree only to discard it later and never
    serialize.

    @vdupras
    Copy link
    Mannequin

    vdupras mannequin commented Feb 12, 2008

    I wanted to start contributing to python for quite a while, so here's my very
    first try (cleaning out old patchless open tickets).

    So, whatever is the final decision on this, here's a patch.

    CDATASection.writexml() already raises ValueError when finding invalid data,
    so it seems consistent to me to extend the behavior to Comment.writexml()

    note: I can't add the "patch" keyword myself?

    @drtomc
    Copy link
    Mannequin Author

    drtomc mannequin commented Feb 12, 2008

    On Feb 13, 2008 6:27 AM, Virgil Dupras <report@bugs.python.org> wrote:

    CDATASection.writexml() already raises ValueError when finding invalid data,
    so it seems consistent to me to extend the behavior to Comment.writexml()

    That looks fine to me.

    @jafo
    Copy link
    Mannequin

    jafo mannequin commented Mar 19, 2008

    Martin: What do you think of this patch?

    @jafo jafo mannequin assigned loewis Mar 19, 2008
    @drtomc
    Copy link
    Mannequin Author

    drtomc mannequin commented Mar 19, 2008

    On Thu, Mar 20, 2008 at 8:26 AM, Sean Reifschneider
    <report@bugs.python.org> wrote:

    Sean Reifschneider <jafo@tummy.com> added the comment:

    Martin: What do you think of this patch?

    Looks fine.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 23, 2008

    Thanks for the patch. Committed as r63563.

    Because of the new exception, I won't backport the change to 2.5.

    @loewis loewis mannequin closed this as completed May 23, 2008
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    0 participants