Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apostrophe is not replace with ' ElementTree.tostring (also in Element.write) #72086

Closed
fruch mannequin opened this issue Aug 30, 2016 · 3 comments
Closed

Apostrophe is not replace with ' ElementTree.tostring (also in Element.write) #72086

fruch mannequin opened this issue Aug 30, 2016 · 3 comments
Labels
topic-XML type-bug An unexpected behavior, bug, or error

Comments

@fruch
Copy link
Mannequin

fruch mannequin commented Aug 30, 2016

BPO 27899
Nosy @scoder, @serhiy-storchaka
Superseder
  • bpo-2647: XML munges apos entity in tag content
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-08-30.18:21:18.624>
    created_at = <Date 2016-08-30.17:18:25.805>
    labels = ['expert-XML', 'type-bug']
    title = 'Apostrophe is not replace with &apos; ElementTree.tostring (also in Element.write)'
    updated_at = <Date 2016-09-10.05:13:24.830>
    user = 'https://bugs.python.org/fruch'

    bugs.python.org fields:

    activity = <Date 2016-09-10.05:13:24.830>
    actor = 'scoder'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-08-30.18:21:18.624>
    closer = 'serhiy.storchaka'
    components = ['XML']
    creation = <Date 2016-08-30.17:18:25.805>
    creator = 'fruch'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 27899
    keywords = []
    message_count = 3.0
    messages = ['273937', '273941', '275569']
    nosy_count = 3.0
    nosy_names = ['scoder', 'fruch', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '2647'
    type = 'behavior'
    url = 'https://bugs.python.org/issue27899'
    versions = ['Python 2.7', 'Python 3.4']

    @fruch
    Copy link
    Mannequin Author

    fruch mannequin commented Aug 30, 2016

    Both on python2.7 and python3.4
    >>> from xml.etree import cElementTree as ET
    >>> text = '<end>its &gt; &lt; &amp; &apos;</end>'
    >>> root = ET.fromstring(text.encode('utf-8'))
    >>> ET.tostring(root, method="xml")
    <end>its &gt; &lt; &amp; '</end>

    I would expected to return the same as the input to be a complient XML 1.0

    I would understand why for html it would return something diffrent, see:
    http://stackoverflow.com/questions/2083754/why-shouldnt-apos-be-used-to-escape-single-quotes

    as a workaround I had to path ElementTree:

    from xml.etree.ElementTree import _escape_cdata ,_raise_serialization_error
    from mock import patch
    
    def _escape_cdata(text):
        # escape character data
        try:
            # it's worth avoiding do-nothing calls for strings that are
            # shorter than 500 character, or so.  assume that's, by far,
            # the most common case in most applications.
            if "&" in text:
                text = text.replace("&", "&amp;")
            if "<" in text:
                text = text.replace("<", "&lt;")
            if ">" in text:
                text = text.replace(">", "&gt;")
            if "'" in text:
                text = text.replace("'", "&apos;")
            return text
        except (TypeError, AttributeError):
            _raise_serialization_error(text)
    
    from xml.etree import cElementTree as ET

    text = '<end>its > < & '</end>'
    root = ET.fromstring(text.encode('utf-8'))

    with patch('xml.etree.ElementTree._escape_cdata', new=_escape_cdata):
    
        s = ET.tostring(root, encoding='unicode', method="xml")
    print(s)

    @fruch fruch mannequin added topic-XML type-bug An unexpected behavior, bug, or error labels Aug 30, 2016
    @fruch
    Copy link
    Mannequin Author

    fruch mannequin commented Aug 30, 2016

    I've now found http://bugs.python.org/issue2647, and seem like this was classify as not a bug.

    maybe documetion should say it ? or anther way to actuly decide about how to output those

    @scoder
    Copy link
    Contributor

    scoder commented Sep 10, 2016

    Definitely not a bug since this isn't required by the XML spec. As said in bpo-2647, you shouldn't rely on exact lexical characteristics of an XML byte stream, unless you request canonical serialisation (C14N).

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-XML type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants