Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minidom does not have to escape quote inside text segments #81555

Closed
mitar mannequin opened this issue Jun 22, 2019 · 12 comments · Fixed by #107947
Closed

Minidom does not have to escape quote inside text segments #81555

mitar mannequin opened this issue Jun 22, 2019 · 12 comments · Fixed by #107947
Labels
topic-XML type-feature A feature request or enhancement

Comments

@mitar
Copy link
Mannequin

mitar mannequin commented Jun 22, 2019

BPO 37374
Nosy @scoder, @mitar, @serhiy-storchaka, @viplezer
PRs
  • bpo-37374: Do not escape quotes in minidom inside text segments #14312
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-06-22.20:34:20.473>
    labels = ['expert-XML', 'type-feature']
    title = 'Minidom does not have to escape quote inside text segments'
    updated_at = <Date 2021-03-10.10:53:53.152>
    user = 'https://github.com/mitar'

    bugs.python.org fields:

    activity = <Date 2021-03-10.10:53:53.152>
    actor = 'viplezer'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['XML']
    creation = <Date 2019-06-22.20:34:20.473>
    creator = 'mitar'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 37374
    keywords = ['patch']
    message_count = 8.0
    messages = ['346296', '346310', '346341', '346350', '346892', '346894', '388423', '388424']
    nosy_count = 5.0
    nosy_names = ['scoder', 'mitar', 'serhiy.storchaka', 'dhilst', 'viplezer']
    pr_nums = ['14312']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue37374'
    versions = ['Python 3.6']

    @mitar
    Copy link
    Mannequin Author

    mitar mannequin commented Jun 22, 2019

    I am using Minidom to pretty-print XML. But currently if there is a quote inside a text segment it escapes it to &quot;

    To my understanding this is unnecessary if all other symbols are escaped. This escaping makes it really ugly and defeats the purpose of me using Minidom for pretty-printing XML.

    @mitar mitar mannequin added topic-XML type-feature A feature request or enhancement labels Jun 22, 2019
    @dhilst
    Copy link
    Mannequin

    dhilst mannequin commented Jun 23, 2019

    Wouldn't be better to support this as a parameter? Escaping is pretty useful in HTML contexts

    @mitar
    Copy link
    Mannequin Author

    mitar mannequin commented Jun 23, 2019

    FYI, this is exactly how ElementTree.tostring does it. So this would make ElementTree.tostring behave the same as minidom.

    @dhilst: I do not think a parameter is needed here. This is completely compatible with HTML. It is just that currently an additional unnecessary escaping (for both XML in HTML context) is done.

    @dhilst
    Copy link
    Mannequin

    dhilst mannequin commented Jun 24, 2019

    This changes behavior in an irreversible way. A parameter would make it reversible.

    @mitar
    Copy link
    Mannequin Author

    mitar mannequin commented Jun 29, 2019

    Sure, but is old behavior useful in any use case? Every bugfix changes old behavior in an irreversible way.

    So in which use case you want the old behavior? Can you elaborate here?

    @dhilst
    Copy link
    Mannequin

    dhilst mannequin commented Jun 29, 2019

    Not really, I don't have a use case here. I'm just warning that this would break user code that relies on old behavior.

    Anyway is possible to add new behavior without changing the old one. A parameter would make this possible, for example.

    This is only my opinion, let's other argue too

    @viplezer
    Copy link
    Mannequin

    viplezer mannequin commented Mar 10, 2021

    Almost 2 years later I only registered to agree with Daniel. This is extremely annoying.

    Use case: I am generating XMLs for Apigee, where my conditions contain quotes and there's no need to escape them.

    @viplezer
    Copy link
    Mannequin

    viplezer mannequin commented Mar 10, 2021

    Sorry, to agree with @mitar

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @Swarf
    Copy link

    Swarf commented Aug 12, 2023

    Any chance we can take action on this? I agree with the concern. XML standard says that escaping quotes in content data is optional. minidom is forcing a change to perfectly valid XML.

    @scoder
    Copy link
    Contributor

    scoder commented Aug 13, 2023

    I read the spec, and this case is actually not explicitly specified. The spec says that &quot; may be used in order to support quotes in attribute values, but it does not forbid it otherwise. So, this behaviour is not wrong.

    Also, the fact that other libraries behave differently does not mean that minidom must. It's been around for more than 20 years, and the current behaviour is probably just as old and settled.

    According to the comments above, there seem to be use cases where this can be an annoyance, apparently, but even there, the result is not wrong, just a bit less readable than it could be.

    While only C14N makes guarantees w.r.t. the exact byte serialisation (so neither minidom not ElementTree need to), any change here will break someone's code out there, so there is a downside to changing the output. A deliberate change without need means deliberate breakage without need.

    To me, there is no definite reason to change the current behaviour.

    As much as I second the preference for clean XML output, I'd rather like to avoid changing the output without a compelling reason.

    @scoder scoder closed this as completed Aug 13, 2023
    @Swarf
    Copy link

    Swarf commented Aug 13, 2023

    The spec also doesn't say that a serializer shouldn't turn all. the Zs into Ys, but I'm pretty sure we would all consider that a bug. If user data would be valid XML, why modify it regardless of user intent?

    If the answer is simply "go use a different library which doesn't have an opinion on this", then I will. Just disappointed that minidom isn't usable at present for cases where a user cares about quotes in text.

    @scoder
    Copy link
    Contributor

    scoder commented Aug 15, 2023

    Reopening since #107947 gives us a reason to change the output, so we can solve this issue along the way.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-XML type-feature A feature request or enhancement
    Projects
    None yet
    Development

    Successfully merging a pull request may close this issue.

    2 participants