Skip to content

Add option to validate ElementTree during serialization #149468

@serhiy-storchaka

Description

@serhiy-storchaka

Feature or enhancement

Proposal:

ElementTree can be serialized to XML and HTML formats. While special characters like & and < are escaped in text and attribute values, there is no way to escape arbitrary characters in element and attribute names, comments, processing instructions and HTML elements like <script>. Also, not all characters can be represented in XML and HTML, for example the null character cannot.

This is usually not a problem, because the structure of the element tree is usually hardcoded and variable parts have reasonable values. But if the element tree is constructed using arbitrary user data, it is possible to inject some elements. For example, if the comment is '--><tag>...</tag><!--'.

It is considered the user's responsibility to ensure that the element tree is valid. But we can help them. The proposed PR add the validate option to serialization functions which enables validation if true. It is the option of serialization because rules are different for XML and HTML.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibStandard Library Python modules in the Lib/ directorytopic-XMLtype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions