Add utility functions to test for legal XML characters and names

# Feature or enhancement

It is well known fact that some characters (such as `<` or `&`) must be escaped in XML and HTML, and that attribute values must be quoted. It is less known fact (unless you specially looked for it) that not all characters can be included in XML, even if escaped. For example, XML cannot contain the null character.

There is also restriction on names of elements and attributes. They cannot contain `<`, `>`, `/`, `!`, `?`, spaces and many other characters. But unlike to Python identifiers, `-`, `:`, `.`, etc are acceptable. The list of valid and invalid characters is pretty long. In the case when the user input is used for element or attribute names without validation, this can even lead to XML injection vulnerability ([CVE-2025-9375](https://nvd.nist.gov/vuln/detail/CVE-2025-9375))

So, I think that it would be useful to provide standard functions to validate XML characters and names in the stdlib. `xml.sax.saxutils` looks an appropriate place, it already has `escape()` and `quoteattr()` utilities.

We can also provide functions to "sanitize" XML characters, similar to `sanitize_xml()` in `Lib/test/libregrtest/utils.py`, but more general. This is similar to old issue #63014.

Now, the problem is that there are two standards of XML: 1.0 and 1.1. The former is much more popular. And they have different definitions of legal characters. There are also restricted characters in XML 1.1 which cannot be used in "well-formed" documents and parsed entities. There is also a set of characters (version depending) using which legal but is discouraged. Should we have several functions or several parameters to specify the XML version and other options?

https://www.w3.org/TR/xml/#charsets
https://www.w3.org/TR/xml11/#charsets

Fortunately, the syntax for names is the same in XML 1.0 and 1.1.

https://www.w3.org/TR/xml/#NT-Name
https://www.w3.org/TR/xml11/#NT-Name

We can also add a similar set of functions for HTML.


### Linked PRs
* gh-139768

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add utility functions to test for legal XML characters and names #139489

Feature or enhancement

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Add utility functions to test for legal XML characters and names #139489

Description

Feature or enhancement

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions