Skip to content

Processing non-ascii tags and attributes in HTMLParser #141756

@serhiy-storchaka

Description

@serhiy-storchaka

Bug report

html.parser.HTMLParser convert names of tags and attributes to lower case. But the HTML5 specification only prescripts converting ASCII upper alpha characters to lower case.

There are some non-ASCII characters which are converted to ASCII lowercase characters (e.g. "ß" -> "ss", "K" (U+212A) -> "k", "ſ" -> "s"). They will be parsed differently by HTMLParser and any other parser or browser.

Linked PRs

Metadata

Metadata

Labels

3.13bugs and security fixes3.14bugs and security fixes3.15new features, bugs and security fixesstdlibStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or error

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions