-
-
Notifications
You must be signed in to change notification settings - Fork 33.5k
Open
Labels
3.13bugs and security fixesbugs and security fixes3.14bugs and security fixesbugs and security fixes3.15new features, bugs and security fixesnew features, bugs and security fixesstdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Description
Bug report
html.parser.HTMLParser convert names of tags and attributes to lower case. But the HTML5 specification only prescripts converting ASCII upper alpha characters to lower case.
- https://html.spec.whatwg.org/multipage/parsing.html#tag-name-state
- https://html.spec.whatwg.org/multipage/parsing.html#attribute-name-state
There are some non-ASCII characters which are converted to ASCII lowercase characters (e.g. "ß" -> "ss", "K" (U+212A) -> "k", "ſ" -> "s"). They will be parsed differently by HTMLParser and any other parser or browser.
Linked PRs
Metadata
Metadata
Assignees
Labels
3.13bugs and security fixesbugs and security fixes3.14bugs and security fixesbugs and security fixes3.15new features, bugs and security fixesnew features, bugs and security fixesstdlibStandard Library Python modules in the Lib/ directoryStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Projects
Status
Todo