Skip to content

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Jul 4, 2025

  • Whitespaces no longer accepted between </ and the tag name. E.g. </ script> does not end the script section.

  • Vertical tabulation (\v) and non-ASCII whitespaces no longer recognized as whitespaces. The only whitespaces are \t\n\r\f .

  • Null character (U+0000) no longer ends the tag name.

  • Attributes and slashes after the tag name in end tags are now ignored, instead of terminating after the first > in quoted attribute value. E.g. </script/foo=">"/>.

  • Multiple slashes and whitespaces between the last attribute and closing > are now ignored in both start and end tags. E.g. <a foo=bar/ //>.

  • Multiple = between attribute name and value are no longer collapsed. E.g. <a foo==bar> produces attribute "foo" with value "=bar".

  • Whitespaces between the = separator and attribute name or value are no longer ignored. E.g. <a foo =bar> produces two attributes "foo" and "=bar", both with value None; <a foo= bar> produces two attributes: "foo" with value "" and "bar" with value None.

  • Fix data loss after unclosed script or style tag (htmlparser unclosed script tag causes data loss #86155).

Also backport test.support.subTests() (gh-135120).


(cherry picked from commit 0243f97)

…according to the HTML5 standard (pythonGH-135930)

* Whitespaces no longer accepted between `</` and the tag name.
  E.g. `</ script>` does not end the script section.

* Vertical tabulation (`\v`) and non-ASCII whitespaces no longer recognized
  as whitespaces. The only whitespaces are `\t\n\r\f `.

* Null character (U+0000) no longer ends the tag name.

* Attributes and slashes after the tag name in end tags are now ignored,
  instead of terminating after the first `>` in quoted attribute value.
  E.g. `</script/foo=">"/>`.

* Multiple slashes and whitespaces between the last attribute and closing `>`
  are now ignored in both start and end tags. E.g. `<a foo=bar/ //>`.

* Multiple `=` between attribute name and value are no longer collapsed.
  E.g. `<a foo==bar>` produces attribute "foo" with value "=bar".

* Whitespaces between the `=` separator and attribute name or value are no
  longer ignored. E.g. `<a foo =bar>` produces two attributes "foo" and
  "=bar", both with value None; `<a foo= bar>` produces two attributes:
  "foo" with value "" and "bar" with value None.

* Fix data loss after unclosed script or style tag (pythongh-86155).

Also backport test.support.subTests() (pythongh-135120).

---------
(cherry picked from commit 0243f97)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Co-authored-by: Waylan Limberg <waylan.limberg@icloud.com>
@ambv ambv merged commit c555f88 into python:3.12 Jul 4, 2025
35 checks passed
@miss-islington-app
Copy link

Thanks @serhiy-storchaka for the PR, and @ambv for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9, 3.10, 3.11.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jul 4, 2025
…according to the HTML5 standard (pythonGH-135930) (pythonGH-136268)

* Whitespaces no longer accepted between `</` and the tag name.
  E.g. `</ script>` does not end the script section.

* Vertical tabulation (`\v`) and non-ASCII whitespaces no longer recognized
  as whitespaces. The only whitespaces are `\t\n\r\f `.

* Null character (U+0000) no longer ends the tag name.

* Attributes and slashes after the tag name in end tags are now ignored,
  instead of terminating after the first `>` in quoted attribute value.
  E.g. `</script/foo=">"/>`.

* Multiple slashes and whitespaces between the last attribute and closing `>`
  are now ignored in both start and end tags. E.g. `<a foo=bar/ //>`.

* Multiple `=` between attribute name and value are no longer collapsed.
  E.g. `<a foo==bar>` produces attribute "foo" with value "=bar".

* Whitespaces between the `=` separator and attribute name or value are no
  longer ignored. E.g. `<a foo =bar>` produces two attributes "foo" and
  "=bar", both with value None; `<a foo= bar>` produces two attributes:
  "foo" with value "" and "bar" with value None.

* Fix data loss after unclosed script or style tag (pythongh-86155).

Also backport test.support.subTests() (pythongh-135120).

---------
(cherry picked from commit 0243f97)
(cherry picked from commit c555f88)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Co-authored-by: Waylan Limberg <waylan.limberg@icloud.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jul 4, 2025
…according to the HTML5 standard (pythonGH-135930) (pythonGH-136268)

* Whitespaces no longer accepted between `</` and the tag name.
  E.g. `</ script>` does not end the script section.

* Vertical tabulation (`\v`) and non-ASCII whitespaces no longer recognized
  as whitespaces. The only whitespaces are `\t\n\r\f `.

* Null character (U+0000) no longer ends the tag name.

* Attributes and slashes after the tag name in end tags are now ignored,
  instead of terminating after the first `>` in quoted attribute value.
  E.g. `</script/foo=">"/>`.

* Multiple slashes and whitespaces between the last attribute and closing `>`
  are now ignored in both start and end tags. E.g. `<a foo=bar/ //>`.

* Multiple `=` between attribute name and value are no longer collapsed.
  E.g. `<a foo==bar>` produces attribute "foo" with value "=bar".

* Whitespaces between the `=` separator and attribute name or value are no
  longer ignored. E.g. `<a foo =bar>` produces two attributes "foo" and
  "=bar", both with value None; `<a foo= bar>` produces two attributes:
  "foo" with value "" and "bar" with value None.

* Fix data loss after unclosed script or style tag (pythongh-86155).

Also backport test.support.subTests() (pythongh-135120).

---------
(cherry picked from commit 0243f97)
(cherry picked from commit c555f88)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Co-authored-by: Waylan Limberg <waylan.limberg@icloud.com>
@bedevere-app
Copy link

bedevere-app bot commented Jul 4, 2025

GH-136291 is a backport of this pull request to the 3.11 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.11 only security fixes label Jul 4, 2025
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jul 4, 2025
…according to the HTML5 standard (pythonGH-135930) (pythonGH-136268)

* Whitespaces no longer accepted between `</` and the tag name.
  E.g. `</ script>` does not end the script section.

* Vertical tabulation (`\v`) and non-ASCII whitespaces no longer recognized
  as whitespaces. The only whitespaces are `\t\n\r\f `.

* Null character (U+0000) no longer ends the tag name.

* Attributes and slashes after the tag name in end tags are now ignored,
  instead of terminating after the first `>` in quoted attribute value.
  E.g. `</script/foo=">"/>`.

* Multiple slashes and whitespaces between the last attribute and closing `>`
  are now ignored in both start and end tags. E.g. `<a foo=bar/ //>`.

* Multiple `=` between attribute name and value are no longer collapsed.
  E.g. `<a foo==bar>` produces attribute "foo" with value "=bar".

* Whitespaces between the `=` separator and attribute name or value are no
  longer ignored. E.g. `<a foo =bar>` produces two attributes "foo" and
  "=bar", both with value None; `<a foo= bar>` produces two attributes:
  "foo" with value "" and "bar" with value None.

* Fix data loss after unclosed script or style tag (pythongh-86155).

Also backport test.support.subTests() (pythongh-135120).

---------
(cherry picked from commit 0243f97)
(cherry picked from commit c555f88)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Co-authored-by: Waylan Limberg <waylan.limberg@icloud.com>
@bedevere-app
Copy link

bedevere-app bot commented Jul 4, 2025

GH-136292 is a backport of this pull request to the 3.10 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.10 only security fixes label Jul 4, 2025
@bedevere-app
Copy link

bedevere-app bot commented Jul 4, 2025

GH-136293 is a backport of this pull request to the 3.9 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.9 only security fixes label Jul 4, 2025
ambv pushed a commit that referenced this pull request Jul 12, 2025
…ing to the HTML5 standard (GH-135930) (GH-136268) (#136291)

* Whitespaces no longer accepted between `</` and the tag name.
  E.g. `</ script>` does not end the script section.

* Vertical tabulation (`\v`) and non-ASCII whitespaces no longer recognized
  as whitespaces. The only whitespaces are `\t\n\r\f `.

* Null character (U+0000) no longer ends the tag name.

* Attributes and slashes after the tag name in end tags are now ignored,
  instead of terminating after the first `>` in quoted attribute value.
  E.g. `</script/foo=">"/>`.

* Multiple slashes and whitespaces between the last attribute and closing `>`
  are now ignored in both start and end tags. E.g. `<a foo=bar/ //>`.

* Multiple `=` between attribute name and value are no longer collapsed.
  E.g. `<a foo==bar>` produces attribute "foo" with value "=bar".

* Whitespaces between the `=` separator and attribute name or value are no
  longer ignored. E.g. `<a foo =bar>` produces two attributes "foo" and
  "=bar", both with value None; `<a foo= bar>` produces two attributes:
  "foo" with value "" and "bar" with value None.

* Fix data loss after unclosed script or style tag (gh-86155).

Also backport test.support.subTests() (gh-135120).

---------
(cherry picked from commit 0243f97)
(cherry picked from commit c555f88)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Co-authored-by: Waylan Limberg <waylan.limberg@icloud.com>
ambv pushed a commit that referenced this pull request Jul 12, 2025
…ing to the HTML5 standard (GH-135930) (GH-136268) (#136292)

* Whitespaces no longer accepted between `</` and the tag name.
  E.g. `</ script>` does not end the script section.

* Vertical tabulation (`\v`) and non-ASCII whitespaces no longer recognized
  as whitespaces. The only whitespaces are `\t\n\r\f `.

* Null character (U+0000) no longer ends the tag name.

* Attributes and slashes after the tag name in end tags are now ignored,
  instead of terminating after the first `>` in quoted attribute value.
  E.g. `</script/foo=">"/>`.

* Multiple slashes and whitespaces between the last attribute and closing `>`
  are now ignored in both start and end tags. E.g. `<a foo=bar/ //>`.

* Multiple `=` between attribute name and value are no longer collapsed.
  E.g. `<a foo==bar>` produces attribute "foo" with value "=bar".

* Whitespaces between the `=` separator and attribute name or value are no
  longer ignored. E.g. `<a foo =bar>` produces two attributes "foo" and
  "=bar", both with value None; `<a foo= bar>` produces two attributes:
  "foo" with value "" and "bar" with value None.

* Fix data loss after unclosed script or style tag (gh-86155).

Also backport test.support.subTests() (gh-135120).

---------
(cherry picked from commit 0243f97)
(cherry picked from commit c555f88)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Co-authored-by: Waylan Limberg <waylan.limberg@icloud.com>
ambv pushed a commit that referenced this pull request Jul 12, 2025
…ng to the HTML5 standard (GH-135930) (GH-136268) (#136293)

* Whitespaces no longer accepted between `</` and the tag name.
  E.g. `</ script>` does not end the script section.

* Vertical tabulation (`\v`) and non-ASCII whitespaces no longer recognized
  as whitespaces. The only whitespaces are `\t\n\r\f `.

* Null character (U+0000) no longer ends the tag name.

* Attributes and slashes after the tag name in end tags are now ignored,
  instead of terminating after the first `>` in quoted attribute value.
  E.g. `</script/foo=">"/>`.

* Multiple slashes and whitespaces between the last attribute and closing `>`
  are now ignored in both start and end tags. E.g. `<a foo=bar/ //>`.

* Multiple `=` between attribute name and value are no longer collapsed.
  E.g. `<a foo==bar>` produces attribute "foo" with value "=bar".

* Whitespaces between the `=` separator and attribute name or value are no
  longer ignored. E.g. `<a foo =bar>` produces two attributes "foo" and
  "=bar", both with value None; `<a foo= bar>` produces two attributes:
  "foo" with value "" and "bar" with value None.

* Fix data loss after unclosed script or style tag (gh-86155).

Also backport test.support.subTests() (gh-135120).

---------
(cherry picked from commit 0243f97)
(cherry picked from commit c555f88)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Co-authored-by: Waylan Limberg <waylan.limberg@icloud.com>
gentoo-bot pushed a commit to gentoo/cpython that referenced this pull request Jul 30, 2025
…ccording to the HTML5 standard (pythonGH-135930) (pythonGH-136268) (python#136293)

* Whitespaces no longer accepted between `</` and the tag name.
  E.g. `</ script>` does not end the script section.

* Vertical tabulation (`\v`) and non-ASCII whitespaces no longer recognized
  as whitespaces. The only whitespaces are `\t\n\r\f `.

* Null character (U+0000) no longer ends the tag name.

* Attributes and slashes after the tag name in end tags are now ignored,
  instead of terminating after the first `>` in quoted attribute value.
  E.g. `</script/foo=">"/>`.

* Multiple slashes and whitespaces between the last attribute and closing `>`
  are now ignored in both start and end tags. E.g. `<a foo=bar/ //>`.

* Multiple `=` between attribute name and value are no longer collapsed.
  E.g. `<a foo==bar>` produces attribute "foo" with value "=bar".

* Whitespaces between the `=` separator and attribute name or value are no
  longer ignored. E.g. `<a foo =bar>` produces two attributes "foo" and
  "=bar", both with value None; `<a foo= bar>` produces two attributes:
  "foo" with value "" and "bar" with value None.

* Fix data loss after unclosed script or style tag (pythongh-86155).

Also backport test.support.subTests() (pythongh-135120).

---------
(cherry picked from commit 0243f97)
(cherry picked from commit c555f88)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Co-authored-by: Waylan Limberg <waylan.limberg@icloud.com>
Signed-off-by: Michał Górny <mgorny@gentoo.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-security A security issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants