Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test.test_xml_etree*.XMLPullParserTest.test_simple_xml fails with (system) expat 2.6.0 #115133

Closed
mgorny opened this issue Feb 7, 2024 · 12 comments
Labels
topic-XML type-bug An unexpected behavior, bug, or error

Comments

@mgorny
Copy link
Contributor

mgorny commented Feb 7, 2024

Bug report

Bug description:

Expat 2.6.0 was released yesterday, with CVE fixes. After upgrading the system library and building CPython --with-system-expat, I'm getting the following test failures:

======================================================================
FAIL: test_simple_xml (test.test_xml_etree.XMLPullParserTest.test_simple_xml) (chunk_size=1)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/mgorny/git/cpython/Lib/test/test_xml_etree.py", line 1495, in test_simple_xml
    self.assert_event_tags(parser, [('end', 'element')])
    ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/git/cpython/Lib/test/test_xml_etree.py", line 1480, in assert_event_tags
    self.assertEqual([(action, elem.tag) for action, elem in events],
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                     expected)
                     ^^^^^^^^^
AssertionError: Lists differ: [] != [('end', 'element')]

Second list contains 1 additional elements.
First extra element 0:
('end', 'element')

- []
+ [('end', 'element')]

======================================================================
FAIL: test_simple_xml (test.test_xml_etree.XMLPullParserTest.test_simple_xml) (chunk_size=5)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/mgorny/git/cpython/Lib/test/test_xml_etree.py", line 1498, in test_simple_xml
    self.assert_event_tags(parser, [
    ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
        ('end', 'element'),
        ^^^^^^^^^^^^^^^^^^^
        ('end', 'empty-element'),
        ^^^^^^^^^^^^^^^^^^^^^^^^^
        ])
        ^^
  File "/home/mgorny/git/cpython/Lib/test/test_xml_etree.py", line 1480, in assert_event_tags
    self.assertEqual([(action, elem.tag) for action, elem in events],
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                     expected)
                     ^^^^^^^^^
AssertionError: Lists differ: [('end', 'element')] != [('end', 'element'), ('end', 'empty-element')]

Second list contains 1 additional elements.
First extra element 1:
('end', 'empty-element')

- [('end', 'element')]
+ [('end', 'element'), ('end', 'empty-element')]

----------------------------------------------------------------------
======================================================================
FAIL: test_simple_xml (test.test_xml_etree_c.XMLPullParserTest.test_simple_xml) (chunk_size=1)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/mgorny/git/cpython/Lib/test/test_xml_etree.py", line 1495, in test_simple_xml
    self.assert_event_tags(parser, [('end', 'element')])
    ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgorny/git/cpython/Lib/test/test_xml_etree.py", line 1480, in assert_event_tags
    self.assertEqual([(action, elem.tag) for action, elem in events],
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                     expected)
                     ^^^^^^^^^
AssertionError: Lists differ: [] != [('end', 'element')]

Second list contains 1 additional elements.
First extra element 0:
('end', 'element')

- []
+ [('end', 'element')]

======================================================================
FAIL: test_simple_xml (test.test_xml_etree_c.XMLPullParserTest.test_simple_xml) (chunk_size=5)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/mgorny/git/cpython/Lib/test/test_xml_etree.py", line 1498, in test_simple_xml
    self.assert_event_tags(parser, [
    ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
        ('end', 'element'),
        ^^^^^^^^^^^^^^^^^^^
        ('end', 'empty-element'),
        ^^^^^^^^^^^^^^^^^^^^^^^^^
        ])
        ^^
  File "/home/mgorny/git/cpython/Lib/test/test_xml_etree.py", line 1480, in assert_event_tags
    self.assertEqual([(action, elem.tag) for action, elem in events],
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                     expected)
                     ^^^^^^^^^
AssertionError: Lists differ: [('end', 'element')] != [('end', 'element'), ('end', 'empty-element')]

Second list contains 1 additional elements.
First extra element 1:
('end', 'empty-element')

- [('end', 'element')]
+ [('end', 'element'), ('end', 'empty-element')]

----------------------------------------------------------------------

I have reproduced with 3.11.8, 3.12.8 and main as of 2afc718, both using Gentoo ebuild and raw git repository. I've tested the latter like this:

./configure -C --with-system-expat
make -j12
./python -u -W default -bb -E -m test -vv test_xml_etree{,_c}

CC @hartwork

CPython versions tested on:

3.11, 3.12, CPython main branch

Operating systems tested on:

Linux

Linked PRs

@mgorny mgorny added the type-bug An unexpected behavior, bug, or error label Feb 7, 2024
@hardfalcon
Copy link

hardfalcon commented Feb 7, 2024

I've run into the exact same issue when trying to build/package Python 3.11.8 in a clean chroot on Archlinux with testing repos enabled (expat 2.6.0 is currently only available from the core-testing repo). The error did not occur anymore when I forced expat 2.5.0 (but I'd obviously prefer to avoid that because expat 2.6.0 is a security update).

@serhiy-storchaka
Copy link
Member

Note that tests are passed for chunk_size=None. I.e. it works as previously when feed the parser a whole line, but fails when split it on 1- or 5-character chunks.

I wonder whether it was an intentional change (and what was the reason) or a bug in expat.

@hardfalcon
Copy link

hardfalcon commented Feb 7, 2024

@serhiy-storchaka: I think this might be the result of the fix for CVE-2023-52425:
https://github.com/libexpat/libexpat/blob/R_2_6_0/expat/Changes#L7C22-L7C27

Not sure if it's intentional, though.

@hartwork
Copy link
Contributor

hartwork commented Feb 7, 2024

Just a quick note that libexpat upstream is aware by now. I'll get back to you for more soon-ish.

@hartwork
Copy link
Contributor

hartwork commented Feb 7, 2024

I have created candidate pull request #115138 now for review. It fixes the test suite in CPython for Expat >=2.6.0.

@serhiy-storchaka
Copy link
Member

I would make the Expat buffering more context depending. For example, when the parser waits for the end of the long attribute, " in new input should set enough=1.

@Snild-Sony
Copy link

Snild-Sony commented Feb 7, 2024

I wonder whether it was an intentional change (and what was the reason) or a bug in expat.

Expat previously took quadratic time when parsing tokens that required multiple buffer fills (a.k.a. "feeds" in the Python code). Expat 2.6.0 introduces a mitigation for this that does exponential backoff when repeatedly failing to parse the same token due to not having enough data.

This is why it doesn't affect the non-chunked case. It will also not affect the vast majority of usecases, since tokens do not usually require multiple feeds.

While I don't believe Expat ever guaranteed immediate events, I recognize that it is a change in Expat's behavior in practice. However, I expect that the vast majority of apps will not depend on getting immediate feedback (as they won't know what their input is) -- and this was the least intrusive way I could think of to fix this DoS.

when the parser waits for the end of the long attribute, " in new input should set enough=1.

@hartwork and I have discussed something along those lines, but concluded that it would be unnecessarily complex, and possibly difficult to get 100% right.

For example, if we're looking for the end of <hello , we want >. But we can't just force a reparse every time a > shows up in the input: attributex=">>>>>>>>>>> would trivially allow a bad XML file to bypass the mitigation. That means that we need to look at/remember more context (possibly from previous buffer fills), which AFAICT would require a whole lot of new infrastructure in Expat.


If this behavior is absolutely unacceptable, it is possible (but not recommended) to disable the mitigation using XML_SetReparseDeferralEnabled(). It is even possible to change mid-document, e.g. if you want to try to flush what's in Expat's buffer, but still mostly benefit from the new attack mitigation.

@serhiy-storchaka
Copy link
Member

Does it need to restart parsing from <hello rather of at most ">>>? If it is so, then you have no other way with the current design. Ideally, Expact could be rewritten as a state machine that does not need reparsing, but it may be too much work.

Could you at least only block reparsing for large enough data? I mean that there is a difference between reparsing 100 bytes and 1000000 bytes.

Reducing the factor from 2 to 1.5 or 1.8 could help in some use cases. For example, if the stream consists of massages of approximately equal size N, and we feed the parser by chunks of approximately N bytes, then there is a large chance to block three messages in row (the first chunk is slightly smaller than the first message, and the second chunk is slightly smaller than the first chunk), but if the factor is less than 2, most likely only one message be blocked.

Or maybe combine both approaches: have_now >= A * had_before - B.

@Snild-Sony
Copy link

Does it need to restart parsing from <hello rather of at most ">>>?

Yes, that's what it does.

Could you at least only block reparsing for large enough data?

It's possible, but won't that just make the behavior harder to grasp for a user of the library?

Also, since the cost depends on the feed size, it's hard to set a definite threshold. Even a 1000-byte token, fed into Expat 1 byte at a time, will incur an amplification of 500x.

For example, if the stream consists of massages of approximately equal size N ...

Yes, a smaller factor could help that case. But I'm not sure it's a good idea to try to hide this behavior from apps -- for those that don't really need the behavior, it's unnecessary, and for those that do need it, it's not a 100% guarantee.

Do note that the whole first chunk would need to consist of a single token (essentially one big tag or comment) for this example to trigger the deferral logic, since consuming any bytes will reset the heuristic.

gentoo-bot pushed a commit to gentoo/gentoo that referenced this issue Feb 7, 2024
Followup to 93480b9.

Bug: python/cpython#115133
Signed-off-by: Sam James <sam@gentoo.org>
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Feb 8, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.
@serhiy-storchaka
Copy link
Member

#115164 keeps testing with small chunks, so we can test that our code does not introduce additional buffering, but makes some failures tolerable with Expat 2.6.0.

Am I correct that the test with chunk_size=8 is passed with Expat 2.6.0?

lazka pushed a commit to msys2-contrib/cpython-mingw that referenced this issue Feb 10, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.
lazka pushed a commit to msys2-contrib/cpython-mingw that referenced this issue Feb 10, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.
serhiy-storchaka added a commit that referenced this issue Feb 11, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Feb 11, 2024
…GH-115164)

Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.
(cherry picked from commit 4a08e7b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Feb 11, 2024
…GH-115164)

Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.
(cherry picked from commit 4a08e7b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@serhiy-storchaka
Copy link
Member

Thank you all for discussion.

serhiy-storchaka added a commit that referenced this issue Feb 11, 2024
…5164) (GH-115288)

Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.
(cherry picked from commit 4a08e7b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit that referenced this issue Feb 11, 2024
…5164) (GH-115289)

Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.
(cherry picked from commit 4a08e7b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
ambv pushed a commit that referenced this issue Feb 21, 2024
…5535)

Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.
(cherry picked from commit 4a08e7b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
naveen521kk pushed a commit to naveen521kk/cpython that referenced this issue Feb 21, 2024
…ythonGH-115164) (pythonGH-115288)

Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.
(cherry picked from commit 4a08e7b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
hroncok pushed a commit to fedora-python/cpython that referenced this issue Feb 28, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.

(cherry picked from commit 4a08e7b)
stratakis pushed a commit to fedora-python/cpython that referenced this issue Feb 28, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.

(cherry picked from commit 4a08e7b)
stratakis pushed a commit to fedora-python/cpython that referenced this issue Feb 28, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.

(cherry picked from commit 4a08e7b)
stratakis pushed a commit to fedora-python/cpython that referenced this issue Feb 28, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.
(cherry picked from commit 4a08e7b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
stratakis pushed a commit to fedora-python/cpython that referenced this issue Feb 28, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.
(cherry picked from commit 4a08e7b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
stratakis pushed a commit to fedora-python/cpython that referenced this issue Feb 28, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.
(cherry picked from commit 4a08e7b)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
scoder pushed a commit to lxml/lxml that referenced this issue Mar 2, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.

Heavily inspired by python/cpython@4a08e7b

We cannot use a @fails_with_expat_2_6_0 decorator
because the test passes in ETreePullTestCase.

See python/cpython#115133
See GHSA-gh68-jm46-84rf
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
hroncok pushed a commit to fedora-python/cpython that referenced this issue Mar 7, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.

(cherry picked from commit 4a08e7b)
stratakis pushed a commit to stratakis/cpython that referenced this issue Mar 11, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.

(cherry picked from commit 4a08e7b)
stratakis pushed a commit to stratakis/cpython that referenced this issue Mar 11, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.

(cherry picked from commit 4a08e7b)
stratakis pushed a commit to stratakis/cpython that referenced this issue Mar 20, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.

(cherry picked from commit 4a08e7b)
stratakis pushed a commit to stratakis/cpython that referenced this issue Mar 20, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.

(cherry picked from commit 4a08e7b)
stratakis pushed a commit to stratakis/cpython that referenced this issue Mar 20, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.

(cherry picked from commit 4a08e7b)
stratakis pushed a commit to stratakis/cpython that referenced this issue Mar 20, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.

(cherry picked from commit 4a08e7b)
stratakis pushed a commit to stratakis/cpython that referenced this issue Mar 25, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.

(cherry picked from commit 4a08e7b)
hroncok pushed a commit to fedora-python/cpython that referenced this issue Mar 26, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.

(cherry picked from commit 4a08e7b)
scoder pushed a commit to lxml/lxml that referenced this issue Mar 28, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.

Heavily inspired by python/cpython@4a08e7b

We cannot use a @fails_with_expat_2_6_0 decorator
because the test passes in ETreePullTestCase.

See python/cpython#115133
See GHSA-gh68-jm46-84rf
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
mcepl pushed a commit to openSUSE-Python/cpython that referenced this issue Apr 2, 2024
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.

(cherry picked from commit 4a08e7b)
mcepl pushed a commit to openSUSE-Python/cpython that referenced this issue Apr 11, 2024
Fix etree XMLPullParser tests for Expat >=2.6.0 with reparse deferral

Fixes: gh#python#115133
From-PR: gh#python/cpython!115138
Patch: expat-260-test_xml_etree-reparse-deferral.patch
mcepl pushed a commit to openSUSE-Python/cpython that referenced this issue Apr 25, 2024
Fix etree XMLPullParser tests for Expat >=2.6.0 with reparse deferral

Fixes: gh#python#115133
From-PR: gh#python/cpython!115138
Patch: expat-260-test_xml_etree-reparse-deferral.patch
mcepl pushed a commit to openSUSE-Python/cpython that referenced this issue Apr 25, 2024
Fix etree XMLPullParser tests for Expat >=2.6.0 with reparse deferral

Fixes: gh#python#115133
From-PR: gh#python/cpython!115138
Patch: expat-260-test_xml_etree-reparse-deferral.patch
mcepl pushed a commit to openSUSE-Python/cpython that referenced this issue May 10, 2024
Fix etree XMLPullParser tests for Expat >=2.6.0 with reparse deferral

Combined with gh#python/cpython!31453
bpo-46811: Make test suite support Expat >=2.4.5 (pythonGH-31453)

Curly brackets were never allowed in namespace URIs
according to RFC 3986, and so-called namespace-validating
XML parsers have the right to reject them a invalid URIs.

libexpat >=2.4.5 has become strcter in that regard due to
related security issues; with ET.XML instantiating a
namespace-aware parser under the hood, this test has no
future in CPython.

References:
- https://datatracker.ietf.org/doc/html/rfc3968
- https://www.w3.org/TR/xml-names/

Also, test_minidom.py: Support Expat >=2.4.5
(cherry picked from commit 2cae938)

Co-authored-by: Sebastian Pipping <sebastian@pipping.org>
Fixes: gh#python#115133
From-PR: gh#python/cpython!115138
Patch: CVE-2023-52425-libexpat-2.6.0-backport-15.6.patch
mcepl pushed a commit to openSUSE-Python/cpython that referenced this issue May 10, 2024
Combined with gh#python/cpython!31453
bpo-46811: Make test suite support Expat >=2.4.5 (pythonGH-31453)

Curly brackets were never allowed in namespace URIs
according to RFC 3986, and so-called namespace-validating
XML parsers have the right to reject them a invalid URIs.

libexpat >=2.4.5 has become strcter in that regard due to
related security issues; with ET.XML instantiating a
namespace-aware parser under the hood, this test has no
future in CPython.

References:
- https://datatracker.ietf.org/doc/html/rfc3968
- https://www.w3.org/TR/xml-names/

Also, test_minidom.py: Support Expat >=2.4.5
(cherry picked from commit 2cae938)

Co-authored-by: Sebastian Pipping <sebastian@pipping.org>
Fixes: gh#python#115133
From-PR: gh#python/cpython!115138
Patch: CVE-2023-52425-libexpat-2.6.0-backport.patch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-XML type-bug An unexpected behavior, bug, or error
Projects
None yet
6 participants