Skip to content

Commit

Permalink
[3.8] gh-115398: Expose Expat >=2.6.0 reparse deferral API (CVE-2023-…
Browse files Browse the repository at this point in the history
…52425) (GH-115623) (GH-116275)

Allow controlling Expat >=2.6.0 reparse deferral (CVE-2023-52425) by adding five new methods:

- `xml.etree.ElementTree.XMLParser.flush`
- `xml.etree.ElementTree.XMLPullParser.flush`
- `xml.parsers.expat.xmlparser.GetReparseDeferralEnabled`
- `xml.parsers.expat.xmlparser.SetReparseDeferralEnabled`
- `xml.sax.expatreader.ExpatParser.flush`

Based on the "flush" idea from #115138 (comment) .

Includes code suggested-by: Snild Dolkow <snild@sony.com>
and by core dev Serhiy Storchaka.

Co-authored-by: Gregory P. Smith <greg@krypto.org>
  • Loading branch information
hartwork and gpshead committed Mar 6, 2024
1 parent 4d58a1d commit 854f645
Show file tree
Hide file tree
Showing 14 changed files with 435 additions and 20 deletions.
36 changes: 36 additions & 0 deletions Doc/library/pyexpat.rst
Expand Up @@ -196,6 +196,42 @@ XMLParser Objects
:exc:`ExpatError` to be raised with the :attr:`code` attribute set to
``errors.codes[errors.XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING]``.

.. method:: xmlparser.SetReparseDeferralEnabled(enabled)

.. warning::

Calling ``SetReparseDeferralEnabled(False)`` has security implications,
as detailed below; please make sure to understand these consequences
prior to using the ``SetReparseDeferralEnabled`` method.

Expat 2.6.0 introduced a security mechanism called "reparse deferral"
where instead of causing denial of service through quadratic runtime
from reparsing large tokens, reparsing of unfinished tokens is now delayed
by default until a sufficient amount of input is reached.
Due to this delay, registered handlers may — depending of the sizing of
input chunks pushed to Expat — no longer be called right after pushing new
input to the parser. Where immediate feedback and taking over responsiblity
of protecting against denial of service from large tokens are both wanted,
calling ``SetReparseDeferralEnabled(False)`` disables reparse deferral
for the current Expat parser instance, temporarily or altogether.
Calling ``SetReparseDeferralEnabled(True)`` allows re-enabling reparse
deferral.

Note that :meth:`SetReparseDeferralEnabled` has been backported to some
prior releases of CPython as a security fix. Check for availability of
:meth:`SetReparseDeferralEnabled` using :func:`hasattr` if used in code
running across a variety of Python versions.

.. versionadded:: 3.8.19

.. method:: xmlparser.GetReparseDeferralEnabled()

Returns whether reparse deferral is currently enabled for the given
Expat parser instance.

.. versionadded:: 3.8.19


:class:`xmlparser` objects have the following attributes:


Expand Down
39 changes: 39 additions & 0 deletions Doc/library/xml.etree.elementtree.rst
Expand Up @@ -163,6 +163,11 @@ data but would still like to have incremental parsing capabilities, take a look
at :func:`iterparse`. It can be useful when you're reading a large XML document
and don't want to hold it wholly in memory.

Where *immediate* feedback through events is wanted, calling method
:meth:`XMLPullParser.flush` can help reduce delay;
please make sure to study the related security notes.


Finding interesting elements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -1352,6 +1357,24 @@ XMLParser Objects

Feeds data to the parser. *data* is encoded data.


.. method:: flush()

Triggers parsing of any previously fed unparsed data, which can be
used to ensure more immediate feedback, in particular with Expat >=2.6.0.
The implementation of :meth:`flush` temporarily disables reparse deferral
with Expat (if currently enabled) and triggers a reparse.
Disabling reparse deferral has security consequences; please see
:meth:`xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` for details.

Note that :meth:`flush` has been backported to some prior releases of
CPython as a security fix. Check for availability of :meth:`flush`
using :func:`hasattr` if used in code running across a variety of Python
versions.

.. versionadded:: 3.8.19


:meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method
for each opening tag, its ``end(tag)`` method for each closing tag, and data
is processed by method ``data(data)``. For further supported callback
Expand Down Expand Up @@ -1413,6 +1436,22 @@ XMLPullParser Objects

Feed the given bytes data to the parser.

.. method:: flush()

Triggers parsing of any previously fed unparsed data, which can be
used to ensure more immediate feedback, in particular with Expat >=2.6.0.
The implementation of :meth:`flush` temporarily disables reparse deferral
with Expat (if currently enabled) and triggers a reparse.
Disabling reparse deferral has security consequences; please see
:meth:`xml.parsers.expat.xmlparser.SetReparseDeferralEnabled` for details.

Note that :meth:`flush` has been backported to some prior releases of
CPython as a security fix. Check for availability of :meth:`flush`
using :func:`hasattr` if used in code running across a variety of Python
versions.

.. versionadded:: 3.8.19

.. method:: close()

Signal the parser that the data stream is terminated. Unlike
Expand Down
4 changes: 3 additions & 1 deletion Include/pyexpat.h
Expand Up @@ -48,8 +48,10 @@ struct PyExpat_CAPI
enum XML_Status (*SetEncoding)(XML_Parser parser, const XML_Char *encoding);
int (*DefaultUnknownEncodingHandler)(
void *encodingHandlerData, const XML_Char *name, XML_Encoding *info);
/* might be none for expat < 2.1.0 */
/* might be NULL for expat < 2.1.0 */
int (*SetHashSalt)(XML_Parser parser, unsigned long hash_salt);
/* might be NULL for expat < 2.6.0 */
XML_Bool (*SetReparseDeferralEnabled)(XML_Parser parser, XML_Bool enabled);
/* always add new stuff to the end! */
};

54 changes: 54 additions & 0 deletions Lib/test/test_pyexpat.py
Expand Up @@ -729,5 +729,59 @@ def resolve_entity(context, base, system_id, public_id):
self.assertEqual(handler_call_args, [("bar", "baz")])


class ReparseDeferralTest(unittest.TestCase):
def test_getter_setter_round_trip(self):
parser = expat.ParserCreate()
enabled = (expat.version_info >= (2, 6, 0))

self.assertIs(parser.GetReparseDeferralEnabled(), enabled)
parser.SetReparseDeferralEnabled(False)
self.assertIs(parser.GetReparseDeferralEnabled(), False)
parser.SetReparseDeferralEnabled(True)
self.assertIs(parser.GetReparseDeferralEnabled(), enabled)

def test_reparse_deferral_enabled(self):
if expat.version_info < (2, 6, 0):
self.skipTest(f'Expat {expat.version_info} does not '
'support reparse deferral')

started = []

def start_element(name, _):
started.append(name)

parser = expat.ParserCreate()
parser.StartElementHandler = start_element
self.assertTrue(parser.GetReparseDeferralEnabled())

for chunk in (b'<doc', b'/>'):
parser.Parse(chunk, False)

# The key test: Have handlers already fired? Expecting: no.
self.assertEqual(started, [])

parser.Parse(b'', True)

self.assertEqual(started, ['doc'])

def test_reparse_deferral_disabled(self):
started = []

def start_element(name, _):
started.append(name)

parser = expat.ParserCreate()
parser.StartElementHandler = start_element
if expat.version_info >= (2, 6, 0):
parser.SetReparseDeferralEnabled(False)
self.assertFalse(parser.GetReparseDeferralEnabled())

for chunk in (b'<doc', b'/>'):
parser.Parse(chunk, False)

# The key test: Have handlers already fired? Expecting: yes.
self.assertEqual(started, ['doc'])


if __name__ == "__main__":
unittest.main()
51 changes: 51 additions & 0 deletions Lib/test/test_sax.py
Expand Up @@ -18,6 +18,7 @@
from io import BytesIO, StringIO
import codecs
import os.path
import pyexpat
import shutil
from urllib.error import URLError
from test import support
Expand Down Expand Up @@ -1206,6 +1207,56 @@ def test_expat_incremental_reset(self):

self.assertEqual(result.getvalue(), start + b"<doc>text</doc>")

def test_flush_reparse_deferral_enabled(self):
if pyexpat.version_info < (2, 6, 0):
self.skipTest(f'Expat {pyexpat.version_info} does not support reparse deferral')

result = BytesIO()
xmlgen = XMLGenerator(result)
parser = create_parser()
parser.setContentHandler(xmlgen)

for chunk in ("<doc", ">"):
parser.feed(chunk)

self.assertEqual(result.getvalue(), start) # i.e. no elements started
self.assertTrue(parser._parser.GetReparseDeferralEnabled())

parser.flush()

self.assertTrue(parser._parser.GetReparseDeferralEnabled())
self.assertEqual(result.getvalue(), start + b"<doc>")

parser.feed("</doc>")
parser.close()

self.assertEqual(result.getvalue(), start + b"<doc></doc>")

def test_flush_reparse_deferral_disabled(self):
result = BytesIO()
xmlgen = XMLGenerator(result)
parser = create_parser()
parser.setContentHandler(xmlgen)

for chunk in ("<doc", ">"):
parser.feed(chunk)

if pyexpat.version_info >= (2, 6, 0):
parser._parser.SetReparseDeferralEnabled(False)

self.assertEqual(result.getvalue(), start) # i.e. no elements started
self.assertFalse(parser._parser.GetReparseDeferralEnabled())

parser.flush()

self.assertFalse(parser._parser.GetReparseDeferralEnabled())
self.assertEqual(result.getvalue(), start + b"<doc>")

parser.feed("</doc>")
parser.close()

self.assertEqual(result.getvalue(), start + b"<doc></doc>")

# ===== Locator support

def test_expat_locator_noinfo(self):
Expand Down
80 changes: 63 additions & 17 deletions Lib/test/test_xml_etree.py
Expand Up @@ -105,11 +105,6 @@
"""


fails_with_expat_2_6_0 = (unittest.expectedFailure
if pyexpat.version_info >= (2, 6, 0) else
lambda test: test)


def checkwarnings(*filters, quiet=False):
def decorator(test):
def newtest(*args, **kwargs):
Expand Down Expand Up @@ -1250,12 +1245,14 @@ def test_tree_write_attribute_order(self):

class XMLPullParserTest(unittest.TestCase):

def _feed(self, parser, data, chunk_size=None):
def _feed(self, parser, data, chunk_size=None, flush=False):
if chunk_size is None:
parser.feed(data)
else:
for i in range(0, len(data), chunk_size):
parser.feed(data[i:i+chunk_size])
if flush:
parser.flush()

def assert_events(self, parser, expected, max_events=None):
self.assertEqual(
Expand All @@ -1273,34 +1270,32 @@ def assert_event_tags(self, parser, expected, max_events=None):
self.assertEqual([(action, elem.tag) for action, elem in events],
expected)

def test_simple_xml(self, chunk_size=None):
def test_simple_xml(self, chunk_size=None, flush=False):
parser = ET.XMLPullParser()
self.assert_event_tags(parser, [])
self._feed(parser, "<!-- comment -->\n", chunk_size)
self._feed(parser, "<!-- comment -->\n", chunk_size, flush)
self.assert_event_tags(parser, [])
self._feed(parser,
"<root>\n <element key='value'>text</element",
chunk_size)
chunk_size, flush)
self.assert_event_tags(parser, [])
self._feed(parser, ">\n", chunk_size)
self._feed(parser, ">\n", chunk_size, flush)
self.assert_event_tags(parser, [('end', 'element')])
self._feed(parser, "<element>text</element>tail\n", chunk_size)
self._feed(parser, "<empty-element/>\n", chunk_size)
self._feed(parser, "<element>text</element>tail\n", chunk_size, flush)
self._feed(parser, "<empty-element/>\n", chunk_size, flush)
self.assert_event_tags(parser, [
('end', 'element'),
('end', 'empty-element'),
])
self._feed(parser, "</root>\n", chunk_size)
self._feed(parser, "</root>\n", chunk_size, flush)
self.assert_event_tags(parser, [('end', 'root')])
self.assertIsNone(parser.close())

@fails_with_expat_2_6_0
def test_simple_xml_chunk_1(self):
self.test_simple_xml(chunk_size=1)
self.test_simple_xml(chunk_size=1, flush=True)

@fails_with_expat_2_6_0
def test_simple_xml_chunk_5(self):
self.test_simple_xml(chunk_size=5)
self.test_simple_xml(chunk_size=5, flush=True)

def test_simple_xml_chunk_22(self):
self.test_simple_xml(chunk_size=22)
Expand Down Expand Up @@ -1499,6 +1494,57 @@ def test_unknown_event(self):
with self.assertRaises(ValueError):
ET.XMLPullParser(events=('start', 'end', 'bogus'))

def test_flush_reparse_deferral_enabled(self):
if pyexpat.version_info < (2, 6, 0):
self.skipTest(f'Expat {pyexpat.version_info} does not '
'support reparse deferral')

parser = ET.XMLPullParser(events=('start', 'end'))

for chunk in ("<doc", ">"):
parser.feed(chunk)

self.assert_event_tags(parser, []) # i.e. no elements started
if ET is pyET:
self.assertTrue(parser._parser._parser.GetReparseDeferralEnabled())

parser.flush()

self.assert_event_tags(parser, [('start', 'doc')])
if ET is pyET:
self.assertTrue(parser._parser._parser.GetReparseDeferralEnabled())

parser.feed("</doc>")
parser.close()

self.assert_event_tags(parser, [('end', 'doc')])

def test_flush_reparse_deferral_disabled(self):
parser = ET.XMLPullParser(events=('start', 'end'))

for chunk in ("<doc", ">"):
parser.feed(chunk)

if pyexpat.version_info >= (2, 6, 0):
if not ET is pyET:
self.skipTest(f'XMLParser.(Get|Set)ReparseDeferralEnabled '
'methods not available in C')
parser._parser._parser.SetReparseDeferralEnabled(False)

self.assert_event_tags(parser, []) # i.e. no elements started
if ET is pyET:
self.assertFalse(parser._parser._parser.GetReparseDeferralEnabled())

parser.flush()

self.assert_event_tags(parser, [('start', 'doc')])
if ET is pyET:
self.assertFalse(parser._parser._parser.GetReparseDeferralEnabled())

parser.feed("</doc>")
parser.close()

self.assert_event_tags(parser, [('end', 'doc')])

#
# xinclude tests (samples from appendix C of the xinclude specification)
Expand Down
14 changes: 14 additions & 0 deletions Lib/xml/etree/ElementTree.py
Expand Up @@ -1303,6 +1303,11 @@ def read_events(self):
else:
yield event

def flush(self):
if self._parser is None:
raise ValueError("flush() called after end of stream")
self._parser.flush()


def XML(text, parser=None):
"""Parse XML document from string constant.
Expand Down Expand Up @@ -1711,6 +1716,15 @@ def close(self):
del self.parser, self._parser
del self.target, self._target

def flush(self):
was_enabled = self.parser.GetReparseDeferralEnabled()
try:
self.parser.SetReparseDeferralEnabled(False)
self.parser.Parse(b"", False)
except self._error as v:
self._raiseerror(v)
finally:
self.parser.SetReparseDeferralEnabled(was_enabled)

# --------------------------------------------------------------------
# C14N 2.0
Expand Down

0 comments on commit 854f645

Please sign in to comment.