Skip to content

Heap-use-after-free issue in pyexpat related to .ExternalEntityParserCreate #139400

@hartwork

Description

@hartwork

Bug report

Bug description:

When configured with --with-address-sanitizer, the test below crashes all versions of CPython:

# Copyright (c) 2025 Sebastian Pipping <sebastian@pipping.org>
# Licensed under Zero-Clause BSD ("0BSD")

import pyexpat as expat
import unittest


class ParentParserLifetimeTest(unittest.TestCase):
    """
    Subparsers make use of their parent XML_Parser inside of Expat.
    As a result, parent parsers need to outlive subparsers.
    Regression test for issue 139400
    """
    def test_parent_parser_outlives_its_subparsers(self):
        parser = expat.ParserCreate()
        subparser = parser.ExternalEntityParserCreate(None)

        # Now try to cause garbage collection of the parent parser
        # while it's still being referenced by a related subparser
        del parser


if __name__ == '__main__':
    unittest.main()

The finding was first documented at #139367 (comment) .

For 3.13, the AddressSanitizer crash details are: (click to expand)
# ./python ..test_file_with_test_class_above_added_dot_py.. -v
test_use_after_free__crash (__main__.UseAfterFreeCrashDemoTest.test_use_after_free__crash) ... =================================================================
==16187==ERROR: AddressSanitizer: heap-use-after-free on address 0x7cda1ad7e038 at pc 0x7f4a1af8a94f bp 0x7ffc61f63720 sp 0x7ffc61f63710
READ of size 8 at 0x7cda1ad7e038 thread T0
    #0 0x7f4a1af8a94e in getRootParserOf Modules/expat/xmlparse.c:8660
    #1 0x7f4a1af8a94e in expat_free Modules/expat/xmlparse.c:913
    #2 0x7f4a1af8a94e in expat_free Modules/expat/xmlparse.c:906
    #3 0x7f4a1af8a94e in PyExpat_XML_ParserFree Modules/expat/xmlparse.c:1997
    #4 0x7f4a1af70286 in xmlparse_dealloc Modules/pyexpat.c:1266
    #5 0x558a55f60555 in Py_DECREF Include/object.h:949
    #6 0x558a55f60555 in Py_XDECREF Include/object.h:1042
    #7 0x558a55f60555 in _PyFrame_ClearLocals Python/frame.c:104
    #8 0x558a55f60555 in _PyFrame_ClearExceptCode Python/frame.c:129
    #9 0x558a55e9df91 in clear_thread_frame Python/ceval.c:1682
    #10 0x558a55e9df91 in _PyEval_FrameClearAndPop Python/ceval.c:1709
    #11 0x558a55ec6b44 in _PyEval_EvalFrameDefault Python/generated_cases.c.h:5222
    #12 0x558a55b523d3 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
    #13 0x558a55b523d3 in method_vectorcall Objects/classobject.c:93
    #14 0x558a55b4d5f6 in _PyVectorcall_Call Objects/call.c:273
    #15 0x558a55b4d5f6 in _PyObject_Call Objects/call.c:348
    #16 0x558a55b4d5f6 in PyObject_Call Objects/call.c:373
    #17 0x558a55eb1e8a in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1362
    #18 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #19 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #20 0x558a55ce88e6 in slot_tp_call Objects/typeobject.c:9570
    #21 0x558a55b470d1 in _PyObject_MakeTpCall Objects/call.c:242
    #22 0x558a55ebf7de in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1850
    #23 0x558a55b523d3 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
    #24 0x558a55b523d3 in method_vectorcall Objects/classobject.c:93
    #25 0x558a55b4d5f6 in _PyVectorcall_Call Objects/call.c:273
    #26 0x558a55b4d5f6 in _PyObject_Call Objects/call.c:348
    #27 0x558a55b4d5f6 in PyObject_Call Objects/call.c:373
    #28 0x558a55eb1e8a in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1362
    #29 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #30 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #31 0x558a55ce88e6 in slot_tp_call Objects/typeobject.c:9570
    #32 0x558a55b470d1 in _PyObject_MakeTpCall Objects/call.c:242
    #33 0x558a55ebf7de in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1850
    #34 0x558a55b523d3 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
    #35 0x558a55b523d3 in method_vectorcall Objects/classobject.c:93
    #36 0x558a55b4d5f6 in _PyVectorcall_Call Objects/call.c:273
    #37 0x558a55b4d5f6 in _PyObject_Call Objects/call.c:348
    #38 0x558a55b4d5f6 in PyObject_Call Objects/call.c:373
    #39 0x558a55eb1e8a in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1362
    #40 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #41 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #42 0x558a55ce88e6 in slot_tp_call Objects/typeobject.c:9570
    #43 0x558a55b470d1 in _PyObject_MakeTpCall Objects/call.c:242
    #44 0x558a55eaa121 in _PyEval_EvalFrameDefault Python/generated_cases.c.h:813
    #45 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #46 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #47 0x558a55cfb652 in slot_tp_init Objects/typeobject.c:9816
    #48 0x558a55cd8107 in type_call Objects/typeobject.c:1997
    #49 0x558a55b470d1 in _PyObject_MakeTpCall Objects/call.c:242
    #50 0x558a55eaa121 in _PyEval_EvalFrameDefault Python/generated_cases.c.h:813
    #51 0x558a55ed8b3e in _PyEval_EvalFrame Include/internal/pycore_ceval.h:119
    #52 0x558a55ed8b3e in _PyEval_Vector Python/ceval.c:1820
    #53 0x558a55ed8b3e in PyEval_EvalCode Python/ceval.c:604
    #54 0x558a5600eb0e in run_eval_code_obj Python/pythonrun.c:1381
    #55 0x558a5600eb0e in run_eval_code_obj Python/pythonrun.c:1348
    #56 0x558a5600f127 in run_mod Python/pythonrun.c:1489
    #57 0x558a560138d0 in pyrun_file Python/pythonrun.c:1295
    #58 0x558a560138d0 in _PyRun_SimpleFileObject Python/pythonrun.c:517
    #59 0x558a5601421c in _PyRun_AnyFileObject Python/pythonrun.c:77
    #60 0x558a560833ec in pymain_run_file_obj Modules/main.c:410
    #61 0x558a560833ec in pymain_run_file Modules/main.c:429
    #62 0x558a560833ec in pymain_run_python Modules/main.c:696
    #63 0x558a56085156 in Py_RunMain Modules/main.c:775
    #64 0x558a56085156 in pymain_main Modules/main.c:805
    #65 0x558a56085156 in Py_BytesMain Modules/main.c:829
    #66 0x7f4a1b9a733f  (/lib64/libc.so.6+0x2733f)
    #67 0x7f4a1b9a73f8 in __libc_start_main (/lib64/libc.so.6+0x273f8)
    #68 0x558a55a23754 in _start ([..]/cpython/python+0x19c754)

0x7cda1ad7e038 is located 952 bytes inside of 1096-byte region [0x7cda1ad7dc80,0x7cda1ad7e0c8)
freed by thread T0 here:
    #0 0x7f4a1bd6b9eb  (/usr/lib/gcc/x86_64-pc-linux-gnu/15/libasan.so.8+0x11f9eb)
    #1 0x7f4a1af87e12 in expat_free Modules/expat/xmlparse.c:934
    #2 0x7f4a1af87e12 in expat_free Modules/expat/xmlparse.c:906
    #3 0x7f4a1af87e12 in PyExpat_XML_ParserFree Modules/expat/xmlparse.c:2011
    #4 0x7f4a1af70286 in xmlparse_dealloc Modules/pyexpat.c:1266
    #5 0x558a55f60555 in Py_DECREF Include/object.h:949
    #6 0x558a55f60555 in Py_XDECREF Include/object.h:1042
    #7 0x558a55f60555 in _PyFrame_ClearLocals Python/frame.c:104
    #8 0x558a55f60555 in _PyFrame_ClearExceptCode Python/frame.c:129
    #9 0x558a55e9df91 in clear_thread_frame Python/ceval.c:1682
    #10 0x558a55e9df91 in _PyEval_FrameClearAndPop Python/ceval.c:1709
    #11 0x558a55ec6b44 in _PyEval_EvalFrameDefault Python/generated_cases.c.h:5222
    #12 0x558a55b523d3 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
    #13 0x558a55b523d3 in method_vectorcall Objects/classobject.c:93
    #14 0x558a55b4d5f6 in _PyVectorcall_Call Objects/call.c:273
    #15 0x558a55b4d5f6 in _PyObject_Call Objects/call.c:348
    #16 0x558a55b4d5f6 in PyObject_Call Objects/call.c:373
    #17 0x558a55eb1e8a in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1362
    #18 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #19 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #20 0x558a55ce88e6 in slot_tp_call Objects/typeobject.c:9570
    #21 0x558a55b470d1 in _PyObject_MakeTpCall Objects/call.c:242
    #22 0x558a55ebf7de in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1850
    #23 0x558a55b523d3 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
    #24 0x558a55b523d3 in method_vectorcall Objects/classobject.c:93
    #25 0x558a55b4d5f6 in _PyVectorcall_Call Objects/call.c:273
    #26 0x558a55b4d5f6 in _PyObject_Call Objects/call.c:348
    #27 0x558a55b4d5f6 in PyObject_Call Objects/call.c:373
    #28 0x558a55eb1e8a in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1362
    #29 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #30 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #31 0x558a55ce88e6 in slot_tp_call Objects/typeobject.c:9570
    #32 0x558a55b470d1 in _PyObject_MakeTpCall Objects/call.c:242
    #33 0x558a55ebf7de in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1850
    #34 0x558a55b523d3 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
    #35 0x558a55b523d3 in method_vectorcall Objects/classobject.c:93
    #36 0x558a55b4d5f6 in _PyVectorcall_Call Objects/call.c:273
    #37 0x558a55b4d5f6 in _PyObject_Call Objects/call.c:348
    #38 0x558a55b4d5f6 in PyObject_Call Objects/call.c:373
    #39 0x558a55eb1e8a in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1362
    #40 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #41 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #42 0x558a55ce88e6 in slot_tp_call Objects/typeobject.c:9570
    #43 0x558a55b470d1 in _PyObject_MakeTpCall Objects/call.c:242
    #44 0x558a55eaa121 in _PyEval_EvalFrameDefault Python/generated_cases.c.h:813
    #45 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #46 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #47 0x558a55cfb652 in slot_tp_init Objects/typeobject.c:9816
    #48 0x558a55cd8107 in type_call Objects/typeobject.c:1997

previously allocated by thread T0 here:
    #0 0x7f4a1bd6ceab in malloc (/usr/lib/gcc/x86_64-pc-linux-gnu/15/libasan.so.8+0x120eab)
    #1 0x7f4a1af8b227 in parserCreate Modules/expat/xmlparse.c:1364
    #2 0x7f4a1af6f0d1 in newxmlparseobject Modules/pyexpat.c:1211
    #3 0x7f4a1af6f0d1 in pyexpat_ParserCreate_impl Modules/pyexpat.c:1609
    #4 0x7f4a1af6f0d1 in pyexpat_ParserCreate Modules/clinic/pyexpat.c.h:511
    #5 0x558a55c4ed39 in cfunction_vectorcall_FASTCALL_KEYWORDS Objects/methodobject.c:440
    #6 0x558a55b48813 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
    #7 0x558a55b48813 in PyObject_Vectorcall Objects/call.c:327
    #8 0x558a55eaa121 in _PyEval_EvalFrameDefault Python/generated_cases.c.h:813
    #9 0x558a55b523d3 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
    #10 0x558a55b523d3 in method_vectorcall Objects/classobject.c:93
    #11 0x558a55b4d5f6 in _PyVectorcall_Call Objects/call.c:273
    #12 0x558a55b4d5f6 in _PyObject_Call Objects/call.c:348
    #13 0x558a55b4d5f6 in PyObject_Call Objects/call.c:373
    #14 0x558a55eb1e8a in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1362
    #15 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #16 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #17 0x558a55ce88e6 in slot_tp_call Objects/typeobject.c:9570
    #18 0x558a55b470d1 in _PyObject_MakeTpCall Objects/call.c:242
    #19 0x558a55ebf7de in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1850
    #20 0x558a55b523d3 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
    #21 0x558a55b523d3 in method_vectorcall Objects/classobject.c:93
    #22 0x558a55b4d5f6 in _PyVectorcall_Call Objects/call.c:273
    #23 0x558a55b4d5f6 in _PyObject_Call Objects/call.c:348
    #24 0x558a55b4d5f6 in PyObject_Call Objects/call.c:373
    #25 0x558a55eb1e8a in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1362
    #26 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #27 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #28 0x558a55ce88e6 in slot_tp_call Objects/typeobject.c:9570
    #29 0x558a55b470d1 in _PyObject_MakeTpCall Objects/call.c:242
    #30 0x558a55ebf7de in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1850
    #31 0x558a55b523d3 in _PyObject_VectorcallTstate Include/internal/pycore_call.h:168
    #32 0x558a55b523d3 in method_vectorcall Objects/classobject.c:93
    #33 0x558a55b4d5f6 in _PyVectorcall_Call Objects/call.c:273
    #34 0x558a55b4d5f6 in _PyObject_Call Objects/call.c:348
    #35 0x558a55b4d5f6 in PyObject_Call Objects/call.c:373
    #36 0x558a55eb1e8a in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1362
    #37 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #38 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #39 0x558a55ce88e6 in slot_tp_call Objects/typeobject.c:9570
    #40 0x558a55b470d1 in _PyObject_MakeTpCall Objects/call.c:242
    #41 0x558a55eaa121 in _PyEval_EvalFrameDefault Python/generated_cases.c.h:813
    #42 0x558a55b4e383 in _PyObject_VectorcallDictTstate Objects/call.c:135
    #43 0x558a55b4e383 in _PyObject_Call_Prepend Objects/call.c:504
    #44 0x558a55cfb652 in slot_tp_init Objects/typeobject.c:9816
    #45 0x558a55cd8107 in type_call Objects/typeobject.c:1997

SUMMARY: AddressSanitizer: heap-use-after-free Modules/expat/xmlparse.c:8660 in getRootParserOf
Shadow bytes around the buggy address:
  0x7cda1ad7dd80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x7cda1ad7de00: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x7cda1ad7de80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x7cda1ad7df00: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x7cda1ad7df80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x7cda1ad7e000: fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd fd fd
  0x7cda1ad7e080: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa
  0x7cda1ad7e100: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x7cda1ad7e180: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x7cda1ad7e200: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x7cda1ad7e280: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==16187==ABORTING

My understanding is that there is a bug in the graph of object relations and that the same parser instance is being freed twice as a consequence.

CC @picnixz

CPython versions tested on:

3.15, 3.14, 3.13, 3.12, 3.11, 3.10, 3.9

Operating systems tested on:

Other, Windows, macOS, Linux

Linked PRs

Metadata

Metadata

Assignees

Labels

3.10only security fixes3.11only security fixes3.12only security fixes3.9only security fixesextension-modulesC modules in the Modules dirrelease-blockertopic-XMLtype-bugAn unexpected behavior, bug, or errortype-securityA security issue

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions