-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[marshal] diffoscope crash when loading PYC files: memory corruption #121112
Comments
Output with
|
It looks like a buffer overflow in a C extension used by diffoscope, rather than a bug in Python. I suggest opening a bug report in diffoscope instead. |
Thanks for taking a look. I had considered that possibility, but so far I've had no luck catching the bad write using ASAN or libgmalloc. I think the only non-stdlib extensions in use should be libarchive via ctypes. I will report to diffoscope as well and see if they can offer any clues. |
I opened issue upstream for diffoscope at : https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/388 The segfault is reproducible on Linux as well when running the diffoscope docker container, there are multiple .pyc files in For example the files:
Thus its not related to archive operations as it can be reproduced when comparing .pyc files from the archives directly. |
I'm guessing this has the same cause as #118990 (https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/371). |
Thanks, I guess this is a duplicate one way or another then. |
I've now read through the diffoscope issues, and a significant part of the problem was that diffoscope was running However it still seems pretty bad that passing invalid data to these functions can cause out-of-bounds memory writes, particularly when it's not malicious but just compiled with a (slightly) older python version. |
Yeah, as I said in #118990 (comment):
|
IMO diffoscope should have its own code to decode PYC files, rather than using Python marshal module. We only support loading PYC files of the same Python version. |
I suggest to close the issue since IMO it's a bug in a 3rd party project, not in Python. |
I don't think anyone disagrees that diffoscope was misusing these functions, and it has already stopped doing so. But I also don't think it's unreasonable to ask for better behaviour than silent memory corruption when someone does try to load an unsupported PYC. |
I would help to have reproducer which doesn't use diffoscope. |
#118990 has one that should work for this issue too (if you replace the embedded base64-encoded PYC with the one from this issue or just load it from a file instead). |
Please explain how to get the PYC file of this issue. The reproducer of the issue gh-118990 fails on Python 3.14 on
|
From the test case .zip posted in the very first post of this issue? |
Testcase.zip contains 2 archives, which one should be used? |
Good question. Helpfully, we have this list of files for which there is a confirmed segfault on Linux as well: #121112 (comment) AFAICT the actual differences -- at least for e.g. But perhaps @jmroot can confirm a specific file to try? |
I'm afraid I don't have a reduced test case prepared that doesn't use diffoscope and only uses a single pyc file. I'm not sure when I will have time to prepare one. This is the commit removing the code that was triggering the issue in diffoscope, if that helps: https://salsa.debian.org/reproducible-builds/diffoscope/-/commit/e75871b07e09cfd778181d905f540a15bd71e63a |
a small testcase to reproduce is to download SmallTestCase.zip which include all the files mentioned in #121112 (comment) at and then parse the files with below script (while diffoscope version used is <= 274) from diffoscope.comparators.python import parse_pyc
def parse_pyc_file(filename):
len(list(parse_pyc(open(filename, "rb"))))
codeA1 = parse_pyc_file("A1/guess.cpython-311.opt-1.pyc")
codeB1 = parse_pyc_file("B1/guess.cpython-311.opt-1.pyc")
codeA2 = parse_pyc_file("A1/guess.cpython-311.pyc")
codeB2 = parse_pyc_file("B1/guess.cpython-311.pyc")
codeA3 = parse_pyc_file("A1/phabreview.cpython-311.opt-1.pyc")
codeB3 = parse_pyc_file("B1/phabreview.cpython-311.opt-1.pyc")
codeA4 = parse_pyc_file("A1/phabreview.cpython-311.pyc")
codeB4 = parse_pyc_file("B1/phabreview.cpython-311.pyc")
codeA5 = parse_pyc_file("A1/repowidget.cpython-311.opt-1.pyc")
codeB5 = parse_pyc_file("B1/repowidget.cpython-311.opt-1.pyc")
codeA6 = parse_pyc_file("A1/repowidget.cpython-311.pyc")
codeB6 = parse_pyc_file("B1/repowidget.cpython-311.pyc")
codeA7 = parse_pyc_file("A1/sync.cpython-311.pyc") such as running above code in container (which ensure diffoscope 274 is used): FROM python:3.12.6
RUN set -ex; \
pip3 install --force-reinstall diffoscope==274; \
wget https://github.com/user-attachments/files/17083613/SmallTestCase.zip; \
unzip SmallTestCase.zip; will result in |
Crash report
What happened?
Running diffoscope version 267 under Python 3.12.4, both installed via MacPorts on Intel macOS 14.5, segfaults when given two specific files as input.
The macOS crash report from the optimized build:
Using a debug build (
--with-pydebug --with-assertions --with-address-sanitizer --with-undefined-behavior-sanitizer
) gives this additional output:The crash report when using the debug build:
The two .tbz2 files are attached (in a zip file to appease github.)
Testcase.zip
CPython versions tested on:
3.12
Operating systems tested on:
macOS
Output from running 'python -VV' on the command line:
Python 3.12.4 (main, Jun 8 2024, 03:35:50) [Clang 15.0.0 (clang-1500.3.9.4)]
The text was updated successfully, but these errors were encountered: