-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
archive: clean up archive reader and writer classes #7518
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
rokm
force-pushed
the
cleanup-archive-writer-reader
branch
2 times, most recently
from
March 25, 2023 18:51
34d4c82
to
4481ba6
Compare
Remove the common `ArchiveReader` class. Because the different archive structures used by PyInstaller have very little in common in terms of format, having separate implementations actually simplifies them. The refactoring also simplifies the lazy file access in `ZlibArchiveReader`, which now stores just the archive file path and opens the file whenever access is needed. We do not need to track the previous position in file, because each access performs seek before reading. This allows us to get rid of associated `FilePos` and `ArchiveFile` classes and their home-grown thread-local storage for file position pointers.
The test was testing the multi-threaded read behavior of the ArchiveFile, which was used by archive readers to dynamically open and close the archive file when reading from it. The ArchiveFile wrapper is gone now (it is easier and less error-prone to just open the file directly whenever needed), so remove the test as well.
Move the helper for recursive listing of PKG contents from `PyInstaller.utisl.cliutils.archive_viewer` to `PyInstaller.archive.readers` and simplify the implementation.
Remove the common `ArchiveWriter` class. Because the different archive structures used by PyInstaller have very little in common in terms of format, having separate implementations actually simplifies them.
Add a `raw` flag that allows us to read the module data without unmarshaling it (e.g., for extraction in archive viewer program).
Rewrite the `archive_viewer` utility to take advantage of the clean up in archive reader classes. When opening embedded archive, do not extract it to temporary file. Rename the dubious `--log` command line option to `--list` (for non-interactive content listing). Display entries line-by-line using `print` and string formatting instead of dumping the TOC dictionary with `pprint.pprint`.
rokm
force-pushed
the
cleanup-archive-writer-reader
branch
from
March 25, 2023 19:03
4481ba6
to
7befd43
Compare
bwoodsend
approved these changes
Mar 26, 2023
rokm
added a commit
to rokm/pyinstaller
that referenced
this pull request
Apr 14, 2023
The archive writers clean-up in pyinstaller#7518 introduced a subtle bug that causes incorrect padding of TOC entries that require no padding; such entries receive padding of the previously-processed entry. This in turn breaks 16-byte alignment of the TOC entries, causing bus error when running onefile executable on platforms with stricter alignment requirements, for example armhf (32-bit Raspian OS).
rokm
added a commit
to rokm/pyinstaller
that referenced
this pull request
Apr 14, 2023
The archive writers clean-up in pyinstaller#7518 introduced a subtle bug that causes incorrect padding of TOC entries that require no padding; such entries receive padding of the previously-processed entry. This in turn breaks 16-byte alignment of the TOC entries, causing bus error when running onefile executable on platforms with stricter alignment requirements, for example armhf (32-bit Raspian OS).
rokm
added a commit
to rokm/pyinstaller
that referenced
this pull request
Apr 14, 2023
The archive writers clean-up in pyinstaller#7518 introduced a subtle bug that causes incorrect padding of TOC entries that require no padding; such entries receive padding of the previously-processed entry. This in turn breaks 16-byte alignment of the TOC entries, causing bus error when running executables on platforms with stricter data alignment requirements, for example armhf (32-bit Raspbian OS).
rokm
added a commit
that referenced
this pull request
Apr 14, 2023
The archive writers clean-up in #7518 introduced a subtle bug that causes incorrect padding of TOC entries that require no padding; such entries receive padding of the previously-processed entry. This in turn breaks 16-byte alignment of the TOC entries, causing bus error when running executables on platforms with stricter data alignment requirements, for example armhf (32-bit Raspbian OS).
JonathonReinhart
added a commit
to JonathonReinhart/staticx
that referenced
this pull request
Apr 16, 2023
fix: update to support new pyinstaller 5.10 syntax Before pyinstaller/pyinstaller#7518: - The `CArchiveReader.toc` attribute referred to a `CTOCReader` object. - `CTOCReader.data` was a `list` of `(dpos, dlen, ulen, flag, typcd, name)` tuples. - `CTOCReader.find()` was used to find an entry for a given name. - `CArchiveReader.extract()` accepted either: - An index into the TOC (this is what `staticx` used to use: `n`) - OR an entry name After pyinstaller/pyinstaller#7518: - The `CTOCReader` class was removed. - The `CArchiveReader.toc` attribute is now a `dict`: `name` => `(entry_offset, data_length, uncompressed_length, compression_flag, typecode)` (the same tuple but without the final `name` element) - `CArchiveReader.extract()` accepts only an entry name
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Clean up archive reader and writer classes by removing the "common"
ArchiveReader
andArchiveWriter
classes. The archive data formats that we use actually have little in common, so trying to shoehorn them onto common/abstract classes actually does more harm than good and leads to questionable design choices.One example is the complicated attempt at lazy file access (to prevent file locking on Windows), which needs a home-grown thread-local storage due to trying to track the last position in the files (which is redundant anyway, because every read access needs to seek to the data blob's position). This all originates from base
ArchiveReader
trying to keep a reference to the archive file, whereasZlibArchiveReader
needs to open/close the file on each archive access.Similarly, the base class implementations imply that we have a data structure whose signature is
PYL\0
while there is no such thing (anymore?).