Harden LEB128 decoding against malformed Mach-O input#29
Merged
Conversation
Add an explicit end-of-buffer bound to readUnsignedLeb128/readSignedLeb128 so truncated or crafted dyld info / function-starts streams can no longer read past the mapped file. Thread the bound through every caller, replace release-stripped asserts on segment indices with runtime checks, and bound the bind-opcode symbol-name scan with memchr instead of strlen. Also extend crash-regression with truncated copies of the sample binaries so the LEB128 and trie-walk bounds checks are exercised against real LC_DYLD_INFO / LC_FUNCTION_STARTS payloads, and add the missing noexcept on the non-Apple NodeException::what() override so the parser builds under modern GCC/Clang. https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
Bounds-check the NodeData copy against the mapped file before reading, so a struct that straddles EOF can no longer be memcpy'd out of range. Read the load-command header through an aligned copy and require cmdsize to honor the pointer-size alignment, and reject fat slices whose offset is not 8-byte aligned. Together these stop the parser from dereferencing Mach-O headers and load commands at misaligned addresses on crafted input (undefined behaviour flagged by UBSan). https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
FatHeaderViewNode built its table with the no-argument CreateTableView(), leaving TableViewData::GetRAW unset, so the AddRow(field,...) template called an empty std::function and aborted with bad_function_call. This crashed both the Fat Header node in the GUI and any --cli dump of a fat binary. Give the fat header table a real GetRAW so row offsets are correct, and make the AddRow template tolerate an unset callback as a safety net. Extend the CLI smoke test to analyze the fat sample so the regression is covered. https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
Opening a large binary or dyld shared cache previously blocked the UI thread for the whole parse because LayoutController::initModel parsed and built the tree synchronously inside LayoutDockWidget::openFile. Split the controller into a thread-safe parse() (libmoex only, no Qt objects) and a GUI-thread buildModel(), and run parse() through QtConcurrent with a QFutureWatcher. The tree is cleared while parsing and rebuilt on completion; re-entrant opens are ignored until the in-flight parse finishes so the worker never races a deleted controller. https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
The two dialog sources included their generated uic headers with a lowercased name (ui_aboutdialog.h) that only resolves on a case-insensitive filesystem, so AutoUic failed on Linux. Match the .ui filename case so the project configures and builds out of the box with stock qt6-base-dev. Make the CLI smoke test fall back to the plain build/MachOExplorer binary when there is no macOS .app bundle, and add a build_linux.sh helper plus Linux build notes. The full regression suite now passes on Linux. https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
The Code Signature node previously showed only the raw __LINKEDIT blob. Parse the embedded-signature SuperBlob (big-endian, byte-wise so unaligned offsets stay safe and every access is bounds-checked) and list each sub-blob with its slot type, magic, offset and length. When an entitlements blob is present, decode and display the plist XML line by line with file offsets, capped so a large blob cannot flood the table. https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
Selecting a node with a large table (symbols, dyld cache images, ObjC metadata) ran ViewNode::Init() synchronously on the GUI thread, briefly freezing the UI. Build uninitialized nodes through QtConcurrent and display them when ready; already-parsed nodes still render instantly. Builds are single-flight and the most recent selection always wins, so rapid clicking never races or shows a stale table. https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
There was no way to search across the whole structure tree; only individual tables could be filtered. Add a search field above the layout tree backed by a recursive QSortFilterProxyModel: typing keeps matching nodes and their ancestors and expands the tree to reveal them, clearing restores the default expansion. Selection handling maps proxy indices back to the source model so node activation keeps working. https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
The layout search only matched node display names. Add a custom filter proxy that also searches the cell contents of any node whose view data has already been built, so once a table (symbols, strings, ...) has been opened its rows become searchable from the layout search box. Nodes that have not been parsed are still matched by name, so lazy loading is preserved and no node is forced to parse just to search. https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
The disassembly-based xref scanner did not compile against modern Capstone: cs_regs_access requires uint16_t register arrays (cs_regs is uint16_t[64], not uint8_t), and ReadPointerAtVm takes a MachHeader* but was passed the MachHeaderPtr shared_ptr. Use the correct register array type and pass the raw pointer so a Capstone-enabled build succeeds and the xref report resolves call/jump targets. https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
Build the project (Qt6 + Capstone) and run the regression suite on Ubuntu for every push to master and every pull request, so the Linux build and the parser hardening / crash regressions stay green. https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add an explicit end-of-buffer bound to readUnsignedLeb128/readSignedLeb128
so truncated or crafted dyld info / function-starts streams can no longer
read past the mapped file. Thread the bound through every caller, replace
release-stripped asserts on segment indices with runtime checks, and bound
the bind-opcode symbol-name scan with memchr instead of strlen.
Also extend crash-regression with truncated copies of the sample binaries
so the LEB128 and trie-walk bounds checks are exercised against real
LC_DYLD_INFO / LC_FUNCTION_STARTS payloads, and add the missing noexcept on
the non-Apple NodeException::what() override so the parser builds under
modern GCC/Clang.
https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok