fix: clamp pread() size to avoid EINVAL on macOS for large reads#1053
fix: clamp pread() size to avoid EINVAL on macOS for large reads#1053veloman-yunkan merged 1 commit intoopenzim:mainfrom
Conversation
|
@jasontitus Thank you for your bug report and fix for it! @benoit74 @kelson42 @rgaudin I think that the reported fact rings a bell. During our discussions of
Under Linux, the second point is circumvented by assuming |
|
I remember some time ago we had to introduce an exception to catch mmap failures (and fall back to memory) because those we more common than expected. I also know mmap is not present on Windows and I believe we don't use the similar-goal Windows API. If I understand correctly, under those circumstances with that maps ZIM, opening the Archive would consume 1GB of memory… that would not be acceptable IMO. |
Correction - opening that ZIM file on a system where |
|
I don't have much to say besides that "it looks like we have a problem". |
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (50.00%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #1053 +/- ##
==========================================
- Coverage 56.26% 56.23% -0.04%
==========================================
Files 101 101
Lines 5014 5015 +1
Branches 2186 2188 +2
==========================================
- Hits 2821 2820 -1
Misses 737 737
- Partials 1456 1458 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@jasontitus Thank very much for reporting this bug and fixing it straight. I would like to get this merge and release ASAP. Then we would do the same at Kiwix level. Can you please give a feedback to the code reviewer so we can move forward with the merging? |
|
@veloman-yunkan Can you please complete the PR so we can merge and make a release of libzim? |
7055769 to
26697f6
Compare
@kelson42 Done |
Oops, not yet. MacOS compiler doesn't like my version. Will fix in a moment. |
On macOS, pread() returns EINVAL when the requested size exceeds INT32_MAX (~2.1 GB). This causes ZIM files with more than ~268 million entries to fail to open, since the URL pointer table (entry_count * 8 bytes) exceeds 2 GB and is read in a single pread() call. The existing read loop already handles partial reads (short reads), but never encounters them because the oversized request fails outright before any data is read. Fix: clamp each pread() call to 1 GB. The existing loop naturally handles the remaining data in subsequent iterations. Tested with a 381-million-entry, 117 GB ZIM file (world OpenStreetMap with vector tiles, terrain, and search index) on macOS 26.3 / Apple Silicon.
26697f6 to
659d1c7
Compare
|
This PR touched a piece of code that contained an old pre-existing bug. The change enabled the bug to manifest itself deterministically. Now it is fixed in #1066. |
Problem
ZIM files with more than ~268 million entries fail to open on macOS with:
On macOS,
pread()returnsEINVALwhen the requested size exceedsINT32_MAX(~2.1 GB). The URL pointer table isentry_count * 8bytes — at 381 million entries this is 3.05 GB, triggering the failure.Fix
Clamp each
pread()call to 1 GB. The existing read loop inFD::readAt()already handles partial reads and iterates until the full request is satisfied, so the remaining data is read in subsequent iterations.Testing
Tested on macOS 26.3 (Apple Silicon) with a 381-million-entry, 117 GB ZIM file (world OpenStreetMap build with vector tiles, terrain-RGB tiles, and a full-text search index). Before the fix,
zim::Archive()throws immediately. After the fix, the archive opens and all entries are accessible.The Kiwix macOS app (3.13.0) reports "cannot be opened" for any ZIM exceeding this threshold. This is the same root cause.