Bit7z v4.1.0 Beta #328
rikyoz
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Main Features and Improvements
Nested archives support: the new
BitNestedArchiveReaderclass lets you open and inspect archives that are embedded as items inside another archive, without extracting them to disk first. (#92)Native file I/O: file streams have been rewritten to use low-level Win32 APIs on Windows and POSIX I/O on Unix, replacing
std::fstream. This dramatically improves the performance of both extraction and compression operations. (#319, #325)Due to the choice of using
std::fstreamfor cross-platform support, v4.0 introduced a severe performance regression.For a quick comparison, I did some non-exhaustive benchmarks. The result is the following boxplot, which showcases the scale of that regression and how v4.1 resolves it. In this test, v4.0 extraction takes roughly six times as long as v3, while v4.1 brings it back down into the same range as v3:
Zooming in on just v3 and v4.1 (the v4.0 box is omitted here), a small performance gap remains:
Part of this residual gap is likely due to v4.1 performing additional security validation that v3 did not, so I anticipate a modest, permanent cost relative to v3 in exchange for safer behavior. I'll continue working to reduce, and where possible eliminate, the non-essential portion of this gap in the stable release.
Benchmark methodology
Each version was benchmarked by repeatedly extracting the same archive to the same destination, discarding warmup iterations. Boxes show the median (line) and interquartile range; the cross marks the mean; whiskers follow the Tukey 1.5×IQR convention. Measurements were collected on a single machine; absolute timings are environment-dependent and intended for relative comparison between versions, not as absolute throughput figures. The v4.0 regression and its resolution in v4.1 are large effects that are robust to measurement noise; the finer v3-vs-v4.1 gap is smaller and more sensitive to measurement conditions, so it should be read as indicative rather than exact.
New extraction callbacks:
RenameCallback(dynamically rename or skip items during extraction),RawDataCallback(stream raw bytes directly to user code without touching the filesystem), andBufferCallback(route each extracted file to a separate, independently chosen buffer) unlock extraction patterns that were previously impossible.Example
BitFileExtractor extractor{ lib, BitFormat::Zip }; // RenameCallback: skip .tmp files, extract everything else under a subfolder extractor.extract( "archive.zip", "output/", []( uint32_t, const tstring& path ) -> tstring { if ( path.find( ".tmp" ) != tstring::npos ) { return {}; // empty string = skip this item } return "backup/" + path; } ); // RawDataCallback: receive a file's raw bytes without writing to disk std::vector<byte_t> rawData; extractor.extractTo( "archive.zip", [&]( const byte_t* data, std::size_t size ) -> bool { rawData.insert( rawData.end(), data, data + size ); return true; // return false to abort }, /* index = */ 0 ); // BufferCallback: extract all files, each into its own buffer std::map<tstring, buffer_t> files; extractor.extract( "archive.zip", [&]( uint32_t, const tstring& path ) -> buffer_t& { return files[ path ]; } );Multi-volume LRU file-handle cache: when reading or creating large multi-volume archives, bit7z now keeps only the most recently used volume file handles open, preventing file descriptor exhaustion on archives with many volumes (#150, #161).
Timestamp control: two complementary additions give full control over timestamps when creating archives.
BitAbstractArchiveCreatorgainssetStoreLastWriteTime(),setStoreCreationTime(), andsetStoreLastAccessTime()to control which timestamp types are stored globally (format support varies: 7z has the most complete support; TAR does not support creation/last-access timestamps). Additionally,BitOutputArchive::addFile()now returns aBitInputItem&, allowing per-item timestamp overrides viasetCreationTime(),setLastWriteTime(), andsetLastAccessTime(). (#184)Example
Deferred library loading:
Bit7zLibraryLoaderallows constructing a loader without immediately loading the 7-zip DLL/SO, then loading (and unloading) it at any later point. Useful for plugin systems and applications that need to control when the native library is brought in.Example
New Features
Bit7zLibrary::useLargePages(): replaces the deprecatedsetLargePageMode().BitArchiveItem::rawPath()/nativeName(): access item paths and names in their raw/native string forms.BitArchiveReader: nested archive constructors: open nested archives directly from a parentBitInputArchive.BitArchiveReader:ArchiveStartOffsetconstructors: open archives embedded in the middle of a file (e.g., self-extracting archives).BitArchiveReader::archiveProperties(): access archive-level format properties.BitArchiveReader::itemsMatching(): get items matching a wildcard pattern.BitError::NoMatchingFile: new error code for filter/regex extraction finding no matching item.BitExtractor::extractFolder(): extract a specific folder from an archive.BitExtractor::extract()withRenameCallback: rename items on the fly during extraction.BitFileCompressor::compress(vector<pair<path, alias>>): compress files using explicit in-archive path aliases. (#313)BitIndicesView: lightweight, non-owning span of item indices; implicitly constructible from a single index, vector, array, or initializer list.BitInputArchive::extractFolderTo(): extract a single folder from an archive.sevenzip_stringtype alias andto_native_string()conversion functions.Improvements
BitItemsVectornow stores items by value instead of viastd::unique_ptr, eliminating one heap allocation per file when indexing items for compression. The improvement scales with the number of files being compressed.OpenErrorcategory for richer archive-opening failure messages.Win32Categoryerror category for correctstd::error_codehandling.to_tstring()is zero-copy when already atstring. (#276)password(), anditemProperties()returnconstreferences; file path retrieval in extract callbacks is skipped when noFileCallbackis set; regex extraction accepts pre-compiledtregexobjects.Bug Fixes
BIT7Z_USE_NATIVE_STRING=ON.Deprecated
Bit7zLibrary::setLargePageMode()-> useuseLargePages().Planned Before Stable Release
Note
This release includes all improvements, patches and fixes introduced in the v4.0.x series up to and including v4.0.12.
Full Changelog: v4.0.12...v4.1.0-beta
This discussion was created from the release Bit7z v4.1.0 Beta.
Beta Was this translation helpful? Give feedback.
All reactions