Introduction
This repository contains the specification for the SOZip (Seek-Optimized Zip) profile to the ZIP file format.
What is SOZip ?
A Seek-Optimized ZIP file (SOZip) is a ZIP file that contains one or several Deflate-compressed files that are organized and annotated such that a SOZip-aware reader can perform very fast random access (seek) within a compressed file.
SOZip makes it possible to access large compressed files directly from a .zip file without prior decompression. It is not a new file format, but a profile of the existing ZIP format, done in a fully backward compatible way. ZIP readers that are non-SOZip aware can read a SOZip-enabled file normally and ignore the extended features that support efficient seek capability.
Software implementations
: C/C++ open source geospatial library. Try out its sozip branch, pending future inclusion in its master branch.
- Python sozipfile module: drop-in replacement
for standard
zipfile
module, creating SOZip-enabled files.
See Annex A: Software implementations for more details.
Examples of SOZip files
Examples of SOZip-enabled files can be found in the sozip-examples repository.
Other ZIP related specification
This GitHub organization also hosts the KeyValuePairs extra-field specification, to be able to encode arbitrary key-value pairs of metadata associated with a file within a ZIP. For example to store the Content-Type of a file.
Benchmarking
Done with GDAL sozip branch, on a laptop running a Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz (6 cores / 12 virtual CPUs).
- ZIP generation:
Timing | Action |
---|---|
6.1 s | Multithreaded (12 vCPUs) generation of 489 MB SOZip-enabled file from a 1.6 GB uncompresssed GeoPackage file withsozip nz-building-outlines.gpkg.zip nz-building-outlines.gpkg |
36 s | Single threaded compression of same file to 480 MB with regular zip utility withzip nz-building-outlines-regular.gpkg.zip nz-building-outlines.gpkg |
- Bulk reading: Multithreaded extraction (4 vCPUs) of 3.2 million features with Arrow Array interface
Timing | Action |
---|---|
1.2 s | from SOZip-compressed GeoPackage file withbench_ogr_batch nz-building-outlines.gpkg.zip |
0.7 s | from uncompressed GeoPackage file withbench_ogr_batch nz-building-outlines.gpkg |
- Subsetting: Extraction of 66,377 features with a spatial filter
Timing | Action |
---|---|
1.2 s | from SOZip-compressed GeoPackage file withogr2ogr out.gpkg nz-building-outlines.gpkg.zip -spat 1740000 5910000 1750000 5920000 |
1.1 s | from uncompressed GeoPackage file withogr2ogr out.gpkg nz-building-outlines.gpkg -spat 1740000 5910000 1750000 5920000 |
- Extraction of one feature from its identifier:
Timing | Action |
---|---|
45 ms | from SOZip-compressed GeoPackage file withogr2ogr out.gpkg nz-building-outlines.gpkg.zip -fid 1000000 |
44 ms | from uncompressed GeoPackage file withogr2ogr out.gpkg nz-building-outlines.gpkg -fid 1000000 |
How to contribute ?
We welcome contributions to this specification as issues, pull requests or discussions.
If you use SOZip or plan to use it for your data delivery, or consider doing a SOZip implementation, etc., let us know!
Social media
Credits
The SOZip specification and its GDAL implementation have been developed by Spatialys, with support from Safe Software