The default way of distributing sourmash indexes is thru a tar.gz that need to be uncompressed to be used (and contains the sbt.json index metadata and the actual nodes data inside a hidden dir). This is annoying because you need at least double the storage space to be able to hold the original tar.gz file and also the uncompressed data which will actually be used.
A better approach would be to take the same tar.gz and read the sbt.json file from it, and also access the actual data without uncompressing it (in disk, still need to uncompress it to memory). This can be done using the tarfile module, and we already have examples of something very close to this in the TarStorage class.
The difference between this solution and the TarStorage is only where the metadata is present: in the same archive, or in two separate files (one for metadata, other for node data)
@scanon and @psdehal suggested using the zipfile module because ZIPs have an index and support random access without needing to scan the full file.
The text was updated successfully, but these errors were encountered:
The default way of distributing sourmash indexes is thru a
tar.gz
that need to be uncompressed to be used (and contains thesbt.json
index metadata and the actual nodes data inside a hidden dir). This is annoying because you need at least double the storage space to be able to hold the originaltar.gz
file and also the uncompressed data which will actually be used.A better approach would be to take the same
tar.gz
and read thesbt.json
file from it, and also access the actual data without uncompressing it (in disk, still need to uncompress it to memory). This can be done using thetarfile
module, and we already have examples of something very close to this in theTarStorage
class.The difference between this solution and the
TarStorage
is only where the metadata is present: in the same archive, or in two separate files (one for metadata, other for node data)@scanon and @psdehal suggested using the zipfile module because ZIPs have an index and support random access without needing to scan the full file.
The text was updated successfully, but these errors were encountered: