Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support all-in-one SBT indexes #490

Closed
luizirber opened this issue Jun 8, 2018 · 0 comments · Fixed by #648
Closed

Support all-in-one SBT indexes #490

luizirber opened this issue Jun 8, 2018 · 0 comments · Fixed by #648
Labels

Comments

@luizirber
Copy link
Member

@luizirber luizirber commented Jun 8, 2018

The default way of distributing sourmash indexes is thru a tar.gz that need to be uncompressed to be used (and contains the sbt.json index metadata and the actual nodes data inside a hidden dir). This is annoying because you need at least double the storage space to be able to hold the original tar.gz file and also the uncompressed data which will actually be used.

A better approach would be to take the same tar.gz and read the sbt.json file from it, and also access the actual data without uncompressing it (in disk, still need to uncompress it to memory). This can be done using the tarfile module, and we already have examples of something very close to this in the TarStorage class.

The difference between this solution and the TarStorage is only where the metadata is present: in the same archive, or in two separate files (one for metadata, other for node data)

@scanon and @psdehal suggested using the zipfile module because ZIPs have an index and support random access without needing to scan the full file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant