Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metadata search index for more responsive FUSE #182

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dnnr
Copy link

@dnnr dnnr commented Jan 22, 2015

This adds the construction of a metadata index during archive creation,
which can be used to narrow down the location of particular entries
within the items list. The FUSE mount uses this index to fetch only
those chunks that are relevant to the specific operation instead of
fetching the metadata of the entire archive.

As a result, using FUSE mounts of large archives is consirably more
reponsive. And more importantly, the performance isn't indirectly
proportional to the the archive size anymore. Any bulk operations that
require the full metadata tree anyways (such as running "find" on the
entire archive) are not negatively impacted.

For this to work, the filesystem traversal order had to be changed from
depth-first to breadth-first, which introduces the new metadata version
number 2. Any otherwise unrelated parts of the code and tests that
relied on the previous behavior are adjusted accordingly.

@dnnr
Copy link
Author

dnnr commented Jan 22, 2015

This could be improved even further by adding a read-cache into attic. As far as I could tell, the Cache class is currently used for write access only, right? As of yet, my patch still fetches chunks repeatedly if (and only if) multiple archives are loaded that have intersecting entries their metadata['items'] list... which isn't unlikely for real datasets.

I wasn't sure though if this should be just slapped into the Cache class and used in do_mount, so I left that out for now.

This adds the construction of a metadata index during archive creation,
which can be used to narrow down the location of particular entries
within the items list. The FUSE mount uses this index to fetch only
those chunks that are relevant to the specific operation instead of
fetching the metadata of the entire archive.

As a result, using FUSE mounts of large archives is consirably more
reponsive. And more importantly, the performance isn't indirectly
proportional to the the archive size anymore. Any bulk operations that
require the full metadata tree anyways (such as running "find" on the
entire archive) are not negatively impacted.

For this to work, the filesystem traversal order had to be changed from
depth-first to breadth-first, which introduces the new metadata version
number 2. Any otherwise unrelated parts of the code and tests that
relied on the previous behavior are adjusted accordingly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant