Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question: can MultiIndex wrap other indices? #1425

Closed
ctb opened this issue Mar 31, 2021 · 8 comments
Closed

question: can MultiIndex wrap other indices? #1425

ctb opened this issue Mar 31, 2021 · 8 comments

Comments

@ctb
Copy link
Contributor

ctb commented Mar 31, 2021

The changes in #1420 remove the short-lived filter function, which should make it possible for MultiIndex to wrap not just LinearIndex objects but also other MultiIndex objects as well as LCA and SBT databases - basically, any Index subclass.

This is a placeholder issue to remind me/us of this possibility and to look into seeing if it's true.

relevant to -

@ctb
Copy link
Contributor Author

ctb commented May 8, 2021

Note that MultiIndex now uses load_file_as_index underneath, so this should be supported. But we might want to put some thought into testing it with some edge cases etc.

@ctb
Copy link
Contributor Author

ctb commented Jun 15, 2021

OK, was looking into this b/c of manifests, and it turns out life is more complex :)

  • MultiIndex.load_from_path uses LinearIndex.load(...) underneath, so it only loads JSON .sig files currently
  • MultiIndex.load_from_pathlist uses load_file_as_index underneath, so it loads index files as well

@ctb
Copy link
Contributor Author

ctb commented Jun 19, 2021

In #1590 I'm proposing to add LazyMultiIndex which dramatically simplifies this functionality by using manifests to support completely lazy selection on/loading of signatures.

@ctb
Copy link
Contributor Author

ctb commented Jun 26, 2021

(we should really come up with a naming scheme that makes it clear which classes are lazy, and/or hold everything in memory vs on disk, and support or require manifests)

@ctb
Copy link
Contributor Author

ctb commented Mar 12, 2022

in various places (including most recently #1837) we are continuing to expand this functionality and make it generic, so it's clear this works - load_from_pathlist uses it and is now pretty well tested. This is probably something we can close although the MultiIndex class itself could be refactored a bit and documented more clearly 🤔

I think some key questions are being asked over in #1878 about whether we should be trying to load files with load_file_as_index (all sourmash-compatible files) instead of LinearIndex.load (which just loads JSON).

@ctb
Copy link
Contributor Author

ctb commented Mar 12, 2022

...which just comes back around to the question of "should we be using lazy loading" that's asked above, actually. Because we probably don't want to try to keep large databases in memory...

@ctb
Copy link
Contributor Author

ctb commented Mar 25, 2022

Now that I'm working towards merging #1891, this is becoming more relevant.

In addition to the above comment:

  • MultiIndex.load_from_path uses LinearIndex.load(...) underneath, so it only loads JSON .sig files currently

  • MultiIndex.load_from_pathlist uses load_file_as_index underneath, so it loads index files as well

MultiIndex.load_from_directory also only loads JSON files currently. I think this is appropriate, FWIW. Pathlists or manifests are good because you have to explicitly list out the files you want to include!

But we may want to change MultiIndex so it doesn't create new manifests when they already exist (see call to CollectionManifest.create_manifest in MultiIndex.load).

Also consider overlapping functionality with StandaloneManifestIndex in #1891.

@ctb
Copy link
Contributor Author

ctb commented Mar 26, 2022

After thoroughly reading and grokking MultiIndex and MultiIndex test cases, this is properly tested, and my conclusions are that:

(a) yes, MultiIndex can wrap other indices;
(b) but it should not be used to do so, in general, because it will load all the signatures in those indices into memory.

I will close this with a new issue to drag MultiIndex into the modern day world of manifests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant