-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Force unload data from SBT searches by default (and fix ZipStorage deallocation along the way) #1513
Conversation
I do not think it should be exposed. |
Ready for review and merge @ctb |
Codecov Report
@@ Coverage Diff @@
## latest #1513 +/- ##
==========================================
+ Coverage 90.26% 95.28% +5.01%
==========================================
Files 126 99 -27
Lines 21099 17417 -3682
Branches 1585 1591 +6
==========================================
- Hits 19045 16595 -2450
+ Misses 1827 593 -1234
- Partials 227 229 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
tests/test_sbt.py
Outdated
to_search = load_one_signature(utils.get_test_data(utils.SIG_FILES[0])) | ||
search_obj = make_jaccard_search_query(threshold=0.1) | ||
|
||
tree = SBT.load(str(testsbt), leaf_loader=SigLeaf.load) | ||
old_result = {str(s.signature) for s in tree.find(search_obj, to_search)} | ||
tree.save(str(newsbt)) | ||
|
||
assert newsbt.exists() | ||
|
||
new_tree = SBT.load(str(newsbt), leaf_loader=SigLeaf.load) | ||
assert isinstance(new_tree.storage, ZipStorage) | ||
assert new_tree.storage.list_sbts() == ['new.sbt.json'] | ||
|
||
to_search = load_one_signature(utils.get_test_data(utils.SIG_FILES[0])) | ||
new_result = {str(s.signature) for s in new_tree.find(search_obj, to_search)} | ||
|
||
print("*" * 60) | ||
print("{}:".format(to_search)) | ||
search_obj = make_jaccard_search_query(threshold=0.1) | ||
old_result = {str(s.signature) for s in tree.find(search_obj, to_search)} | ||
new_result = {str(s.signature) for s in new_tree.find(search_obj, to_search)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is fun: I reordered the test because the call to tree.save(str(newsbt))
closes the original zipfile used as storage in testsbt
. Why did it trigger only when defaulting to unload_data=True
? Because with unload_data=False
the data was still in memory, even with the storage zipfile closed.
the call to .close()
inside .save()
should probably be something like .flush()
instead, which might fix the warnings about closed files too...
...did the scope creep on this one a bit? 😆 |
Yes, but I'll argue it is for good reasons. This fixes a bunch of So, instead of reordering the test, I changed the But there are some warnings that can be fixed, so... I'll ping again for another review =] |
So, this one is ready for review and merge @ctb @bluegenes... But there is a new error during doc building that I think was triggered by some newly released version of the docs dependencies. Sigh. |
On Tue, May 11, 2021 at 05:04:16PM -0700, Luiz Irber wrote:
So, this one is ready for review and merge @ctb @bluegenes...
might get to it tomorrow.
But there is a new error during doc building that I think was triggered by some newly released version of the docs dependencies. Sigh.
ok, post as issue I guess?
|
Fixing in #1516 |
Missing bit from #1370, re: #1370 (comment)
While changing the SBT code, this triggered deeper bugs in ZipStorage, so I fixed them too. I changed the
.save()
method to only.flush()
the storage, but not.close()
it. Because the.flush()
call uses a temp file, I'm also avoiding deleting it (because it becomes the newZipStorage
), and also dealing with some interesting cases during deallocation (flushing/closing the underlying Zip file properly).TODO:
should this be exposed back to the command line? I don't think it actually makes sense for current use cases, sounload_data
inSBT.find
becomes available only at the Python API level