New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make viirs-compact datasets compatible with dask distributed #1546
Make viirs-compact datasets compatible with dask distributed #1546
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think the most basic way of testing would be to do:
from dask.distributed import Client
client = Client()
# normal reader/Scene loading
scn['I01'].compute()
This should run it in a "local cluster" and I assume this fails with the current master branch otherwise this isn't much of a test. There may be issues with creating a Client in testing. I know I had some issues with this when working on the MultiScene. We might be able to steal some of dask's pytest "fixtures" but since these aren't public API we probably want to avoid this as much as possible.
Codecov Report
@@ Coverage Diff @@
## master #1546 +/- ##
==========================================
- Coverage 92.09% 90.98% -1.11%
==========================================
Files 251 251
Lines 36711 36698 -13
==========================================
- Hits 33808 33390 -418
- Misses 2903 3308 +405
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just couple comments.
DeepCode failed to analyze this pull requestSomething went wrong despite trying multiple times, sorry about that. |
I added a simple distributed test. Seems to run fine, fails on master. |
def test_distributed(self): | ||
"""Check that distributed computations work.""" | ||
from dask.distributed import Client | ||
self.client = Client() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to have a separate class that always creates the client in setup rather than assigning to self.client
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I end up having more tests, I will split the TestCase, yes. For now it's fine like it is I think
This PR makes viirs-compact datasets compatible with dask distributed.
The problem was coming from the
map_blocks
calls using a method of the filehandler as a callable, thus embedding all the filehandler attributes including an openh5py.File instance
.Feedback on how to test this would be very much appreciated!