New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix dataset writing so computations are shared between tasks #216
Conversation
Oops I forgot this needs my trollimage updates too. Will make a PR tomorrow. |
satpy/writers/simple_image.py
Outdated
@@ -23,6 +23,7 @@ | |||
|
|||
import logging | |||
|
|||
import dask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F401 'dask' imported but unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mraspaud Bug in stickler? This import is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see it being used in this file ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha I was looking at the wrong file. This was from a previous commit that stickler didn't comment on I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just nagging...
@@ -325,17 +327,44 @@ def get_filename(self, **kwargs): | |||
"No filename pattern or specific filename provided") | |||
return self.filename_parser.compose(kwargs) | |||
|
|||
def save_datasets(self, datasets, **kwargs): | |||
def save_datasets(self, datasets, compute=True, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring should be updated to exlain compute
, and how that affects the return value.
for ds in datasets: | ||
self.save_dataset(ds, **kwargs) | ||
res = self.save_dataset(ds, compute=False, **kwargs) | ||
if isinstance(res, tuple): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see self.save_dataset
returning a tuple, even longer down the call chain, am I missing something ?
satpy/writers/__init__.py
Outdated
|
||
def save_dataset(self, dataset, filename=None, fill_value=None, **kwargs): | ||
def save_dataset(self, dataset, filename=None, fill_value=None, | ||
compute=True, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring should mention compute
satpy/writers/__init__.py
Outdated
@@ -360,15 +389,18 @@ def __init__(self, | |||
self.enhancer = Enhancer(ppp_config_dir=self.ppp_config_dir, | |||
enhancement_config_file=enhancement_config) | |||
|
|||
def save_dataset(self, dataset, filename=None, fill_value=None, overlay=None, decorate=None, **kwargs): | |||
def save_dataset(self, dataset, filename=None, fill_value=None, | |||
overlay=None, decorate=None, compute=True, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring should mention compute
and the return value
dst_ds = None | ||
def save_image(self, img, filename=None, floating_point=False, | ||
compute=True, **kwargs): | ||
"""Save the image to the given *filename* in geotiff_ format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring should mention compute
and the return value
satpy/scene.py
Outdated
@@ -766,7 +766,7 @@ def load_writer_config(self, config_files, **kwargs): | |||
return writer | |||
|
|||
def save_dataset(self, dataset_id, filename=None, writer=None, | |||
overlay=None, **kwargs): | |||
overlay=None, compute=True, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring should mention compute
and what the function returns.
XArray defaults to using netcdf4-python engine if possible. It falls back to scipy. PRs are pending for ways to specify the engine from global xarray options.
This PR adds a
compute
keyword argument to all save_dataset, save_datasets, save_image methods in the Scene and Writers. It defaults to True in all of these methods so individual use cases have not changed. It is also handled by the higher level methods so that the user doesn't see a difference except in performance.Handling writes in the way this PR implements will allow all dask objects to be computed at the same and is how it is implemented in the higher level methods. This means that if tasks share computations (input datasets) they will be shared and shouldn't have to be recalculated.
Additionally this will allow for more flexibility in the future. If a user calls
scn.save_datasets(..., compute=False)
they will get a daskDelayed
object back which they can then use with other delayed objects.If compute is True or a writer does not support returning a delayed object (and they don't complain about the compute keyword being passed) then functions will return whatever the result of the delayed function call was. Typically this is just None. This should also allow future flexibility if a writer should be returning something (maybe filenames of the files created?).
WIP: Docstrings and tests to make sure this functionality completely works.
git diff origin/develop **/*py | flake8 --diff