New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pass through of xr compute, persist and chunk to Scene #1017
Conversation
Nice start. I just thought of something, in order to match the xarray/dask interfaces these should probably return a new Scene object. Thoughts? |
Codecov Report
@@ Coverage Diff @@
## master #1017 +/- ##
==========================================
- Coverage 87.36% 87.33% -0.04%
==========================================
Files 183 183
Lines 28161 28194 +33
==========================================
+ Hits 24603 24623 +20
- Misses 3558 3571 +13
Continue to review full report at Codecov.
|
Haven't thought about that it's a good point. Yes I think if xarray and dask return new objects satpy should too. Maybe we can make that a default but add a parameter like "inplace"? |
Looks like stickler isn't so happy with your indentation. I'm not sure I like the inplace kwarg, does it really provide you anything? Pandas, xarray, and dask are all not inplace anymore. Although I understand the |
Actually after adding the scn.copy() part I was not so convinced myself since in any way a Scene is returned so from my side we can remove the "inplace" parameter. |
What is "it" in that question? I was thinking doing a |
I meant the apply function with "it". That would at least save some code. But sure we can also copy the loop to every method. I will change that later or tomorrow morning then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job. Looks nice and clean. Do you think you could add some tests?
Otherwise, I wonder if you could link to the xarray method in the docstrings by doing xarray.DataArray.chunk
when referencing the methods (include single backticks around them). I think that sphinx (the intersphinx extension) should pick up on this when rendering the sphinx docs. If it doesn't then maybe you could add a second line to the docstring like:
See :meth:`xarray.DataArray.chunk` for more details.
Lastly, method docstrings should end in a period. Do you think you could add those? We should probably make stickler start checking docstrings.
Codecov Report
@@ Coverage Diff @@
## main #1017 +/- ##
==========================================
+ Coverage 93.39% 93.47% +0.07%
==========================================
Files 273 275 +2
Lines 40612 40772 +160
==========================================
+ Hits 37929 38111 +182
+ Misses 2683 2661 -22
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
I rendered the sphinx locally for your changes and noticed there needs to be a blank line between the "subject" and the rest of the docstring otherwise they get rendered as one line in the HTML. I also tweaked the verb tense to make flake8-docstring happy. This then lead to flake8-docstring complaining that we were using the name of the method inside the method's docstring. I told flake8 to ignore that for now. @mraspaud thoughts? I'll see if I can cleanly ignore the check just for those methods instead of globally. Edit: Got it! |
2d895fd
to
ae4680f
Compare
satpy/scene.py
Outdated
for k in new_scn.datasets.keys(): | ||
new_scn[k] = new_scn[k].chunk(**kwargs) | ||
return new_scn | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
W293 blank line contains whitespace
lines_sparse = np.array(list(range(1, nlines, 20)) + [nlines]) | ||
times_sparse = mjd_1970 + lines_sparse / 24 / 3600 | ||
acq_time_s = ['LINE:={}\rTIME:={:.6f}\r'.format(l, t) | ||
for l, t in zip(lines_sparse, times_sparse)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E741 ambiguous variable name 'l'
DeepCode's analysis on #fe1a32 found:
Top issues
👉 View analysis in DeepCode’s Dashboard | Configure the bot |
fe1a326
to
b87eaef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, very useful!
satpy/scene.py
Outdated
""" | ||
new_scn = self.copy() | ||
for k in new_scn._datasets.keys(): | ||
new_scn[k] = new_scn[k].compute(**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if these methods could compute (same for the persist) all the DataArrays at the same time. As is this will likely recompute share dependencies of the DataArrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I don't exactly understand what you mean with "at the same time".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally for dask arrays you want to call res1, res2, res3 = dask.array.compute(array1, array2, array3)
so that all dependency calculations for generating those three arrays are only computed once. I'm not sure how that can be done with xarray. You could try passing the DataArrays to da.compute
and see if that works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. Yes indeed that would be useful. I will check if that is possible with xarray or as you suggested with da.compute
.
@djhoese I changed the code to use dask.compute
/ dask.persist
now.
I'm not sure what codefactors problem is. If you merged with |
There were so many changes since the original PR that it was easier to redo the changes on the current main. That's why I force pushed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small suggestions, but otherwise looks good.
satpy/scene.py
Outdated
@@ -1143,6 +1143,45 @@ def save_datasets(self, writer=None, filename=None, datasets=None, compute=True, | |||
**kwargs) | |||
return writer.save_datasets(dataarrays, compute=compute, **save_kwargs) | |||
|
|||
def compute(self, **kwargs): | |||
"""Call `compute` on all Scene datasets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we say "data arrays" here instead of datasets to avoid the future confusion when Scene is more dependent on xarray Dataset objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes definitely. I think this is better since this otherwise adds to confusion.
Co-authored-by: David Hoese <david.hoese@ssec.wisc.edu>
Looks like the jobs got hung up, I've restarted them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @mraspaud or others, have any comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Adds the xarray interfaces
compute
,persist
andchunk
to Scene.If for example
scn.compute()
is called it is iterated over all Datasets in the Scene andcompute
is called on every Dataset (which is a xarray.DataArray).flake8 satpy
AUTHORS.md
if not there already