Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pass through of xr compute, persist and chunk to Scene #1017

Merged
merged 6 commits into from May 3, 2022

Conversation

BENR0
Copy link
Collaborator

@BENR0 BENR0 commented Dec 11, 2019

Adds the xarray interfaces compute, persist and chunk to Scene.

If for example scn.compute() is called it is iterated over all Datasets in the Scene and compute is called on every Dataset (which is a xarray.DataArray).

  • Closes Add compute method to Scene #1015
  • Tests added and test suite added to parent suite
  • Tests passed
  • Passes flake8 satpy
  • Fully documented
  • Add your name to AUTHORS.md if not there already

@djhoese
Copy link
Member

djhoese commented Dec 11, 2019

Nice start. I just thought of something, in order to match the xarray/dask interfaces these should probably return a new Scene object. Thoughts?

@coveralls
Copy link

coveralls commented Dec 11, 2019

Coverage Status

Coverage decreased (-0.03%) to 87.334% when pulling b4226b9 on BENR0:xarray_interfaces into fb61664 on pytroll:master.

@codecov
Copy link

codecov bot commented Dec 11, 2019

Codecov Report

Merging #1017 into master will decrease coverage by 0.03%.
The diff coverage is 20%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1017      +/-   ##
==========================================
- Coverage   87.36%   87.33%   -0.04%     
==========================================
  Files         183      183              
  Lines       28161    28194      +33     
==========================================
+ Hits        24603    24623      +20     
- Misses       3558     3571      +13
Impacted Files Coverage Δ
satpy/scene.py 88.69% <20%> (-1.67%) ⬇️
satpy/writers/cf_writer.py 90.67% <0%> (-0.42%) ⬇️
satpy/tests/writer_tests/test_cf.py 98.49% <0%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fb61664...8281c57. Read the comment docs.

@BENR0
Copy link
Collaborator Author

BENR0 commented Dec 12, 2019

Haven't thought about that it's a good point. Yes I think if xarray and dask return new objects satpy should too. Maybe we can make that a default but add a parameter like "inplace"?

@djhoese
Copy link
Member

djhoese commented Dec 12, 2019

Looks like stickler isn't so happy with your indentation. I'm not sure I like the inplace kwarg, does it really provide you anything? Pandas, xarray, and dask are all not inplace anymore. Although I understand the _apply decorator, I'm worried it causes a little too much indirection. I'm willing to be convinced, but just worried at first glance.

@BENR0
Copy link
Collaborator Author

BENR0 commented Dec 12, 2019

Actually after adding the scn.copy() part I was not so convinced myself since in any way a Scene is returned so from my side we can remove the "inplace" parameter.
It's kind of funny I wasn't sure of the decorator either because the saving of repetitive code is minimal. So I guess you are thinking of moving it into the scene and just call it in each return?

@djhoese
Copy link
Member

djhoese commented Dec 12, 2019

So I guess you are thinking of moving it into the scene and just call it in each return?

What is "it" in that question? I was thinking doing a new_scn = scn.copy() followed by the for loop with the compute/persist/chunk calls in each method was good enough.

@BENR0
Copy link
Collaborator Author

BENR0 commented Dec 12, 2019

I meant the apply function with "it". That would at least save some code. But sure we can also copy the loop to every method. I will change that later or tomorrow morning then.

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job. Looks nice and clean. Do you think you could add some tests?

Otherwise, I wonder if you could link to the xarray method in the docstrings by doing xarray.DataArray.chunk when referencing the methods (include single backticks around them). I think that sphinx (the intersphinx extension) should pick up on this when rendering the sphinx docs. If it doesn't then maybe you could add a second line to the docstring like:

See :meth:`xarray.DataArray.chunk` for more details.

Lastly, method docstrings should end in a period. Do you think you could add those? We should probably make stickler start checking docstrings.

@codecov
Copy link

codecov bot commented Dec 13, 2019

Codecov Report

Merging #1017 (d3bdfdc) into main (b587681) will increase coverage by 0.07%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1017      +/-   ##
==========================================
+ Coverage   93.39%   93.47%   +0.07%     
==========================================
  Files         273      275       +2     
  Lines       40612    40772     +160     
==========================================
+ Hits        37929    38111     +182     
+ Misses       2683     2661      -22     
Flag Coverage Δ
behaviourtests 4.84% <8.10%> (+<0.01%) ⬆️
unittests 94.01% <100.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
satpy/scene.py 93.02% <100.00%> (+0.20%) ⬆️
satpy/tests/test_scene.py 99.45% <100.00%> (+0.01%) ⬆️
satpy/resample.py 79.34% <0.00%> (-0.69%) ⬇️
satpy/readers/seviri_l1b_native.py 85.39% <0.00%> (-0.25%) ⬇️
satpy/modifiers/geometry.py 87.30% <0.00%> (-0.20%) ⬇️
satpy/readers/ahi_hsd.py 97.25% <0.00%> (-0.05%) ⬇️
satpy/readers/fci_l1c_nc.py 97.93% <0.00%> (-0.05%) ⬇️
satpy/readers/utils.py 91.79% <0.00%> (ø)
satpy/composites/viirs.py 86.40% <0.00%> (ø)
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b587681...d3bdfdc. Read the comment docs.

@djhoese
Copy link
Member

djhoese commented Dec 13, 2019

I rendered the sphinx locally for your changes and noticed there needs to be a blank line between the "subject" and the rest of the docstring otherwise they get rendered as one line in the HTML. I also tweaked the verb tense to make flake8-docstring happy. This then lead to flake8-docstring complaining that we were using the name of the method inside the method's docstring. I told flake8 to ignore that for now. @mraspaud thoughts?

I'll see if I can cleanly ignore the check just for those methods instead of globally.

Edit: Got it!

@mraspaud mraspaud added the enhancement code enhancements, features, improvements label Jan 21, 2020
satpy/tests/test_scene.py Outdated Show resolved Hide resolved
satpy/tests/test_scene.py Outdated Show resolved Hide resolved
satpy/tests/test_scene.py Outdated Show resolved Hide resolved
satpy/tests/test_scene.py Outdated Show resolved Hide resolved
satpy/tests/test_scene.py Outdated Show resolved Hide resolved
satpy/tests/test_scene.py Outdated Show resolved Hide resolved
satpy/tests/test_scene.py Outdated Show resolved Hide resolved
satpy/scene.py Outdated
for k in new_scn.datasets.keys():
new_scn[k] = new_scn[k].chunk(**kwargs)
return new_scn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W293 blank line contains whitespace

lines_sparse = np.array(list(range(1, nlines, 20)) + [nlines])
times_sparse = mjd_1970 + lines_sparse / 24 / 3600
acq_time_s = ['LINE:={}\rTIME:={:.6f}\r'.format(l, t)
for l, t in zip(lines_sparse, times_sparse)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E741 ambiguous variable name 'l'

@ghost
Copy link

ghost commented Dec 31, 2020

DeepCode's analysis on #fe1a32 found:

  • ⚠️ 1 warning, ℹ️ 33 minor issues. 👇
  • ✔️ 25 issues were fixed.

Top issues

Description Example fixes
Defining only __eq__ but not __ne__ will result in a Python2 error if objects are compared with inequality. Occurrences: 🔧 Example fixes
Unused CRS imported from pyproj Occurrences: 🔧 Example fixes
Statement seems to have no effect Occurrences: 🔧 Example fixes

👉 View analysis in DeepCode’s Dashboard | Configure the bot

Copy link
Member

@sfinkens sfinkens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, very useful!

satpy/scene.py Outdated
"""
new_scn = self.copy()
for k in new_scn._datasets.keys():
new_scn[k] = new_scn[k].compute(**kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if these methods could compute (same for the persist) all the DataArrays at the same time. As is this will likely recompute share dependencies of the DataArrays.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I don't exactly understand what you mean with "at the same time".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally for dask arrays you want to call res1, res2, res3 = dask.array.compute(array1, array2, array3) so that all dependency calculations for generating those three arrays are only computed once. I'm not sure how that can be done with xarray. You could try passing the DataArrays to da.compute and see if that works.

Copy link
Collaborator Author

@BENR0 BENR0 Dec 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Yes indeed that would be useful. I will check if that is possible with xarray or as you suggested with da.compute.

@djhoese I changed the code to use dask.compute / dask.persist now.

@djhoese
Copy link
Member

djhoese commented Dec 2, 2021

I'm not sure what codefactors problem is. If you merged with main then this isn't supposed to be an issue (the asserts in the test directory).

@BENR0
Copy link
Collaborator Author

BENR0 commented Dec 2, 2021

I'm not sure what codefactors problem is. If you merged with main then this isn't supposed to be an issue (the asserts in the test directory).

There were so many changes since the original PR that it was easier to redo the changes on the current main. That's why I force pushed.

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small suggestions, but otherwise looks good.

satpy/scene.py Outdated
@@ -1143,6 +1143,45 @@ def save_datasets(self, writer=None, filename=None, datasets=None, compute=True,
**kwargs)
return writer.save_datasets(dataarrays, compute=compute, **save_kwargs)

def compute(self, **kwargs):
"""Call `compute` on all Scene datasets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say "data arrays" here instead of datasets to avoid the future confusion when Scene is more dependent on xarray Dataset objects?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes definitely. I think this is better since this otherwise adds to confusion.

satpy/scene.py Outdated Show resolved Hide resolved
@djhoese
Copy link
Member

djhoese commented Dec 7, 2021

Looks like the jobs got hung up, I've restarted them.

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @mraspaud or others, have any comments?

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@djhoese djhoese added this to In progress in PCW Spring 2022 via automation May 3, 2022
@djhoese djhoese merged commit e5a71d5 into pytroll:main May 3, 2022
PCW Spring 2022 automation moved this from In progress to Done May 3, 2022
@BENR0 BENR0 deleted the xarray_interfaces branch August 23, 2022 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement code enhancements, features, improvements
Projects
Development

Successfully merging this pull request may close these issues.

Add compute method to Scene
6 participants