Make combine_arrays understand non-numpy arrays #1216

gerritholl · 2020-05-25T14:15:20Z

Make combine_arrays understand non-numpy arrays in attributes. I believe this should fix #1215.

Added a regression test triggering the ValueError raised when trying to read a FCI composite (see pytroll#1215).

In combine metadata, replace an ndarray instance check by a check for the array interface (attribute __array__).

Added another regression test to combine_metadata, that better simulates the situation with the FCI reader. The ancillary_variables attribute is actually List[xarray.DataArray], so this needs to be handled as well.

In combine_metadata, cover lists of arrays such as happen when trying to read an FCI composite.

The list of arrays was catching too much, such as [()]. Only catch lists of arrays when they are non-empty.

PEP 8 fix, I never know where it wants those pesky list generators indented!

satpy/dataset.py

coveralls · 2020-05-25T15:18:44Z

Coverage increased (+0.01%) to 89.777% when pulling f307d98 on gerritholl:combine-metadata-array-interface into 836c657 on pytroll:master.

codecov · 2020-05-25T15:40:03Z

Codecov Report

❗ No coverage uploaded for pull request base (master@836c657). Click here to learn what that means.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##             master    #1216   +/-   ##
=========================================
  Coverage          ?   89.77%           
=========================================
  Files             ?      202           
  Lines             ?    30031           
  Branches          ?        0           
=========================================
  Hits              ?    26961           
  Misses            ?     3070           
  Partials          ?        0

Impacted Files	Coverage Δ
satpy/dataset.py	`93.75% <100.00%> (ø)`
satpy/tests/test_dataset.py	`100.00% <100.00%> (ø)`
satpy/tests/reader_tests/test_viirs_compact.py	`95.83% <0.00%> (ø)`
satpy/tests/enhancement_tests/test_enhancements.py	`100.00% <0.00%> (ø)`
satpy/tests/test_writers.py	`98.55% <0.00%> (ø)`
satpy/tests/reader_tests/test_omps_edr.py	`98.98% <0.00%> (ø)`
satpy/readers/goes_imager_nc.py	`65.72% <0.00%> (ø)`
satpy/composites/crefl_utils.py	`84.52% <0.00%> (ø)`
satpy/readers/generic_image.py	`93.33% <0.00%> (ø)`
satpy/utils.py	`70.90% <0.00%> (ø)`
... and 194 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 836c657...f307d98. Read the comment docs.

gerritholl · 2020-05-25T16:07:22Z

From the combine_metadata docstring:

If any keys are not equal or do not exist in all provided dictionaries
then they are not included in the returned dictionary.

is that correct? The implementation is comparing values as well as keys! Should this be "if any values are not equal"?
if yes; how could this be implemented in a dask-friendly way? The value of the ancillary_variables variable attribute is 124 MB per channel for the highest resolution channels on FCI. How can we decide whether or not to include a key if this decision depends on the result of a postponed evaluation?

(I'd like to resolve these questions before I do more stylistic work, such as solving the issues codebeat is complaining about)

gerritholl · 2020-05-26T07:16:42Z

One option would be that for arrays, instead of comparing the values, it compares object identity or perhaps uses numpy.shares_memory.

djhoese · 2020-05-26T13:10:07Z

is that correct? The implementation is comparing values as well as keys! Should this be "if any values are not equal"?

I think this was supposed to mean "if the values of any keys are not equal".

Regardless, would it just be safer/smarter to special case ancillary_variables like we do "time" attributes? Or what about special casing just on DataArray objects specifically and use their .name property for comparison. Or shape, dtype, and name? Would shares_memory work for dask arrays? That function also returns True if one input is a subset of the other, right? That might not be what we want.

gerritholl · 2020-05-26T13:26:17Z

I don't know what we want, because I don't have a good idea of the intention of this function, how it is used in practice, or what might break in case of a false positive or false negative equals matching. I'm fine with any solution that doesn't involve computing the arrays at this stage.

djhoese · 2020-05-26T15:02:42Z

I think the original use case were things that are now included as ancillary_variables or as coordinates on a DataArray. For example, VIIRS has a moon illumination fraction that could be provided as a per-scan array. I think this is now requested/loaded as a separate dataset by the few composites that use it. Another example is pressure levels for the NUCAPS reader. I think these are still added to .attrs as an attribute. I'm not sure an identity check would work in this case since it might be recreated for every dataset loaded by the reader.

Off the top of my head, I think ancillary_variables is the only case where a reader developer should need to add delayed "things" (dask arrays) outside of the normal DataArray .coords interface. So a special case for ancillary_variables would be OK with me, but a blanket DataArray specific check would be fine too. In either case I think we can assume DataArrays and use the name and shape of the arrays as a basic equality check.

gerritholl · 2020-05-26T16:48:00Z

@djhoese @mraspaud Do you have a preferred alternative?

mraspaud · 2020-05-26T19:19:28Z

I would use 'is' as a comparator for datarrays

In combine_metadata, compare arrays with object identity rather than by value, avoiding expensive computation. Refactor the combine_metadata function with three small helpers to reduce code complexity. Expand unit tests for combine_metadata.

PEP 8 fixes, forgot to remove #breakpoint() from test code.

gerritholl · 2020-05-27T10:10:05Z

The test failures appear unrelated to this PR.

djhoese

I was hoping this new functionality would only apply to DataArrays and not numpy or dask arrays, but this is probably better overall.

Thanks!

mraspaud

LGTM

gerritholl added 6 commits May 25, 2020 16:12

Regression test for combine_arrays failure

0940289

Added a regression test triggering the ValueError raised when trying to read a FCI composite (see pytroll#1215).

Replace isinstace check by array interface check

b397372

In combine metadata, replace an ndarray instance check by a check for the array interface (attribute __array__).

Added regression test better simulating FCI

bb87b3c

Added another regression test to combine_metadata, that better simulates the situation with the FCI reader. The ancillary_variables attribute is actually List[xarray.DataArray], so this needs to be handled as well.

Cover List[xarray.DataArray] in combine_metadata

479eccf

In combine_metadata, cover lists of arrays such as happen when trying to read an FCI composite.

Don't catch empty collections

6f58f83

The list of arrays was catching too much, such as [()]. Only catch lists of arrays when they are non-empty.

PEP8 fix

be63cc8

PEP 8 fix, I never know where it wants those pesky list generators indented!

gerritholl commented May 25, 2020

View reviewed changes

satpy/dataset.py Outdated Show resolved Hide resolved

gerritholl marked this pull request as ready for review May 25, 2020 15:16

gerritholl requested review from djhoese and mraspaud as code owners May 25, 2020 15:16

Merge branch 'master' into combine-metadata-array-interface

bf87970

gerritholl added 2 commits May 27, 2020 11:31

PEP8 fixes

f307d98

PEP 8 fixes, forgot to remove #breakpoint() from test code.

djhoese approved these changes May 27, 2020

View reviewed changes

mraspaud approved these changes May 28, 2020

View reviewed changes

mraspaud assigned gerritholl May 28, 2020

mraspaud added bug component:scene labels May 28, 2020

mraspaud merged commit 2bf2e96 into pytroll:master May 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make combine_arrays understand non-numpy arrays #1216

Make combine_arrays understand non-numpy arrays #1216

gerritholl commented May 25, 2020 •

edited

coveralls commented May 25, 2020 •

edited

codecov bot commented May 25, 2020 •

edited

gerritholl commented May 25, 2020 •

edited

gerritholl commented May 26, 2020

djhoese commented May 26, 2020

gerritholl commented May 26, 2020

djhoese commented May 26, 2020

gerritholl commented May 26, 2020

mraspaud commented May 26, 2020

gerritholl commented May 27, 2020

djhoese left a comment

mraspaud left a comment

Make combine_arrays understand non-numpy arrays #1216

Make combine_arrays understand non-numpy arrays #1216

Conversation

gerritholl commented May 25, 2020 • edited

coveralls commented May 25, 2020 • edited

codecov bot commented May 25, 2020 • edited

Codecov Report

gerritholl commented May 25, 2020 • edited

gerritholl commented May 26, 2020

djhoese commented May 26, 2020

gerritholl commented May 26, 2020

djhoese commented May 26, 2020

gerritholl commented May 26, 2020

mraspaud commented May 26, 2020

gerritholl commented May 27, 2020

djhoese left a comment

Choose a reason for hiding this comment

mraspaud left a comment

Choose a reason for hiding this comment

gerritholl commented May 25, 2020 •

edited

coveralls commented May 25, 2020 •

edited

codecov bot commented May 25, 2020 •

edited

gerritholl commented May 25, 2020 •

edited