Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ancillary variable confusion after resampling #2336

Merged
merged 4 commits into from Feb 2, 2023

Conversation

gerritholl
Copy link
Collaborator

@gerritholl gerritholl commented Dec 22, 2022

Fix ancillary variable confusion after resampling.

Previously, replace_anc() was only looking at _satpy_id_keys in the variable attributes. When this was not set, it would use None for constructing the DataIDs used for comparison. When we pass None as ID keys to DataID, the DataID becomes empty: DataID(), such that a comparison always evaluates to True. As a consequence, instead of replacing the "correct" of N ancillary variables, it would replace the first one N times and the other ones not at all.

In this PR, I am taking the default_id_keys_config if no _satpy_id_keys are set in the variable attributes.

I'm not sure if _satpy_id_keys is supposed to be always set (possibly to default_id_keys_config, or if it is expected that it is not always set.

Added a unit test reproducing the failure described in GH2329.
In replacing ancillary variables, use default ID keys config if no id
keys are defined as metadata in the datasets in question.
@gerritholl gerritholl marked this pull request as ready for review December 22, 2022 11:04
@codecov
Copy link

codecov bot commented Dec 22, 2022

Codecov Report

Merging #2336 (ad3ff76) into main (bffa2fb) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #2336   +/-   ##
=======================================
  Coverage   94.67%   94.67%           
=======================================
  Files         328      328           
  Lines       48544    48554   +10     
=======================================
+ Hits        45960    45970   +10     
  Misses       2584     2584           
Flag Coverage Δ
behaviourtests 4.43% <8.33%> (-0.01%) ⬇️
unittests 95.30% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
satpy/dataset/anc_vars.py 90.00% <100.00%> (ø)
satpy/tests/test_scene.py 99.50% <100.00%> (+<0.01%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@coveralls
Copy link

coveralls commented Dec 22, 2022

Coverage Status

Coverage: 95.264% (+0.001%) from 95.263% when pulling ad3ff76 on gerritholl:bugfix-resample-ancillary into bffa2fb on pytroll:main.

@pnuu
Copy link
Member

pnuu commented Dec 22, 2022

The unstable test failures seem to be pointing here: zarr-developers/zarr-python#1304 (merged yesterday).

satpy/tests/test_scene.py Outdated Show resolved Hide resolved
Copy link
Contributor

@adybbroe adybbroe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I am not mastering this part of Satpy, so only left a comment on improving readability a bit. And hope someone else can comment on the functionality.

@adybbroe
Copy link
Contributor

adybbroe commented Feb 1, 2023

And I suppose you have noticed the comments from CodeScene. The test_scene.py module is overly large, and would be nice if we could split it in smaller parts eventually.

Small rewrite in test_resample_multi_ancillary to make the code easier
to read.
@gerritholl
Copy link
Collaborator Author

And I suppose you have noticed the comments from CodeScene. The test_scene.py module is overly large, and would be nice if we could split it in smaller parts eventually.

I agree. More than 2000 LOC and 118 test methods/functions. Not sure what the best way would be to split this up, but it needs to be done. But maybe a dedicated PR on its own?

@djhoese
Copy link
Member

djhoese commented Feb 1, 2023

But maybe a dedicated PR on its own?

Yes and I think we could do something similar for the composites/__init__.py module too (with convenience imports in the init module to make sure YAML configs don't break).

@gerritholl gerritholl added this to the v0.40.0 milestone Feb 1, 2023
Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had one inline comment about your test. If that can't be done then I'll consider this review as an approval. I do think @mraspaud needs to review this before merge as he was the one who added the ancillary walking logic and the DataID.

Comment on lines +1799 to +1803
def test_resample_multi_ancillary(self):
"""Test that multiple ancillary variables are retained after resampling.

This test corresponds to GH#2329
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other tests for resampling with ancillary variables? Or maybe just the data walker? I'm wondering if an existing test could be updated with the checks you include at the bottom of this test that would serve the same purpose as this test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is test_resample_ancillary, which tests the case with a single ancillary variable. The bugs #2329 and #2330 occur only if there is more than one ancillary variable. I don't understand the data walker, but I don't think the tests involving it do anything with ancillary variables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the data walker, but I don't think the tests involving it do anything with ancillary variables.

Well that's unfortunate then because that was the whole point of that function/generator iirc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be any test specifically for dataset_walker. The only mention in the test suite is in a mock:

$ find . -name '*.py' -exec grep dataset_walker {} +
./test_scene.py:                mock.patch('satpy.dataset.dataset_walker') as ds_walker:

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. FWIW I think dataarrays loaded with satpy will always have _satpy_id_keys set, but it's good to have a fallback

@mraspaud mraspaud merged commit ad5fb8d into pytroll:main Feb 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ancillary variables does not get resampled resampling replaces ancillary variable
6 participants