Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt reader mviri_l1b_fiduceo_nc #2802

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Conversation

bkremmli
Copy link

This PR fixes the mviri_l1b_fiduceo_nc reader when being used with a new xarray version (2024.3.0). When using the original reader, a ValueError about not being able to decode the times is thrown. The file is now opened without decoding the time and decodes it at a later stage. FillValues for the time are recognized to replace time values with NaT and time is decoded using the offset values included within the attributes.

Also, opening the dataset using chunks is deactivated because the input files contains dimensions of the same name which cannot be processed by xarray at the moment.

  • [x ] Tests added
  • Fully documented --> this change should not be visible to users
  • [x ] Add your name to AUTHORS.md if not there already

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this feature! I'm no expert here, so problably @sfinkens needs to have a look, but I wrote some comments already :)

satpy/readers/mviri_l1b_fiduceo_nc.py Outdated Show resolved Hide resolved
Comment on lines 567 to 570
# chunks={"x": CHUNK_SIZE,
# "y": CHUNK_SIZE,
# "x_ir_wv": CHUNK_SIZE,
# "y_ir_wv": CHUNK_SIZE},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disabling chunking is risky... why is this necessary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the first comments!
Concerning the chunks, it throws a ValueError at the moment: "ValueError: This function cannot handle duplicate dimensions, but dimensions {'srf_size'} appear more than once on this object's dims: ('srf_size', 'srf_size')"

It could be related to this post: pydata/xarray#8579

Any other suggestions than disabling it are very welcome :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that discussion they propose a workaround:

ds.variables["covariance..."].dims = ("srf_size_1", "srf_size_2")
ds.chunk(mychunks)

I tried that here: sfinkens@82eb8cd Let me know what you think!

I also updated the tests to trigger that situation:

  1. Add a new variable with identical dimension names
  2. Write a fake file to disk
  3. Read it back in

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still a bit of work to be done, because the tests now fail due to some extra attributes and coordinates.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback! As already discussed, I will include your proposed changes. Concerning the tests, I am currently working on it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfinkens I submitted some changes and the tests are mostly ok now. The test "test_reassign_coords(self)" still fails - as I am not familiar with the Mock Object Library, it would be great if you could have a look on it - thank you!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll check after lunch!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sfinkens and I did some pair programming: Calling time.astype("datetime64[s]").astype("datetime64[ns]") is not properly working because a float input is needed. We propose to open the dataset with decode_cs=False and afterwards decode time and other variables separatly to properly take care of time FillValues and offsets.

sfinkens added a test for the Interpolator.

The test "test_reassign_coords() is still failing because the new functionalities in DatasetWrapper() are not yet considered (mocking should not be of importance here). Maybe we should call the assign_coords() method directly instead of DatasetWrapper().

Also, a test for DatasetWrapper should be included to test the separate encoding.

satpy/tests/reader_tests/test_mviri_l1b_fiduceo_nc.py Outdated Show resolved Hide resolved
satpy/readers/mviri_l1b_fiduceo_nc.py Outdated Show resolved Hide resolved
@mraspaud mraspaud requested a review from sfinkens May 17, 2024 10:09
bkremmli and others added 2 commits May 17, 2024 10:18
Copy link
Member

@sfinkens sfinkens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @bkremmli! Thanks for updating the tests, so that the problem is triggered.

Comment on lines 567 to 570
# chunks={"x": CHUNK_SIZE,
# "y": CHUNK_SIZE,
# "x_ir_wv": CHUNK_SIZE,
# "y_ir_wv": CHUNK_SIZE},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that discussion they propose a workaround:

ds.variables["covariance..."].dims = ("srf_size_1", "srf_size_2")
ds.chunk(mychunks)

I tried that here: sfinkens@82eb8cd Let me know what you think!

I also updated the tests to trigger that situation:

  1. Add a new variable with identical dimension names
  2. Write a fake file to disk
  3. Read it back in

Comment on lines 567 to 570
# chunks={"x": CHUNK_SIZE,
# "y": CHUNK_SIZE,
# "x_ir_wv": CHUNK_SIZE,
# "y_ir_wv": CHUNK_SIZE},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still a bit of work to be done, because the tests now fail due to some extra attributes and coordinates.

@sfinkens
Copy link
Member

Also I noticed that space pixels now have some finite values (instead of NaN), because decode_cf=False. You can use decode_cf=True together with time.encoding["add_offset/_FillValue"], see sfinkens@7045a87

Copy link

codecov bot commented May 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.93%. Comparing base (f33c3e4) to head (5beedea).
Report is 23 commits behind head on main.

Current head 5beedea differs from pull request most recent head 59880ce

Please upload reports for the commit 59880ce to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2802      +/-   ##
==========================================
- Coverage   95.95%   95.93%   -0.02%     
==========================================
  Files         379      379              
  Lines       53888    53899      +11     
==========================================
  Hits        51708    51708              
- Misses       2180     2191      +11     
Flag Coverage Δ
behaviourtests 4.08% <0.00%> (-0.01%) ⬇️
unittests 96.03% <100.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@sfinkens sfinkens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, almost there!

I think in the tests you need to write the dataset to disk and read it back in. Otherwise the dask error with duplicate dimensions is not triggered.

satpy/readers/mviri_l1b_fiduceo_nc.py Outdated Show resolved Hide resolved
satpy/readers/mviri_l1b_fiduceo_nc.py Outdated Show resolved Hide resolved
satpy/readers/mviri_l1b_fiduceo_nc.py Outdated Show resolved Hide resolved
Comment on lines 567 to 570
# chunks={"x": CHUNK_SIZE,
# "y": CHUNK_SIZE,
# "x_ir_wv": CHUNK_SIZE,
# "y_ir_wv": CHUNK_SIZE},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll check after lunch!

Comment on lines +461 to +464
raw_time = nc["time_ir_wv"]
self.nc = self.nc.drop_vars(["time_ir_wv"])
self.nc = xr.decode_cf(self.nc)
self.nc["time_ir_wv"] = raw_time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please extract that into a separate method, for example DatasetWrapper._decode_cf()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I just noticed that in get_time there's a note:

Variable is sometimes named "time" and sometimes "time_ir_wv"

I guess this would have to be handled here, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just renaming to a common time variable name is sufficient

Comment on lines +474 to +480
(chunk_size, chunk_size) = nc.variables["quality_pixel_bitmask"].encoding["chunksizes"]
chunks = {
"x": chunk_size,
"y": chunk_size,
"x_ir_wv": chunk_size,
"y_ir_wv": chunk_size
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer separate x/y chunksizes

Suggested change
(chunk_size, chunk_size) = nc.variables["quality_pixel_bitmask"].encoding["chunksizes"]
chunks = {
"x": chunk_size,
"y": chunk_size,
"x_ir_wv": chunk_size,
"y_ir_wv": chunk_size
}
(chunk_size_y, chunk_size_x) = nc.variables["quality_pixel_bitmask"].encoding["chunksizes"]
chunks = {
"x": chunk_size_x,
"y": chunk_size_y,
"x_ir_wv": chunk_size_x,
"y_ir_wv": chunk_size_y
}

Comment on lines +544 to +545
time = xr.where(time == time.attrs["_FillValue"], np.datetime64("NaT"),
(time + time.attrs["add_offset"]).astype("datetime64[s]").astype("datetime64[ns]"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to do this once after decoding?

@@ -403,6 +416,7 @@ def test_get_dataset(self, file_handler, name, calibration, resolution,
xr.testing.assert_allclose(ds, expected)
assert ds.dtype == expected.dtype
assert ds.attrs == expected.attrs
assert True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debugging leftover?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True - sorry for that, will be fixed shortly.

@@ -28,6 +28,7 @@
import xarray as xr
from pyproj import CRS
from pyresample.geometry import AreaDefinition
from pytest_lazyfixture import lazy_fixture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should fix the tests

Suggested change
from pytest_lazyfixture import lazy_fixture
from pytest_lazy_fixtures import lf as lazy_fixture

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants