Adapt reader mviri_l1b_fiduceo_nc #2802

bkremmli · 2024-05-17T09:07:13Z

This PR fixes the mviri_l1b_fiduceo_nc reader when being used with a new xarray version (2024.3.0). When using the original reader, a ValueError about not being able to decode the times is thrown. The file is now opened without decoding the time and decodes it at a later stage. FillValues for the time are recognized to replace time values with NaT and time is decoded using the offset values included within the attributes.

Also, opening the dataset using chunks is deactivated because the input files contains dimensions of the same name which cannot be processed by xarray at the moment.

[x ] Tests added
Fully documented --> this change should not be visible to users
[x ] Add your name to AUTHORS.md if not there already

…mes after import

…ead_error

mraspaud

Thanks for adding this feature! I'm no expert here, so problably @sfinkens needs to have a look, but I wrote some comments already :)

satpy/readers/mviri_l1b_fiduceo_nc.py

mraspaud · 2024-05-17T09:49:13Z

satpy/readers/mviri_l1b_fiduceo_nc.py

+            # chunks={"x": CHUNK_SIZE,
+            #         "y": CHUNK_SIZE,
+            #         "x_ir_wv": CHUNK_SIZE,
+            #         "y_ir_wv": CHUNK_SIZE},


disabling chunking is risky... why is this necessary?

Thanks for the first comments!
Concerning the chunks, it throws a ValueError at the moment: "ValueError: This function cannot handle duplicate dimensions, but dimensions {'srf_size'} appear more than once on this object's dims: ('srf_size', 'srf_size')"

It could be related to this post: pydata/xarray#8579

Any other suggestions than disabling it are very welcome :)

In that discussion they propose a workaround:

ds.variables["covariance..."].dims = ("srf_size_1", "srf_size_2") ds.chunk(mychunks)

I tried that here: sfinkens@82eb8cd Let me know what you think!

I also updated the tests to trigger that situation:

Add a new variable with identical dimension names

Write a fake file to disk

Read it back in

There's still a bit of work to be done, because the tests now fail due to some extra attributes and coordinates.

Thank you for your feedback! As already discussed, I will include your proposed changes. Concerning the tests, I am currently working on it.

@sfinkens I submitted some changes and the tests are mostly ok now. The test "test_reassign_coords(self)" still fails - as I am not familiar with the Mock Object Library, it would be great if you could have a look on it - thank you!

Ok, I'll check after lunch!

sfinkens and I did some pair programming: Calling time.astype("datetime64[s]").astype("datetime64[ns]") is not properly working because a float input is needed. We propose to open the dataset with decode_cs=False and afterwards decode time and other variables separatly to properly take care of time FillValues and offsets.

sfinkens added a test for the Interpolator.

The test "test_reassign_coords() is still failing because the new functionalities in DatasetWrapper() are not yet considered (mocking should not be of importance here). Maybe we should call the assign_coords() method directly instead of DatasetWrapper().

Also, a test for DatasetWrapper should be included to test the separate encoding.

satpy/tests/reader_tests/test_mviri_l1b_fiduceo_nc.py

satpy/readers/mviri_l1b_fiduceo_nc.py

Co-authored-by: Martin Raspaud <martin.raspaud@smhi.se>

sfinkens

Nice work @bkremmli! Thanks for updating the tests, so that the problem is triggered.

sfinkens · 2024-05-17T15:34:46Z

satpy/readers/mviri_l1b_fiduceo_nc.py

+            # chunks={"x": CHUNK_SIZE,
+            #         "y": CHUNK_SIZE,
+            #         "x_ir_wv": CHUNK_SIZE,
+            #         "y_ir_wv": CHUNK_SIZE},


In that discussion they propose a workaround:

ds.variables["covariance..."].dims = ("srf_size_1", "srf_size_2") ds.chunk(mychunks)

I tried that here: sfinkens@82eb8cd Let me know what you think!

I also updated the tests to trigger that situation:

Add a new variable with identical dimension names

Write a fake file to disk

Read it back in

sfinkens · 2024-05-17T15:38:27Z

satpy/readers/mviri_l1b_fiduceo_nc.py

+            # chunks={"x": CHUNK_SIZE,
+            #         "y": CHUNK_SIZE,
+            #         "x_ir_wv": CHUNK_SIZE,
+            #         "y_ir_wv": CHUNK_SIZE},


There's still a bit of work to be done, because the tests now fail due to some extra attributes and coordinates.

sfinkens · 2024-05-17T15:49:35Z

Also I noticed that space pixels now have some finite values (instead of NaN), because decode_cf=False. You can use decode_cf=True together with time.encoding["add_offset/_FillValue"], see sfinkens@7045a87

codecov · 2024-05-24T06:10:42Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.93%. Comparing base (f33c3e4) to head (5beedea).
Report is 23 commits behind head on main.

❗ Current head 5beedea differs from pull request most recent head 59880ce

Please upload reports for the commit 59880ce to get more accurate results.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2802      +/-   ##
==========================================
- Coverage   95.95%   95.93%   -0.02%     
==========================================
  Files         379      379              
  Lines       53888    53899      +11     
==========================================
  Hits        51708    51708              
- Misses       2180     2191      +11

Flag	Coverage Δ
behaviourtests	`4.08% <0.00%> (-0.01%)`	⬇️
unittests	`96.03% <100.00%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…ead_error Conflicts: satpy/readers/mviri_l1b_fiduceo_nc.py

sfinkens

Nice work, almost there!

I think in the tests you need to write the dataset to disk and read it back in. Otherwise the dask error with duplicate dimensions is not triggered.

satpy/readers/mviri_l1b_fiduceo_nc.py

sfinkens · 2024-05-24T12:01:58Z

satpy/readers/mviri_l1b_fiduceo_nc.py

+            # chunks={"x": CHUNK_SIZE,
+            #         "y": CHUNK_SIZE,
+            #         "x_ir_wv": CHUNK_SIZE,
+            #         "y_ir_wv": CHUNK_SIZE},


Ok, I'll check after lunch!

sfinkens · 2024-05-29T10:33:49Z

satpy/readers/mviri_l1b_fiduceo_nc.py

+        raw_time = nc["time_ir_wv"]
+        self.nc = self.nc.drop_vars(["time_ir_wv"])
+        self.nc = xr.decode_cf(self.nc)
+        self.nc["time_ir_wv"] = raw_time


Can you please extract that into a separate method, for example DatasetWrapper._decode_cf()

BTW, I just noticed that in get_time there's a note:

Variable is sometimes named "time" and sometimes "time_ir_wv"

I guess this would have to be handled here, too.

Maybe just renaming to a common time variable name is sufficient

sfinkens · 2024-05-29T10:35:55Z

satpy/readers/mviri_l1b_fiduceo_nc.py

+        (chunk_size, chunk_size) = nc.variables["quality_pixel_bitmask"].encoding["chunksizes"]
+        chunks = {
+            "x": chunk_size,
+            "y": chunk_size,
+            "x_ir_wv": chunk_size,
+            "y_ir_wv": chunk_size
+        }


I'd prefer separate x/y chunksizes

Suggested change

(chunk_size, chunk_size) = nc.variables["quality_pixel_bitmask"].encoding["chunksizes"]

chunks = {

"x": chunk_size,

"y": chunk_size,

"x_ir_wv": chunk_size,

"y_ir_wv": chunk_size

}

(chunk_size_y, chunk_size_x) = nc.variables["quality_pixel_bitmask"].encoding["chunksizes"]

chunks = {

"x": chunk_size_x,

"y": chunk_size_y,

"x_ir_wv": chunk_size_x,

"y_ir_wv": chunk_size_y

}

sfinkens · 2024-05-29T10:39:55Z

satpy/readers/mviri_l1b_fiduceo_nc.py

+        time = xr.where(time == time.attrs["_FillValue"], np.datetime64("NaT"),
+                        (time + time.attrs["add_offset"]).astype("datetime64[s]").astype("datetime64[ns]"))


Is it possible to do this once after decoding?

sfinkens · 2024-05-29T10:40:16Z

satpy/tests/reader_tests/test_mviri_l1b_fiduceo_nc.py

@@ -403,6 +416,7 @@ def test_get_dataset(self, file_handler, name, calibration, resolution,
            xr.testing.assert_allclose(ds, expected)
            assert ds.dtype == expected.dtype
            assert ds.attrs == expected.attrs
+            assert True


Debugging leftover?

True - sorry for that, will be fixed shortly.

sfinkens · 2024-05-29T10:54:43Z

satpy/tests/reader_tests/test_mviri_l1b_fiduceo_nc.py

@@ -28,6 +28,7 @@
 import xarray as xr
 from pyproj import CRS
 from pyresample.geometry import AreaDefinition
+from pytest_lazyfixture import lazy_fixture


This should fix the tests

Suggested change

from pytest_lazyfixture import lazy_fixture

from pytest_lazy_fixtures import lf as lazy_fixture

bkremmli added 4 commits May 17, 2024 07:02

fix for file reading; includes removing chunk reading and decoding ti…

f27a91d

…mes after import

Merge branch 'read_error' of https://github.com/bkremmli/satpy into r…

a08d9bc

…ead_error

remove import datetime

2dbae1a

add bkremmli to AUTHORS.md

eb0382a

bkremmli requested review from djhoese and mraspaud as code owners May 17, 2024 09:07

correct for failures from hook id ruff

7d6608a

mraspaud requested changes May 17, 2024

View reviewed changes

mraspaud requested a review from sfinkens May 17, 2024 10:09

bkremmli and others added 2 commits May 17, 2024 10:18

minor adaptations from PR comments

ec22136

Update satpy/readers/mviri_l1b_fiduceo_nc.py

5beedea

Co-authored-by: Martin Raspaud <martin.raspaud@smhi.se>

sfinkens reviewed May 17, 2024

View reviewed changes

bkremmli added 2 commits May 24, 2024 08:02

perform chunking after open_dataset and use decode_cf = False

21679c6

Merge branch 'read_error' of https://github.com/bkremmli/satpy into r…

a7cb10d

…ead_error Conflicts: satpy/readers/mviri_l1b_fiduceo_nc.py

sfinkens reviewed May 24, 2024

View reviewed changes

decode times separatly from other variables, adds TestInterpolator

59880ce

sfinkens reviewed May 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt reader mviri_l1b_fiduceo_nc #2802

Adapt reader mviri_l1b_fiduceo_nc #2802

bkremmli commented May 17, 2024

mraspaud left a comment

mraspaud May 17, 2024

bkremmli May 17, 2024

sfinkens May 17, 2024

sfinkens May 17, 2024

bkremmli May 23, 2024

bkremmli May 24, 2024

sfinkens May 24, 2024

bkremmli May 28, 2024

sfinkens left a comment

sfinkens May 17, 2024

sfinkens May 17, 2024

sfinkens commented May 17, 2024

codecov bot commented May 24, 2024 •

edited

sfinkens left a comment

sfinkens May 24, 2024

sfinkens May 29, 2024

sfinkens May 29, 2024

sfinkens May 29, 2024

sfinkens May 29, 2024

sfinkens May 29, 2024

sfinkens May 29, 2024

bkremmli May 29, 2024

sfinkens May 29, 2024

		time = xr.where(time == time.attrs["_FillValue"], np.datetime64("NaT"),
		(time + time.attrs["add_offset"]).astype("datetime64[s]").astype("datetime64[ns]"))

	from pytest_lazyfixture import lazy_fixture
	from pytest_lazy_fixtures import lf as lazy_fixture

Adapt reader mviri_l1b_fiduceo_nc #2802

Are you sure you want to change the base?

Adapt reader mviri_l1b_fiduceo_nc #2802

Conversation

bkremmli commented May 17, 2024

mraspaud left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfinkens left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfinkens commented May 17, 2024

codecov bot commented May 24, 2024 • edited

Codecov Report

sfinkens left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented May 24, 2024 •

edited