-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt reader mviri_l1b_fiduceo_nc #2802
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this feature! I'm no expert here, so problably @sfinkens needs to have a look, but I wrote some comments already :)
# chunks={"x": CHUNK_SIZE, | ||
# "y": CHUNK_SIZE, | ||
# "x_ir_wv": CHUNK_SIZE, | ||
# "y_ir_wv": CHUNK_SIZE}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disabling chunking is risky... why is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the first comments!
Concerning the chunks, it throws a ValueError at the moment: "ValueError: This function cannot handle duplicate dimensions, but dimensions {'srf_size'} appear more than once on this object's dims: ('srf_size', 'srf_size')"
It could be related to this post: pydata/xarray#8579
Any other suggestions than disabling it are very welcome :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that discussion they propose a workaround:
ds.variables["covariance..."].dims = ("srf_size_1", "srf_size_2")
ds.chunk(mychunks)
I tried that here: sfinkens@82eb8cd Let me know what you think!
I also updated the tests to trigger that situation:
- Add a new variable with identical dimension names
- Write a fake file to disk
- Read it back in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still a bit of work to be done, because the tests now fail due to some extra attributes and coordinates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback! As already discussed, I will include your proposed changes. Concerning the tests, I am currently working on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sfinkens I submitted some changes and the tests are mostly ok now. The test "test_reassign_coords(self)" still fails - as I am not familiar with the Mock Object Library, it would be great if you could have a look on it - thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll check after lunch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sfinkens and I did some pair programming: Calling time.astype("datetime64[s]").astype("datetime64[ns]") is not properly working because a float input is needed. We propose to open the dataset with decode_cs=False and afterwards decode time and other variables separatly to properly take care of time FillValues and offsets.
sfinkens added a test for the Interpolator.
The test "test_reassign_coords() is still failing because the new functionalities in DatasetWrapper() are not yet considered (mocking should not be of importance here). Maybe we should call the assign_coords() method directly instead of DatasetWrapper().
Also, a test for DatasetWrapper should be included to test the separate encoding.
Co-authored-by: Martin Raspaud <martin.raspaud@smhi.se>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @bkremmli! Thanks for updating the tests, so that the problem is triggered.
# chunks={"x": CHUNK_SIZE, | ||
# "y": CHUNK_SIZE, | ||
# "x_ir_wv": CHUNK_SIZE, | ||
# "y_ir_wv": CHUNK_SIZE}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that discussion they propose a workaround:
ds.variables["covariance..."].dims = ("srf_size_1", "srf_size_2")
ds.chunk(mychunks)
I tried that here: sfinkens@82eb8cd Let me know what you think!
I also updated the tests to trigger that situation:
- Add a new variable with identical dimension names
- Write a fake file to disk
- Read it back in
# chunks={"x": CHUNK_SIZE, | ||
# "y": CHUNK_SIZE, | ||
# "x_ir_wv": CHUNK_SIZE, | ||
# "y_ir_wv": CHUNK_SIZE}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still a bit of work to be done, because the tests now fail due to some extra attributes and coordinates.
Also I noticed that space pixels now have some finite values (instead of NaN), because |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2802 +/- ##
==========================================
- Coverage 95.95% 95.93% -0.02%
==========================================
Files 379 379
Lines 53888 53899 +11
==========================================
Hits 51708 51708
- Misses 2180 2191 +11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
…ead_error Conflicts: satpy/readers/mviri_l1b_fiduceo_nc.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, almost there!
I think in the tests you need to write the dataset to disk and read it back in. Otherwise the dask error with duplicate dimensions is not triggered.
# chunks={"x": CHUNK_SIZE, | ||
# "y": CHUNK_SIZE, | ||
# "x_ir_wv": CHUNK_SIZE, | ||
# "y_ir_wv": CHUNK_SIZE}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll check after lunch!
raw_time = nc["time_ir_wv"] | ||
self.nc = self.nc.drop_vars(["time_ir_wv"]) | ||
self.nc = xr.decode_cf(self.nc) | ||
self.nc["time_ir_wv"] = raw_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please extract that into a separate method, for example DatasetWrapper._decode_cf()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I just noticed that in get_time
there's a note:
Variable is sometimes named "time" and sometimes "time_ir_wv"
I guess this would have to be handled here, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just renaming to a common time variable name is sufficient
(chunk_size, chunk_size) = nc.variables["quality_pixel_bitmask"].encoding["chunksizes"] | ||
chunks = { | ||
"x": chunk_size, | ||
"y": chunk_size, | ||
"x_ir_wv": chunk_size, | ||
"y_ir_wv": chunk_size | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer separate x/y chunksizes
(chunk_size, chunk_size) = nc.variables["quality_pixel_bitmask"].encoding["chunksizes"] | |
chunks = { | |
"x": chunk_size, | |
"y": chunk_size, | |
"x_ir_wv": chunk_size, | |
"y_ir_wv": chunk_size | |
} | |
(chunk_size_y, chunk_size_x) = nc.variables["quality_pixel_bitmask"].encoding["chunksizes"] | |
chunks = { | |
"x": chunk_size_x, | |
"y": chunk_size_y, | |
"x_ir_wv": chunk_size_x, | |
"y_ir_wv": chunk_size_y | |
} |
time = xr.where(time == time.attrs["_FillValue"], np.datetime64("NaT"), | ||
(time + time.attrs["add_offset"]).astype("datetime64[s]").astype("datetime64[ns]")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to do this once after decoding?
@@ -403,6 +416,7 @@ def test_get_dataset(self, file_handler, name, calibration, resolution, | |||
xr.testing.assert_allclose(ds, expected) | |||
assert ds.dtype == expected.dtype | |||
assert ds.attrs == expected.attrs | |||
assert True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debugging leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True - sorry for that, will be fixed shortly.
@@ -28,6 +28,7 @@ | |||
import xarray as xr | |||
from pyproj import CRS | |||
from pyresample.geometry import AreaDefinition | |||
from pytest_lazyfixture import lazy_fixture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should fix the tests
from pytest_lazyfixture import lazy_fixture | |
from pytest_lazy_fixtures import lf as lazy_fixture |
This PR fixes the mviri_l1b_fiduceo_nc reader when being used with a new xarray version (2024.3.0). When using the original reader, a ValueError about not being able to decode the times is thrown. The file is now opened without decoding the time and decodes it at a later stage. FillValues for the time are recognized to replace time values with NaT and time is decoded using the offset values included within the attributes.
Also, opening the dataset using chunks is deactivated because the input files contains dimensions of the same name which cannot be processed by xarray at the moment.
AUTHORS.md
if not there already