-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep FCI data as 32-bit floats #2637
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2637 +/- ##
=======================================
Coverage 95.20% 95.20%
=======================================
Files 354 354
Lines 51370 51370
=======================================
Hits 48908 48908
Misses 2462 2462
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
data[meas_path + "/radiance_to_bt_conversion_coefficient_wavenumber"] = FakeH5Variable( | ||
da.array(955.0, dtype=np.float32)) | ||
data[meas_path + "/radiance_to_bt_conversion_coefficient_a"] = FakeH5Variable(da.array(1.0, dtype=np.float32)) | ||
data[meas_path + "/radiance_to_bt_conversion_coefficient_b"] = FakeH5Variable(da.array(0.4, dtype=np.float32)) | ||
data[meas_path + "/radiance_to_bt_conversion_constant_c1"] = FakeH5Variable( | ||
da.array(1e11 * 2 * h * c ** 2, dtype=np.float32)) | ||
data[meas_path + "/radiance_to_bt_conversion_constant_c2"] = FakeH5Variable( | ||
da.array(1e2 * h * c / k, dtype=np.float32)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason why you had to fix the type of the coefficients here? Now that the coefficients type casting is hardcoded inside the reader, I would assume that what we want to test now is rather that the final output is float32
, no matter the type of the inputs that come from the files? In other words, if one would remove the type casting inside the reader, the tests would still pass?
(Commenting here, but it applies to most changes below too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was initially trying to mimic the dtypes of the real data files, so converted these coefficients to float32
. Because the FakeH5Variable
doesn't mimic exactly the behaviour of the netCDF4
(or h5netcdf
) libraries, the coeffs are collected by a different branch of the code. I wrote the description in a haste earlier today, but examples there are exactly for this case.
The dtypes in the FakeH5Variable
can act as a reminder of the actual data types for the brave developer who starts to convert the FCI tests to use a stub file instead of these classes.
Pull Request Test Coverage Report for Build 6863852302
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
👍 This test takes 8.43 GB with main, 5.80 GB with this branch: from satpy import Scene
from glob import glob
fci_files = glob("/media/nas/x21308/mtg/commissioning-data/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-*-FD--CHK-BODY---NC4E_C_EUMT_*_IDPFI_IV_*_*_N__C_0073_*.nc")
sc = Scene(filenames={"fci_l1c_nc": fci_files})
sc.load(["vis_06", "wv_73", "ir_105"])
sc.save_datasets() Do we have an equivalent test in our load testing? |
I don't think we benchmark fci yet. PRs are welcome :) |
We do benchmark FCI:) and we have a "peakmem_load_resample_save" and some more similar ones https://mraspaud.github.io/satpy/#fci_benchmarks.FCI.peakmem_load_resample_save |
perfect! |
Our FCI benchmark is with simulated data and only FDHSI. Once measured data are no longer restricted, we can update it to use measured data and include both HRFI and FDHSI. Test script that uses FDHSI+HRFI and that includes resampling. Satpy main: 9.7 GB This branch: 9.0 GB A modest improvement, but probably resampling still turns data into double. Code: import hdf5plugin
from satpy import Scene
from glob import glob
fci_files = glob("/media/nas/x23352/MTG/FCI/2023/11/15/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-*-FD--CHK-BODY--DIS-NC4E_C_EUMT_20231115*_IDPFI_OPE_20231115*_20231115*_N_JLS_C_0054_*.nc")
sc = Scene(filenames={"fci_l1c_nc": fci_files})
sc.load(["vis_06", "wv_73", "ir_105"])
ls = sc.resample("eurol")
ls.save_datasets() |
I think it's not resampling, but stretching in |
The radiance, reflectance and brightness temperature data arrays returned by the
fci_l1c_nc
reader are 64-bit floats, which is way more than necessary. This PR changes them to 32-bit floats.The changes required again showed that we should be writing real NetCDF4 files instead of using the mock classes. Some of the changes would have been simpler that way. As an example, values of
data[meas_path + '/radiance_to_bt_conversion_coefficient_wavenumber']
from the files is read withval = v[:]
but for the mock test class withval = v.__array__().item()
.