Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep FCI data as 32-bit floats #2637

Merged
merged 1 commit into from
Nov 16, 2023
Merged

Conversation

pnuu
Copy link
Member

@pnuu pnuu commented Nov 14, 2023

The radiance, reflectance and brightness temperature data arrays returned by the fci_l1c_nc reader are 64-bit floats, which is way more than necessary. This PR changes them to 32-bit floats.

The changes required again showed that we should be writing real NetCDF4 files instead of using the mock classes. Some of the changes would have been simpler that way. As an example, values of data[meas_path + '/radiance_to_bt_conversion_coefficient_wavenumber'] from the files is read with val = v[:] but for the mock test class with val = v.__array__().item().

  • Tests adjusted

@pnuu pnuu added the enhancement code enhancements, features, improvements label Nov 14, 2023
@pnuu pnuu self-assigned this Nov 14, 2023
Copy link

codecov bot commented Nov 14, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (6fb1d1e) 95.20% compared to head (1930845) 95.20%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2637   +/-   ##
=======================================
  Coverage   95.20%   95.20%           
=======================================
  Files         354      354           
  Lines       51370    51370           
=======================================
  Hits        48908    48908           
  Misses       2462     2462           
Flag Coverage Δ
behaviourtests 4.24% <0.00%> (ø)
unittests 95.83% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +106 to +113
data[meas_path + "/radiance_to_bt_conversion_coefficient_wavenumber"] = FakeH5Variable(
da.array(955.0, dtype=np.float32))
data[meas_path + "/radiance_to_bt_conversion_coefficient_a"] = FakeH5Variable(da.array(1.0, dtype=np.float32))
data[meas_path + "/radiance_to_bt_conversion_coefficient_b"] = FakeH5Variable(da.array(0.4, dtype=np.float32))
data[meas_path + "/radiance_to_bt_conversion_constant_c1"] = FakeH5Variable(
da.array(1e11 * 2 * h * c ** 2, dtype=np.float32))
data[meas_path + "/radiance_to_bt_conversion_constant_c2"] = FakeH5Variable(
da.array(1e2 * h * c / k, dtype=np.float32))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason why you had to fix the type of the coefficients here? Now that the coefficients type casting is hardcoded inside the reader, I would assume that what we want to test now is rather that the final output is float32, no matter the type of the inputs that come from the files? In other words, if one would remove the type casting inside the reader, the tests would still pass?
(Commenting here, but it applies to most changes below too)

Copy link
Member Author

@pnuu pnuu Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially trying to mimic the dtypes of the real data files, so converted these coefficients to float32. Because the FakeH5Variable doesn't mimic exactly the behaviour of the netCDF4 (or h5netcdf) libraries, the coeffs are collected by a different branch of the code. I wrote the description in a haste earlier today, but examples there are exactly for this case.

The dtypes in the FakeH5Variable can act as a reminder of the actual data types for the brave developer who starts to convert the FCI tests to use a stub file instead of these classes.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 6863852302

  • 26 of 26 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 95.783%

Totals Coverage Status
Change from base Build 6847297703: 0.0%
Covered Lines: 49034
Relevant Lines: 51193

💛 - Coveralls

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@gerritholl
Copy link
Collaborator

👍

This test takes 8.43 GB with main, 5.80 GB with this branch:

from satpy import Scene
from glob import glob
fci_files = glob("/media/nas/x21308/mtg/commissioning-data/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-*-FD--CHK-BODY---NC4E_C_EUMT_*_IDPFI_IV_*_*_N__C_0073_*.nc")
sc = Scene(filenames={"fci_l1c_nc": fci_files})
sc.load(["vis_06", "wv_73", "ir_105"])
sc.save_datasets()

Do we have an equivalent test in our load testing?

@mraspaud
Copy link
Member

Do we have an equivalent test in our load testing?

I don't think we benchmark fci yet. PRs are welcome :)

@ameraner
Copy link
Member

We do benchmark FCI:) and we have a "peakmem_load_resample_save" and some more similar ones https://mraspaud.github.io/satpy/#fci_benchmarks.FCI.peakmem_load_resample_save

@mraspaud
Copy link
Member

We do benchmark FCI:) and we have a "peakmem_load_resample_save" and some more similar ones mraspaud.github.io/satpy/#fci_benchmarks.FCI.peakmem_load_resample_save

perfect!

@gerritholl
Copy link
Collaborator

gerritholl commented Nov 15, 2023

Our FCI benchmark is with simulated data and only FDHSI. Once measured data are no longer restricted, we can update it to use measured data and include both HRFI and FDHSI.

Test script that uses FDHSI+HRFI and that includes resampling.

Satpy main: 9.7 GB

This branch: 9.0 GB

A modest improvement, but probably resampling still turns data into double.

Code:

import hdf5plugin
from satpy import Scene
from glob import glob
fci_files = glob("/media/nas/x23352/MTG/FCI/2023/11/15/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-*-FD--CHK-BODY--DIS-NC4E_C_EUMT_20231115*_IDPFI_OPE_20231115*_20231115*_N_JLS_C_0054_*.nc")
sc = Scene(filenames={"fci_l1c_nc": fci_files})
sc.load(["vis_06", "wv_73", "ir_105"])
ls = sc.resample("eurol")
ls.save_datasets()

@pnuu
Copy link
Member Author

pnuu commented Nov 15, 2023

I think it's not resampling, but stretching in trollimage. The scale_factor and offset applied here are float64 because min/max stretch are given as plain floats.

@mraspaud mraspaud merged commit a0b8a7b into pytroll:main Nov 16, 2023
19 checks passed
@pnuu pnuu deleted the fci-32-bit-datasets branch November 16, 2023 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement code enhancements, features, improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants