Keep FCI data as 32-bit floats #2637

pnuu · 2023-11-14T12:51:15Z

The radiance, reflectance and brightness temperature data arrays returned by the fci_l1c_nc reader are 64-bit floats, which is way more than necessary. This PR changes them to 32-bit floats.

The changes required again showed that we should be writing real NetCDF4 files instead of using the mock classes. Some of the changes would have been simpler that way. As an example, values of data[meas_path + '/radiance_to_bt_conversion_coefficient_wavenumber'] from the files is read with val = v[:] but for the mock test class with val = v.__array__().item().

Tests adjusted

…bit floats

codecov · 2023-11-14T13:03:58Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (6fb1d1e) 95.20% compared to head (1930845) 95.20%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2637   +/-   ##
=======================================
  Coverage   95.20%   95.20%           
=======================================
  Files         354      354           
  Lines       51370    51370           
=======================================
  Hits        48908    48908           
  Misses       2462     2462

Flag	Coverage Δ
behaviourtests	`4.24% <0.00%> (ø)`
unittests	`95.83% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ameraner · 2023-11-14T13:33:13Z

satpy/tests/reader_tests/test_fci_l1c_nc.py

+    data[meas_path + "/radiance_to_bt_conversion_coefficient_wavenumber"] = FakeH5Variable(
+        da.array(955.0, dtype=np.float32))
+    data[meas_path + "/radiance_to_bt_conversion_coefficient_a"] = FakeH5Variable(da.array(1.0, dtype=np.float32))
+    data[meas_path + "/radiance_to_bt_conversion_coefficient_b"] = FakeH5Variable(da.array(0.4, dtype=np.float32))
+    data[meas_path + "/radiance_to_bt_conversion_constant_c1"] = FakeH5Variable(
+        da.array(1e11 * 2 * h * c ** 2, dtype=np.float32))
+    data[meas_path + "/radiance_to_bt_conversion_constant_c2"] = FakeH5Variable(
+        da.array(1e2 * h * c / k, dtype=np.float32))


Is there a specific reason why you had to fix the type of the coefficients here? Now that the coefficients type casting is hardcoded inside the reader, I would assume that what we want to test now is rather that the final output is float32, no matter the type of the inputs that come from the files? In other words, if one would remove the type casting inside the reader, the tests would still pass?
(Commenting here, but it applies to most changes below too)

I was initially trying to mimic the dtypes of the real data files, so converted these coefficients to float32. Because the FakeH5Variable doesn't mimic exactly the behaviour of the netCDF4 (or h5netcdf) libraries, the coeffs are collected by a different branch of the code. I wrote the description in a haste earlier today, but examples there are exactly for this case.

The dtypes in the FakeH5Variable can act as a reminder of the actual data types for the brave developer who starts to convert the FCI tests to use a stub file instead of these classes.

coveralls · 2023-11-15T08:21:49Z

Pull Request Test Coverage Report for Build 6863852302

26 of 26 (100.0%) changed or added relevant lines in 2 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 95.783%

Totals
Change from base Build 6847297703:	0.0%
Covered Lines:	49034
Relevant Lines:	51193

💛 - Coveralls

mraspaud

LGTM!

gerritholl · 2023-11-15T10:54:02Z

👍

This test takes 8.43 GB with main, 5.80 GB with this branch:

from satpy import Scene
from glob import glob
fci_files = glob("/media/nas/x21308/mtg/commissioning-data/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-*-FD--CHK-BODY---NC4E_C_EUMT_*_IDPFI_IV_*_*_N__C_0073_*.nc")
sc = Scene(filenames={"fci_l1c_nc": fci_files})
sc.load(["vis_06", "wv_73", "ir_105"])
sc.save_datasets()

Do we have an equivalent test in our load testing?

mraspaud · 2023-11-15T10:56:06Z

Do we have an equivalent test in our load testing?

I don't think we benchmark fci yet. PRs are welcome :)

ameraner · 2023-11-15T11:07:14Z

We do benchmark FCI:) and we have a "peakmem_load_resample_save" and some more similar ones https://mraspaud.github.io/satpy/#fci_benchmarks.FCI.peakmem_load_resample_save

mraspaud · 2023-11-15T12:01:58Z

We do benchmark FCI:) and we have a "peakmem_load_resample_save" and some more similar ones mraspaud.github.io/satpy/#fci_benchmarks.FCI.peakmem_load_resample_save

perfect!

gerritholl · 2023-11-15T12:24:21Z

Our FCI benchmark is with simulated data and only FDHSI. Once measured data are no longer restricted, we can update it to use measured data and include both HRFI and FDHSI.

Test script that uses FDHSI+HRFI and that includes resampling.

Satpy main: 9.7 GB

This branch: 9.0 GB

A modest improvement, but probably resampling still turns data into double.

Code:

import hdf5plugin
from satpy import Scene
from glob import glob
fci_files = glob("/media/nas/x23352/MTG/FCI/2023/11/15/W_XX-EUMETSAT-Darmstadt,IMG+SAT,MTI1+FCI-1C-RRAD-*-FD--CHK-BODY--DIS-NC4E_C_EUMT_20231115*_IDPFI_OPE_20231115*_20231115*_N_JLS_C_0054_*.nc")
sc = Scene(filenames={"fci_l1c_nc": fci_files})
sc.load(["vis_06", "wv_73", "ir_105"])
ls = sc.resample("eurol")
ls.save_datasets()

pnuu · 2023-11-15T12:36:41Z

I think it's not resampling, but stretching in trollimage. The scale_factor and offset applied here are float64 because min/max stretch are given as plain floats.

Keep all FCI radiance, reflectance and brightness temperatures as 32-…

1930845

…bit floats

pnuu added the enhancement code enhancements, features, improvements label Nov 14, 2023

pnuu self-assigned this Nov 14, 2023

pnuu requested review from gerritholl, djhoese and mraspaud as code owners November 14, 2023 12:51

ameraner reviewed Nov 14, 2023

View reviewed changes

mraspaud approved these changes Nov 15, 2023

View reviewed changes

mraspaud merged commit a0b8a7b into pytroll:main Nov 16, 2023
19 checks passed

pnuu deleted the fci-32-bit-datasets branch November 16, 2023 14:12

pnuu mentioned this pull request Nov 23, 2023

Preservation of input data dtype in processing FCI data #2647

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep FCI data as 32-bit floats #2637

Keep FCI data as 32-bit floats #2637

pnuu commented Nov 14, 2023

codecov bot commented Nov 14, 2023 •

edited

Loading

ameraner Nov 14, 2023

pnuu Nov 14, 2023 •

edited

Loading

coveralls commented Nov 15, 2023

mraspaud left a comment

gerritholl commented Nov 15, 2023

mraspaud commented Nov 15, 2023

ameraner commented Nov 15, 2023

mraspaud commented Nov 15, 2023

gerritholl commented Nov 15, 2023 •

edited

Loading

pnuu commented Nov 15, 2023

Keep FCI data as 32-bit floats #2637

Keep FCI data as 32-bit floats #2637

Conversation

pnuu commented Nov 14, 2023

codecov bot commented Nov 14, 2023 • edited Loading

Codecov Report

ameraner Nov 14, 2023

Choose a reason for hiding this comment

pnuu Nov 14, 2023 • edited Loading

Choose a reason for hiding this comment

coveralls commented Nov 15, 2023

Pull Request Test Coverage Report for Build 6863852302

💛 - Coveralls

mraspaud left a comment

Choose a reason for hiding this comment

gerritholl commented Nov 15, 2023

mraspaud commented Nov 15, 2023

ameraner commented Nov 15, 2023

mraspaud commented Nov 15, 2023

gerritholl commented Nov 15, 2023 • edited Loading

pnuu commented Nov 15, 2023

codecov bot commented Nov 14, 2023 •

edited

Loading

pnuu Nov 14, 2023 •

edited

Loading

gerritholl commented Nov 15, 2023 •

edited

Loading