Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For group_files with multiple readers, allow user to configure behaviour if some groups have zero files for some readers #1742

Closed
gerritholl opened this issue Jun 29, 2021 · 3 comments · Fixed by #1743
Labels
component:multiscene component:scene enhancement code enhancements, features, improvements

Comments

@gerritholl
Copy link
Collaborator

Feature Request

Is your feature request related to a problem? Please describe.

I'm creating a MultiScene from ABI and GLM. When the time coverage between the two is not equal, or one or more files are missing from one but not the other, the result of group_files may include one or more groups in which only one of the readers has any files assigned. When I'm not calling group_files directly, but via MultiScene.from_files, which can lead to hard-to-debug errors that occur when trying to load a dataset and then access the scene object.

For example:

import satpy
from satpy.utils import debug_on; debug_on()
filenames = [
    "OR_ABI-L1b-RadF-M6C14_G16_s19000010000000_e19000010005000_c20403662359590.nc",
    "OR_ABI-L1b-RadF-M6C14_G16_s19000010010000_e19000010015000_c20403662359590.nc",
    "OR_ABI-L1b-RadF-M6C14_G16_s19000010020000_e19000010025000_c20403662359590.nc",
    "OR_GLM-L2-GLMF-M3_G16_s19000010000000_e19000010001000_c20403662359590.nc",
    "OR_GLM-L2-GLMF-M3_G16_s19000010001000_e19000010002000_c20403662359590.nc",
    "OR_GLM-L2-GLMF-M3_G16_s19000010002000_e19000010003000_c20403662359590.nc",
    "OR_GLM-L2-GLMF-M3_G16_s19000010003000_e19000010004000_c20403662359590.nc",
    "OR_GLM-L2-GLMF-M3_G16_s19000010004000_e19000010005000_c20403662359590.nc",
    "OR_GLM-L2-GLMF-M3_G16_s19000010005000_e19000010006000_c20403662359590.nc",
    "OR_GLM-L2-GLMF-M3_G16_s19000010006000_e19000010007000_c20403662359590.nc",
    "OR_GLM-L2-GLMF-M3_G16_s19000010007000_e19000010008000_c20403662359590.nc",
    "OR_GLM-L2-GLMF-M3_G16_s19000010008000_e19000010009000_c20403662359590.nc",
    "OR_GLM-L2-GLMF-M3_G16_s19000010009000_e19000010010000_c20403662359590.nc",
    "OR_GLM-L2-GLMF-M3_G16_s19000010010000_e19000010011000_c20403662359590.nc",
    "OR_GLM-L2-GLMF-M3_G16_s19000010011000_e19000010012000_c20403662359590.nc",
    "OR_GLM-L2-GLMF-M3_G16_s19000010012000_e19000010013000_c20403662359590.nc",
    "OR_GLM-L2-GLMF-M3_G16_s19000010013000_e19000010014000_c20403662359590.nc",
    "OR_GLM-L2-GLMF-M3_G16_s19000010014000_e19000010015000_c20403662359590.nc",
    "OR_GLM-L2-GLMF-M3_G16_s19000010015000_e19000010016000_c20403662359590.nc",
    "OR_GLM-L2-GLMF-M3_G16_s19000010016000_e19000010017000_c20403662359590.nc",
    "OR_GLM-L2-GLMF-M3_G16_s19000010017000_e19000010018000_c20403662359590.nc",
    "OR_GLM-L2-GLMF-M3_G16_s19000010018000_e19000010019000_c20403662359590.nc",
    "OR_GLM-L2-GLMF-M3_G16_s19000010019000_e19000010020000_c20403662359590.nc",
    "OR_GLM-L2-GLMF-M3_G16_s19000010020000_e19000010021000_c20403662359590.nc"]
ms = satpy.MultiScene.from_files(
    filenames,
    reader=["abi_l1b", "glm_l2"],
    group_keys=["start_time"],
    time_threshold=35)
ms.load(["C14_yellow_lightning"])
ms.scenes

currently results in

[DEBUG: 2021-06-29 12:27:11 : satpy.readers.yaml_reader] Reading ('/home/gholl/checkouts/satpy/satpy/etc/readers/abi_l1b.yaml', '/media/nas/o16091/00_MITARBEITER/HOLL/Arbeit/checkouts-perforce/dev_Accso_EBP/config/readers/abi_l1b.yaml')
[DEBUG: 2021-06-29 12:27:11 : satpy.readers.yaml_reader] Reading ('/home/gholl/checkouts/satpy/satpy/etc/readers/glm_l2.yaml',)
[DEBUG: 2021-06-29 12:27:11 : satpy.multiscene] Forcing iteration of generator-like object of Scenes
[DEBUG: 2021-06-29 12:27:11 : satpy.readers.yaml_reader] Reading ('/home/gholl/checkouts/satpy/satpy/etc/readers/abi_l1b.yaml', '/media/nas/o16091/00_MITARBEITER/HOLL/Arbeit/checkouts-perforce/dev_Accso_EBP/config/readers/abi_l1b.yaml')
[DEBUG: 2021-06-29 12:27:11 : satpy.readers.yaml_reader] Assigning to abi_l1b: ['OR_ABI-L1b-RadF-M6C14_G16_s19000010000000_e19000010005000_c20403662359590.nc']
[DEBUG: 2021-06-29 12:27:12 : satpy.readers.yaml_reader] Reading ('/home/gholl/checkouts/satpy/satpy/etc/readers/glm_l2.yaml',)
[DEBUG: 2021-06-29 12:27:12 : satpy.readers.yaml_reader] Assigning to glm_l2: ['OR_GLM-L2-GLMF-M3_G16_s19000010000000_e19000010001000_c20403662359590.nc']
[DEBUG: 2021-06-29 12:27:12 : satpy.composites.config_loader] Looking for composites config file abi.yaml
[DEBUG: 2021-06-29 12:27:12 : satpy.composites.config_loader] Looking for composites config file visir.yaml
[DEBUG: 2021-06-29 12:27:12 : satpy.composites.config_loader] Looking for composites config file glm.yaml
[DEBUG: 2021-06-29 12:27:12 : satpy.readers.glm_l2] Reading in get_dataset flash_extent_density.
[DEBUG: 2021-06-29 12:27:12 : satpy.readers.abi_l1b] Reading in get_dataset C14.
[DEBUG: 2021-06-29 12:27:12 : satpy.readers.abi_l1b] Calibrating to brightness temperatures
/data/gholl/miniconda3/envs/py39/lib/python3.9/site-packages/pyproj/crs/crs.py:1216: UserWarning: You will likely lose important projection information when converting to a PROJ string from another format. See: https://proj.org/faq.html#what-is-the-best-format-for-describing-coordinate-reference-systems
  return self._crs.to_proj4(version=version)
[DEBUG: 2021-06-29 12:27:12 : satpy.writers] Enhancement configuration options: [{'name': 'btemp_threshold', 'method': <function btemp_threshold at 0x7fab658ad940>, 'kwargs': {'threshold': 242.0, 'min_in': 163.0, 'max_in': 330.0}}]
[DEBUG: 2021-06-29 12:27:12 : satpy.scene] Unloading dataset: DataID(name='flash_extent_density', resolution=2000, modifiers=())
[DEBUG: 2021-06-29 12:27:12 : satpy.scene] Unloading dataset: DataID(name='C14', wavelength=WavelengthRange(min=10.8, central=11.2, max=11.6, unit='µm'), resolution=2000, calibration=<calibration.brightness_temperature>, modifiers=())
[DEBUG: 2021-06-29 12:27:12 : satpy.scene] Unloading dataset: DataID(name='highlight_C14', resolution=2000)
[DEBUG: 2021-06-29 12:27:12 : satpy.readers.yaml_reader] Reading ('/home/gholl/checkouts/satpy/satpy/etc/readers/abi_l1b.yaml', '/media/nas/o16091/00_MITARBEITER/HOLL/Arbeit/checkouts-perforce/dev_Accso_EBP/config/readers/abi_l1b.yaml')
[DEBUG: 2021-06-29 12:27:12 : satpy.readers.yaml_reader] Reading ('/home/gholl/checkouts/satpy/satpy/etc/readers/glm_l2.yaml',)
[DEBUG: 2021-06-29 12:27:12 : satpy.readers.yaml_reader] Assigning to glm_l2: ['OR_GLM-L2-GLMF-M3_G16_s19000010001000_e19000010002000_c20403662359590.nc']
[DEBUG: 2021-06-29 12:27:12 : satpy.composites.config_loader] Looking for composites config file glm.yaml
[DEBUG: 2021-06-29 12:27:12 : satpy.composites.config_loader] Looking for composites config file visir.yaml
Traceback (most recent call last):
  File "/home/gholl/checkouts/satpy/satpy/scene.py", line 1162, in _update_dependency_tree
    self._dependency_tree.populate_with_keys(needed_datasets, query)
  File "/home/gholl/checkouts/satpy/satpy/dependency_tree.py", line 241, in populate_with_keys
    raise MissingDependencies(unknown_datasets, "Unknown datasets:")
satpy.node.MissingDependencies: Unknown datasets: {DataQuery(name='highlight_C14'): {DataQuery(name='highlight_C14')}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/gholl/checkouts/protocode/mwe/group-files-desire.py", line 34, in <module>
    ms.scenes
  File "/home/gholl/checkouts/satpy/satpy/multiscene.py", line 221, in scenes
    self._scenes = list(self._scenes)
  File "/home/gholl/checkouts/satpy/satpy/multiscene.py", line 123, in __iter__
    scn = next(self._self_iter)
  File "/home/gholl/checkouts/satpy/satpy/multiscene.py", line 113, in _create_cached_iter
    for scn in self._scene_gen:
  File "/home/gholl/checkouts/satpy/satpy/multiscene.py", line 263, in _call_scene_func
    new_scn = getattr(scn, func_name)(*args, **kwargs)
  File "/home/gholl/checkouts/satpy/satpy/scene.py", line 1153, in load
    self._update_dependency_tree(needed_datasets, query)
  File "/home/gholl/checkouts/satpy/satpy/scene.py", line 1164, in _update_dependency_tree
    raise KeyError(str(err))
KeyError: "Unknown datasets: {DataQuery(name='highlight_C14'): {DataQuery(name='highlight_C14')}}"

It took me a while to understand why this was happening.

Describe the solution you'd like

I would like that group_files and MultiScene.from_files have configurable behaviour if one or more groups do not have data for all readers. The current behaviour, "ignore", would be the default. Other behaviour could be: issue a warning, raise an exception, or skip this group (thus skipping some files).

Describe any changes to existing user workflow

None; the current behaviour would remain the default.

Additional context

Alternatively, I could go through the data files myself first and check their time coverage. Considering that in my actual code I'm reading ABI directly from an S3 bucket, that would be quite involved, and a solution within Satpy is preferable.

@gerritholl gerritholl changed the title Allow group_files with multiple readers to require each group has at least one file matched to each reader For group_files with multiple readers, allow user to configure behaviour if some groups have zero files for some readers Jun 29, 2021
@djhoese djhoese added component:multiscene component:scene enhancement code enhancements, features, improvements labels Jun 29, 2021
@djhoese
Copy link
Member

djhoese commented Jun 29, 2021

Makes sense. What are you thinking, a keyword argument on group_files? And maybe the default is different for MultiScene.from_files? 🤔 maybe they should both be the same behavior by default.

@gerritholl
Copy link
Collaborator Author

If we want to be backward compatible, both should default to the current behaviour. But considering that (1) Satpy is <1.0, (2) MultiScene is explicitly noted as experimental, (3) probably nobody is using this function in production, (4) probably almost nobody is using this function with multiple readers at all considering I added this functionality in #1269, and (5) good programmers will notice backward incompatible behaviour due to breaking unit tests (ha ha); probably we can tolerate non-backward compatible behaviour in this case.

@djhoese
Copy link
Member

djhoese commented Jun 29, 2021

Agreed. I think we can also assume that people who are using group_files and MultiScene.from_files with multiple readers (the 1-3 people doing that right now) probably want this behavior anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:multiscene component:scene enhancement code enhancements, features, improvements
Projects
None yet
2 participants