Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cacheing doesn't work with scn.crop #2483

Closed
simonrp84 opened this issue May 16, 2023 · 4 comments · Fixed by #2485
Closed

Cacheing doesn't work with scn.crop #2483

simonrp84 opened this issue May 16, 2023 · 4 comments · Fixed by #2485

Comments

@simonrp84
Copy link
Member

simonrp84 commented May 16, 2023

Describe the bug
If I enable cacheing of lats + lons + sensor angles and then crop a GOES scene, I get an error message rather than a successfully cropped image.

To Reproduce

import satpy
satpy.config.set({'cache_dir': "d:/sat_data/cache_dir/"})
satpy.config.set({'cache_sensor_angles': True})
satpy.config.set({'cache_lonlats': True})

from satpy import Scene
from glob import glob

scn = Scene(glob('/some_goes/data_dir/*.nc'), reader='abi_l1b')
scn.load(['true_color_reproduction', 'C13'])

scn2 = scn.crop(xy_bbox=(463500, 4002219, 2445846, 5020000))
scn3 = scn2.resample(scn2['C13'].attrs['area'], resampler='native')
scn3.save_datasets()

Expected behavior
A resampled image to be produced.

Actual results
The following error message is generated:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 12
     10 scn.load(comps, generate=False)
     11 scn = scn.crop(xy_bbox = bbox)
---> 12 scn2 = scn.resample(scn['C13'].attrs['area'], resampler='native')
     13 scn2.save_datasets(base_dir=outdir, fill_value=0)#, writer='simple_image')
     14 break

File ~\PycharmProjects\satpy\satpy\scene.py:966, in Scene.resample(self, destination, datasets, generate, unload, resampler, reduce_data, **resample_kwargs)
    963 # regenerate anything from the wishlist that needs it (combining
    964 # multiple resolutions, etc.)
    965 if generate:
--> 966     new_scn.generate_possible_composites(unload)
    968 return new_scn

File ~\PycharmProjects\satpy\satpy\scene.py:1430, in Scene.generate_possible_composites(self, unload)
   1423 def generate_possible_composites(self, unload):
   1424     """See which composites can be generated and generate them.
   1425 
   1426     Args:
   1427         unload (bool): if the dependencies of the composites
   1428                        should be unloaded after successful generation.
   1429     """
-> 1430     keepables = self._generate_composites_from_loaded_datasets()
   1432     if self.missing_datasets:
   1433         self._remove_failed_datasets(keepables)

File ~\PycharmProjects\satpy\satpy\scene.py:1449, in Scene._generate_composites_from_loaded_datasets(self)
   1446 trunk_nodes = self._dependency_tree.trunk(limit_nodes_to=self.missing_datasets,
   1447                                           limit_children_to=self._datasets.keys())
   1448 needed_comp_nodes = set(self._filter_loaded_datasets_from_trunk_nodes(trunk_nodes))
-> 1449 return self._generate_composites_nodes_from_loaded_datasets(needed_comp_nodes)

File ~\PycharmProjects\satpy\satpy\scene.py:1455, in Scene._generate_composites_nodes_from_loaded_datasets(self, compositor_nodes)
   1453 keepables = set()
   1454 for node in compositor_nodes:
-> 1455     self._generate_composite(node, keepables)
   1456 return keepables

File ~\PycharmProjects\satpy\satpy\scene.py:1479, in Scene._generate_composite(self, comp_node, keepables)
   1477 try:
   1478     delayed_prereq = False
-> 1479     prereq_datasets = self._get_prereq_datasets(
   1480         comp_node.name,
   1481         prereqs,
   1482         keepables,
   1483     )
   1484 except DelayedGeneration:
   1485     # if we are missing a required dependency that could be generated
   1486     # later then we need to wait to return until after we've also
   1487     # processed the optional dependencies
   1488     delayed_prereq = True

File ~\PycharmProjects\satpy\satpy\scene.py:1561, in Scene._get_prereq_datasets(self, comp_id, prereq_nodes, keepables, skip)
   1558 prereq_id = prereq_node.name
   1559 if prereq_id not in self._datasets and prereq_id not in keepables \
   1560         and isinstance(prereq_node, CompositorNode):
-> 1561     self._generate_composite(prereq_node, keepables)
   1563 # composite generation may have updated the DataID
   1564 prereq_id = prereq_node.name

File ~\PycharmProjects\satpy\satpy\scene.py:1513, in Scene._generate_composite(self, comp_node, keepables)
   1510     return
   1512 try:
-> 1513     composite = compositor(prereq_datasets,
   1514                            optional_datasets=optional_datasets,
   1515                            **comp_node.name.to_dict())
   1516     cid = DataID.new_id_from_dataarray(composite)
   1517     self._datasets[cid] = composite

File ~\PycharmProjects\satpy\satpy\modifiers\atmosphere.py:80, in PSPRayleighReflectance.__call__(self, projectables, optional_datasets, **info)
     78 if len(projectables) != 6:
     79     vis, red = self.match_data_arrays(projectables)
---> 80     sata, satz, suna, sunz = get_angles(vis)
     81 else:
     82     vis, red, sata, satz, suna, sunz = self.match_data_arrays(projectables)

File ~\PycharmProjects\satpy\satpy\modifiers\angles.py:366, in get_angles(data_arr)
    343 def get_angles(data_arr: xr.DataArray) -> tuple[xr.DataArray, xr.DataArray, xr.DataArray, xr.DataArray]:
    344     """Get sun and satellite azimuth and zenith angles.
    345 
    346     Note that this function can benefit from the ``satpy.config`` parameters
   (...)
    364 
    365     """
--> 366     sata, satz = _get_sensor_angles(data_arr)
    367     suna, sunz = _get_sun_angles(data_arr)
    368     return sata, satz, suna, sunz

File ~\PycharmProjects\satpy\satpy\modifiers\angles.py:453, in _get_sensor_angles(data_arr)
    450 area_def = data_arr.attrs["area"]
    451 chunks = _geo_chunks_from_data_arr(data_arr)
--> 453 sata, satz = _get_sensor_angles_from_sat_pos(sat_lon, sat_lat, sat_alt,
    454                                              data_arr.attrs["start_time"],
    455                                              area_def, chunks)
    456 sata = _geo_dask_to_data_array(sata)
    457 satz = _geo_dask_to_data_array(satz)

File ~\PycharmProjects\satpy\satpy\modifiers\angles.py:153, in ZarrCacheHelper.__call__(self, cache_dir, *args)
    151     if should_cache and not zarr_paths:
    152         self._warn_if_irregular_input_chunks(args, args_to_use)
--> 153         self._cache_results(res, zarr_format)
    154 # if we did any caching, let's load from the zarr files
    155 if should_cache:
    156     # re-calculate the cached paths

File ~\PycharmProjects\satpy\satpy\modifiers\angles.py:202, in ZarrCacheHelper._cache_results(self, res, zarr_format)
    200     # See https://github.com/dask/dask/issues/8380
    201     with dask.config.set({"optimization.fuse.active": False}):
--> 202         new_sub_res = sub_res.to_zarr(zarr_path, compute=False)
    203     new_res.append(new_sub_res)
    204 # actually compute the storage to zarr

File ~\miniconda3\Lib\site-packages\dask\array\core.py:2949, in Array.to_zarr(self, *args, **kwargs)
   2938 def to_zarr(self, *args, **kwargs):
   2939     """Save array to the zarr storage format
   2940 
   2941     See https://zarr.readthedocs.io/ for details about the format.
   (...)
   2947     dask.array.to_zarr : equivalent function
   2948     """
-> 2949     return to_zarr(self, *args, **kwargs)

File ~\miniconda3\Lib\site-packages\dask\array\core.py:3703, in to_zarr(arr, url, component, storage_options, overwrite, region, compute, return_stored, **kwargs)
   3700     raise ValueError("Cannot use `region` keyword when url is not a `zarr.Array`.")
   3702 if not _check_regular_chunks(arr.chunks):
-> 3703     raise ValueError(
   3704         "Attempt to save array to zarr with irregular "
   3705         "chunking, please call `arr.rechunk(...)` first."
   3706     )
   3708 storage_options = storage_options or {}
   3710 if storage_options:

ValueError: Attempt to save array to zarr with irregular chunking, please call `arr.rechunk(...)` first.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment Info:

  • OS: Windows 11 + Ubuntu 20.04
  • Satpy Version: Latest from github
  • PyResample Version: Latest from github
@djhoese
Copy link
Member

djhoese commented May 16, 2023

Can you print out the chunks for each DataArray after cropping? Something like:

for x in scn2.values():
    print(x.chunks)

This is supposed to be handled already as documented here:

Note that the zarr format requires regular chunking of data. That is,
chunks must be all the same size per dimension except for the last chunk.
To work around this limitation, this class will determine a good regular
chunking based on the existing chunking scheme, rechunk the input
arguments, and then rechunk the results before returning them to the user.
This rechunking is only done if caching is enabled.

So this suggests zarr or dask is now doing a more strict check than before and/or that my check in angles.py is not catching all the same cases.

@djhoese
Copy link
Member

djhoese commented May 16, 2023

For my own future reference: I bet it is this case that I'm not catching:

https://github.com/dask/dask/blame/11f2b83ee034aea7837691f9a0d4ede6309d6194/dask/array/core.py#L3756-L3757

Basically, when you only have two chunks in one dimension and the last chunk is bigger than the first chunk.

@simonrp84
Copy link
Member Author

Looks like your assumption is correct. This is the printout you requested:

((1016,), (1980,))
((1016,), (1980,))
((2032,), (516, 3444))
((2032,), (516, 3444))
((2032,), (516, 3444))
((1016,), (1980,))
((1016,), (1980,))
((508,), (990,))

@djhoese
Copy link
Member

djhoese commented May 16, 2023

Ok I'll need to make a test case for this and update the condition in the code. Shouldn't be a hard fix or test, but I'm not sure I'll have time today or even this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants