Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with numpy 1.24.0 #194

Closed
julianblue opened this issue Dec 25, 2022 · 9 comments · Fixed by #195
Closed

Issues with numpy 1.24.0 #194

julianblue opened this issue Dec 25, 2022 · 9 comments · Fixed by #195

Comments

@julianblue
Copy link

julianblue commented Dec 25, 2022

Hi @gjoseph92 ,
I am getting an error when trying to stack data when the latest numpy version 1.24.0 is installed. All works fine with 1.23.5.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[48], line 1
----> 1 stack = stackstac.stack(itms.get_all_items(), resolution=10)

File ~/anaconda3/envs/ag-ml/lib/python3.8/site-packages/stackstac/stack.py:311, in stack(items, assets, epsg, resolution, bounds, bounds_latlon, snap_bounds, resampling, chunksize, dtype, fill_value, rescale, sortby_date, xy_coords, properties, band_coords, gdal_env, errors_as_nodata, reader)
    287 asset_table, spec, asset_ids, plain_items = prepare_items(
    288     plain_items,
    289     assets=assets,
   (...)
    294     snap_bounds=snap_bounds,
    295 )
    296 arr = items_to_dask(
    297     asset_table,
    298     spec,
   (...)
    306     errors_as_nodata=errors_as_nodata,
    307 )
    309 return xr.DataArray(
    310     arr,
--> 311     *to_coords(
    312         plain_items,
    313         asset_ids,
    314         spec,
    315         xy_coords=xy_coords,
    316         properties=properties,
    317         band_coords=band_coords,
    318     ),
    319     attrs=to_attrs(spec),
    320     name="stackstac-" + dask.base.tokenize(arr),
    321 )

File ~/anaconda3/envs/ag-ml/lib/python3.8/site-packages/stackstac/prepare.py:500, in to_coords(items, asset_ids, spec, xy_coords, properties, band_coords)
    496     except KeyError:
    497         pass
    499 coords.update(
--> 500     accumulate_metadata.metadata_to_coords(
    501         flattened_metadata_by_asset,
    502         "band",
    503         skip_fields={"href"},
    504         # skip_fields={"href", "title", "description", "type", "roles"},
    505     )
    506 )
    507 if any(eo_by_asset):
    508     coords.update(
    509         accumulate_metadata.metadata_to_coords(
    510             eo_by_asset,
   (...)
    513         )
    514     )

File ~/anaconda3/envs/ag-ml/lib/python3.8/site-packages/stackstac/accumulate_metadata.py:29, in metadata_to_coords(items, dim_name, fields, skip_fields)
     23 def metadata_to_coords(
     24     items: Iterable[Mapping[str, object]],
     25     dim_name: str,
     26     fields: Union[str, Sequence[str], Literal[True]] = True,
     27     skip_fields: Container[str] = (),
     28 ) -> Dict[str, xr.Variable]:
---> 29     return dict_to_coords(
     30         accumulate_metadata(
     31             items,
     32             fields=[fields] if isinstance(fields, str) else fields,
     33             skip_fields=skip_fields,
     34         ),
     35         dim_name,
     36     )

File ~/anaconda3/envs/ag-ml/lib/python3.8/site-packages/stackstac/accumulate_metadata.py:168, in dict_to_coords(metadata, dim_name)
    163     except TypeError:
    164         # if it's not set-able, just give up
    165         break
    167 props_arr = np.squeeze(
--> 168     np.array(
    169         props,
    170         # Avoid DeprecationWarning creating ragged arrays when elements are lists/tuples of different lengths
    171         dtype="object"
    172         if (
    173             isinstance(props, _ourlist)
    174             and len(set(len(x) for x in props if isinstance(x, (list, tuple))))
    175             > 1
    176         )
    177         else None,
    178     )
    179 )
    181 if (
    182     props_arr.ndim > 1
    183     or props_arr.ndim == 1
   (...)
    187     # our "bands", "y", and "x" dimensions, and xarray won't let us use unrelated
    188     # dimensions. so just skip it for now.
    189     continue

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (17,) + inhomogeneous part.
@gjoseph92
Copy link
Owner

Thanks for reporting @julianblue. This maybe seems like numpy/numpy#22004:

Ragged array creation will now always raise a ValueError unless dtype=object is passed. This includes very deeply nested sequences.
https://numpy.org/doc/stable/release/1.24.0-notes.html

However I'm confused, since we do already pass dtype='object' (note quotes; I hope removing them doesn't fix this, but I guess it's worth trying). I added that to avoid a DeprecationWarning from NumPy, so I'm a little surprised this has suddenly broken with no warning.

I won't be able to work on this for a few weeks. For now, pinning NumPy is a fine workaround; ideally we'd make a release doing that. If anyone wants to take a look at this sooner, please post what you find here.

@TomAugspurger
Copy link
Contributor

Do you have a reproducible example @julianblue? I can't reproduce using stackstac main and this simple example:

In [17]: import stackstac, pystac_client

In [18]: client = pystac_client.Client.open("http://planetarycomputer.microsoft.com/api/stac/v1/")

In [19]: ic = client.search(collections="sentinel-2-l2a", max_items=2).item_collection()

In [20]: stackstac.stack(ic, assets=["B02", "B04"])
Out[20]:
<xarray.DataArray 'stackstac-eb5fcb5108bcbbdc7f4efca13682f66b' (time: 2,
                                                                band: 2,
                                                                y: 30978,
                                                                x: 20982)>
dask.array<fetch_raster_window, shape=(2, 2, 30978, 20982), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/44)
  * time                                     (time) datetime64[ns] 2022-12-24...
    id                                       (time) <U54 'S2B_MSIL2A_20221224...
  * band                                     (band) <U3 'B02' 'B04'
  * x                                        (x) float64 5e+05 ... 7.098e+05
  * y                                        (y) float64 2e+06 ... 1.69e+06
    s2:not_vegetated_percentage              (time) float64 0.3278 0.0
    ...                                       ...
    gsd                                      float64 10.0
    proj:shape                               object {10980}
    common_name                              (band) <U4 'blue' 'red'
    center_wavelength                        (band) float64 0.49 0.665
    full_width_half_max                      (band) float64 0.098 0.038
    epsg                                     int64 32719
Attributes:
    spec:        RasterSpec(epsg=32719, bounds=(499980.0, 1690240.0, 709800.0...
    crs:         epsg:32719
    transform:   | 10.00, 0.00, 499980.00|\n| 0.00,-10.00, 2000020.00|\n| 0.0...
    resolution:  10.0

@julianblue
Copy link
Author

julianblue commented Dec 26, 2022

@TomAugspurger I did not specify the assets to pull all available bands. Specifying the assets does NOT cause the error to be raised @gjoseph92 .

stack = stackstac.stack(itm_coll, resolution=10)

@julianblue
Copy link
Author

julianblue commented Dec 26, 2022

@TomAugspurger , @gjoseph92 ,

the code fails in the metadata conversion for proj:transform (not present in the preview band)

 [[10.0, 0.0, 399960.0, 0.0, -10.0, 5700000.0], [60.0, 0.0, 399960.0, 0.0, -60.0, 5700000.0], [10.0, 0.0, 399960.0, 0.0, -10.0, 5700000.0], [10.0, 0.0, 399960.0, 0.0, -10.0, 5700000.0], [10.0, 0.0, 399960.0, 0.0, -10.0, 5700000.0], [20.0, 0.0, 399960.0, 0.0, -20.0, 5700000.0], [20.0, 0.0, 399960.0, 0.0, -20.0, 5700000.0], [20.0, 0.0, 399960.0, 0.0, -20.0, 5700000.0], [10.0, 0.0, 399960.0, 0.0, -10.0, 5700000.0], [60.0, 0.0, 399960.0, 0.0, -60.0, 5700000.0], [20.0, 0.0, 399960.0, 0.0, -20.0, 5700000.0], [20.0, 0.0, 399960.0, 0.0, -20.0, 5700000.0], [20.0, 0.0, 399960.0, 0.0, -20.0, 5700000.0], [20.0, 0.0, 399960.0, 0.0, -20.0, 5700000.0], [10.0, 0.0, 399960.0, 0.0, -10.0, 5700000.0], [10.0, 0.0, 399960.0, 0.0, -10.0, 5700000.0], None]

The none in the list should ideally not exist for raster band data and should be a transform like the other bands?

This causes

props_arr = np.squeeze(
                np.array(
                    props,
                    # Avoid DeprecationWarning creating ragged arrays when elements are lists/tuples of different lengths
                    dtype="object"
                    if (
                        isinstance(props, _ourlist)
                        and len(set(len(x) for x in props if isinstance(x, (list, tuple))))
                        > 1
                    )
                    else None,
                )
            )

to set dtype=None and raises the error. Shouldn't the dtype in that case be set to "object" as well instead of none?

@gjoseph92
Copy link
Owner

@julianblue can you give #195 a try? I don't have a computer available, so just did this on my phone and can't test it out. I feel like something along these lines might work (not sure if this is exactly the thing to do, though).

@julianblue
Copy link
Author

Hi @gjoseph92 ,
#195 fixed the issue for me.

@hrodmn
Copy link

hrodmn commented Mar 21, 2023

I hit this issue today on stackstac==0.4.3. It is happening when I use the query extension to filter out Landsat 7 items from the landsat-c2-l2 collection in the Planetary Computer STAC. Other operations that use the query arg (e.g. filtering on eo:cloud_cover) do not cause the failure.

Here is a reprex:

import planetary_computer
import pystac_client
import stackstac

COLLECTION = "landsat-c2-l2"
BBOX = (-91.67481115470213, 47.86318699029263, -91.52471085746416, 47.93118792159696)
DATE_RANGE = ["2022-07-01", "2022-07-31"]
EPSG = 5070
RESOLUTION = 30
ASSETS = ["red", "green", "blue", "qa_pixel"]
BOUNDS_5070 = (326000, 2771000, 337000, 2778000)

planetary_computer_catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

# include all items
stac_items = planetary_computer_catalog.search(
    collections=[COLLECTION],
    bbox=BBOX,
    datetime=DATE_RANGE,
).item_collection()

print(f"there are {len(stac_items)} items if we do not apply any filters")

# works
landsat_stack = stackstac.stack(
    items=stac_items,
    assets=ASSETS,
    epsg=EPSG,
    resolution=RESOLUTION,
    bounds=BOUNDS_5070,
    xy_coords="center",
)
print(landsat_stack)

# filter out cloudy items
stac_items_less_cloudy = planetary_computer_catalog.search(
    collections=[COLLECTION],
    bbox=BBOX,
    datetime=DATE_RANGE,
    query={"eo:cloud_cover": {"lte": 50}},
).item_collection()

print(f"there are {len(stac_items_less_cloudy)} items if we exclude cloudy images")


# works
landsat_stack_filt = stackstac.stack(
    items=stac_items_less_cloudy,
    assets=ASSETS,
    epsg=EPSG,
    resolution=RESOLUTION,
    bounds=BOUNDS_5070,
    xy_coords="center",
)
print(landsat_stack_filt)

# exclude Landsat 7
stac_items_no_landsat_7 = planetary_computer_catalog.search(
    collections=[COLLECTION],
    bbox=BBOX,
    datetime=DATE_RANGE,
    query={"platform": {"neq": "landsat-7"}},
).item_collection()

print(f"there are {len(stac_items_no_landsat_7)} items if we exclude Landsat 7")


# fails with inhomogenous error
landsat_stack_filt = stackstac.stack(
    items=stac_items_no_landsat_7,
    assets=ASSETS,
    epsg=EPSG,
    resolution=RESOLUTION,
    bounds=BOUNDS_5070,
    xy_coords="center",
)
print(landsat_stack_filt)

Output:

there are 10 items if we include Landsat 7
<xarray.DataArray 'stackstac-5060345c8d58f7556b298868d9999b98' (time: 10,
                                                                band: 4,
                                                                y: 234, x: 368)>
dask.array<fetch_raster_window, shape=(10, 4, 234, 368), dtype=float64, chunksize=(1, 1, 234, 368), chunktype=numpy.ndarray>
Coordinates: (12/27)
  * time                         (time) datetime64[ns] 2022-07-01T15:28:21.11...
    id                           (time) <U31 'LE07_L2SP_027027_20220701_02_T1...
  * band                         (band) <U8 'red' 'green' 'blue' 'qa_pixel'
  * x                            (x) float64 3.26e+05 3.26e+05 ... 3.37e+05
  * y                            (y) float64 2.778e+06 2.778e+06 ... 2.771e+06
    platform                     (time) <U9 'landsat-7' ... 'landsat-9'
    ...                           ...
    view:off_nadir               int64 0
    landsat:cloud_cover_land     (time) float64 14.0 69.06 40.75 ... 100.0 97.19
    gsd                          int64 30
    raster:bands                 (band) object {'scale': 2.75e-05, 'nodata': ...
    title                        (band) <U29 'Red Band' ... 'Pixel Quality As...
    epsg                         int64 5070
Attributes:
    spec:        RasterSpec(epsg=5070, bounds=(325980, 2770980, 337020, 27780...
    crs:         epsg:5070
    transform:   | 30.00, 0.00, 325980.00|\n| 0.00,-30.00, 2778000.00|\n| 0.0...
    resolution:  30
there are 6 items if we exclude cloudy images
<xarray.DataArray 'stackstac-b859f39f5ea1c12c247ef26512f01dbb' (time: 6,
                                                                band: 4,
                                                                y: 234, x: 368)>
dask.array<fetch_raster_window, shape=(6, 4, 234, 368), dtype=float64, chunksize=(1, 1, 234, 368), chunktype=numpy.ndarray>
Coordinates: (12/27)
  * time                         (time) datetime64[ns] 2022-07-01T15:28:21.11...
    id                           (time) <U31 'LE07_L2SP_027027_20220701_02_T1...
  * band                         (band) <U8 'red' 'green' 'blue' 'qa_pixel'
  * x                            (x) float64 3.26e+05 3.26e+05 ... 3.37e+05
  * y                            (y) float64 2.778e+06 2.778e+06 ... 2.771e+06
    platform                     (time) <U9 'landsat-7' ... 'landsat-9'
    ...                           ...
    view:off_nadir               int64 0
    landsat:cloud_cover_land     (time) float64 14.0 40.75 15.43 14.0 2.41 3.36
    gsd                          int64 30
    raster:bands                 (band) object {'scale': 2.75e-05, 'nodata': ...
    title                        (band) <U29 'Red Band' ... 'Pixel Quality As...
    epsg                         int64 5070
Attributes:
    spec:        RasterSpec(epsg=5070, bounds=(325980, 2770980, 337020, 27780...
    crs:         epsg:5070
    transform:   | 30.00, 0.00, 325980.00|\n| 0.00,-30.00, 2778000.00|\n| 0.0...
    resolution:  30
there are 8 items if we exclude Landsat 7
Traceback (most recent call last):
  File "/home/henry/workspace/hrodmn.dev/posts/hls/stackstac_inhomogeneous.py", line 72, in <module>
    landsat_stack_filt = stackstac.stack(
  File "/home/henry/workspace/hrodmn.dev/env/lib/python3.10/site-packages/stackstac/stack.py", line 311, in stack
    *to_coords(
  File "/home/henry/workspace/hrodmn.dev/env/lib/python3.10/site-packages/stackstac/prepare.py", line 500, in to_coords
    accumulate_metadata.metadata_to_coords(
  File "/home/henry/workspace/hrodmn.dev/env/lib/python3.10/site-packages/stackstac/accumulate_metadata.py", line 29, in metadata_to_coords
    return dict_to_coords(
  File "/home/henry/workspace/hrodmn.dev/env/lib/python3.10/site-packages/stackstac/accumulate_metadata.py", line 168, in dict_to_coords
    np.array(
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (4,) + inhomogeneous part.

@gjoseph92
Copy link
Owner

FYI @julianblue @hrodmn, I just released a version with this fixed (0.4.4). Sorry for the long wait.

@hrodmn
Copy link

hrodmn commented Jun 23, 2023

Thanks @gjoseph92!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants