Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fickle API connection to S2 catalog that errors with RuntimeError: not recognized as a supported file format. #192

Open
FlorisCalkoen opened this issue Feb 23, 2023 · 5 comments

Comments

@FlorisCalkoen
Copy link

FlorisCalkoen commented Feb 23, 2023

Since yesterday (2023-02-22) 21:00 CET I have a very unstable connection when loading data from the S2 SR catalog.
A query like below often errors with RuntimeError: not recognized as a supported file format. (see details). Usually I run these on a dask.client.

import planetary_computer
import pystac_client
import stackstac
import rasterio
import rioxarray
import xarray as xr


catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)
bbox = [151.2945334307, -33.7448472377, 151.3229999588, -33.6917695125]

roi = {
    "type": "Polygon",
    "coordinates": [
        [
            [151.2945334307, -33.7448472377],
            [151.3229999588, -33.7448472377],
            [151.3229999588, -33.6917695125],
            [151.2945334307, -33.6917695125],
            [151.2945334307, -33.7448472377],
        ]
    ],
}

search = catalog.search(
    collections=["sentinel-2-l2a"],
    intersects=roi,
    datetime="2022-12-31/2023-02-01",
    query={"eo:cloud_cover": {"lt": 50}},
)
items = search.item_collection()

da = stackstac.stack(
    items,
    assets=["B02", "B03", "B04", "B08", "B11", "SCL"],
    bounds_latlon=bbox,
    resampling=rasterio.enums.Resampling.bilinear,
).compute()
2023-02-23 09:07:52,233 - distributed.worker - WARNING - Compute Failed
Key:       ('asset_table_to_reader_and_window-fetch_raster_window-33cfbe764bf9c7870b6cd517607e768e', 4, 1, 0, 0)
Function:  execute_task
args:      ((subgraph_callable-36480e7c-187a-45a2-8d8c-8768940161f4, (subgraph_callable-0424adc3-32ef-449d-90af-2fcd67fbfe0b, array([[('https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/56/H/LH/2023/01/25/S2B_MSIL2A_20230125T235229_N0400_R130_T56HLH_20230127T052848.SAFE/GRANULE/L2A_T56HLH_A030758_20230125T235228/IMG_DATA/R10m/T56HLH_20230125T235229_B03_10m.tif?st=2023-02-22T07%3A25%3A33Z&se=2023-02-24T07%3A25%3A33Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-02-23T06%3A42%3A59Z&ske=2023-03-02T06%3A42%3A59Z&sks=b&skv=2021-06-08&sig=A4QAd0NTlYyf0NKVTb3Ple4ASgU43Etg4IXELFQtJvY%3D', [ 300000., 6190240.,  409800., 6300040.])]],
      dtype=[('url', 'O'), ('bounds', '<f8', (4,))]), RasterSpec(epsg=32756, bounds=(341920.0, 6264820.0, 344670.0, 6270760.0), resolutions_xy=(10.0, 10.0)), <Resampling.bilinear: 1>, dtype('float64'), nan, True, None, (<class 'tuple'>, [RasterioIOError('HTTP response code: 404')]), <class 'stac
kwargs:    {}
Exception: 'RuntimeError(\'Error opening \\\'https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/56/H/LH/2023/01/25/S2B_MSIL2A_20230125T235229_N0400_R130_T56HLH_20230127T052848.SAFE/GRANULE/L2A_T56HLH_A030758_20230125T235228/IMG_DATA/R10m/T56HLH_20230125T235229_B03_10m.tif?st=2023-02-22T07%3A25%3A33Z&se=2023-02-24T07%3A25%3A33Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-02-23T06%3A42%3A59Z&ske=2023-03-02T06%3A42%3A59Z&sks=b&skv=2021-06-08&sig=A4QAd0NTlYyf0NKVTb3Ple4ASgU43Etg4IXELFQtJvY%3D\\\': RasterioIOError("\\\'/vsicurl/https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/56/H/LH/2023/01/25/S2B_MSIL2A_20230125T235229_N0400_R130_T56HLH_20230127T052848.SAFE/GRANULE/L2A_T56HLH_A030758_20230125T235228/IMG_DATA/R10m/T56HLH_20230125T235229_B03_10m.tif?st=2023-02-22T07%3A25%3A33Z&se=2023-02-24T07%3A25%3A33Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-02-23T06%3A42%3A59Z&ske=2023-03-02T06%3A42%3A59Z&sks=b&skv=2021-06-08&sig=A4QAd0NTlYyf0NKVTb3Ple4ASgU43Etg4IXELFQtJvY%3D\\\' not recognized as a supported file format.")\')'

---------------------------------------------------------------------------
RasterioIOError                           Traceback (most recent call last)
File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/stackstac/rio_reader.py:326, in _open()
    325 try:
--> 326     ds = SelfCleaningDatasetReader(
    327         self.url, sharing=False
    328     )
    329 except Exception as e:

File rasterio/_base.pyx:309, in rasterio._base.DatasetBase.__init__()

RasterioIOError: '/vsicurl/https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/56/H/LH/2023/01/25/S2B_MSIL2A_20230125T235229_N0400_R130_T56HLH_20230127T052848.SAFE/GRANULE/L2A_T56HLH_A030758_20230125T235228/IMG_DATA/R10m/T56HLH_20230125T235229_B03_10m.tif?st=2023-02-22T07%3A25%3A33Z&se=2023-02-24T07%3A25%3A33Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-02-23T06%3A42%3A59Z&ske=2023-03-02T06%3A42%3A59Z&sks=b&skv=2021-06-08&sig=A4QAd0NTlYyf0NKVTb3Ple4ASgU43Etg4IXELFQtJvY%3D' not recognized as a supported file format.

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[65], line 33
     20 search = catalog.search(
     21     collections=["sentinel-2-l2a"],
     22     intersects=roi,
     23     datetime="2022-12-31/2023-02-01",
     24     query={"eo:cloud_cover": {"lt": 50}},
     25 )
     26 items = search.item_collection()
     28 da = stackstac.stack(
     29     items,
     30     assets=["B02", "B03", "B04", "B08", "B11", "SCL"],
     31     bounds_latlon=bbox,
     32     resampling=rasterio.enums.Resampling.bilinear,
---> 33 ).compute()

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/xarray/core/dataarray.py:1089, in DataArray.compute(self, **kwargs)
   1070 """Manually trigger loading of this array's data from disk or a
   1071 remote source into memory and return a new array. The original is
   1072 left unaltered.
   (...)
   1086 dask.compute
   1087 """
   1088 new = self.copy(deep=False)
-> 1089 return new.load(**kwargs)

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/xarray/core/dataarray.py:1063, in DataArray.load(self, **kwargs)
   1045 def load(self: T_DataArray, **kwargs) -> T_DataArray:
   1046     """Manually trigger loading of this array's data from disk or a
   1047     remote source into memory and return this array.
   1048 
   (...)
   1061     dask.compute
   1062     """
-> 1063     ds = self._to_temp_dataset().load(**kwargs)
   1064     new = self._from_temp_dataset(ds)
   1065     self._variable = new._variable

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/xarray/core/dataset.py:746, in Dataset.load(self, **kwargs)
    743 import dask.array as da
    745 # evaluate all the dask arrays simultaneously
--> 746 evaluated_data = da.compute(*lazy_data.values(), **kwargs)
    748 for k, data in zip(lazy_data, evaluated_data):
    749     self.variables[k].data = data

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/dask/base.py:599, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    596     keys.append(x.__dask_keys__())
    597     postcomputes.append(x.__dask_postcompute__())
--> 599 results = schedule(dsk, keys, **kwargs)
    600 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/distributed/client.py:3137, in Client.get(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
   3135         should_rejoin = False
   3136 try:
-> 3137     results = self.gather(packed, asynchronous=asynchronous, direct=direct)
   3138 finally:
   3139     for f in futures.values():

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/distributed/client.py:2306, in Client.gather(self, futures, errors, direct, asynchronous)
   2304 else:
   2305     local_worker = None
-> 2306 return self.sync(
   2307     self._gather,
   2308     futures,
   2309     errors=errors,
   2310     direct=direct,
   2311     local_worker=local_worker,
   2312     asynchronous=asynchronous,
   2313 )

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/distributed/utils.py:338, in SyncMethodMixin.sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    336     return future
    337 else:
--> 338     return sync(
    339         self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    340     )

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/distributed/utils.py:405, in sync(loop, func, callback_timeout, *args, **kwargs)
    403 if error:
    404     typ, exc, tb = error
--> 405     raise exc.with_traceback(tb)
    406 else:
    407     return result

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/distributed/utils.py:378, in sync.<locals>.f()
    376         future = asyncio.wait_for(future, callback_timeout)
    377     future = asyncio.ensure_future(future)
--> 378     result = yield future
    379 except Exception:
    380     error = sys.exc_info()

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/tornado/gen.py:769, in Runner.run(self)
    766 exc_info = None
    768 try:
--> 769     value = future.result()
    770 except Exception:
    771     exc_info = sys.exc_info()

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/distributed/client.py:2169, in Client._gather(self, futures, errors, direct, local_worker)
   2167         exc = CancelledError(key)
   2168     else:
-> 2169         raise exception.with_traceback(traceback)
   2170     raise exc
   2171 if errors == "skip":

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/dask/optimization.py:990, in __call__()
    988 if not len(args) == len(self.inkeys):
    989     raise ValueError("Expected %d args, got %d" % (len(self.inkeys), len(args)))
--> 990 return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/dask/core.py:149, in get()
    147 for key in toposort(dsk):
    148     task = dsk[key]
--> 149     result = _execute_task(task, cache)
    150     cache[key] = result
    151 result = _execute_task(out, cache)

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/dask/core.py:119, in _execute_task()
    115     func, args = arg[0], arg[1:]
    116     # Note: Don't assign the subtask results to a variable. numpy detects
    117     # temporaries by their reference count and can execute certain
    118     # operations in-place.
--> 119     return func(*(_execute_task(a, cache) for a in args))
    120 elif not ishashable(arg):
    121     return arg

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/stackstac/to_dask.py:185, in fetch_raster_window()
    178 # Only read if the window we're fetching actually overlaps with the asset
    179 if windows.intersect(current_window, asset_window):
    180     # NOTE: when there are multiple assets, we _could_ parallelize these reads with our own threadpool.
    181     # However, that would probably increase memory usage, since the internal, thread-local GDAL datasets
    182     # would end up copied to even more threads.
    183 
    184     # TODO when the Reader won't be rescaling, support passing `output` to avoid the copy?
--> 185     data = reader.read(current_window)
    187     if all_empty:
    188         # Turn `output` from a broadcast-trick array to a real array, so it's writeable
    189         if (
    190             np.isnan(data)
    191             if np.isnan(fill_value)
    192             else np.equal(data, fill_value)
    193         ).all():
    194             # Unless the data we just read is all empty anyway

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/stackstac/rio_reader.py:385, in read()
    384 def read(self, window: Window, **kwargs) -> np.ndarray:
--> 385     reader = self.dataset
    386     try:
    387         result = reader.read(
    388             window=window,
    389             masked=True,
   (...)
    392             **kwargs,
    393         )

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/stackstac/rio_reader.py:381, in dataset()
    379 with self._dataset_lock:
    380     if self._dataset is None:
--> 381         self._dataset = self._open()
    382     return self._dataset

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/stackstac/rio_reader.py:337, in _open()
    332             warnings.warn(msg)
    333             return NodataReader(
    334                 dtype=self.dtype, fill_value=self.fill_value
    335             )
--> 337         raise RuntimeError(msg) from e
    338 if ds.count != 1:
    339     ds.close()

RuntimeError: Error opening 'https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/56/H/LH/2023/01/25/S2B_MSIL2A_20230125T235229_N0400_R130_T56HLH_20230127T052848.SAFE/GRANULE/L2A_T56HLH_A030758_20230125T235228/IMG_DATA/R10m/T56HLH_20230125T235229_B03_10m.tif?st=2023-02-22T07%3A25%3A33Z&se=2023-02-24T07%3A25%3A33Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-02-23T06%3A42%3A59Z&ske=2023-03-02T06%3A42%3A59Z&sks=b&skv=2021-06-08&sig=A4QAd0NTlYyf0NKVTb3Ple4ASgU43Etg4IXELFQtJvY%3D': RasterioIOError("'/vsicurl/https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/56/H/LH/2023/01/25/S2B_MSIL2A_20230125T235229_N0400_R130_T56HLH_20230127T052848.SAFE/GRANULE/L2A_T56HLH_A030758_20230125T235228/IMG_DATA/R10m/T56HLH_20230125T235229_B03_10m.tif?st=2023-02-22T07%3A25%3A33Z&se=2023-02-24T07%3A25%3A33Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-02-23T06%3A42%3A59Z&ske=2023-03-02T06%3A42%3A59Z&sks=b&skv=2021-06-08&sig=A4QAd0NTlYyf0NKVTb3Ple4ASgU43Etg4IXELFQtJvY%3D' not recognized as a supported file format.")

2023-02-23 09:07:52,385 - distributed.worker - WARNING - Compute Failed
Key:       ('asset_table_to_reader_and_window-fetch_raster_window-33cfbe764bf9c7870b6cd517607e768e', 2, 4, 0, 0)
Function:  execute_task
args:      ((subgraph_callable-36480e7c-187a-45a2-8d8c-8768940161f4, (subgraph_callable-0424adc3-32ef-449d-90af-2fcd67fbfe0b, array([[('https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/56/H/LH/2023/01/15/S2B_MSIL2A_20230115T235229_N0400_R130_T56HLH_20230116T091738.SAFE/GRANULE/L2A_T56HLH_A030615_20230115T235227/IMG_DATA/R20m/T56HLH_20230115T235229_B11_20m.tif?st=2023-02-22T07%3A25%3A33Z&se=2023-02-24T07%3A25%3A33Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-02-23T06%3A42%3A59Z&ske=2023-03-02T06%3A42%3A59Z&sks=b&skv=2021-06-08&sig=A4QAd0NTlYyf0NKVTb3Ple4ASgU43Etg4IXELFQtJvY%3D', [ 300000., 6190240.,  409800., 6300040.])]],
      dtype=[('url', 'O'), ('bounds', '<f8', (4,))]), RasterSpec(epsg=32756, bounds=(341920.0, 6264820.0, 344670.0, 6270760.0), resolutions_xy=(10.0, 10.0)), <Resampling.bilinear: 1>, dtype('float64'), nan, True, None, (<class 'tuple'>, [RasterioIOError('HTTP response code: 404')]), <class 'stac
kwargs:    {}
Exception: 'RuntimeError("Error reading Window(col_off=0, row_off=0, width=275, height=594) from \'https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/56/H/LH/2023/01/15/S2B_MSIL2A_20230115T235229_N0400_R130_T56HLH_20230116T091738.SAFE/GRANULE/L2A_T56HLH_A030615_20230115T235227/IMG_DATA/R20m/T56HLH_20230115T235229_B11_20m.tif?st=2023-02-22T07%3A25%3A33Z&se=2023-02-24T07%3A25%3A33Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-02-23T06%3A42%3A59Z&ske=2023-03-02T06%3A42%3A59Z&sks=b&skv=2021-06-08&sig=A4QAd0NTlYyf0NKVTb3Ple4ASgU43Etg4IXELFQtJvY%3D\': RasterioIOError(\'Read or write failed. IReadBlock failed at X offset 0, Y offset 0: IReadBlock failed at X offset 4, Y offset 2: TIFFReadEncodedTile() failed.\')")'


<\details>
@FlorisCalkoen FlorisCalkoen changed the title Unreliable Fickle API connection to S2 catalog that errors with RuntimeError: not recognized as a supported file format. Feb 23, 2023
@FlorisCalkoen
Copy link
Author

FlorisCalkoen commented Feb 23, 2023

I'm trying to find a reproducable example, but I can't fully track it down yet. This is another kind traceback I got with a similar snippet as above:

2023-02-23 11:35:15,800 - distributed.worker - WARNING - Compute Failed
Key:       ('asset_table_to_reader_and_window-fetch_raster_window-e1a79444dab8d9e2e7bc05b6293d1778', 0, 3, 0, 0)
Function:  execute_task
args:      ((subgraph_callable-35fd6af9-538f-4445-bc46-422dd5de4ead, (subgraph_callable-c44a256d-f930-43a0-86a9-5c78b0353ac1, array([[('https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/56/H/LH/2022/08/22/S2B_MSIL2A_20220822T000219_N0400_R030_T56HLH_20220822T162835.SAFE/GRANULE/L2A_T56HLH_A028513_20220822T000222/IMG_DATA/R10m/T56HLH_20220822T000219_B08_10m.tif?st=2023-02-22T10%3A13%3A55Z&se=2023-02-24T10%3A13%3A55Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-02-23T07%3A47%3A01Z&ske=2023-03-02T07%3A47%3A01Z&sks=b&skv=2021-06-08&sig=SH8dBq%2BwzHlm1LDwMrh4B0s7GRVysIyGVk%2B19y/hRRk%3D', [ 300000., 6190240.,  409800., 6300040.])]],
      dtype=[('url', 'O'), ('bounds', '<f8', (4,))]), RasterSpec(epsg=32756, bounds=(341928.23765310494, 6264827.991079295, 344662.6210193617, 6270757.102609981), resolutions_xy=(10.0, 10.0)), <Resampling.bilinear: 1>, dtype('float64'), nan, False, None, (<class 'tuple'>, [RasterioIOError('H
kwargs:    {}
Exception: 'RuntimeError("Error opening \'https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/56/H/LH/2022/08/22/S2B_MSIL2A_20220822T000219_N0400_R030_T56HLH_20220822T162835.SAFE/GRANULE/L2A_T56HLH_A028513_20220822T000222/IMG_DATA/R10m/T56HLH_20220822T000219_B08_10m.tif?st=2023-02-22T10%3A13%3A55Z&se=2023-02-24T10%3A13%3A55Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-02-23T07%3A47%3A01Z&ske=2023-03-02T07%3A47%3A01Z&sks=b&skv=2021-06-08&sig=SH8dBq%2BwzHlm1LDwMrh4B0s7GRVysIyGVk%2B19y/hRRk%3D\': RasterioIOError(\'503: Recv failure: Connection reset by peer\')")'

2023-02-23 11:35:15,801 - distributed.worker - WARNING - Compute Failed
Key:       ('asset_table_to_reader_and_window-fetch_raster_window-e1a79444dab8d9e2e7bc05b6293d1778', 0, 0, 0, 0)
Function:  execute_task
args:      ((subgraph_callable-35fd6af9-538f-4445-bc46-422dd5de4ead, (subgraph_callable-c44a256d-f930-43a0-86a9-5c78b0353ac1, array([[('https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/56/H/LH/2022/08/22/S2B_MSIL2A_20220822T000219_N0400_R030_T56HLH_20220822T162835.SAFE/GRANULE/L2A_T56HLH_A028513_20220822T000222/IMG_DATA/R10m/T56HLH_20220822T000219_B02_10m.tif?st=2023-02-22T10%3A13%3A55Z&se=2023-02-24T10%3A13%3A55Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-02-23T07%3A47%3A01Z&ske=2023-03-02T07%3A47%3A01Z&sks=b&skv=2021-06-08&sig=SH8dBq%2BwzHlm1LDwMrh4B0s7GRVysIyGVk%2B19y/hRRk%3D', [ 300000., 6190240.,  409800., 6300040.])]],
      dtype=[('url', 'O'), ('bounds', '<f8', (4,))]), RasterSpec(epsg=32756, bounds=(341928.23765310494, 6264827.991079295, 344662.6210193617, 6270757.102609981), resolutions_xy=(10.0, 10.0)), <Resampling.bilinear: 1>, dtype('float64'), nan, False, None, (<class 'tuple'>, [RasterioIOError('H
kwargs:    {}
Exception: 'RuntimeError(\'Error opening \\\'https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/56/H/LH/2022/08/22/S2B_MSIL2A_20220822T000219_N0400_R030_T56HLH_20220822T162835.SAFE/GRANULE/L2A_T56HLH_A028513_20220822T000222/IMG_DATA/R10m/T56HLH_20220822T000219_B02_10m.tif?st=2023-02-22T10%3A13%3A55Z&se=2023-02-24T10%3A13%3A55Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-02-23T07%3A47%3A01Z&ske=2023-03-02T07%3A47%3A01Z&sks=b&skv=2021-06-08&sig=SH8dBq%2BwzHlm1LDwMrh4B0s7GRVysIyGVk%2B19y/hRRk%3D\\\': RasterioIOError("\\\'/vsicurl/https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/56/H/LH/2022/08/22/S2B_MSIL2A_20220822T000219_N0400_R030_T56HLH_20220822T162835.SAFE/GRANULE/L2A_T56HLH_A028513_20220822T000222/IMG_DATA/R10m/T56HLH_20220822T000219_B02_10m.tif?st=2023-02-22T10%3A13%3A55Z&se=2023-02-24T10%3A13%3A55Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-02-23T07%3A47%3A01Z&ske=2023-03-02T07%3A47%3A01Z&sks=b&skv=2021-06-08&sig=SH8dBq%2BwzHlm1LDwMrh4B0s7GRVysIyGVk%2B19y/hRRk%3D\\\' not recognized as a supported file format.")\')'

2023-02-23 11:35:15,811 - distributed.worker - WARNING - Compute Failed
Key:       ('asset_table_to_reader_and_window-fetch_raster_window-e1a79444dab8d9e2e7bc05b6293d1778', 0, 1, 0, 0)
Function:  execute_task
args:      ((subgraph_callable-35fd6af9-538f-4445-bc46-422dd5de4ead, (subgraph_callable-c44a256d-f930-43a0-86a9-5c78b0353ac1, array([[('https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/56/H/LH/2022/08/22/S2B_MSIL2A_20220822T000219_N0400_R030_T56HLH_20220822T162835.SAFE/GRANULE/L2A_T56HLH_A028513_20220822T000222/IMG_DATA/R10m/T56HLH_20220822T000219_B03_10m.tif?st=2023-02-22T10%3A13%3A55Z&se=2023-02-24T10%3A13%3A55Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-02-23T07%3A47%3A01Z&ske=2023-03-02T07%3A47%3A01Z&sks=b&skv=2021-06-08&sig=SH8dBq%2BwzHlm1LDwMrh4B0s7GRVysIyGVk%2B19y/hRRk%3D', [ 300000., 6190240.,  409800., 6300040.])]],
      dtype=[('url', 'O'), ('bounds', '<f8', (4,))]), RasterSpec(epsg=32756, bounds=(341928.23765310494, 6264827.991079295, 344662.6210193617, 6270757.102609981), resolutions_xy=(10.0, 10.0)), <Resampling.bilinear: 1>, dtype('float64'), nan, False, None, (<class 'tuple'>, [RasterioIOError('H
kwargs:    {}
Exception: 'RuntimeError(\'Error opening \\\'https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/56/H/LH/2022/08/22/S2B_MSIL2A_20220822T000219_N0400_R030_T56HLH_20220822T162835.SAFE/GRANULE/L2A_T56HLH_A028513_20220822T000222/IMG_DATA/R10m/T56HLH_20220822T000219_B03_10m.tif?st=2023-02-22T10%3A13%3A55Z&se=2023-02-24T10%3A13%3A55Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-02-23T07%3A47%3A01Z&ske=2023-03-02T07%3A47%3A01Z&sks=b&skv=2021-06-08&sig=SH8dBq%2BwzHlm1LDwMrh4B0s7GRVysIyGVk%2B19y/hRRk%3D\\\': RasterioIOError("\\\'/vsicurl/https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/56/H/LH/2022/08/22/S2B_MSIL2A_20220822T000219_N0400_R030_T56HLH_20220822T162835.SAFE/GRANULE/L2A_T56HLH_A028513_20220822T000222/IMG_DATA/R10m/T56HLH_20220822T000219_B03_10m.tif?st=2023-02-22T10%3A13%3A55Z&se=2023-02-24T10%3A13%3A55Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-02-23T07%3A47%3A01Z&ske=2023-03-02T07%3A47%3A01Z&sks=b&skv=2021-06-08&sig=SH8dBq%2BwzHlm1LDwMrh4B0s7GRVysIyGVk%2B19y/hRRk%3D\\\' not recognized as a supported file format.")\')'

2023-02-23 11:35:15,858 - distributed.worker - WARNING - Compute Failed
Key:       ('asset_table_to_reader_and_window-fetch_raster_window-ce1b7bca382e9de462603fa6630a18a9', 0, 5, 0, 0)
Function:  execute_task
args:      ((subgraph_callable-724a8816-772a-4cd1-9dc5-2c737b6b9592, (subgraph_callable-5613dc44-6692-46dd-b3a3-711328c5dc8c, array([[('https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/56/H/LH/2022/11/20/S2B_MSIL2A_20221120T000219_N0400_R030_T56HLH_20221120T090830.SAFE/GRANULE/L2A_T56HLH_A029800_20221120T000221/IMG_DATA/R20m/T56HLH_20221120T000219_SCL_20m.tif?st=2023-02-22T10%3A13%3A55Z&se=2023-02-24T10%3A13%3A55Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-02-23T07%3A47%3A01Z&ske=2023-03-02T07%3A47%3A01Z&sks=b&skv=2021-06-08&sig=SH8dBq%2BwzHlm1LDwMrh4B0s7GRVysIyGVk%2B19y/hRRk%3D', [ 300000., 6190240.,  409800., 6300040.])]],
      dtype=[('url', 'O'), ('bounds', '<f8', (4,))]), RasterSpec(epsg=32756, bounds=(341928.23765310494, 6264827.991079295, 344662.6210193617, 6270757.102609981), resolutions_xy=(10.0, 10.0)), <Resampling.bilinear: 1>, dtype('float64'), nan, False, None, (<class 'tuple'>, [RasterioIOError('H
kwargs:    {}
Exception: 'RuntimeError(\'Error opening \\\'https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/56/H/LH/2022/11/20/S2B_MSIL2A_20221120T000219_N0400_R030_T56HLH_20221120T090830.SAFE/GRANULE/L2A_T56HLH_A029800_20221120T000221/IMG_DATA/R20m/T56HLH_20221120T000219_SCL_20m.tif?st=2023-02-22T10%3A13%3A55Z&se=2023-02-24T10%3A13%3A55Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-02-23T07%3A47%3A01Z&ske=2023-03-02T07%3A47%3A01Z&sks=b&skv=2021-06-08&sig=SH8dBq%2BwzHlm1LDwMrh4B0s7GRVysIyGVk%2B19y/hRRk%3D\\\': RasterioIOError("\\\'/vsicurl/https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/56/H/LH/2022/11/20/S2B_MSIL2A_20221120T000219_N0400_R030_T56HLH_20221120T090830.SAFE/GRANULE/L2A_T56HLH_A029800_20221120T000221/IMG_DATA/R20m/T56HLH_20221120T000219_SCL_20m.tif?st=2023-02-22T10%3A13%3A55Z&se=2023-02-24T10%3A13%3A55Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-02-23T07%3A47%3A01Z&ske=2023-03-02T07%3A47%3A01Z&sks=b&skv=2021-06-08&sig=SH8dBq%2BwzHlm1LDwMrh4B0s7GRVysIyGVk%2B19y/hRRk%3D\\\' not recognized as a supported file format.")\')'

---------------------------------------------------------------------------
RasterioIOError                           Traceback (most recent call last)
File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/stackstac/rio_reader.py:326, in _open()
    325 try:
--> 326     ds = SelfCleaningDatasetReader(
    327         self.url, sharing=False
    328     )
    329 except Exception as e:

File rasterio/_base.pyx:309, in rasterio._base.DatasetBase.__init__()

RasterioIOError: 503: Recv failure: Connection reset by peer

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[67], line 1
----> 1 da = dss.compute()

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/xarray/core/dataset.py:912, in Dataset.compute(self, **kwargs)
    893 """Manually trigger loading and/or computation of this dataset's data
    894 from disk or a remote source into memory and return a new dataset.
    895 Unlike load, the original dataset is left unaltered.
   (...)
    909 dask.compute
    910 """
    911 new = self.copy(deep=False)
--> 912 return new.load(**kwargs)

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/xarray/core/dataset.py:746, in Dataset.load(self, **kwargs)
    743 import dask.array as da
    745 # evaluate all the dask arrays simultaneously
--> 746 evaluated_data = da.compute(*lazy_data.values(), **kwargs)
    748 for k, data in zip(lazy_data, evaluated_data):
    749     self.variables[k].data = data

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/dask/base.py:599, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    596     keys.append(x.__dask_keys__())
    597     postcomputes.append(x.__dask_postcompute__())
--> 599 results = schedule(dsk, keys, **kwargs)
    600 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/distributed/client.py:3137, in Client.get(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
   3135         should_rejoin = False
   3136 try:
-> 3137     results = self.gather(packed, asynchronous=asynchronous, direct=direct)
   3138 finally:
   3139     for f in futures.values():

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/distributed/client.py:2306, in Client.gather(self, futures, errors, direct, asynchronous)
   2304 else:
   2305     local_worker = None
-> 2306 return self.sync(
   2307     self._gather,
   2308     futures,
   2309     errors=errors,
   2310     direct=direct,
   2311     local_worker=local_worker,
   2312     asynchronous=asynchronous,
   2313 )

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/distributed/utils.py:338, in SyncMethodMixin.sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    336     return future
    337 else:
--> 338     return sync(
    339         self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    340     )

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/distributed/utils.py:405, in sync(loop, func, callback_timeout, *args, **kwargs)
    403 if error:
    404     typ, exc, tb = error
--> 405     raise exc.with_traceback(tb)
    406 else:
    407     return result

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/distributed/utils.py:378, in sync.<locals>.f()
    376         future = asyncio.wait_for(future, callback_timeout)
    377     future = asyncio.ensure_future(future)
--> 378     result = yield future
    379 except Exception:
    380     error = sys.exc_info()

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/tornado/gen.py:769, in Runner.run(self)
    766 exc_info = None
    768 try:
--> 769     value = future.result()
    770 except Exception:
    771     exc_info = sys.exc_info()

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/distributed/client.py:2169, in Client._gather(self, futures, errors, direct, local_worker)
   2167         exc = CancelledError(key)
   2168     else:
-> 2169         raise exception.with_traceback(traceback)
   2170     raise exc
   2171 if errors == "skip":

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/dask/optimization.py:990, in __call__()
    988 if not len(args) == len(self.inkeys):
    989     raise ValueError("Expected %d args, got %d" % (len(self.inkeys), len(args)))
--> 990 return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/dask/core.py:149, in get()
    147 for key in toposort(dsk):
    148     task = dsk[key]
--> 149     result = _execute_task(task, cache)
    150     cache[key] = result
    151 result = _execute_task(out, cache)

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/dask/core.py:119, in _execute_task()
    115     func, args = arg[0], arg[1:]
    116     # Note: Don't assign the subtask results to a variable. numpy detects
    117     # temporaries by their reference count and can execute certain
    118     # operations in-place.
--> 119     return func(*(_execute_task(a, cache) for a in args))
    120 elif not ishashable(arg):
    121     return arg

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/stackstac/to_dask.py:185, in fetch_raster_window()
    178 # Only read if the window we're fetching actually overlaps with the asset
    179 if windows.intersect(current_window, asset_window):
    180     # NOTE: when there are multiple assets, we _could_ parallelize these reads with our own threadpool.
    181     # However, that would probably increase memory usage, since the internal, thread-local GDAL datasets
    182     # would end up copied to even more threads.
    183 
    184     # TODO when the Reader won't be rescaling, support passing `output` to avoid the copy?
--> 185     data = reader.read(current_window)
    187     if all_empty:
    188         # Turn `output` from a broadcast-trick array to a real array, so it's writeable
    189         if (
    190             np.isnan(data)
    191             if np.isnan(fill_value)
    192             else np.equal(data, fill_value)
    193         ).all():
    194             # Unless the data we just read is all empty anyway

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/stackstac/rio_reader.py:385, in read()
    384 def read(self, window: Window, **kwargs) -> np.ndarray:
--> 385     reader = self.dataset
    386     try:
    387         result = reader.read(
    388             window=window,
    389             masked=True,
   (...)
    392             **kwargs,
    393         )

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/stackstac/rio_reader.py:381, in dataset()
    379 with self._dataset_lock:
    380     if self._dataset is None:
--> 381         self._dataset = self._open()
    382     return self._dataset

File ~/mambaforge/envs/coastal/lib/python3.10/site-packages/stackstac/rio_reader.py:337, in _open()
    332             warnings.warn(msg)
    333             return NodataReader(
    334                 dtype=self.dtype, fill_value=self.fill_value
    335             )
--> 337         raise RuntimeError(msg) from e
    338 if ds.count != 1:
    339     ds.close()

RuntimeError: Error opening 'https://sentinel2l2a01.blob.core.windows.net/sentinel2-l2/56/H/LH/2022/08/22/S2B_MSIL2A_20220822T000219_N0400_R030_T56HLH_20220822T162835.SAFE/GRANULE/L2A_T56HLH_A028513_20220822T000222/IMG_DATA/R10m/T56HLH_20220822T000219_B08_10m.tif?st=2023-02-22T10%3A13%3A55Z&se=2023-02-24T10%3A13%3A55Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-02-23T07%3A47%3A01Z&ske=2023-03-02T07%3A47%3A01Z&sks=b&skv=2021-06-08&sig=SH8dBq%2BwzHlm1LDwMrh4B0s7GRVysIyGVk%2B19y/hRRk%3D': RasterioIOError('503: Recv failure: Connection reset by peer')

@FlorisCalkoen
Copy link
Author

FlorisCalkoen commented Feb 23, 2023

Just following up.. running the same stackstac.stack(...) might fail/succeed after running the same cell multiple times, so it seems to be on an issue on the server side.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Feb 23, 2023

xref gjoseph92/stackstac#18 and #11 (comment) where this came up before.

I think that the Storage Account is responding with a 503 error. GDAL / stackstac should retry on these, but you might need to do a bit of work to configure things properly. Ideally you wouldn't need to worry about it, but for now you might need to do some work to manually ensure that things are set up properly (maybe setting https://stackstac.readthedocs.io/en/latest/api/main/stackstac.stack.html#stackstac.stack.params.gdal_env, and maybe some environment variables like GDAL_HTTP_MAX_RETRY and GDAL_HTTP_RETRY_DELAY (on the workers if using Dask). I thought those environment variables were supposed to be set on the Hub, but perhaps not.)

When I have a chance, I'll look into setting those environment variables on the Hub by default. But in the meantime, and you aren't using the Hub, you'll need to set them.

@TomAugspurger TomAugspurger transferred this issue from microsoft/PlanetaryComputerDataCatalog Feb 23, 2023
@FlorisCalkoen
Copy link
Author

FlorisCalkoen commented Feb 23, 2023

@TomAugspurger , thank you for the suggestion. Just for completeness, the errors were both raised on Hub and in my local env.

Do you know what may have caused this change in behaviour between my previous use of MPC and current use? Could it be a daily/weekly/monthly quota issue? Although the requests I make are not necessary large(r), maybe slightly more often, so the total volume increases.

In hub the following gdal variables are set:

['GDAL_HTTP_MERGE_CONSECUTIVE_RANGES', 'GDAL_DRIVER_PATH', 'GDAL_DISABLE_READDIR_ON_OPEN', 'GDAL_DATA']

I now set os.environ["GDAL_HTTP_MAX_RETRY"] = "3" and so far no errors. At least not on my local. When using this in Hub I'll make sure to update this env variable as well on the workers.

@TomAugspurger
Copy link
Contributor

Do you know what may have caused this change in behaviour between my previous use of MPC and current use? Could it be a daily/weekly/monthly quota issue?

The errors you're seeing are from the storage account itself, and is a global limit shared between all users. It just so happened that you requested data when the storage account was near it's limit (serving requests to you and other users) and you got some 500 errors.

Most of the time the storage account isn't anywhere near it's limit, so don't need to worry about the retries. But it's safer to have them in place just in case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants