Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDAL Error Reported in logs, but rasterio method never returns #3093

Open
sgillies opened this issue May 17, 2024 Discussed in #3028 · 1 comment
Open

GDAL Error Reported in logs, but rasterio method never returns #3093

sgillies opened this issue May 17, 2024 Discussed in #3028 · 1 comment
Assignees
Labels
Milestone

Comments

@sgillies
Copy link
Member

Discussed in #3028

Originally posted by ryanherring February 13, 2024
I was hoping to be able to provide a more comprehensive way to reproduce the issue I'm seeing, but it's triggering so rarely in a remote environment that I have not been able to reproduce it on my laptop. I figured it was worth asking and seeing how I may be able to gather more info. The issue I'm seeing is that this block of code never returns (hangs indefinitely):

with rasterio.open("...") as src:
    data = src.read(masked=True)

A log message is printed out shortly after entering this block and then no other log messages are printed and the process never terminates or has any Python exception thrown.

INFO - GDAL signalled an error: err_no=2, msg='/io/gdal-3.6.4/gcore/gdalnodatamaskband.cpp, 258: cannot allocate 103862880 bytes'

The raster file I'm trying to open has the shape (6, 5478, 4740) and the dtype is float32. This read is also taking place in a loop that reads in other rasters of a similar size and this one is the 8th or so that's read in when the issue occurs.

I'm running this code over a very large number of rasters in a remote environment that uses Docker and runs in AWS. If I specify a small amount of memory (e.g. 4Gi) for the container, the call to read the data fails quickly with this exception:

numpy.core._exceptions._ArrayMemoryError: Unable to allocate 594. MiB for an array with shape (6, 5478, 4740) and data type float32

When I run the same code with a container size of 16Gi, then I hit the issue described above (error message and task hangs until manually killed).

I want to make it clear that I understand that the size of 16Gi is insufficient to read all of the data into memory. If I set the memory higher, it succeeds. The issue is that I am only able to determine that the increase in memory is necessary by observing a hanging task and manually intervening. (Our team has code that automatically retries tasks with more memory if a task failed with an out of memory exception, but since the task is hanging, that logic is unable to kick in.)

If anyone has any thoughts on how to dig into this further, please let me know!

@sgillies sgillies added this to the 1.4.0 milestone May 17, 2024
@sgillies sgillies self-assigned this May 17, 2024
@sgillies
Copy link
Member Author

We found a bug in GDAL. I will plan to patch GDAL when building the next 1.4 wheels, based on OSGeo/gdal#9926.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant