You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Originally posted by ryanherring February 13, 2024
I was hoping to be able to provide a more comprehensive way to reproduce the issue I'm seeing, but it's triggering so rarely in a remote environment that I have not been able to reproduce it on my laptop. I figured it was worth asking and seeing how I may be able to gather more info. The issue I'm seeing is that this block of code never returns (hangs indefinitely):
A log message is printed out shortly after entering this block and then no other log messages are printed and the process never terminates or has any Python exception thrown.
INFO - GDAL signalled an error: err_no=2, msg='/io/gdal-3.6.4/gcore/gdalnodatamaskband.cpp, 258: cannot allocate 103862880 bytes'
The raster file I'm trying to open has the shape (6, 5478, 4740) and the dtype is float32. This read is also taking place in a loop that reads in other rasters of a similar size and this one is the 8th or so that's read in when the issue occurs.
I'm running this code over a very large number of rasters in a remote environment that uses Docker and runs in AWS. If I specify a small amount of memory (e.g. 4Gi) for the container, the call to read the data fails quickly with this exception:
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 594. MiB for an array with shape (6, 5478, 4740) and data type float32
When I run the same code with a container size of 16Gi, then I hit the issue described above (error message and task hangs until manually killed).
I want to make it clear that I understand that the size of 16Gi is insufficient to read all of the data into memory. If I set the memory higher, it succeeds. The issue is that I am only able to determine that the increase in memory is necessary by observing a hanging task and manually intervening. (Our team has code that automatically retries tasks with more memory if a task failed with an out of memory exception, but since the task is hanging, that logic is unable to kick in.)
If anyone has any thoughts on how to dig into this further, please let me know!
The text was updated successfully, but these errors were encountered:
Discussed in #3028
Originally posted by ryanherring February 13, 2024
I was hoping to be able to provide a more comprehensive way to reproduce the issue I'm seeing, but it's triggering so rarely in a remote environment that I have not been able to reproduce it on my laptop. I figured it was worth asking and seeing how I may be able to gather more info. The issue I'm seeing is that this block of code never returns (hangs indefinitely):
A log message is printed out shortly after entering this block and then no other log messages are printed and the process never terminates or has any Python exception thrown.
The raster file I'm trying to open has the shape
(6, 5478, 4740)
and thedtype
isfloat32
. This read is also taking place in a loop that reads in other rasters of a similar size and this one is the 8th or so that's read in when the issue occurs.I'm running this code over a very large number of rasters in a remote environment that uses Docker and runs in AWS. If I specify a small amount of memory (e.g. 4Gi) for the container, the call to read the data fails quickly with this exception:
When I run the same code with a container size of 16Gi, then I hit the issue described above (error message and task hangs until manually killed).
I want to make it clear that I understand that the size of 16Gi is insufficient to read all of the data into memory. If I set the memory higher, it succeeds. The issue is that I am only able to determine that the increase in memory is necessary by observing a hanging task and manually intervening. (Our team has code that automatically retries tasks with more memory if a task failed with an out of memory exception, but since the task is hanging, that logic is unable to kick in.)
If anyone has any thoughts on how to dig into this further, please let me know!
The text was updated successfully, but these errors were encountered: