New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read errors when reading COGs from S3 with many threads #1828
Comments
@Kirill888 there was a thread recently on gdal-dev where @rouault suspects that there is a problem in the GDAL block cache or GTiff driver. https://lists.osgeo.org/pipermail/gdal-dev/2019-October/051016.html And there are reports that seem related like this one: OSGeo/gdal#1244. |
@sgillies thanks, I'll read through those, from a quick glance it does look like this is the same issue. |
The fact that larger reads are more likely to cause failures is compatible with cache hypothesis. I have tried disabling cache and still seen errors, but as discussed in that issue one can not disable cache fully, it is still being used. |
For the record, this recent thread has nothing to do with /vsicurl /vsis3 issues. It is a concurrency issue with writes to datasets and can be reproduced with only local files |
Proposed fix for the /vsis3/ issue in OSGeo/gdal#2012 |
Thanks @rouault ! @Kirill888 I'm going to try your example with a patched GDAL 3.0 now. |
@Kirill888 using your notebook, I didn't see any failures with my patched GDAL. |
I'm going to patch GDAL 2.4.3 in the wheels we upload to PyPI. See rasterio/rasterio-wheels#30. |
Expected behavior and actual behavior.
I'm reading a bunch of Cloud Optimized GeoTIFF images from a public S3 bucket with
aws_unsigned=True
option. I'm using many threads to speed things up. As I increase number of threads "high enough", I start seeing errors of this kind:I suspect the problem is in GDAL code to be honest. Looks like error is triggered more often when reading the entire raster at once. I have not been able to trigger error when reading data in stripes in the test example linked below, but I have seen it happen elsewhere even when reading a part of the file.
I believe this is the same issue as reported in #1686.
Steps to reproduce the problem.
see here:
https://gist.github.com/Kirill888/55148f21e0dcc2cf3d88e9e6abd349f7
With this example I have only observed errors at read time, but in "production" I seen failures during open as well. I'm using public bucket for convenience of reporting, but I have observed same problems with signed s3 requests.
Operating system
Ubuntu 18.04, running in AWS
us-west-2
,m5.4xlarge
.Rasterio version and provenance
1.1.0 installed in binary mode from pypi
The text was updated successfully, but these errors were encountered: