Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow reading .tiff extensions from storage #75

Merged
merged 2 commits into from
Sep 27, 2021

Conversation

JamesOConnor
Copy link
Contributor

@JamesOConnor JamesOConnor commented Sep 27, 2021

Hey @gjoseph92, thanks for all the hard work on the project - we're looking at using it at at Satellite Vu - this is just a small modification of the gdal environment to allow .tiff extensions to be read.

Copy link
Owner

@gjoseph92 gjoseph92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! As you can see from the comment I'm not actually familiar with this setting. If you are, can you explain what it does? Maybe we want to add other things here too.

@JamesOConnor
Copy link
Contributor Author

JamesOConnor commented Sep 27, 2021

I'm similarly unfamiliar with the details I'm afraid - I only know it needs updating (which I did for the sentinel 2 jp2's back in the day). There's a note in the titiler docs about it https://developmentseed.org/titiler/advanced/performance_tuning/ which sums up the extent of my knowledge!

@kylebarron might be able to help us with why we wouldn't just put loads of extensions in here, is there a performance cost?

@geospatial-jeff
Copy link

geospatial-jeff commented Sep 27, 2021

When GDAL opens a file with /vsicurl/ it will send HEAD requests to look for the file being opened and any potential sidecars (like external overviews). You'll notice it can send quite a lot of HEAD requests:

docker run --env CPL_CURL_VERBOSE=YES osgeo/gdal:latest gdalinfo /vsicurl/https://naipeuwest.blob.core.windows.net/naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.tif

If you add the CPL_VSIL_CURL_ALLOWED_EXTENSIONS="tif" environment variable it will send significantly less HEAD requests, because GDAL will only try to open .tif files while skipping sidecars.

docker run --env CPL_CURL_VERBOSE=YES --env CPL_VSIL_CURL_ALLOWED_EXTENSIONS="tif"  osgeo/gdal:latest gdalinfo /vsicurl/https://naipeuwest.blob.core.windows.net/naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.tif

While limiting the allowed extensions makes it significantly faster to open files by reducing the number of HEAD requests, if the extension of the file you are opening isn't included in this list, GDAL won't open it:

docker run --env CPL_CURL_VERBOSE=YES --env CPL_VSIL_CURL_ALLOWED_EXTENSIONS="tiff"  osgeo/gdal:latest gdalinfo /vsicurl/https://naipeuwest.blob.core.windows.net/naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.tif

ERROR 4: `/vsicurl/https://naipeuwest.blob.core.windows.net/naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.tif' does not exist in the file system, and is not recognized as a supported dataset name.
gdalinfo failed - unable to open '/vsicurl/https://naipeuwest.blob.core.windows.net/naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.tif'.

@JamesOConnor
Copy link
Contributor Author

Thanks for this @geospatial-jeff, hadn't seen CPL_CURL_VERBOSE for profiling before, that's very handy. I guess in this context tif,tiff makes sense, I don't suspect there are any other extensions which will be required for this library as it stands.

@gjoseph92
Copy link
Owner

@geospatial-jeff thanks for the explanation and the reminder! Yes, I suspected this was the setting to eliminate HEADs for sidecars, but couldn't remember which of the many settings it was. In that case, I'm going to

  1. Update the comment
  2. Add jp2 (there are probably other reasonable extensions as well, but that's the first that comes to mind)

@gjoseph92
Copy link
Owner

@geospatial-jeff I now reminded myself that we're also setting GDAL_DISABLE_READDIR_ON_OPEN="EMPTY_DIR" while opening datasets. This achieves the same thing as you described for CPL_VSIL_CURL_ALLOWED_EXTENSIONS="tif", by disabling searching for sidecar files.

open=dict(
GDAL_DISABLE_READDIR_ON_OPEN="EMPTY_DIR",
# ^ stop GDAL from requesting `.aux` and `.msk` files from the bucket (speeds up `open` time a lot)

Comparing the requests with none of these options set (thanks for the docker command btw, very helpful for testing all this out):

gabe ~ » docker run --env CPL_CURL_VERBOSE=YES osgeo/gdal:latest gdalinfo /vsicurl/https://naipeuwest.blob.core.windows.net/naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.tif 2>&1 | grep -E "HEAD|GET"
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.tif HTTP/1.1
> GET /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.tif HTTP/1.1
> GET /naip/v002/fl/2019/fl_60cm_2019/28080/ HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.tif.aux.xml HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.aux HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.AUX HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.tif.aux HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.tif.AUX HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.tif.ovr HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.tif.OVR HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.tif.msk HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.tif.MSK HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.XML HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.xml HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.IMD HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.imd HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.RPB HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.rpb HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.PVL HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.pvl HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215_rpc.txt HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215_RPC.TXT HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215_metadata.txt HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215_METADATA.TXT HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215_MTL.txt HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215_MTL.TXT HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/METADATA.DIM HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/metadata.dim HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215_metadata.xml HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215_METADATA.XML HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.TXT HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.txt HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.RPC HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.rpc HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.pass HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.PASS HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/summary.txt HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/SUMMARY.TXT HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/HDR060_se_17_060_20191215.txt HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/HDR060_se_17_060_20191215.TXT HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/HDR808060_se_17_060_20191215.txt HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/HDR808060_se_17_060_20191215.TXT HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/RPC060_se_17_060_20191215.txt HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/RPC060_se_17_060_20191215.TXT HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/RPC808060_se_17_060_20191215.txt HTTP/1.1
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/RPC808060_se_17_060_20191215.TXT HTTP/1.1

vs ALLOWED_EXTENSIONS:

gabe ~ » docker run --env CPL_CURL_VERBOSE=YES --env CPL_VSIL_CURL_ALLOWED_EXTENSIONS="tif" osgeo/gdal:latest gdalinfo /vsicurl/https://naipeuwest.blob.core.windows.net/naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.tif 2>&1 | grep -E "HEAD|GET"
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.tif HTTP/1.1
> GET /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.tif HTTP/1.1
> GET /naip/v002/fl/2019/fl_60cm_2019/28080/ HTTP/1.1

vs DISABLE_READDIR_ON_OPEN

gabe ~ » docker run --env CPL_CURL_VERBOSE=YES --env GDAL_DISABLE_READDIR_ON_OPEN="EMPTY_DIR" osgeo/gdal:latest gdalinfo /vsicurl/https://naipeuwest.blob.core.windows.net/naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.tif 2>&1 | grep -E "HEAD|GET"
> HEAD /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.tif HTTP/1.1
> GET /naip/v002/fl/2019/fl_60cm_2019/28080/m_2808060_se_17_060_20191215.tif HTTP/1.1

As we can see, DISABLE_READDIR_ON_OPEN also eliminates all the HEADs, plus eliminates an extra GET on the parent directory.

So I'm thinking (especially given my original comment in the code) we can just eliminate ALLOWED_EXTENSIONS entirely. I don't see what benefit we'd be getting from it.

AFAICT, this isn't doing anything for us since we have `GDAL_DISABLE_READDIR_ON_OPEN="EMPTY_DIR"`, so no sense in blocklisting other file extension.
@gjoseph92 gjoseph92 merged commit e5c7e0f into gjoseph92:main Sep 27, 2021
@gjoseph92 gjoseph92 added the needs-future-test Add a test for this in the future, once tests exist (#26) label Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-future-test Add a test for this in the future, once tests exist (#26)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants