-
Notifications
You must be signed in to change notification settings - Fork 523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python file opener VSI plugin #2898
Conversation
Resolves #2888 maybe
The missing piece was keeping a reference to opened files.
Very interesting. From what I'm understanding it definitely seems interesting. Right now, is this an additional interface or is this PR completely replacing the old interface(s)? I honestly haven't been using this functionality even though I was the one who originally added it. It was more a stepping stone for fsspec users, but I didn't need it directly for my usual work. |
Instead, we'll make a way for rasterio.open to accept them as the "fp" positional argument
@djhoese this would be an additional interface. I think FilePath will need to be deprecated and removed eventually. Your work won't be lost, though, it will live on in the new opener argument. |
@groutr @vincentsarago @snowman2 anything else you'd like to see before this gets merged? |
Eliminates the fsspec OpenFile special case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @sgillies 👍
Co-authored-by: Alan D. Snow <alansnow21@gmail.com>
* Rough draft of a new VSI plugin Resolves #2888 maybe * Getting closer * Working implementation! The missing piece was keeping a reference to opened files. * Add more tests * Add more documentation about opener * Improvements based on feedback from @rouault * Update docstrings * Update the VSI topic in docs * Don't accept fsspec OpenFile instances as opener Instead, we'll make a way for rasterio.open to accept them as the "fp" positional argument * Store contexts of vsipyopener file objects Eliminates the fsspec OpenFile special case * Strip file mode for the zipfile case * Update rasterio/_vsiopener.pyx Co-authored-by: Alan D. Snow <alansnow21@gmail.com> --------- Co-authored-by: Sean Gillies <seangillies@Seans-MacBook-Air.local> Co-authored-by: Alan D. Snow <alansnow21@gmail.com>
👋 I've been playing with this recently https://github.com/vincentsarago/vsifile Maybe I'm doing something absolutely wrong (and surely duplicating what fsspec is trying to solve) but I wanted to see if we could maybe create a super simple file handler with simple persistent cache (at least for file header). Sadly, right now it's pretty inefficient because GDAL won't merge request nor use internal cache (this is really weird to me) as I'm trying to explain in vincentsarago/vsifile#1 No real actions needed it's mostly a FYI |
FYI: I think the opener needs to implement https://github.com/OSGeo/gdal/blob/4d94400adc0a47fc5c9091adb8a73a5aadeb5d4a/port/cpl_vsil_plugin.cpp#L80-L97 to enable SMART multi range request :-) |
@vincentsarago Doesn't that assume that the python file-like object is multi-reader friendly? And thread safe? I don't think that is true for most file-like objects. |
Maybe, I'm not saying that all file-like object should support this
Yeah that's definitely something that should be tested (I usually work in single thread) |
Cython is not my forte so if someone is interested to help it will be really appreciated I'm just trying to add the # in https://github.com/rasterio/rasterio/blob/75f770a309c24c97c859b7b20985e10c00a4c514/rasterio/_vsiopener.pyx#L57-L58
callbacks_struct.read_multi_range = <VSIFilesystemPluginReadMultiRangeCallback>pyopener_read_multi_range # in https://github.com/rasterio/rasterio/blob/75f770a309c24c97c859b7b20985e10c00a4c514/rasterio/_vsiopener.pyx#L150
cdef int pyopener_read_multi_range(void *pFile,
int nRanges,
char *ppData,
vsi_l_offset[:] panOffsets,
size_t[:] panSizes) except -1 with gil:
"""Read several ranges of bytes from file.
Reads nRanges objects of panSizes[i] bytes from the indicated file at the offset panOffsets[i] into the buffer ppData[i].
Ranges must be sorted in ascending start offset, and must not overlap each other.
"""
cdef object file_obj = <object>pFile
file_obj.read_multi_range(nRanges, ppData, panOffsets, panSizes)
return 0 but the definition of I guess we'll also need to make it optional in some way! we can start a proper PR if people are interested Note: the |
ReadMultiRange() is only going to be called from a single thread at a time on a single VSILFILE* object |
cdef int pyopener_read_multi_range(void *pFile,
int nRanges,
void **ppData,
vsi_l_offset *panOffsets,
size_t *panSizes) except -1 with gil:
"""Read several ranges of bytes from file.
Reads nRanges objects of panSizes[i] bytes from the indicated file at the offset panOffsets[i] into the buffer ppData[i].
Ranges must be sorted in ascending start offset, and must not overlap each other.
ref: https://gdal.org/api/cpl.html#_CPPv419VSIFReadMultiRangeLiPPvPK12vsi_l_offsetPK6size_tP8VSILFILE
Parameters
----------
pFile: File handle
file handle opened with VSIFOpenL().
nRanges : int
number of ranges to read.
ppData : array of buffer
Array of nRanges buffer into which the data should be read
(ppData[i] must be at list panSizes[i] bytes).
panOffsets : array of int
Array of nRanges offsets at which the data should be read.
panSizes : array of in
Array of nRanges sizes of objects to read (in bytes).
int VSIPluginHandle::ReadMultiRange(int nRanges, **ppData, vsi_l_offset *panOffsets, size_t *panSizes)
{
return poFS->ReadMultiRange(cbData, nRanges, ppData, panOffsets, panSizes);
}
"""
cdef object file_obj = <object>pFile
# Convert panOffsets and panSizes to Python lists
cdef list offsets = [int(panOffsets[i]) for i in range(nRanges)]
cdef list sizes = [int(panSizes[i]) for i in range(nRanges)]
# Call the Python method with the converted arguments
cdef list python_data = file_obj.read_multi_range(nRanges, offsets, sizes)
for i in range(nRanges):
memcpy(ppData[i], <void*><char*>python_data[i], len(python_data[i]))
return 0 took me only 2 days 😅 (there might be a lot of issue with this code, but at least it works 😄 ) |
Note: There are 2 important variables I'm not sure if Rasterio's should have default for those or try to use the default from GDAL but having those set in the |
@sgillies happy to open an issue instead of commenting here! Quick question: is the I'm getting this kind of error sometimes 👇
|
At least for the initial design, it was not thread safe. We didn't have a lot of test/user cases for the initial proof of concept and the initial implementation has changed a couple times since then. Please create new issues for what you've been discussing here so the individual topics don't get lost and please mention the relevant people. |
* Support writing to Python files This depends on PR #2898. Resolves #2906. * Update change log * Fix PR reference * Use CPLError to convey opener failure out of the callback * Call CPLError with 4 arguments * Register opener for filename and mode Also make sure that deregistration happens on cleanup * Remove with from test * Update opener documentation --------- Co-authored-by: Sean Gillies <seangillies@Seans-MacBook-Air.local>
Resolves #2888
maybedefinitely.This PR adds a new
opener
keyword argument torasterio.open()
, making an analogy to theopener
kwarg of Python's builtinopen()
.The
opener
keyword argument must be a callable which takes a string as its one positional argument and returns a read-only Python file-like object withread
,seek
,tell
, andclose
methods. Theopener
is called by GDAL's virtual filesystem machinery and thus its file-like object serves GDAL's format drivers.Example usage:
When the opener needs extra arguments, one can use
functools.partial
.Applications of this include:
Currently we're testing with
io.open
,zipfile.ZipFile.open
, andfsspec.open
. I expect this to extend togzip
andfs_s3fs
openers in the same way.