-
Notifications
You must be signed in to change notification settings - Fork 523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accessing datasets located in buffers using MemoryFile and ZipMemoryFile #977
Comments
@sgillies - quick question on one part of this document (which is very useful by the way).
Can you help me understand the technical reasons as to why this is the case? Would it be possible in rasterio to pass file-like objects that are not immediately read into memory? Obviously some streaming access is possible for other buffer types (local file, s3, etc.). |
Yes, you would think that you can pass any type compatible with that MemoryFile interface, where methods to return sets of bytes from the target were implemented to read on demand, not hold everything literally in memory. |
@jhamman GDAL requires a |
Rasterio has different ways to access datasets located on disk or at network addresses and datasets located in memory buffers. This document explains the former once again and then introduces the latter for the first time.
Accessing datasets on your filesystem
To access datasets on disk, give a filesystem path to
rasterio.open()
.Equivalently, use a
file://
URL.Accessing datasets in a zip archive
To access a dataset located in a local zip file, pass a
zip://
URL (Apache VFS style) torasterio.open()
.Accessing network datasets
Datasets at
http://
,https://
, ors3://
(AWS CLI style) network locations can be accessed by passing these locators torasterio.open()
. See #942 for details.The difference from GDAL
If you're a GDAL user, you may be used to passing strings like
/vsizip/foo.zip
to call for zip file handling and strings like/viscurl/https://example.com/foo.tif
to call for HTTP protocol handling. Rasterio registers handlers by URL schemes instead. Rasterio uses GDAL's special strings internally, but they are not part of the Rasterio API.Accessing datasets in memory buffers
Rasterio can access datasets located in the buffers of Python objects without writing the buffers to disk. To see, open and read any GeoTIFF file.
The buffer of
data
's value contains that GeoTIFF. To make it available to Rasterio (and GDAL), givedata
to aMemoryFile
and then open the dataset usingMemoryFile.open()
.As there is only one dataset per
MemoryFile
,MemoryFile.open()
needs no filename or path argument. In many cases the usage can be condensed to the following.MemoryFile
is like Python'sBytesIO
class but has an additional special feature: the bytes buffer is mapped to a virtual file for use by GDAL. The virtual file is deleted when theMemoryFile
closes.You can also pass a file-like object opened in binary mode to
MemoryFile()
. This is for convenience only, the bytes of the file are read immediately into abytes
object.Note that the profile and band data of that dataset have been captured for use in other examples below.
Performance notes
Recognize the above as a more memory-intensive way of getting the same results as the very first example in this document. Generally speaking, raster data formats are optimized for random access and GDAL format drivers need datasets to be written entirely onto disk or into memory and mapped to a virtual file. Using
MemoryFile
to hold a large GeoTIFF doesn't requirea hard disk (which is good for serverless applications) but loads the entire GeoTIFF into RAM.
Writing to MemoryFile
A
MemoryFile
can also be written. You can create a GeoTIFF (for example) in memory and then stream its bytes elsewhere without writing to disk. In this case you must bind theMemoryFile
to a name so it can be referenced later.Writing band data to the opened dataset modifies the virtual file and consequently the
MemoryFile
buffer.Be kind: rewind
Note well: after
dataset
closes, thememfile
position is left at its end.Zip files in a buffer
The
ZipMemoryFile
class is mostly the same, but is for use with a buffer that contains a zip archive.This is much the same interface as that of
zipfile.ZipFile
.Writing in-memory zip files
Writing to a
ZipMemoryFile
is not currently supported, but it is possible to do so using Python'szipfile
library and Rasterio'sMemoryFile
together.Final notes on convenience features
By popular request,
rasterio.open()
can also take a file object opened in binary modes 'rb' or 'wb' as its first argument.A
MemoryFile
is created internally to hold the bytes read from the input file object. This is therefore not the best way to read or write datasets already on disk and addressable by name.As is the case for every printed profile, the output is the following.
Rasterio has different ways to access datasets located on disk or at network addresses and datasets located in memory buffers. The features are acquired from GDAL, but the abstractions are different, more Pythonic.
The text was updated successfully, but these errors were encountered: