Skip to content

Commit

Permalink
Merge branch 'master' into cinit
Browse files Browse the repository at this point in the history
  • Loading branch information
Sean Gillies committed Nov 24, 2020
2 parents fac30a0 + ea8220b commit 104e25d
Show file tree
Hide file tree
Showing 70 changed files with 2,317 additions and 428 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ target/
.idea/
venv/
venv2/
.vscode/

# rasterio
gdal-config.txt
Expand Down
50 changes: 46 additions & 4 deletions CHANGES.txt
Original file line number Diff line number Diff line change
@@ -1,11 +1,53 @@
Changes
=======

1.2.0 (TBD)
-----------
1.2dev
------

- epsg_treats_as_latlong() and epsg_treats_as_northingeasting() functions have
been added to rasterio.crs (#1943).
1.1.8 (2020-10-20)
------------------

- Multipolyons passed to rasterize are flattened to avoid holes in output
(#2014).
- If the certifi package can be imported, its certificate store location will
be passed to GDAL during import of rasterio._env unless CURL_CA_BUNDLE is
already set (#2009).

1.1.7 (2020-09-29)
------------------

- Add missing methods needed to determine whether GDAL treats a CRS as lat/long
or northing/easting (#1943).
- Wrap calls to GDALChecksumImage so that errors set by GDAL are propagated to
Python as a RasterioIOError.
- Raise RasterioDeprecationWarning when a dataset opened in modes other than
'r' is given to the WarpedVRT constructor.
- Base RasterioDeprecationWarning on FutureWarning, following the
recommendation of PEP 565.
- Fix a segmentation fault that occurs when a WarpedVRT closes after the
dataset it references has been previously closed (#2001).
- Add resampling option to merge and rio-merge (#1996).

1.1.6 (2020-09-14)
------------------

- Remove background layer from boundless VRT (#1982). It's not needed since
fixes in GDAL after 3.1.3. Wheels on PyPI for rasterio 1.1.6 will patch GDAL
2.4.4 to fix those GDAL issues.
- Clean up VSI files left by MemoryFileBase, resolving #1953.
- Do not pass empty coordinate arrays to warp._transform to avoid crashes with
some versions of GDAL as reported in #1952. Instead, directly return empty
output arrays.
- Properly convert block size `--co` option values to int in rio-clip and
rio-warp to prevent exceptions reported in #1989.
- Fail gracefully when rio-convert lacks an input file (#1985).
- Allow merge.merge() to open one dataset at a time (#1831).
- Optimize CRS.__eq__() for CRS described by EPSG codes.
- Fix bug in ParsedPath.is_remote() reported in #1967.
- The reproject() method accepts objects that provide `__array__` in addition
to instances of numpy.ndarray (#1957, #1959).
- Custom labels may be used with show_hist() by giving the `label` keyword
argument a sequence of label strings, one per band.

1.1.5 (2020-06-02)
------------------
Expand Down
4 changes: 3 additions & 1 deletion CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ Cython language is a superset of Python. Cython files end with ``.pyx`` and

Rasterio supports Python 2 and Python 3 in the same code base, which is
aided by an internal compatibility module named ``compat.py``. It functions
similarly to the more widely known `six <https://pythonhosted.org/six/>`__ but
similarly to the more widely known `six <https://six.readthedocs.io/>`__ but
we only use a small portion of the features so it eliminates a dependency.

We strongly prefer code adhering to `PEP8
Expand Down Expand Up @@ -217,6 +217,8 @@ package layout.

To run the entire suite and the code coverage report:

Note: rasterio must be installed in editable mode in order to run tests.

.. code-block:: console
$ py.test --cov rasterio --cov-report term-missing
Expand Down
28 changes: 20 additions & 8 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ Rasterio

Rasterio reads and writes geospatial raster data.

.. image:: https://travis-ci.org/mapbox/rasterio.png?branch=master
:target: https://travis-ci.org/mapbox/rasterio
.. image:: https://travis-ci.com/mapbox/rasterio.png?branch=master
:target: https://travis-ci.com/mapbox/rasterio

.. image:: https://coveralls.io/repos/github/mapbox/rasterio/badge.svg?branch=master
:target: https://coveralls.io/github/mapbox/rasterio?branch=master
Expand All @@ -21,9 +21,8 @@ channels.

**GDAL Compatibility:**

* Rasterio ~= 1.1.0 requires GDAL >= 1.11, < 3.1
* Rasterio ~= 1.0.25 requires GDAL >= 1.11, < 3.1
* Rasterio ~= 1.0.0, < 1.0.25 requires GDAL >= 1.11, < 3.0
* Rasterio ~= 1.2.0 requires GDAL >= 3.0
* Rasterio ~= 1.1.0 requires GDAL >= 1.11, < 3.3

Read the documentation for more details: https://rasterio.readthedocs.io/.

Expand Down Expand Up @@ -300,12 +299,25 @@ cannot rely on gdal-config, which is only present on UNIX systems, to discover
the locations of header files and libraries that rasterio needs to compile its
C extensions. On Windows, these paths need to be provided by the user. You
will need to find the include files and the library files for gdal and use
setup.py as follows.
setup.py as follows. You will also need to specify the installed gdal version
through the GDAL_VERSION environment variable.

.. code-block:: console
$ python setup.py build_ext -I<path to gdal include files> -lgdal_i -L<path to gdal library>
$ python setup.py install
$ python setup.py build_ext -I<path to gdal include files> -lgdal_i -L<path to gdal library> install
With pip

.. code-block:: console
$ pip install --no-use-pep517 --global-option -I<path to gdal include files> -lgdal_i -L<path to gdal library> .
Note: :code:`--no-use-pep517` is required as pip currently hasn't implemented a
way for optional arguments to be passed to the build backend when using PEP 517.
See `here <https://github.com/pypa/pip/issues/5771>`__. for more details.

Alternatively environment variables (e.g. INCLUDE and LINK) used by MSVC compiler can be used to point
to include directories and library files.

We have had success compiling code using the same version of Microsoft's
Visual Studio used to compile the targeted version of Python (more info on
Expand Down
18 changes: 15 additions & 3 deletions docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Binary wheels with the GDAL, GEOS, and PROJ4 libraries included are available
for OS X versions 10.7+ starting with Rasterio version 0.17. To install,
run ``pip install rasterio``. These binary wheels are preferred by newer
versions of pip. If you don't want these wheels and want to install from
a source distribution, run ``pip install rasterio --no-use-wheel`` instead.
a source distribution, run ``pip install rasterio --no-binary`` instead.

The included GDAL library is fairly minimal, providing only the format drivers
that ship with GDAL and are enabled by default. To get access to more formats,
Expand Down Expand Up @@ -109,8 +109,20 @@ setup.py as follows.

.. code-block:: console
$ python setup.py build_ext -I<path to gdal include files> -lgdal_i -L<path to gdal library>
$ python setup.py install
$ python setup.py build_ext -I<path to gdal include files> -lgdal_i -L<path to gdal library> install
With pip

.. code-block:: console
$ pip install --no-use-pep517 --global-option -I<path to gdal include files> -lgdal_i -L<path to gdal library> .
Note: :code:`--no-use-pep517` is required as pip currently hasn't implemented a
way for optional arguments to be passed to the build backend when using PEP 517.
See `here <https://github.com/pypa/pip/issues/5771>`__. for more details.

Alternatively environment variables (e.g. INCLUDE and LINK) used by MSVC compiler can be used to point
to include directories and library files.

We have had success compiling code using the same version of Microsoft's
Visual Studio used to compile the targeted version of Python (more info on
Expand Down
119 changes: 76 additions & 43 deletions docs/topics/concurrency.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,25 @@ function, which means that Python threads can read and write concurrently.

The Numpy library also often releases the GIL, e.g., in applying
universal functions to arrays, and this makes it possible to distribute
processing of an array across cores of a processor. The Cython function
below, included in Rasterio's ``_example`` module, simulates such
a GIL-releasing raster processing function.
processing of an array across cores of a processor.

This means that it is possible to parallelize tasks that need to be performed
for a set of windows/pixels in the raster. Reading, writing and processing can
always be done concurrently. But it depends on the hardware and where the
bottlenecks are, how much of a speedup can be obtained. In the case that the
processing function releases the GIL, multiple threads processing
simultaneously can lead to further speedups.

.. note::
If you wish to do multiprocessing that is not trivially parallelizable
accross very large images that do not fit in memory, or if you wish to
do multiprocessing across multiple machines. You might want to have a
look at `dask <https://dask.org/>`__ and in particular this
`example <https://examples.dask.org/applications/satellite-imagery-geotiff.html>`__.

The Cython function below, included in Rasterio's ``_example`` module,
simulates a GIL-releasing CPU-intensive raster processing function. You can
also easily create GIL-releasing functions by using `numba <https://numba.pydata.org/>`__

.. code-block:: python
Expand Down Expand Up @@ -43,7 +59,10 @@ a GIL-releasing raster processing function.
output_view[~i, j, k] = <unsigned char>val
return output
Here is the program in examples/thread_pool_executor.py.
Here is the program in examples/thread_pool_executor.py. It is set up in such
a way that at most 1 thread is reading and at most 1 thread is writing at the
same time. Processing is not protected by a lock and can be done by multiple
threads simultaneously.

.. code-block:: python
Expand All @@ -58,6 +77,8 @@ Here is the program in examples/thread_pool_executor.py.
"""
import concurrent.futures
import multiprocessing
import threading
import rasterio
from rasterio._example import compute
Expand All @@ -70,66 +91,78 @@ Here is the program in examples/thread_pool_executor.py.
reversed.
"""
with rasterio.Env():
with rasterio.open(infile) as src:
with rasterio.open(infile) as src:
# Create a destination dataset based on source params. The
# destination will be tiled, and we'll process the tiles
# concurrently.
profile = src.profile
profile.update(blockxsize=128, blockysize=128, tiled=True)
# Create a destination dataset based on source params. The
# destination will be tiled, and we'll process the tiles
# concurrently.
profile = src.profile
profile.update(blockxsize=128, blockysize=128, tiled=True)
with rasterio.open(outfile, "w", **src.profile) as dst:
windows = [window for ij, window in dst.block_windows()]
with rasterio.open(outfile, "w", **profile) as dst:
# We cannot write to the same file from multiple threads
# without causing race conditions. To safely read/write
# from multiple threads, we use a lock to protect the
# DatasetReader/Writer
read_lock = threading.Lock()
write_lock = threading.Lock()
# Materialize a list of destination block windows
# that we will use in several statements below.
windows = [window for ij, window in dst.block_windows()]
def process(window):
with read_lock:
src_array = src.read(window=window)
# This generator comprehension gives us raster data
# arrays for each window. Later we will zip a mapping
# of it with the windows list to get (window, result)
# pairs.
data_gen = (src.read(window=window) for window in windows)
# The computation can be performed concurrently
result = compute(src_array)
with concurrent.futures.ThreadPoolExecutor(
max_workers=num_workers
) as executor:
with write_lock:
dst.write(result, window=window)
# We map the compute() function over the raster
# data generator, zip the resulting iterator with
# the windows list, and as pairs come back we
# write data to the destination dataset.
for window, result in zip(
windows, executor.map(compute, data_gen)
):
dst.write(result, window=window)
# We map the process() function over the list of
# windows.
with concurrent.futures.ThreadPoolExecutor(
max_workers=num_workers
) as executor:
executor.map(process, windows)
The code above simulates a CPU-intensive calculation that runs faster when
spread over multiple cores using the ``ThreadPoolExecutor`` from Python 3's
``concurrent.futures`` module. Compared to the case of one concurrent job
``concurrent.futures`` module. Compared to the case of one concurrent job
(``-j 1``),

.. code-block:: console
$ time python examples/thread_pool_executor.py tests/data/RGB.byte.tif /tmp/test.tif -j 1
real 0m3.555s
user 0m3.422s
sys 0m0.095s
real 0m4.277s
user 0m4.356s
sys 0m0.184s
we get an almost 3x speed up with four concurrent jobs.
we get over 3x speed up with four concurrent jobs.

.. code-block:: console
$ time python examples/thread_pool_executor.py tests/data/RGB.byte.tif /tmp/test.tif -j 4
real 0m1.247s
user 0m3.505s
sys 0m0.088s
real 0m1.251s
user 0m4.402s
sys 0m0.168s
.. note::
If the function that you'd like to map over raster windows doesn't release the
GIL, you unfortunately cannot simply replace ``ThreadPoolExecutor`` with
``ProcessPoolExecutor``, the DatasetReader/Writer cannot be shared by multiple
processes, which means that each process needs to open the file seperately,
or you can do all the reading and writing from the main thread, as shown in
this next example. This is much less efficient memory wise, however.

.. code-block:: python
arrays = [src.read(window=window) for window in windows]
If the function that you'd like to map over raster windows doesn't release
the GIL, you can replace ``ThreadPoolExecutor`` with ``ProcessPoolExecutor``
and get the same results with similar performance.
with concurrent.futures.ProcessPoolExecutor(
max_workers=num_workers
) as executor:
futures = executor.map(compute, arrays)
for window, result in zip(windows, futures):
dst.write(result, window=window)
29 changes: 29 additions & 0 deletions docs/topics/georeferencing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,32 @@ a pixel's image coordinates are ``x, y`` and its world coordinates are

The ``Affine`` class has some useful properties and methods
described at https://github.com/sgillies/affine.

Some datasets may not have an affine transformation matrix, but are still georeferenced.

Ground Control Points
----------------------

A ground control point (GCP) is the mapping of a dataset's row and pixel coordinate to a
single world x, y, and optionally z coordinate. Typically a dataset will have multiple
GCPs distributed across the image. Rasterio can calculate an affine transformation matrix
from a collection of GCPs using the ``rasterio.transform.from_gcps`` method.

Rational Polynomial Coefficients
---------------------------------

A dataset may also be georeferenced with a set of rational polynomial coefficients (RPCs)
which can be used to compute pixel coordinates from x, y, and z coordinates. The RPCs are
an application of the Rigorous Projection Model which uses four sets of 20 term cubic polynomials
and several normalizing parameters to establish a relationship between image and world coordinates.
RPCs are defined with image coordinates in pixel units and world coordinates in decimal
degrees of longitude and latitude and height above the WGS84 ellipsoid (EPSG:4326).

RPCs are usually provided by the dataset provider and are only well behaved over the
extent of the image. Additionally, accurate height values are required for the best
results. Datasets with low terrain variation may use an average height over the extent of
the image, while datasets with higher terrain variation should use a digital elevation
model to sample height values.The coordinate transformation from world to pixel
coordinates is exact while the reverse is not, and must be computed iteratively. For more
details on coordinate transformations using RPCs see
https://gdal.org/api/gdal_alg.html#_CPPv424GDALCreateRPCTransformerP11GDALRPCInfoidPPc
2 changes: 1 addition & 1 deletion docs/topics/masks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Consider Rasterio's RGB.byte.tif test dataset. It has 718 rows and 791
columns of pixels. Each pixel has 3 8-bit (uint8) channels or bands. It has a
trapezoid of image data within a rectangular background of 0,0,0 value pixels.

.. image:: https://www.dropbox.com/s/sg7qejccih5m4ah/RGB.byte.jpg?dl=1
.. image:: ../img/RGB.byte.jpg

Metadata in the dataset declares that values of 0 will be interpreted as
invalid data or *nodata* pixels. In, e.g., merging the image with adjacent
Expand Down
2 changes: 1 addition & 1 deletion docs/topics/memory-files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ Reading MemoryFiles

Like ``BytesIO``, ``MemoryFile`` implements the Python file protocol and
provides ``read()``, ``seek()``, and ``tell()`` methods. Instances are thus suitable
as arguments for methods like `requests.post() <http://docs.python-requests.org/en/master/api/#requests.post>`__.
as arguments for methods like `requests.post() <https://requests.readthedocs.io/en/latest/api/#requests.post>`__.

.. code-block:: python
Expand Down
Loading

0 comments on commit 104e25d

Please sign in to comment.