The goal of gdal-summarize.py
is to summarize raster data across
layers. There are two common use cases for this tool. The first is
calculating a cell-wise summary across the bands of a raster file
(e.g. a GeoTIFF). For example, given a multi-band input GeoTIFF file
input.tif
, to calculate the cell-wise mean of the first three bands
use:
gdal-summarize.py input.tif --bands 1 2 3 --outfile output.tif
Alternatively, to compute the cell-wise mean across multiple GeoTIFF
files (input1.tif
, input2.tif
, and input3.tif
) use:
gdal-summarize.py input1.tif input2.tif input3.tif --outfile output.tif
If these input files have multiple bands, the default behavior is to
summarize them across the first band of each file; however, the
--bands
argument can override this behavior:
# summarize across the second band of each file
gdal-summarize.py input1.tif input2.tif input3.tif --bands 2 --outfile output.tif
# summarize across band 1 of input1.tif and band 2 of input2.tif
gdal-summarize.py input1.tif input2.tif --bands 1 2 --outfile output.tif
For usage examples of gdal-summarize.py
and a comparison with R, look
at this
document
on the GitHub repo. The code for the excellent
gdal_calc.py
served as the
starting point for gdal-summarize.py
.
The default behavior is to perform a cell-wise mean; however, other
summary functions are available via the --function
argument:
mean
: cell-wise mean across layers.median
: cell-wise median across layers.max
: cell-wise max across layers.sum
: cell-wise sum across layers.meannz
: cell-wise mean across layers after removing zeros.count
: count the number layers with non-negative value for each cell.richness
: count the number of layers with positive values for each cell.
In all cases, these functions remove missing values (i.e. NoData or NA)
prior to calculating the summary statistic. For example, the mean of the
numners, 1, 2, and NoData is 1.5. This is similar to the way
na.rm = TRUE
works in R. I’d be happy to add additional summary
functions, just open an
issue or
submit a pull
request.
This repository contains two example datasets for testing
gdal-summarize.py
. The small example, consists of two GeoTIFF
files (data/small1.tif
and data/small2.tif
) each of which has two
bands with dimensions 3x3. The files are filled with count data
(i.e. non-negative integers) and contain both zeros and missing values.
These files are meant to be small enough that the correct results can be
manually calculated to help understand how the functions are working.
The data in the first file looks like this:
The large example, consists of two GeoTIFF files (data/large1.tif
and data/large2.tif
) each of which has 9 bands with dimensions
250x250. The files are filled with simulated species occupancy
(i.e. real numbers between 0 and 1) and contain both zeros and missing
values. These files were generated using the simulate_species()
function from the prioritizr
R package. The
data in the first file looks like this:
gdal-summarize.py
uses the GDAL Python 3 bindings and therefore
requires GDAL be installed on your system. On Mac
OS, this can be accomplished using homebrew with:
brew install gdal
Use gdal-summarize.py -h
to get help and usage guidelines for this
tool:
usage: gdal-summarize.py [-h] --outfile OUTFILE [--bands BANDS [BANDS ...]] [--function {mean,median,max,sum,meannz,count,richness}]
[--block_size BLOCK_SIZE BLOCK_SIZE] [--nrows NROWS] [--quiet] [--overwrite]
files [files ...]
Summarize a set of rasters layers.
positional arguments:
files input raster file(s)
optional arguments:
-h, --help show this help message and exit
--outfile OUTFILE, -o OUTFILE
output raster
--bands BANDS [BANDS ...], -b BANDS [BANDS ...]
bands to summarize. single file: bands in this file to
summarize (default all bands); multiple files: bands
in corresponding files to summarize (default = 1)
--function {mean,median,max,sum,meannz,count,richness}, -f {mean,median,max,sum,meannz,count,richness}
summarization function (default = 'mean')
--block_size BLOCK_SIZE BLOCK_SIZE, -s BLOCK_SIZE BLOCK_SIZE
x and y dimensions of blocks to process (default based
on input)
--nrows NROWS, -n NROWS
number of rows to process in a single block
(block_size ignored if provided)
--overwrite, -w overwrite existing file
--creation-option CREATION_OPTIONS, --co CREATION_OPTIONS
passes one or more creation options to the output
format driver multiple
--quiet, -q supress messages