Skip to content

Summarization of rasters across bands in Python with GDAL

License

Notifications You must be signed in to change notification settings

mstrimas/gdal-summarize

Repository files navigation

gdal-summarize

The goal of gdal-summarize.py is to summarize raster data across layers. There are two common use cases for this tool. The first is calculating a cell-wise summary across the bands of a raster file (e.g. a GeoTIFF). For example, given a multi-band input GeoTIFF file input.tif, to calculate the cell-wise mean of the first three bands use:

gdal-summarize.py input.tif --bands 1 2 3 --outfile output.tif

Alternatively, to compute the cell-wise mean across multiple GeoTIFF files (input1.tif, input2.tif, and input3.tif) use:

gdal-summarize.py input1.tif input2.tif input3.tif --outfile output.tif

If these input files have multiple bands, the default behavior is to summarize them across the first band of each file; however, the --bands argument can override this behavior:

# summarize across the second band of each file
gdal-summarize.py input1.tif input2.tif input3.tif --bands 2 --outfile output.tif
# summarize across band 1 of input1.tif and band 2 of input2.tif
gdal-summarize.py input1.tif input2.tif --bands 1 2 --outfile output.tif

For usage examples of gdal-summarize.py and a comparison with R, look at this document on the GitHub repo. The code for the excellent gdal_calc.py served as the starting point for gdal-summarize.py.

Summary Functions

The default behavior is to perform a cell-wise mean; however, other summary functions are available via the --function argument:

  • mean: cell-wise mean across layers.
  • median: cell-wise median across layers.
  • max: cell-wise max across layers.
  • sum: cell-wise sum across layers.
  • meannz: cell-wise mean across layers after removing zeros.
  • count: count the number layers with non-negative value for each cell.
  • richness: count the number of layers with positive values for each cell.

In all cases, these functions remove missing values (i.e. NoData or NA) prior to calculating the summary statistic. For example, the mean of the numners, 1, 2, and NoData is 1.5. This is similar to the way na.rm = TRUE works in R. I’d be happy to add additional summary functions, just open an issue or submit a pull request.

Example dataset

This repository contains two example datasets for testing gdal-summarize.py. The small example, consists of two GeoTIFF files (data/small1.tif and data/small2.tif) each of which has two bands with dimensions 3x3. The files are filled with count data (i.e. non-negative integers) and contain both zeros and missing values. These files are meant to be small enough that the correct results can be manually calculated to help understand how the functions are working. The data in the first file looks like this:

The large example, consists of two GeoTIFF files (data/large1.tif and data/large2.tif) each of which has 9 bands with dimensions 250x250. The files are filled with simulated species occupancy (i.e. real numbers between 0 and 1) and contain both zeros and missing values. These files were generated using the simulate_species() function from the prioritizr R package. The data in the first file looks like this:

Dependencies

gdal-summarize.py uses the GDAL Python 3 bindings and therefore requires GDAL be installed on your system. On Mac OS, this can be accomplished using homebrew with:

brew install gdal

Getting Help

Use gdal-summarize.py -h to get help and usage guidelines for this tool:

usage: gdal-summarize.py [-h] --outfile OUTFILE [--bands BANDS [BANDS ...]] [--function {mean,median,max,sum,meannz,count,richness}]
                         [--block_size BLOCK_SIZE BLOCK_SIZE] [--nrows NROWS] [--quiet] [--overwrite]
                         files [files ...]

Summarize a set of rasters layers.

positional arguments:
  files                 input raster file(s)

optional arguments:
  -h, --help            show this help message and exit
  --outfile OUTFILE, -o OUTFILE
                        output raster
  --bands BANDS [BANDS ...], -b BANDS [BANDS ...]
                        bands to summarize. single file: bands in this file to
                        summarize (default all bands); multiple files: bands
                        in corresponding files to summarize (default = 1)
  --function {mean,median,max,sum,meannz,count,richness}, -f {mean,median,max,sum,meannz,count,richness}
                        summarization function (default = 'mean')
  --block_size BLOCK_SIZE BLOCK_SIZE, -s BLOCK_SIZE BLOCK_SIZE
                        x and y dimensions of blocks to process (default based
                        on input)
  --nrows NROWS, -n NROWS
                        number of rows to process in a single block
                        (block_size ignored if provided)
  --overwrite, -w       overwrite existing file
  --creation-option CREATION_OPTIONS, --co CREATION_OPTIONS
                        passes one or more creation options to the output
                        format driver multiple
  --quiet, -q           supress messages

About

Summarization of rasters across bands in Python with GDAL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published