# Aggregation (spatial downsampling)

### Categorical aggregation

The core aggregation code is written in Cython, in raster_utilities.aggregation.spatial.core.categorical.pyx. 

A helper class raster_utilities.aggregation.spatial.SpatialAggregator is provided to manage calling the Cython code.

This notebook demonstrates using the helper class to aggregate a series of categorical-type raster files, for example to convert the MODIS 30 arcsecond (~1km) grids into 2.5 arcminute (~5km) grids.

Categorical rasters can be aggregated to produce one class proportion and one like adjacency grid for each of the input values, and a single majority grid. This assumes (requires) that the input grids have a small-ish number of unique values, and are in unsigned 8 bit integer format (it was written for the MCD12Q1 BRDF landcover data).

The code has been written to read input rasters of theoreticlly unlimited size, which are read in tiles to build up the output coarser / smaller grids; memory use is determined by the size of the output files (and the number of categories, i.e. output files that are created).

In [2]:
# The helper class
import raster_utilities.aggregation.spatial.SpatialAggregator

In [5]:
# Enumerations to provide acceptable values for the aggregation parameters,
# avoid having to remember strings
from raster_utilities.aggregation.aggregation_values import *

In [2]:
import glob

### Run a categorical aggregation across a series of files in a folder

### Code not currently functional

In [3]:
# The files to be aggregated should be provided as a list of filepaths. 
# (Just make a single-item list for one file)
inCatFiles = glob.glob(r'\\map-fs1.ndph.ox.ac.uk\map_data\mastergrids\MODIS_Global\MCD12Q1_Annual_Landcover\500m_Raw\A20*.tif')

# Also provide the output folder
outDir = r'C:\temp\testagg_categorical_output'

Specify the output nodata value (it doesn't have to be the same as the input, incoming NDV will be read from the files (better be set properly!)

In [4]:
ndvOut = -9999

Specify the aggregation statistics to create. This must be a list of items from the ContinuousAggregationStats enumeration, or their string representations.

In [4]:
#e.g.
# stats = [CategoricalAggregationStats.FRACTIONS]

# or to do all of them use this convenienct:
stats = CategoricalAggregationStats.ALL
aggArgs = {"categories":range(0,2),
          "resolution":"5km"}

Finally configure the aggregation. The final parameter for the SpatialAggregator constructor should be a dictionary that configures how the aggregation will run. 

* This must have a key "categories" the value of which is a list of 8-bit integer values, giving the values that are expected to occur in the categorical raster input files. For fractional or like-adjacency outputs, one output file will be generated from each input file for each of these values.

* This must have a key that is a member of the AggregationTypes enumeration, i.e. AggregationTypes.RESOLUTION, AggregationTypes.FACTOR, or AggregationTypes.SIZE. This key determines the resolution of the output files in one of three ways.
* The value of this key should be as follows:
    * AggregationTypes.RESOLUTION: (Float value, or string "1km", "5km" or "10km")
    * AggregationTypes.FACTOR: Int value (e.g. 5 to go from 1k rasters to 5k rasters
    * AggregationTypes.SIZE: 2-tuple specifying the (height,width) of the output rasters

* A key "resolution_name" may be provided, which provides the name for the output resolution to be used as the fifth token of the 6-token output filenames (e.g. "5km")

* A key "mem_limit_gb" may be provided, to limit the memory use (if not, 30GB will be the default). Note that it's not very accurate so be conservative!




In [5]:
aggArgs = {"categories":[0,255],
           AggregationTypes.RESOLUTION:"1km".
          "resolution_name":"1km"}

Now just instantiate and run the aggregation:

In [6]:
agg = SpatialAggregator(inCatFiles, outDir, ndvOut, stats, aggArgs)

In [None]:
agg.RunAggregation()