# Aggregation (spatial downsampling)

### Categorical aggregation

The core aggregation code is written in Cython, in raster_utilities.aggregation.spatial.core.categorical.pyx. 

A helper class raster_utilities.aggregation.spatial.SpatialAggregator is provided to manage calling the Cython code.

This notebook demonstrates using the helper class to aggregate a series of categorical-type raster files, for example to convert the MODIS 30 arcsecond (~1km) grids into 2.5 arcminute (~5km) grids.

Categorical rasters can be aggregated to produce one class proportion and one like adjacency grid for each of the input values, and a single majority grid. This assumes (requires) that the input grids have a small-ish number of unique values, and are in unsigned 8 bit integer format (it was written for the MCD12Q1 BRDF landcover data).

The code has been written to read input rasters of theoreticlly unlimited size, which are read in tiles to build up the output coarser / smaller grids; memory use is determined by the size of the output files (and the number of categories, i.e. output files that are created).

In [1]:
# The helper class
from raster_utilities.aggregation.spatial.SpatialAggregator import SpatialAggregator

In [2]:
# Enumerations to provide acceptable values for the aggregation parameters,
# avoid having to remember strings
from raster_utilities.aggregation.aggregation_values import *
from raster_utilities.utils.logger import LogLevels

In [3]:
import glob

### Run a categorical aggregation across a series of files in a folder


In [4]:
# The files to be aggregated should be provided as a list of filepaths. 
# (Just make a single-item list for one file)
inCatFiles = glob.glob(r'C:/Temp/dataprep/IGBP/500m/IGBP_Landcover.*.Annual.Data.500m.Data.tif')

# Also provide the output folder
outDir = r'C:\Temp\dataprep\IGBP\5km_nopar'

Specify the output nodata value (it doesn't have to be the same as the input, incoming NDV will be read from the files (better be set properly!)

In [5]:
ndvOut = -9999

Specify the aggregation statistics to create. This must be a list of items from the ContinuousAggregationStats enumeration, or their string representations.

In [9]:
#e.g.
# stats = [CategoricalAggregationStats.FRACTIONS]

# or to do all of them use this convenienct:
stats = CategoricalAggregationStats.ALL.value
#aggArgs = {"categories":range(0,3),
#          "resolution":100.0}

Finally configure the aggregation. The final parameter for the SpatialAggregator constructor should be a dictionary that configures how the aggregation will run. 

* This must have a key "`categories`". The value of this key is either:
    * A list of 8-bit integer values, giving the values that are expected to occur in the categorical raster input files. 
    * A dictionary with keys that are 8-bit integer values and values that are labels for these values, for hte output filenames.
    * For fractional or like-adjacency outputs, one output file will be generated from each input file for each of these values.

* There should be a key '`aggregation_type`' that is a member of the AggregationTypes enumeration, i.e. `AggregationTypes.RESOLUTION`, `AggregationTypes.FACTOR`, or `AggregationTypes.SIZE`. 
* There should be a key 'aggregation_specifier' that determines the output cell size in a manner dependent on the 
value of aggregation_type as follows:
    * `aggregation_type==AggregationTypes.RESOLUTION`: (Float value, or string "1km", "5km" or "10km")
    * `aggregation_type==AggregationTypes.FACTOR`: Int value (e.g. 5 to go from 1k rasters to 5k rasters
    * `aggregation_type==AggregationTypes.SIZE`: 2-tuple of positive ints specifying the (height,width) of the output rasters
* A key "`resolution_name`" may be provided, which provides the "friendly name" for the output resolution to be used as the fifth token of the 6-token output filenames (e.g. "5km")
* A key "`mem_limit_gb`" may be provided, to limit the memory use (if not provided, 30GB will be the default). Note that it's not very accurate so be conservative!
* A key "`assume_correct_input`" may be provided; if this is "`False`" (by default if not provided) then the input data will be snapped and aligned to a mastergrid template first, before calculating the properties of the output raster
* A key "`sanitise_resolution`" may be provided; if this is "`True`" (by default if not provided) then the output resolution (whether provided numerically or calculated) will be "sanitised" to a mastergrid resolution i.e. a value that divides cleanly into 1.0. For example 0.0083334 would become 0.008333333333333 (1/120)).
* A key "`snap_alignment`" may be provided; if this is `SnapTypes.NEAREST` (the default if not provided) or `SnapTypes.TOWARDS_ORIGIN` then the origin point of the output will be positioned precisely at the top left corner of a cell in a global grid of the requested resolution. Because this potentially moves the extreme (bottom right) point of the output towards the origin such that it is inside the bottom right of the input data, an extra cell will be added if necessary to the output extent to accommodate the full input extent.


Now just instantiate and run the aggregation:

In [1]:
# The values that the input raster has, with names: (here the MODIS IGBP landcover values)
ibgpCats = {
    0:'Unclassified',
    1:'Evergreen_Needleleaf_Forest',
    2:'Evergreen_Broadleaf_Forest',
    3:'Deciduous_Needleleaf_Forest',
    4:'Deciduous_Broadleaf_Forest',
    5:'Mixed_Forest',
    6:'Closed_Shrublands',
    7:'Open_Shrublands',
    8:'Woody_Savannas',
    9:'Savannas',
    10:'Grasslands',
    11:'Permanent_Wetlands',
    12:'Croplands',
    13:'Urban_And_Built_Up',
    14:'Cropland_Natural_Vegetation_Mosaic',
    15:'Snow_And_Ice',
    16:'Barren_Or_Sparsely_Populated',
    17:'Water'
}
# or just numbers (here the values from the GUF data)
sampleBinaryCats = [0, 255]

In [7]:
args = {"categories": ibgpCats, #sampleBinaryCats
        'aggregation_type' : AggregationTypes.RESOLUTION,
        'aggregation_specifier' : "5km",
        'resolution_name' : '5km',
        'sanitise_resolution' : True,
        'snap_alignment' : SnapTypes.NEAREST,
        'assume_correct_input' : True,
        'mem_limit_gb' : 20
        }


In [10]:

agg = SpatialAggregator(inCatFiles, outDir, ndvOut, stats, args,loggingLevel=LogLevels.DEBUG)

In [None]:
agg.RunAggregation()