# Temporal aggregation

This is a demonstration of how to run temporal aggregations using the cython library code. (Or other aggregations of multiple files into one - not specifically temporal).

The code is written in cython (`raster_utilities/aggregation/temporal/core/temporal.pyx`) and a helper class `raster_utilities/aggregation/temporal/temporal_aggregation_runner.py` is provided to assist with loading the data and passing it to the core function.

This notebook demonstrates how to use the helper code.

Import the aggregation helper class

In [1]:
from raster_utilities.aggregation.temporal.temporal_aggregation_runner import TemporalAggregator


## 1. Define the aggregations

The "temporal" aggregation is controlled by a dictionary in which the keys are the required output aggregation names (years, calendar months, etc) and the values are a list of files corresponding to that period. A given input file can appear in more than one output aggregation (e.g. you can have keys for real months and for synoptic months).

Here we show a couple of ways to build that object

In [2]:
import glob

In [4]:
import os
from collections import defaultdict

#### MODIS 8-daily images to dynamic monthly, dynamic annual, and synoptic monthly outputs:

In [3]:
inFilePattern = r'J:\MOD11A2_Gapfilled_Output\LST_Night\5km\8-Daily\*_Mean.tif'
#inFilePattern = r'C:\Temp\dataprep\brazil\1km_daily\evi\*_Data.tif'
inFiles = glob.glob(inFilePattern)

Build the dictionary based on extracting the date from the filenames - this will need changing to suit the filename patterns being used and the type of outputs we want (annual, monthly, synoptic months?) 

We can use a defaultdict rather than a real dict which simplifies the loop a bit.

The dictionary key will be used to create the output filenames so you might want to alter the strings slightly to make them more informative.

In [5]:
def buildMODISKeyFromDailies(tag, fileList, doMonthly = True, doAnnual = True, doSynoptic = True, doOverall = False):
    daymonths = {1:1, 9:1, 17:1, 25:1, 33:2, 41:2, 49:2, 57:2, 65:3, 73:3, 81:3, 89:3, 97:4, 
             105:4, 113:4, 121:4, 129:5, 137:5, 145:5, 153:6, 161:6, 169:6, 177:6, 185:7, 
             193:7, 201:7, 209:7, 217:8, 225:8, 233:8, 241:8, 249:9, 257:9, 265:9, 273:9, 
             281:10, 289:10, 297:10, 305:10, 313:11, 321:11, 329:11, 337:12, 345:12, 353:12, 
             361:12}
    processingKey = defaultdict(list)
    if tag is not None:
        if not tag.endswith("."):
            tag = tag + "."
    for fn in fileList:
        parts = os.path.basename(fn).split('_')
        dateStr = parts[0]
        yr = dateStr[1:5]
        daynum = int(dateStr[5:8])
        monthStr = str(daymonths[daynum]).zfill(2)
        if doMonthly:
            yrMonth = yr + "." + monthStr
            if tag is not None:
                outkey = tag + yrMonth
            else:
                outkey = yrMonth
            processingKey[outkey].append(fn)
        if doAnnual:
            if tag is not None:
                outkey = tag + str(yr) + ".Annual"
            else:
                outkey = str(yr) + ".Annual"
            processingKey[outkey].append(fn)
        if doSynoptic:
            if tag is not None:
                outKey = tag + "Synoptic." + monthStr
                outKeyOverall = tag + "Synoptic.Overall"
            else:
                outKey = "Synoptic." + monthStr
                outKeyOverall = "Synoptic.Overall"
            processingKey[outKey].append(fn)
            if doOverall:
                processingKey[outKeyOverall].append(fn)
    return processingKey
            

In [6]:
# build a dictionary keyed by year, to create annual outputs
tag = "LST_Night"
fileKey = buildMODISKeyFromDailies(tag, inFiles, False, False, True)

#### CHIRPS monthlies to dynamic annual outputs:

In [6]:
def buildBasicKey(fileList):
    processingKey = defaultdict(list)
    for fn in inFiles:
        parts = os.path.basename(fn).split('.')
        yr = parts[1]
        outkey = "CHIRPS."+yr
        processingKey[outkey].append(fn)
    return processingKey

## 2. Other setup

We also need to specify the output folder, the output nodata value, and whether we want to create a synoptic (overall) output too (this doubles memory use so don't do unless you need it).

In [7]:
outDir = r"G:\modis\mod11a2_v5\Night_Synoptic_From_5KDaily"
outNDV = -9999
doSynoptic = True

Finally we need to specify which stats to do, what's appropriate will depend on the data. For rainfall we just want a sum.
The values must be specified as a list of values from the TemporalAggregationStats class. You can also use TemporalAggregationStats.ALL.

In [8]:
from raster_utilities.aggregation.aggregation_values import TemporalAggregationStats

In [9]:
# use the string value of the enums e.g.
stats = [ 'mean', 'SD', 'count']
# or enum objects e.g.
# stats = [TemporalAggregationStats.MEAN, TemporalAggregationStats.RANGE]

## 3. Running 

Now we just need to instantiate the class and run the aggregation. 
The runner should automatically handle splitting the processing into tiles if the files are too large to fit into memory, although currently it estimates this based on assuming it can use ~40GB RAM so you might need to tweak it directly. Intermediate processing tiles are not automatically deleted at present.

In [10]:
agg = TemporalAggregator(fileKey, outDir, outNDV, stats, doSynoptic)

In [None]:
agg.RunAggregation()

Current status as of 27/2:
Night ones: data are correct except synoptic count is doubled
Day ones: all synoptic have been double counted so SDs need adjusting or repeating