# Temporal aggregation

This is a demonstration of how to run temporal aggregations using the cython library code. (Or other aggregations of multiple files into one - not specifically temporal).

The code is written in cython (`raster_utilities/aggregation/temporal/core/temporal.pyx`) and a helper class `raster_utilities/aggregation/temporal/temporal_aggregation_runner.py` is provided to assist with loading the data and passing it to the core function.

This notebook demonstrates how to use the helper class, by building the input arguments that it needs and then calling its main run method.

Import the aggregation helper class:

In [71]:
from raster_utilities.aggregation.temporal.TemporalAggregator import TemporalAggregator


In [72]:
import os
from collections import defaultdict
import glob

## 1. Define the aggregations

The "temporal" aggregation is controlled by a dictionary in which the keys represent the required output aggregation points (years, calendar months, etc, or just a single key for a quick summary of "everything") and the values are a list of files corresponding to that period. 

A given input file can appear in more than one output aggregation (e.g. you can have keys for real months and for synoptic months) - each key is processed separately, except for if synoptic output is requested in which case all files mentioned in any of the dictionary items will contribute (but only once).

Here we show a couple of ways to build that object

#### Example 1: MODIS 8-daily images to dynamic monthly, dynamic annual, and synoptic monthly outputs:

This cell will build the dictionary based on extracting the date from the MODIS 8-daily filenames and will create a dictionary to output all of dynamic monthly, dynamic annual, and synoptic monthly in one pass. Note: synoptic overall we will ask the aggregator to do on-the-fly, as opposed to adding a key to the dictionary for it (containing all files: doOverall=True). No need to do both!

This could be changed to suit the filename patterns being used and the type of outputs we want (annual, monthly, synoptic months?) 

We can use a defaultdict rather than a real dict which simplifies the loop a bit.

The string keys of the dictionary will be used to create the output filenames; we pass in a string "tag" which will be used in conjunction with the date to generate these keys. You might want to alter the strings slightly to make them more informative. 

This version of the function parses filenames that are in the new 6-token filename format

In [74]:
def buildMODISKeyFromMGDailies(tag, fileList, doMonthly=True, doAnnual=True, doSynoptic=True, doOverall=False):
    # mapping of julian day to number of the calendar month
    daymonths = {1:1, 9:1, 17:1, 25:1, 33:2, 41:2, 49:2, 57:2, 65:3, 73:3, 81:3, 89:3, 97:4, 
             105:4, 113:4, 121:5, 129:5, 137:5, 145:5, 153:6, 161:6, 169:6, 177:6, 185:7, 
             193:7, 201:7, 209:7, 217:8, 225:8, 233:8, 241:8, 249:9, 257:9, 265:9, 273:9, 
             281:10, 289:10, 297:10, 305:11, 313:11, 321:11, 329:11, 337:12, 345:12, 353:12, 
             361:12}
             
    processingKey = defaultdict(list)
    if tag is not None:
        if not tag.endswith("."):
            tag = tag + "."
    for fn in fileList:
        parts = os.path.basename(fn).split('.')
        yr = parts[1]
        daynum = parts[2]
        assert len(daynum) == 3
        monthStr = str(daymonths[int(daynum)]).zfill(2)
        if doMonthly:
            yrMonth = yr + "." + monthStr
            if tag is not None:
                outkey = tag + yrMonth
            else:
                outkey = yrMonth
            processingKey[outkey].append(fn)
        if doAnnual:
            if tag is not None:
                outkey = tag + str(yr) + ".Annual"
            else:
                outkey = str(yr) + ".Annual"
            processingKey[outkey].append(fn)
        if doSynoptic:
            if tag is not None:
                outKey = tag + "Synoptic." + monthStr
                outKeyOverall = tag + "Synoptic.Overall"
            else:
                outKey = "Synoptic." + monthStr
                outKeyOverall = "Synoptic.Overall"
            processingKey[outKey].append(fn)
            if doOverall:
                processingKey[outKeyOverall].append(fn)
    return processingKey

This version parses filenames that are in the original format for the MODIS 8-daily grids (e.g. A2000049_xxx.tif)


In [21]:
def buildMODISKeyFromDailies(tag, fileList, doMonthly = True, doAnnual = True, doSynoptic = True, doOverall = False):
    #daymonths = {1:1, 9:1, 17:1, 25:1, 33:2, 41:2, 49:2, 57:2, 65:3, 73:3, 81:3, 89:3, 97:4, 
    #         105:4, 113:4, 121:4, 129:5, 137:5, 145:5, 153:6, 161:6, 169:6, 177:6, 185:7, 
    #         193:7, 201:7, 209:7, 217:8, 225:8, 233:8, 241:8, 249:9, 257:9, 265:9, 273:9, 
    #         281:10, 289:10, 297:10, 305:10, 313:11, 321:11, 329:11, 337:12, 345:12, 353:12, 
    #         361:12}
    # mapping of julian day to number of the calendar month
    daymonths = {1:1, 9:1, 17:1, 25:1, 33:2, 41:2, 49:2, 57:2, 65:3, 73:3, 81:3, 89:3, 97:4, 
             105:4, 113:4, 121:5, 129:5, 137:5, 145:5, 153:6, 161:6, 169:6, 177:6, 185:7, 
             193:7, 201:7, 209:7, 217:8, 225:8, 233:8, 241:8, 249:9, 257:9, 265:9, 273:9, 
             281:10, 289:10, 297:10, 305:11, 313:11, 321:11, 329:11, 337:12, 345:12, 353:12, 
             361:12}
             
    processingKey = defaultdict(list)
    if tag is not None:
        if not tag.endswith("."):
            tag = tag + "."
    for fn in fileList:
        parts = os.path.basename(fn).split('_')
        dateStr = parts[0]
        yr = dateStr[1:5]
        daynum = int(dateStr[5:8])
        monthStr = str(daymonths[daynum]).zfill(2)
        if doMonthly:
            yrMonth = yr + "." + monthStr
            if tag is not None:
                outkey = tag + yrMonth
            else:
                outkey = yrMonth
            processingKey[outkey].append(fn)
        if doAnnual:
            if tag is not None:
                outkey = tag + str(yr) + ".Annual"
            else:
                outkey = str(yr) + ".Annual"
            processingKey[outkey].append(fn)
        if doSynoptic:
            if tag is not None:
                outKey = tag + "Synoptic." + monthStr
                outKeyOverall = tag + "Synoptic.Overall"
            else:
                outKey = "Synoptic." + monthStr
                outKeyOverall = "Synoptic.Overall"
            processingKey[outKey].append(fn)
            if doOverall:
                processingKey[outKeyOverall].append(fn)
    return processingKey

# build a dictionary keyed by year, to create annual outputs
inFilePattern1 = r'F:\EVI\*.tif'
inFiles = glob.glob(inFilePattern1)
tag = "EVI_Unfilled_V6"
fileKey = buildMODISKeyFromDailies(tag, inFiles, doMonthly=False, doAnnual=False, doSynoptic=True, doOverall=False)


In [75]:
# build a dictionary keyed by year, to create annual outputs
inFilePattern1 = r'C:\Temp\dataprep\E8\EVI_Out\EVI_v6.*.Data.tif'
inFiles = glob.glob(inFilePattern1)
tag = "EVI_Filled_v6"
fileKey = buildMODISKeyFromMGDailies(tag, inFiles, doMonthly=True, doAnnual=True, doSynoptic=True, doOverall=False)


#### Example 2: balanced means
This cell would build a dictionary with a single key, to create a "balanced" mean from pre-existing synoptic monthly mean files (created using the cell above and subsequently renamed to the 6-token syntax). We don't pass in a tag, we extract the existing one instead.

In [35]:
def buildSynopticBalancedMeanKey(fileList):
    tag = None
    stat = None
    files = []
    for fn in fileList:
        parts = os.path.basename(fn).split(".")
        thistag = parts[0]
        synoptictag = parts[1]
        monthtag = parts[2]
        stattag = parts[3]
        if tag is None:
            tag = thistag
        if tag != thistag:
            assert False
        if synoptictag != "Synoptic":
            assert False
        try:
            i = int(monthtag)
        except:
            continue # the ".Overall" one
        if stat is None:
            stat = stattag
        if stat != stattag:
            assert False
        files.append(fn)
    outname = tag + "." + "Synoptic.Overall.Balanced-" + stat
    return {outname: files}

inFilePattern = r'C:\Temp\dataprep\EVI\EVI_Unfilled_Synoptic\EVI*.Synoptic.*.mean.*.tif'
inFiles = glob.glob(inFilePattern)
inFiles = [f for f in inFiles if len(f.split('.')[2])==2]
fileKey = buildSynopticBalancedMeanKey(inFiles)

#### Example 3: CHIRPS monthlies to dynamic annual outputs:

In [None]:
def buildBasicKey(fileList):
    processingKey = defaultdict(list)
    for fn in inFiles:
        parts = os.path.basename(fn).split('.')
        yr = parts[1]
        outkey = "CHIRPS."+yr
        processingKey[outkey].append(fn)
    return processingKey

#### Example 4: 
just some kind of one-off thing, make a single output by definining the files against a one key dictionary

In [None]:
files = glob.glob(r'J:\Temp_Suitability\5k\Pf\monthly_pf\*.2002.*.tif')
fileKey = {"test-2002": files}
fileKey

## 2. Other setup

We also need to specify the output folder, the output nodata value, and whether we want to create a synoptic (overall) output too (this doubles memory use so don't do unless you need it).

In [77]:
#outDir = r"G:\modis\mcd43b4_v5\TCW_Synoptic_From_5KDaily"
outDir = r"C:\Temp\dataprep\E8\EVI_Out_Summaries"
outNDV = -9999
doSynoptic = True
#outDir = r'J:\Temp_Suitability\5k'

Finally we need to specify which stats to do, what's appropriate will depend on the data. For rainfall we just want a sum.
The values must be specified as a list of values from the TemporalAggregationStats class. You can also use TemporalAggregationStats.ALL

In [78]:
from raster_utilities.aggregation.aggregation_values import TemporalAggregationStats

In [80]:
# use the string value of the enums e.g.
stats = [ 'mean', 'count', 'SD', 'max', 'min']

# or enum objects e.g.
# stats = [TemporalAggregationStats.MEAN, TemporalAggregationStats.RANGE]
#stats = [TemporalAggregationStats.MEAN, TemporalAggregationStats.MAX, TemporalAggregationStats.MIN, TemporalAggregationStats.SD]

## 3. Running 

Now we just need to instantiate the class and run the aggregation. 
The runner should automatically handle splitting the processing into tiles if the files are too large to fit into memory, although currently it estimates this based on assuming it can use ~40GB RAM so you might need to tweak it directly. Intermediate processing tiles are not automatically deleted at present.

In [82]:
agg = TemporalAggregator(fileKey, outDir, outNDV, stats, doSynoptic)

In [None]:
agg.RunAggregation()