## Introduction

This notebook contains code written to calculate "synoptic" i.e. across-time summary statistic rasters (mean, s.d. and count) from a set of time-series rasters. 

It outputs a grid for each calendar month and an overall grid, for each statistic.

It was primarily written to calculate summaries from MODIS 8-daily files where the timestamp of each file is embedded in the filename as a year and julian day. The code can be modified easily enough to work with timestamps in other formats.

The actual heavy-lifting code is in a Cython module that should be available and compiled first.

The advantages of using this code over just doing it in ArcMap are that it should be quicker, calculations are multithreaded, and a numerically-robust method is used for calculating the statistics.

In [98]:
import glob
import numpy as np
from osgeo import gdal
import os
from collections import defaultdict

In [99]:
from General_Raster_Funcs.RasterTiling import getTiles
from General_Raster_Funcs.TiffManagement import *

The calculation code is implemented in an external class written in Cython. It processes data in parallel and tracks both data for a given month (or other subset) as well as an overall value. You will need to compile and install this first by running the following command in the MODIS_Raster_Funcs folder:

`python setup.py build_ext --inplace`

This assumes that you have Cython installed and also a suitable C compiler, which can be a slight mission on windows.

In [100]:
from MODIS_Raster_Funcs.SynopticData import MonthlyStatCalculator

## Set up file locations - edit then run each cell

e.g.

In [8]:
inBaseDir = r'F:\MCD43B4_Gapfilled_Output\EVI\Output_Final_30k_2030pc_FixedMean'
genericFilePattern = r"{0}\*_{1}.tif"
tag = "Filled_Data"
what = "EVI"

or

In [101]:
#inBaseDir = r'E:\Temp\tsmodel\5km\runs\-180W-180E-60N--60S-1024px-martens2\mosaic'
inBaseDir = r'E:\Temp\tsmodel\1km\ts_1k_global_martens2\global_mosaic'
genericFilePattern = r"{0}\*_{1}.tif"
#tag = "5km_global_martens2"
tag = "1km_global_martens2"
what = "TS_New_1km_Martens2"

In [102]:
tileDir = r"C:\temp\test\tiles_1k"
outDir = r"C:\temp\test\merged_1k"

In [103]:
# Specify the height of each tile - depends on available memory.
# The algorithm will work in slices that are this height and full-width,
# so the algorithm needs around idealSlice * fullWidth * 80 bytes of RAM
# Thus with global 1k images (43200px wide), a slice of 7168 (high) needs 
# around 25Gb RAM. Choose based on your PC's RAM.

# The rasters have tilesize 256 (or a multiple thereof) so pick a size
# that is a multiple of this where possible for most efficient access
# (not a big deal, though)
idealSlice = 7200

# alter to suit the images
fullWidth = 43200
fullHeight = 14400

# alter output no data value to whatever you want: not required to be the same as the inputs
outNdv = -9999


### Run the following cells unaltered to configure the remaining inputs

The code will iterate over a dictionary called monthDays where the key is a month number and the value is a list of the julian day numbers (in the case of MODIS data or other data numbered by Julian day) or other day-identifiers (specifically monthnum-daynum) for which there is data in that month. 

#### For MODIS specifically:

For example {1:[1,9,17,25],...,12:[337,345,353,361]} for MODIS data with filenames like A2009165.tif


For aggregating the MODIS 8-daily files, the filenames are coded with year and julian day. We need to map the julian day to the month:

In [25]:
# build a dictionary mapping day of year to month of year, only required for the 
# day numbers that the 8-daily MODIS data occurs on
# generate this in excel with =CONCATENATE(DAYNUM,":",MONTH(DAYNUM),", ")
daymonths = {1:1, 9:1, 17:1, 25:1, 33:2, 41:2, 49:2, 57:2, 65:3, 73:3, 81:3, 89:3, 97:4, 
             105:4, 113:4, 121:4, 129:5, 137:5, 145:5, 153:6, 161:6, 169:6, 177:6, 185:7, 
             193:7, 201:7, 209:7, 217:8, 225:8, 233:8, 241:8, 249:9, 257:9, 265:9, 273:9, 
             281:10, 289:10, 297:10, 305:10, 313:11, 321:11, 329:11, 337:12, 345:12, 353:12, 
             361:12}

    

Build a dictionary to map a julian day-of-year to all the filenames available (across years) for that julian day.

For the MODIS data we will do this based on month and then day i.e. the summary code will iterate over monthDays created above and then read the day files for each day in each month:

In [9]:
# build a list of MODIS files available for each day-of-year, based on the 
# year / julian day that's encoded in the filenames such as "A2015009_LST_Day.tif"
years = defaultdict(int)
days = defaultdict(int)
dayfiles = defaultdict(list)

# swap to build list of days for each month
monthDays = defaultdict(list)
for d,m in daymonths.iteritems():
    monthDays[m].append(d)
    
allFilesDict = defaultdict(list)
#for fn in glob.glob(inFilePattern):
for fn in glob.glob(genericFilePattern.format(inBaseDir, tag)):
    datestr = os.path.basename(fn).split('_')[0][1:]
    yr = int(datestr[:4])
    years[yr] +=1
    day = int(datestr[4:])
    days[day] +=1
    month = daymonths[day]
    dayfiles[day].append(fn)
    allFilesDict[str(yr)].append(fn)  

#### For other data: build similar data structures

For example {1:['1-1','1-15'],...12:['12-1','12-15']} for some other data with filenames like TSI_20091215.tif


In [104]:
# build a list of MODIS files available for each day-of-year, based on the 
# year / julian day that's encoded in the filenames such as "A2015009_LST_Day.tif"
years = defaultdict(int)
days = defaultdict(int)
dayfiles = defaultdict(list)

# swap to build list of days for each month
monthDays = defaultdict(list)
for m in range(1,13):
    for d in range(1,2):
        monthDays[m].append(str(m)+'-'+str(d))
    
allFilesDict = defaultdict(list)
for fn in glob.glob(genericFilePattern.format(inBaseDir, tag)):
    datestr = os.path.basename(fn).split('_')[0]
    yr = int(datestr[:4])
    years[yr] +=1
    #day = int(datestr[4:])
    month = int(datestr[4:6])
    day = int(datestr[6:])
    monthday = str(month)+"-"+str(day)
    days[monthday] +=1
    #month = daymonths[day]
    dayfiles[monthday].append(fn)
    allFilesDict[str(yr)].append(fn)

### configure how the processing will run

In [106]:
globalGT = None
globalProj = None
stats = ['Count', 'Mean', 'SD']
stats = ['Mean']
# work out the tiles we'll work in. We'll work with full-width slices for 
# now. 
slices = sorted(list(set([s[1] for s in getTiles(fullWidth, fullHeight, idealSlice)])))

write a function called fnGetter that will return the required output file name for the given descriptor (what), timespan (when), summary type (stat) and where (global, africa, top coordinate of slice, etc)

In [95]:
fnGetter = lambda what, when, stat, where:(
    "_".join([str(what), str(when), str(stat), str(where)]) + ".tif")

#### run the next two cells to define the functions that actually read the source data and call the summarisation code

In [89]:
def synopticSliceRunner(top, bottom, width, outputNDV):
    assert (isinstance(bottom,int) and isinstance(top,int)
        and bottom > top)
    
    if not monthDays or not dayfiles or not fnGetter:
        print "Notebook globals monthDays, dayfiles, and fnGetter must be defined first"
        return False
    sliceHeight = bottom - top
    statsCalculator = MonthlyStatCalculator(sliceHeight, width, outputNDV)
    sliceGT = None
    sliceProj = None
    print str((top,bottom))
    for month, days in monthDays.iteritems():
        # for each calendar day of this synoptic month 
        print "\tMonth "+str(month)
        for day in days:
            # for each file on this calendar day (i.e. one per year)
            print"\t\tDay "+str(day)
            for dayfile in dayfiles[day]:
                # add slice
                data, myGT, myProj, thisNdv = ReadAOI_PixelLims(dayfile, None, (top, bottom))
                if sliceGT is None:
                    sliceGT = myGT
                    sliceProj = myProj
                else:
                    assert sliceGT == myGT
                    assert sliceProj == myProj
                # add the data to the running calculator
                statsCalculator.addFile(data,  thisNdv)
        # get and save the results for this synoptic month
        monthResults = statsCalculator.emitMonth()
        SaveLZWTiff(monthResults['count'], outNdv, sliceGT, sliceProj, tileDir,
                   fnGetter(what, "M" + str(month).zfill(2), "Count", top))
        SaveLZWTiff(monthResults['mean'], outNdv, sliceGT, sliceProj, tileDir,
                   fnGetter(what, "M" + str(month).zfill(2), "Mean", top))
        SaveLZWTiff(monthResults['sd'], outNdv, sliceGT, sliceProj, tileDir,
                   fnGetter(what, "M" + str(month).zfill(2), "SD", top))
    
    # get and save the overall synoptic result
    overallResults = statsCalculator.emitTotal()
    SaveLZWTiff(overallResults['count'], outNdv, sliceGT, sliceProj, tileDir,
        fnGetter(what, "Overall", "Count", top))
    SaveLZWTiff(overallResults['mean'], outNdv, sliceGT, sliceProj, tileDir,
        fnGetter(what, "Overall", "Mean", top))
    SaveLZWTiff(overallResults['sd'], outNdv, sliceGT, sliceProj, tileDir,
        fnGetter(what, "Overall", "SD", top))
    statsCalculator = None
    
    return True
        

In [46]:
def temporalSliceRunner(top, bottom, width, outputNDV, filesDict):
    assert (isinstance(bottom,int) and isinstance(top,int)
        and bottom > top)
    sliceHeight = bottom - top
    statsCalculator = MonthlyStatCalculator(sliceHeight, width, outputNDV)
    sliceGT = None
    sliceProj = None
    print str((top,bottom))
    for timeKey, timeFiles in filesDict.iteritems():
        print timeKey
        for timeFile in timeFiles:
            data, myGT, myProj, thisNdv = ReadAOI_PixelLims(timeFile, None, (top, bottom))
            if sliceGT is None:
                sliceGT = myGT
                sliceProj = myProj
            else:
                assert sliceGT == myGT
                assert sliceProj == myProj
            statsCalculator.addFile(data, thisNdv)
        periodResults = statsCalculator.emitMonth()
        SaveLZWTiff(periodResults['mean'], outNdv, sliceGT, sliceProj, tileDir,
                   fnGetter(what, str(timeKey), "Mean", top))
    overallResults = statsCalculator.emitTotal()
    SaveLZWTiff(periodResults['mean'], outNdv, sliceGT, sliceProj, tileDir,
               fnGetter(what, "Overall", "Mean", top))
    return True

# Run this cell to calculate the results and save to tiled tiffs

In [None]:
for t,b in slices:
    synopticSliceRunner(t, b, fullWidth, outNdv)
    #temporalSliceRunner(t,b,fullWidth,outNdv,allFilesDict)
        

In [None]:
[str(f)+":"+str(len(allFilesDict[f])) for f in sorted(allFilesDict.keys())]

## Run this cell to merge the tiles to global outputs and build pyramids

In [None]:
import subprocess
vrtBuilder = "gdalbuildvrt {0} {1}"
transBuilder = "gdal_translate -of GTiff -co COMPRESS=LZW "+\
    "-co PREDICTOR=2 -co TILED=YES -co SPARSE_OK=TRUE -co BIGTIFF=YES "+\
    "--config GDAL_CACHEMAX 8000 {0} {1}"
ovBuilder = "gdaladdo -ro --config COMPRESS_OVERVIEW LZW --config USE_RRD NO " +\
        "--config TILED YES {0} 2 4 8 16 32 64 128 256 --config GDAL_CACHEMAX 8000"
statBuilder = "gdalinfo -stats {0} >nul"    

vrts = []
tifs = []
if not os.path.isdir(outDir):
    os.makedirs(outDir)
# For each statistic and each month (+ overall), build a vrt file to mosaic all the slices for 
# that image together
for stat in stats:
    for month in sorted(monthDays.keys()):
        # get the filenames of all the slices for this month
        tiffWildCard = fnGetter(what, 'M'+str(month).zfill(2), stat, "*")
        sliceTiffs = os.path.join(tileDir, tiffWildCard)
        vrtName = what + "_Month_" + str(month).zfill(2) + "_" + stat + ".vrt"
        vrtFile = os.path.join(outDir, vrtName)
        vrtCommand = vrtBuilder.format(vrtFile, 
                                      sliceTiffs)
        print vrtCommand
        vrts.append(vrtFile)
        subprocess.call(vrtCommand)
    tiffWildCard = fnGetter(what, "Overall", stat, "*")
    sliceTiffs = os.path.join(tileDir, tiffWildCard)
    vrtName = what+"_Overall_" + stat + ".vrt"
    vrtFile = os.path.join(outDir, vrtName)
    vrtCommand = vrtBuilder.format(vrtFile, 
                                      sliceTiffs)
    print vrtCommand
    vrts.append(vrtFile)
    subprocess.call(vrtCommand)
# Translate each of the vrts into a tiff
for vrt in vrts:
    tif = vrt.replace('vrt', 'tif')
    transCommand = transBuilder.format(vrt, tif)
    print transCommand
    tifs.append(tif)
    subprocess.call(transCommand)
# Build overviews and statistics on all of the output tiffs
for tif in tifs:
    ovCommand = ovBuilder.format(tif)
    statCommand = statBuilder.format(tif)
    print ovCommand
    subprocess.call(ovCommand)
    print statCommand
    subprocess.call(statCommand)



# Run the same code to generate "balanced means"

The balanced mean is the mean of the monthly means, as compared to the overall mean we calculated above which is the mean of all the individual (8-)daily values.

Some areas in the world are less likely to have data (due to clouds) at certain times of year. The overall synoptic mean is therefore skewed towards the values experienced during periods when there are fewer clouds. (We might have 50 values recorded in a July and only 5 values recorded in a December: the overall mean will be dominated by July values). 

By taking the mean of the monthly means, we effectively increase the weight given to days from "rarer" periods, because each month is treated equally (so the 5 \* December readings are collectively contributing as much to the balanced mean as the 50 \* July readings)

We just use the same Cython library as before, only slightly differently

###### Generate the "balanced" mean as the mean of the 12 monthly bands

In [None]:
# assuming we've run the code above and the output files from before are in the list called tifs

fileListMonths = [t for tif in tifs if t.startswith('Month') and t.endswith('Mean')]
assert len(fileListMonths) == 12
for (top, bottom) in slices:
    sliceHeight = bottom - top
    statsCalculator = MonthlyStatCalculator(sliceHeight, width, outputNDV)
    sliceGT = None
    sliceProj = None
    print str((top, bottom)) 
    for monthfile in fileListMonths:
        data, myGT, myProj, thisNdv = ReadAOI_PixelLims(monthfile, None, (top,bottom))
        if sliceGT is None:
            sliceGT = myGT
            sliceProj = myProj
        else:
            assert sliceGT == myGT
            assert sliceProj == myProj
        # calculate the mean of the months, use a fixed value for the "month" as we're not 
        # wanting monthly output from the calculator this time
        statsCalculator.addFile(data, 1, thisNdv)
    balancedRes = statsCalculator.emitTotal()
    SaveLZWTiff(balancedRes,['count'], outNdv, sliceGT, sliceProj, outDir,
               fnGetter(what, "Count_Of_Months", "", top))
    SaveLZWTiff(balancedRes,['mean'], outNdv, sliceGT, sliceProj, outDir,
               fnGetter(what, "Mean_Of_Months", "", top))
    SaveLZWTiff(balancedRes,['sd'], outNdv, sliceGT, sliceProj, outDir,
               fnGetter(what, "SD_Of_Months", "", top))
    statsCalculator = None


