# Extracting mastergrid-aligned covariates from Earth Engine

This means covariates which are at exactly 30 arcsecond ("1km") / 2.5 arcminute ("5km") resolutions, or clean multiples/fractions of this, and with global extent or pixels that are aligned to those of a global grid (one with origin at -180, 90)

## First install and authenticate the Earth Engine python API.

This needs to be done each time for running in colab, as it's a new runtime each time. Don't run these cells if running in a local notebook server.

Ensure that you authenticate using a google account which has access to Earth Engine (this is most likely your personal account rather than your MAP G-Suite account)

In [4]:
!pip install earthengine-api

Collecting earthengine-api
[?25l  Downloading https://files.pythonhosted.org/packages/df/5a/11b0ccdee986474f2b8ec6d2730a6ce883eeb37f4e5fc16311964b0e916a/earthengine-api-0.1.180.tar.gz (135kB)
[K     |████████████████████████████████| 143kB 4.9MB/s 
Building wheels for collected packages: earthengine-api
  Building wheel for earthengine-api (setup.py) ... [?25l[?25hdone
  Stored in directory: /root/.cache/pip/wheels/57/01/bd/c8c309e42c1d463e475c78fc3393b749e5a643bbb39f367a0b
Successfully built earthengine-api
Installing collected packages: earthengine-api
Successfully installed earthengine-api-0.1.180


In [5]:
!earthengine authenticate

Running command using Cloud API.  Set --no-use_cloud_api to go back to using the API
Opening the following address in a web browser:

    https://accounts.google.com/o/oauth2/auth?client_id=517222506229-vsmmajv00ul0bs7p89v5m89qs8eb9359.apps.googleusercontent.com&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fearthengine+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdevstorage.full_control&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&response_type=code

Please authorize access to your Earth Engine account, and paste the generated code below. If the web browser does not start, please manually browse the URL above.

Please enter authorization code: 4/YAFYly9Ks8fDm0SneAgN3Bq3HY6G28oahQC4mTbAR0lzt3SuaMzXmHU

Successfully saved authorization token.


## Import the EE API and check the install 

Check that everything works correctly by referencing an image from Earth Engine

In [1]:
# Import the Earth Engine Python Package
import ee

# Initialize the Earth Engine object, using the authentication credentials.
ee.Initialize()

# Print the information for an image asset.
image = ee.Image('srtm90_v4')
print(image.getInfo())

{'type': 'Image', 'bands': [{'id': 'elevation', 'data_type': {'type': 'PixelType', 'precision': 'int', 'min': -32768, 'max': 32767}, 'dimensions': [432000, 144000], 'crs': 'EPSG:4326', 'crs_transform': [0.000833333333333, 0.0, -180.0, 0.0, -0.000833333333333, 60.0]}], 'version': 1494271934303000, 'id': 'srtm90_v4', 'properties': {'system:time_start': 950227200000, 'system:time_end': 951177600000, 'system:asset_size': 18827626666}}


## Define some globals for our setup in MAP


In [342]:
# Resolutions for our lat/lon "mastgergrids, defining what they get informally 
# referred to as vs their actual resolution in degrees. The keys are used in 
# the mastergrid filename syntax
mgResolutions = {
    "30m"  : 1.0/3600.0,
    "100m" : 1.0/1200.0,
    "500m" : 1.0/240.0,
    "1km"  : 1.0/120.0,
    "5km"  : 1.0/24.0,
    "10km" : 1.0/12.0,
    "Unchanged" : None
}

# The main extents that we use / will want to export at, include any data specific 
# ones here for datasets that aren't global, in particular less than +- 90deg latitude
mgRegions = {
    "nearlyGlobal": ee.Geometry.Rectangle(**{
        'coords'   : [-180, -60, 180, 85],
        'geodesic' : False,
        'proj'     : 'EPSG:4326'
    }),
    "global": ee.Geometry.Rectangle(**{
        'coords'   : [-180, -90, 180, 90],
        'geodesic' : False,
        'proj'     : 'EPSG:4326'
    }),
    "viirs-extent": ee.Geometry.Rectangle(**{
        'coords'   : [-180, -65, 180, 75],
        'geodesic' : False,
        'proj'     : 'EPSG:4326'
    }),
    "viirs-extent-7560": ee.Geometry.Rectangle(**{
        'coords'   : [-180, -60, 180, 75],
        'geodesic' : False,
        'proj'     : 'EPSG:4326'
    })
    ,"mapAfrica" : ee.Geometry.Rectangle(**{
        'coords'   : [-18, -35, 52, 37.5],
        'geodesic' : False,
        'proj'     : 'EPSG:4326'
    })
}

# The different timesteps at which we store temporal aggregations of dynamic 
# datasets. These strings will just be used to build filenames
mgTimesteps = ["Monthly", "Annual", "Synoptic_Overall", "Synoptic"]

# The different continuous aggregation types that are available; matches the keys 
# returned by the spatialSummaries_Continuous function and will be used directly 
# in filename generation. 'Data' means no aggregation
continuousAggregationStats = ['min', 'max', 'mean', 'SD', 'Data']
categoricalAggregationStats = ['majority', 'fraction', 'Data'] # 'like'

# The cloud storage bucket to which we should export. The user account used to authenticate 
# to Earth Engine must have at least write-access to this bucket, so set up your own one 
# if necessary.
EXPORT_BUCKET = 'map-ee-outputs'


# The maximum dimensions for an exported file before breaking it into tiles. 
# This random-seeming pair of numbers is based on a global "500m" (15 arcsecond) grid, which is 
# 86400*43200. However they need to be a multiple of 256 (the default for shardSize), so these 
# are such and a-bit-of-leeway beyond.
# Keen to avoid tiling when possible to reduce faff (e.g. would have to regenerate pyramids locally) 
# but beyond a certain size it becomes unreliable. Global 30 arcsecond grids seem to mostly 
# work but global 15 arcsecond ones only occasionally do.

#MAX_DIMS = [89856,44800]    

MAX_DIMS = [43264,21760]

## Define EE processing functions

### Spatial aggregation of continuous-type data

For such data we might want to perform spatial aggregation based on min / max / mean / SD 

Note the hardcoded nCells - we want to force Earth Engine to use all data pixels in producing an 
output pixel. This value tells it the max number of inputs per output to look for. It would be 
unlikely that we'd want to aggregate say 30m data to 1km or coarser.

In [36]:
def spatialSummaries_Continuous(img):
    # Because of lazy evaluation we specify the means by which an image _will_ be 
    # reduced when actually required at a lower resolution, but we don't specify 
    # any particular resolution at this point. So the function simply returns them 
    # all, and they will only be made flesh if and when exported.
    minRed = ee.Reducer.min()
    maxRed = ee.Reducer.max()
    meanRed = ee.Reducer.mean()
    sdRed = ee.Reducer.sampleStdDev()
    
    nCells = 20 * 20 * 1.05
    return {
        "min"  : img.reduceResolution(minRed, False, nCells),
        "max"  : img.reduceResolution(maxRed, False, nCells),
        "mean" : img.reduceResolution(meanRed, False, nCells),
        "SD"   : img.reduceResolution(sdRed, False, nCells),
        "Data" : img
    }
    
    

### Spatial aggregation of categorical-type data

For such data in MAP we produce:
* A single grid with the modal input value at each output pixel
* A grid for each discrete input value, giving at each output pixel the percentage of input pixels that had that value
* A grid for each discrete input value, giving at each output pixel the "like-adjacency" of the input pixels that had that value. This is a measure of how many of the 8 neighbours of each pixel have the same value as the pixel itself. We calculate this for each source pixel, and then take the mean value across the output pixels. We have not yet implemented this in Earth Engine and will probably instead do this using the entropy() function if needed, but for now I don't think these grids ever really get used.

Note that as well as nCells, defined as before, we also hardcode a maximum number of categories that the input data can have (and thus the number of output percentage grids there will be)

In [205]:
def spatialSummaries_Categorical(img, categoryMapping = None):#categoryValues=None, categoryNames=None):
    '''Produces a majority (mode) value aggregation as well as one percentage-cover aggregation 
    for each specified pixel value.
    img = the ee.Image to aggregate, should be categorical data for results to be meaningful
    categoryMapping = optional, a list of 2-tuples which are (value, categoryName).
    If categoryMapping is provided then output will contain a "Fractions" item, the value of 
    which is another dictionary, mapping 
        categoryName:(ee.Image giving the fractional cover of that value).
    Output will always contain a "Majority" item, and a "Data" item which is the unmodified image 
    (and will give nearest-neighbour aggregation on export).'''
    
    nCells = 20 * 20 * 1.05
    modeRed = img.reduceResolution(ee.Reducer.mode(maxRaw=100), False, nCells)
    outObj = {
        "majority" : modeRed,
        "Data": img
    }
    if categoryMapping is not None:
        if len(categoryMapping) < 2 or len(categoryMapping)>100:
            raise ValueError("If categories are provided for proportional outputs, there must be 2<=n<=100 categories")
        # assuming we have a list of [(k,v),(k2,v2)] then this will give [(k,k2),(v,v2)] and immediately unpack it 
        categoryValues, categoryNames = list(zip(*categoryMapping))
        binaryImg = ee.Image(categoryValues)
        filteredClassProps = (img.eq(binaryImg).rename(categoryNames)
                           .reduceResolution(ee.Reducer.mean(), False, nCells))
        fractions = {}
        for categoryName in categoryNames:
            fractions[categoryName] = filteredClassProps.select(categoryName)
        outObj["fractions"] = fractions
    return outObj

To export cleanly-aligned grids we will use the crs and crsTransform parameters to the EE export function. The crsTransform is the same as the "geotransform" as reported/used by GDAL; it needs the location of the origin (top left corner) of the grid. To programatically get the origin in client-side numbers from an EE geometry is a bit of a faff, we could just code them manually but here we go.

In [5]:
# seems to be a ridiculous amount of effort just to get the flippin origin points numerically but whatev
# The problem is that we don't know how many levels of nesting there will be in coords
# depending on whether it's multipart or not so we need to recursively flatten the list
def getOriginXY(eeGeom):
    from collections import Iterable
    def flatten(items):
        for x in items:
            if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
                for sub_x in flatten(x):
                    yield sub_x
            else:
                yield x
    rawCoords = eeGeom.getInfo()['coordinates']
    allCoords = list(flatten(rawCoords))
    xCoords = allCoords[::2]
    yCoords = allCoords[1::2]
    minX = min(xCoords)
    maxY = max(yCoords)
    return (minX, maxY)

Define a function to create the necessary parameters to create one export Task for each image in a given collection, giving the exported images a name appropriate for the MAP mastergrid filename syntax.

In [246]:
def getFilenameTimeTags(img, mgTimeStep):
    # build the year/month parts of the filename
    if mgTimeStep.startswith("Synoptic"):
        imgYearTag = "Synoptic"
    else:
        imgYearTag = ee.Date(img.get('system:time_start')).get('year').getInfo()
    if mgTimeStep == "Annual":
        imgMonthTag = "Annual"
    elif mgTimeStep == "Synoptic_Overall":
        imgMonthTag = "Overall"
    else:
        # this assumes that synoptic-monthly data have a proper date set i.e. 
        # with an arbitrary year
        imgMonthTag = str(ee.Date(img.get('system:time_start')).get('month').getInfo()).zfill(2)
    return(imgYearTag, imgMonthTag)
 

In [335]:
def getCollExportParams_Raw(collection, mgTimeStepName, timeSummaryName, 
                             resolutionName, variableName, regionGeom):
    '''
    collection: an ee.ImageCollection, which must already be aggregated to monthly or annual or whatever
    mgTimeStepName: a string representing the timestep as it should be shown in the filenames, e.g. "Monthly"
    timeSummaryName: a string representing how the images of this timestep were created, e.g. if the images 
        are monthly and are the mean of daily data then "mean", or if they are monthly from monthly data 
        then "Data"
    resolutionName: a string representing the mastergrid-named resolution as it should be shown in the filenames 
        corresponding to a specific decimal degrees resolution
    varName: for the first part of the filenames
    regionGeom: an ee.Geometry(.Rectangle, preferably)
    aggregations: 
    '''
    fnTemplate = "{0!s}.{1!s}.{2!s}.{3!s}.{4!s}.{5!s}"
    if mgTimeStepName not in mgTimesteps:
        raise ValueError("specified timestep string isn't valid")
    #if resolutionName not in mgResolutions:
    #    raise ValueError("specified resolution string isn't valid")
    
    exportOrigin = getOriginXY(regionGeom)
    #exportRes = mgResolutions[mgResolutionName] # None if mgResolutionName=="Unchanged"
    
    nImages = collection.size().getInfo()
    collectionList = collection.toList(nImages) # necessary to allow client-side iterating over them
    exportImages = {}
        
    for i in range(nImages):
        img = ee.Image(collectionList.get(i))
        imgYearTag, imgMonthTag = getFilenameTimeTags(img, mgTimeStepName)
      
        # export at the image's own nominal scale, but reprojected (if not already) to EPSG:4326 
        # Any resampling needed to do this will be done by the EE default method of nearest-neigbour.
        # This doesn't necessarily give us exactly what we want for our mastergrid alignments.
        # No spatial aggregation, just one image out for each image in.
        imgTrans = img.projection().getInfo()['transform']
        existingResX = imgTrans[0]
        existingResY = imgTrans[4]
        img = img.reproject(ee.Projection('EPSG:4326'), None, scale=existingResX)
        affine = str([existingResX, 0.0, exportOrigin[0], 0, existingResY, exportOrigin[1]])
        if variableName is None:
            varNameOut = aggImg.get('varname').getInfo()
        else:
            varNameOut = variableName
            
        fileName = fnTemplate.format(varNameOut, imgYearTag, imgMonthTag, 
                                     timeSummaryName, resolutionName, "Data")
        taskDesc = fileName.replace('.', '_')

        exportParams = {
            'image':img,
            'fileNamePrefix': fileName,
            'description':taskDesc,
            'bucket':EXPORT_BUCKET,
            'crs':'EPSG:4326',
            'crsTransform':affine,
            'region':regionGeom.getInfo()['coordinates'],
            'maxPixels':4e9,
            'formatOptions':{
                'cloudOptimized':True  # ensures internal tiling and creates pyramids
            },
            # this random-seeming pair of numbers are a multiple of 256 (the default for shardSize)
            # that's a-bit-of-leeway bigger than a global 500m grid, which is 86400*43200. This 
            # ensures that images up to this resolution will be exported as a single file, so no need 
            # to faff with mosaicing and re-pyramiding the downloads
            'fileDimensions':MAX_DIMS
        }
        if fileName in exportImages:
            raise ValueError("{} would be a duplicate filename, check collection is what you intended"
                             .format(fileName))
        exportImages[fileName] = exportParams

    return exportImages



In [336]:
def getCollExportParams_CatAgg(collection, mgTimeStepName, timeSummaryName,
                              mgResolutionName, variableName, regionGeom,
                              aggregations=["Data"],
                               categoryMapping=None):
    fnTemplate = "{0!s}.{1!s}.{2!s}.{3!s}.{4!s}.{5!s}"
    if mgTimeStepName not in mgTimesteps:
        raise ValueError("specified timestep string isn't valid")
    if mgResolutionName not in mgResolutions:
        raise ValueError("specified resolution string isn't valid")
    
    exportOrigin = getOriginXY(regionGeom)
    exportRes = mgResolutions[mgResolutionName] # None if mgResolutionName=="Unchanged"
    
    nImages = collection.size().getInfo()
    collectionList = collection.toList(nImages) # necessary to allow client-side iterating over them
    exportImages = {}
    for i in range(nImages):
        img = ee.Image(collectionList.get(i))
        imgYearTag, imgMonthTag = getFilenameTimeTags(img, mgTimeStepName)
      
        # we are applying some level of spatial aggregation, this might mean we are 
        # deliberately aggregating e.g. to 5k outputs, but this also applies even 
        # if we are exporting a true 1km image (e.g. MODIS) to mastergrid "1km".
        # In this latter case we don't want to create and export min/max/mean/sd aggregations,
        # just a standard nearest-neighbour.
        affine = str([exportRes, 0.0, exportOrigin[0], 0, -exportRes, exportOrigin[1]])
        aggregatedImages = spatialSummaries_Categorical(img, categoryMapping)
        for spatialSummary, aggImg in aggregatedImages.items():
            if spatialSummary not in aggregations:
                continue
            if variableName is None:
                varNameOut = aggImg.get('varname').getInfo()
            else:
                varNameOut = variableName
            
            if spatialSummary == "Data":
                # User has requested a no-aggregation type summary. Check whether the requested resolution 
                # for the export is actually close to the native image resolution. If it is, then the spatial 
                # summary indicator in the filename will be "Data". If it's not, then warn the user as this might 
                # be an error, and record the spatial summary as "NN" (nearest neighbour), which is what it will 
                # default to.
                existingProj = img.projection().getInfo()
                if existingProj['crs'] != "EPSG:4326":
                    llProj =  ee.Projection("EPSG:4326").atScale(img.projection().nominalScale())
                    reProjWarn = "(when reprojected to EPSG:4326) "
                else:
                    llProj = existingProj
                    reProjWarn = ""
                imgTrans = llProj.getInfo()['transform']
                existingResX = imgTrans[0]
                factor = max([exportRes, existingResX]) / min([exportRes, existingResX])
                if factor > 1.25 and existingResX != 1.0:
                    warnings.warn(("The requested mastergrid resolution of {} differs from "+
                                  "the image native resolution {}of {} by quite a lot, but no "+
                                  "spatial summary was requested (aggregations=['Data']). " +
                                  "Therefore the outputs will be produced by nearest neighbour " +
                                   "resampling.").format(
                                      exportRes, reProjWarn, existingResX))
                    spatialSummary="NN"
            if spatialSummary == "majority":
                spatialSummary = "majority-class"
            if spatialSummary == "fractions":
                for className, aggSubImg in aggImg.items():
                    varNameOut = className
                    fileName = fnTemplate.format(varNameOut, imgYearTag, imgMonthTag,
                                                timeSummaryName, mgResolutionName, spatialSummary)
                    taskDesc = fileName.replace('.', '_')
                    exportParams = {
                        'image':aggSubImg,
                        'fileNamePrefix': fileName,
                        'description':taskDesc,
                        'bucket':EXPORT_BUCKET,
                        'crs':'EPSG:4326',
                        'crsTransform':affine,
                        'region':regionGeom.getInfo()['coordinates'],
                        'maxPixels':4e9,
                        'formatOptions':{
                            'cloudOptimized':True # ensures internal tiling and creates pyramids
                        },
                        'fileDimensions':MAX_DIMS
                    }
                    if fileName in exportImages:
                        raise ValueError("{} would be a duplicate filename, check collection is what you intended"
                                         .format(fileName))
                    exportImages[fileName] = exportParams
            else:    
                fileName = fnTemplate.format(varNameOut, imgYearTag, imgMonthTag, 
                                             timeSummaryName, mgResolutionName, spatialSummary)
                taskDesc = fileName.replace('.', '_')
                exportParams = {
                    'image':aggImg,
                    'fileNamePrefix': fileName,
                    'description':taskDesc,
                    'bucket':EXPORT_BUCKET,
                    'crs':'EPSG:4326',
                    'crsTransform':affine,
                    'region':regionGeom.getInfo()['coordinates'],
                    'maxPixels':4e9,
                    'formatOptions':{
                        'cloudOptimized':True # ensures internal tiling and creates pyramids
                    },
                    'fileDimensions':MAX_DIMS
                }
                if fileName in exportImages:
                    raise ValueError("{} would be a duplicate filename, check collection is what you intended"
                                     .format(fileName))
                exportImages[fileName] = exportParams
    return exportImages

In [354]:
import warnings

def getCollExportParams_ContAgg(collection, mgTimeStepName, timeSummaryName, 
                                mgResolutionName, variableName, regionGeom,
                                aggregations=["Data"]):
    '''
    collection: an ee.ImageCollection, which must already be aggregated to monthly or annual or whatever
    mgTimeStepName: a string representing the timestep as it should be shown in the filenames, e.g. "Monthly"
    timeSummaryName: a string representing how the images of this timestep were created, e.g. if the images 
        are monthly and are the mean of daily data then "mean", or if they are monthly from monthly data 
        then "Data"
    resolutionName: a string representing the mastergrid-named resolution as it should be shown in the filenames 
        corresponding to a specific decimal degrees resolution
    variableName: for the first part of the filenames. If None, then image property "varname" will be used.
    regionGeom: an ee.Geometry(.Rectangle, preferably)
    aggregations: 
    '''
    fnTemplate = "{0!s}.{1!s}.{2!s}.{3!s}.{4!s}.{5!s}"
    if mgTimeStepName not in mgTimesteps:
        raise ValueError("specified timestep string isn't valid")
    if mgResolutionName not in mgResolutions:
        raise ValueError("specified resolution string isn't valid")
    
    exportOrigin = getOriginXY(regionGeom)
    exportRes = mgResolutions[mgResolutionName] # None if mgResolutionName=="Unchanged"
    
    nImages = collection.size().getInfo()
    collectionList = collection.toList(nImages) # necessary to allow client-side iterating over them
    exportImages = {}
        
    for i in range(nImages):
        img = ee.Image(collectionList.get(i))
        imgYearTag, imgMonthTag = getFilenameTimeTags(img, mgTimeStepName)
      
        # we are applying some level of spatial aggregation, this might mean we are 
        # deliberately aggregating e.g. to 5k outputs, but this also applies even 
        # if we are exporting a true 1km image (e.g. MODIS) to mastergrid "1km".
        # In this latter case we don't want to create and export min/max/mean/sd aggregations,
        # just a standard nearest-neighbour.
        affine = str([exportRes, 0.0, exportOrigin[0], 0, -exportRes, exportOrigin[1]])
        aggregatedImages = spatialSummaries_Continuous(img)
        for spatialSummary, aggImg in aggregatedImages.items():
            if spatialSummary not in aggregations:
                continue
            if variableName is None:
                varNameOut = aggImg.get('varname').getInfo()
            else:
                varNameOut = variableName
            if spatialSummary == "Data":
                # User has requested a no-aggregation type summary. Check whether the requested resolution 
                # for the export is actually close to the native image resolution. If it is, then the spatial 
                # summary indicator in the filename will be "Data". If it's not, then warn the user as this might 
                # be an error, and record the spatial summary as "NN" (nearest neighbour), which is what it will 
                # default to.
                existingProj = img.projection().getInfo()
                if existingProj['crs'] != "EPSG:4326":
                    llProj =  ee.Projection("EPSG:4326").atScale(img.projection().nominalScale()).getInfo()
                    reProjWarn = "(when reprojected to EPSG:4326) "
                else:
                    llProj = existingProj
                    reProjWarn = ""
                imgTrans = llProj['transform']
                existingResX = imgTrans[0]
                factor = max([exportRes, existingResX]) / min([exportRes, existingResX])
                if factor > 1.25 and existingResX != 1.0:
                    # when we create a composite from an imageCollection e.g. temporal mean or std,
                    # it seems to return a default projection of EPSG:4326, scale 1.0, even if the 
                    # inputs were e.g. sinusoidal (MODIS), so this check breaks. 
                    warnings.warn(("The requested mastergrid resolution of {} differs from "+
                                  "the image native resolution {}of {} by quite a lot, but no "+
                                  "spatial summary was requested (aggregations=['Data']). " +
                                  "Therefore the outputs will be produced by nearest neighbour " +
                                   "resampling.").format(
                                      exportRes, reProjWarn, existingResX))
                    spatialSummary="NN"
            fileName = fnTemplate.format(varNameOut, imgYearTag, imgMonthTag, 
                                         timeSummaryName, mgResolutionName, spatialSummary)
            taskDesc = fileName.replace('.', '_')
            exportParams = {
                'image':aggImg,
                'fileNamePrefix': fileName,
                'description':taskDesc,
                'bucket':EXPORT_BUCKET,
                'crs':'EPSG:4326',
                'crsTransform':affine,
                'region':regionGeom.getInfo()['coordinates'],
                'maxPixels':4e9,
                'formatOptions':{
                    'cloudOptimized':True # ensures internal tiling and creates pyramids
                },
                'fileDimensions':MAX_DIMS
            }
            if fileName in exportImages:
                raise ValueError("{} would be a duplicate filename, check collection is what you intended"
                                 .format(fileName))
            exportImages[fileName] = exportParams
    return exportImages


In [361]:
def createTasks(taskParamsDict):
    taskList = []
    for _, params in taskParamsDict.items():
        task = ee.batch.Export.image.toCloudStorage(**params)
        taskList.append(task)
    return taskList

def performTasks(taskList):
    for t in taskList:
        t.start()
    
def reportTasks(taskList):
    stati = [] # yeah no i've no idea
    for t in taskList:
        status = t.status()
        out = status['description'] + ": " +status['state'] 
        if 'start_timestamp_ms' in status:
            timeSecs = (status['update_timestamp_ms'] - status['start_timestamp_ms']) / 1000
            out += " ({} seconds)".format(timeSecs)
        stati.append(out)
    return stati

def cancelTasks(taskList):
    for t in taskList:
        status = t.status()
        if status['state'] == 'RUNNING' or status['state'] == 'READY':
            print("Cancelling {}".format(status['description']))
            t.cancel()
        

# Usage

First create the collection in the way you want to export, with images that have a single band. This could be the images as-is, or some kind of temporal summary such as annual images. 

Then call the appropriate getCollExportParams_* function to get a list of parameter dictionaries, one for each image to be exported.

Then create tasks from each set of parameters using createTasks(). 

Then start those tasks with performTasks()


## VIIRS

* Continuous dataset, monthly, extent 75N-65S
* Actually we're going to take the opportunity to change this to 75N-60S to slightly reduce storage
* Native resolution equivalent to 500m (but not precisely equal to the mastergrids 500m, so use the getCollExportParams_ContAgg function to allow us to specify the required MG resolution as opposed to outputting the exact native resolution but specify "Data" for the aggregations to just use NN resampling
* For aggregated data, we only store this at 5km aggregated resolution, not 1km or 10km, and we will want mean, min, max, SD.

We're just exporting the monthly files as-is so all we need to do is select the band and filter to the dates we want. So first create the collection for export.

In [171]:
coll_viirs = (ee.ImageCollection("NOAA/VIIRS/DNB/MONTHLY_V1/VCMSLCFG")
    .map(lambda i: (i.select('avg_rad')))
    .filter(ee.Filter.date(ee.Date.fromYMD(2017,3,1),
                           ee.Date.fromYMD(2017,9,30))))
exportColl = coll_viirs

### 500m

In [None]:
exportRegion = mgRegions['viirs-extent-7560']
exportRes = mgResolutions['500m']
exportItems = getCollExportParams_ContAgg(collection=exportColl, mgTimeStepName="Monthly", timeSummaryName="Data",
                                       mgResolutionName="500m", variableName="VIIRS-SLC", regionGeom=exportRegion,
                                       aggregations=["Data"])
exportItems

In [173]:
viirs500m_tasks = createTasks(exportItems)
# just a bit of faffing if the exports fail and we have to re-run
# errortasks = [t.status()['description'] for t in viirs500m_tasks if t.status()['state']=="FAILED"]
# viirs500m_tasks = [t for t in viirs500m_tasks_new if str(t).split(' ')[2] in errortasks]

In [174]:
# note that the default str of the task seems to show UNSUBMITTED even when it's running
viirs500m_tasks

[<Task EXPORT_IMAGE: VIIRS-SLC_2017_03_Data_500m_Data (UNSUBMITTED)>,
 <Task EXPORT_IMAGE: VIIRS-SLC_2017_04_Data_500m_Data (UNSUBMITTED)>,
 <Task EXPORT_IMAGE: VIIRS-SLC_2017_05_Data_500m_Data (UNSUBMITTED)>,
 <Task EXPORT_IMAGE: VIIRS-SLC_2017_06_Data_500m_Data (UNSUBMITTED)>,
 <Task EXPORT_IMAGE: VIIRS-SLC_2017_07_Data_500m_Data (UNSUBMITTED)>,
 <Task EXPORT_IMAGE: VIIRS-SLC_2017_08_Data_500m_Data (UNSUBMITTED)>,
 <Task EXPORT_IMAGE: VIIRS-SLC_2017_09_Data_500m_Data (UNSUBMITTED)>]

In [175]:
performTasks(viirs500m_tasks)

In [181]:
# so instead:
reportTasks(viirs500m_tasks)

['VIIRS-SLC_2017_03_Data_500m_Data: FAILED',
 'VIIRS-SLC_2017_04_Data_500m_Data: FAILED',
 'VIIRS-SLC_2017_05_Data_500m_Data: FAILED',
 'VIIRS-SLC_2017_06_Data_500m_Data: COMPLETED',
 'VIIRS-SLC_2017_07_Data_500m_Data: FAILED',
 'VIIRS-SLC_2017_08_Data_500m_Data: FAILED',
 'VIIRS-SLC_2017_09_Data_500m_Data: FAILED']

In [86]:
#cancelTasks(viirs500m_tasks_new)

Cancelling VIIRS-SLC_2017_10_Data_500m_Data
Cancelling VIIRS-SLC_2017_11_Data_500m_Data
Cancelling VIIRS-SLC_2018_03_Data_500m_Data
Cancelling VIIRS-SLC_2018_04_Data_500m_Data
Cancelling VIIRS-SLC_2018_07_Data_500m_Data
Cancelling VIIRS-SLC_2018_08_Data_500m_Data
Cancelling VIIRS-SLC_2018_09_Data_500m_Data
Cancelling VIIRS-SLC_2018_10_Data_500m_Data
Cancelling VIIRS-SLC_2018_11_Data_500m_Data
Cancelling VIIRS-SLC_2018_12_Data_500m_Data


### 5km

In [None]:
exportColl = coll_viirs
exportRegion = mgRegions['viirs-extent-7560']
exportOrigin = getOriginXY(exportRegion)
exportRes = mgResolutions['5km']
exportAgg = None

#collection, mgTimeStep, timeSummaryName, mgResolutionName, 
 #                              varName, regionGeom, aggregations=["Unchanged"])
exportItems = getCollectionExportParams(collection=exportColl, mgTimeStep="Monthly", timeSummaryName="Data",
                                       mgResolutionName="5km", varName="VIIRS-SLC", regionGeom=exportRegion,
                                       aggregations=["min", "max", "mean", "SD"])
exportItems

In [None]:
viirs5km_tasks = createTasks(exportItems)
viirs5km_tasks

In [13]:
performTasks(viirs5km_tasks)

In [6]:
def downloadImages(taskList, localFolder):
    for t in taskList:
        if 1: #t.status()['state']=='COMPLETED':
            prefix = t.config['outputPrefix']
            gsUrl = "gs://{}/{}*".format(EXPORT_BUCKET, prefix)
            gsCmd = "gsutil -m mv {} {}".format(gsUrl, localFolder)
            print(gsCmd)
            !{gsCmd}

In [47]:
downloadImages(tmpTasks, r'C:\Temp\dataprep\VIIRS\5km')

KeyError: 'outputPrefix'

## Synoptic MODIS

A bit more pre-processing here to create temporal aggregations.

Here we'll set the variable name (used to create the first part of output filenames) directly on each image 
so we can make a single call to the getCollExportParams_ContAgg, rather than having to call it once per 
varname.

In [445]:
def mean_for_synoptic_month(coll, month):
    cMonthColl = coll.filter(ee.Filter.calendarRange(month, month, 'month'))
    return summariseColl(cMonthColl)
    
def mean_of_synoptic_monthly_means(coll):
    monthly_means = []
    for month in range(1, 13):
        monthly_means.append(mean_for_synoptic_month(coll, month))
    return ee.ImageCollection(monthly_means).mean()

def summariseColl(coll):
    mean = coll.reduce(ee.Reducer.mean())
    sd = coll.reduce(ee.Reducer.sampleStdDev())
    iMin = coll.reduce(ee.Reducer.min())
    iMax = coll.reduce(ee.Reducer.max())
    allIm = ee.Image.cat(mean, sd, iMin, iMax)
    return allIm

def monthlySummary(coll):
    firstDate = ee.Date(coll.aggregate_min('system:time_start'))
    lastDate = ee.Date(coll.aggregate_max('system:time_start'))
    startMonth = ee.Date.fromYMD(firstDate.get('year'), firstDate.get('month'), 1)
    endMonth = ee.Date.fromYMD(lastDate.get('year'), lastDate.get('month'), 1)
    # difference is based on "average" month so this won't necessarily be an int
    nSteps = endMonth.difference(startMonth, 'month').round()
    print(nSteps.getInfo())
    dates = ee.List.sequence(0, nSteps, 1).map(lambda n: startMonth.advance(n, 'month'))
    print(dates.getInfo())
    monthlyIms = dates.map(lambda d: summariseColl(
        coll.filter(ee.Filter.date(ee.DateRange(d, ee.Date(d).advance(1,'month')))))
                          .set('system:time_start', ee.Date(d).millis())
                          .set('system:time_end', ee.Date(d).advance(1, 'month').millis()))
    return monthlyIms
    
def annual_means(coll):
    firstDate = ee.Date(coll.aggregate_min('system:time_start'))
    lastDate = ee.Date(coll.aggregate_max('system:time_start'))
    startYear = ee.Date.fromYMD(firstDate.get('year'), 1, 1)
    endYear = ee.Date.fromYMD(lastDate.get('year'), 1, 1)
    nSteps = endYear.difference(startYear, 'year').round()
    dates = ee.List.sequence(0, nSteps, 1).map(lambda n: startYear.advance(n, 'year'))
    annualIms = dates.map(lambda d: summariseColl(
        coll.filter(ee.Filter.date(ee.DateRange(d, ee.Date(d).advance(1, 'year')))))
                          .set('system:time_start', ee.Date(d).millis())
                          .set('system:time_end', ee.Date(d).advance(1, 'year').millis()))
    return annualIms


In [355]:
mod11a2 = ee.ImageCollection('MODIS/006/MOD11A2')
mod11a2 = mod11a2.filter(ee.Filter.calendarRange(2000, 2017, 'year'))
lst_day_c_coll = mod11a2.map(lambda i: i.select('LST_Day_1km')
                             .multiply(0.02).subtract(273.15).
                             set('system:time_start', i.get('system:time_start')))
lst_night_c_coll = mod11a2.map(lambda i: i.select('LST_Night_1km')
                               .multiply(0.02).subtract(273.15)
                               .set('system:time_start', i.get('system:time_start')))

In [462]:
summariseColl(lst_day_c_coll).getInfo()

{'type': 'Image',
 'bands': [{'id': 'LST_Day_1km_mean',
   'data_type': {'type': 'PixelType',
    'precision': 'double',
    'min': -273.15,
    'max': 1037.5500000000002},
   'crs': 'EPSG:4326',
   'crs_transform': [1.0, 0.0, 0.0, 0.0, 1.0, 0.0]},
  {'id': 'LST_Day_1km_stdDev',
   'data_type': {'type': 'PixelType', 'precision': 'double'},
   'crs': 'EPSG:4326',
   'crs_transform': [1.0, 0.0, 0.0, 0.0, 1.0, 0.0]},
  {'id': 'LST_Day_1km_min',
   'data_type': {'type': 'PixelType',
    'precision': 'double',
    'min': -273.15,
    'max': 1037.5500000000002},
   'crs': 'EPSG:4326',
   'crs_transform': [1.0, 0.0, 0.0, 0.0, 1.0, 0.0]},
  {'id': 'LST_Day_1km_max',
   'data_type': {'type': 'PixelType',
    'precision': 'double',
    'min': -273.15,
    'max': 1037.5500000000002},
   'crs': 'EPSG:4326',
   'crs_transform': [1.0, 0.0, 0.0, 0.0, 1.0, 0.0]}]}

In [355]:
sourceBandName = "LST_Day_1km"
lst_day_synoptic = summariseColl(lst_day_c_coll)
lst_day_mean = lst_day_synoptic.select(sourceBandName+"_mean").set('varname', 'AFR_LST_Day_v6')
lst_day_sd = lst_day_synoptic.select(sourceBandName+"_stdDev").set('varname', 'AFR_LST_Day_v6')
lst_day_min = lst_day_synoptic.select(sourceBandName+"_min").set('varname', 'AFR_LST_Day_v6')
lst_day_max = lst_day_synoptic.select(sourceBandName+"_max").set('varname', 'AFR_LST_Day_v6')
lst_day_balanced_mean = mean_of_synoptic_monthly_means(lst_day_c_coll).set('varname', 'AFR_LST_Day_v6')
lst_night_mean = lst_night_c_coll.mean().set('varname', 'AFR_LST_Night_v6')
lst_night_sd = lst_night_c_coll.reduce(ee.Reducer.sampleStdDev()).set('varname', 'AFR_LST_Night_v6')
lst_night_balanced_mean = mean_of_synoptic_monthly_means(lst_night_c_coll).set('varname', 'AFR_LST_Night_v6')

sdColl = ee.ImageCollection.fromImages([lst_day_sd, lst_night_sd])
meanColl = ee.ImageCollection.fromImages([lst_day_mean, lst_night_mean])
balancedMeanColl = ee.ImageCollection.fromImages([lst_day_balanced_mean, lst_night_balanced_mean])

In [356]:
exportRegion = mgRegions['mapAfrica']

exportItemsSD = getCollExportParams_ContAgg(collection=sdColl, mgTimeStepName="Synoptic_Overall", timeSummaryName="SD",
                                         mgResolutionName="1km", variableName=None, regionGeom=exportRegion,
                                         aggregations=["Data"])
exportItemsMean = getCollExportParams_ContAgg(collection=meanColl, mgTimeStepName="Synoptic_Overall", timeSummaryName="mean",
                                         mgResolutionName="1km", variableName=None, regionGeom=exportRegion,
                                         aggregations=["Data"])
exportItemsBalancedMean = getCollExportParams_ContAgg(collection=balancedMeanColl, mgTimeStepName="Synoptic_Overall", timeSummaryName="Balanced-mean",
                                         mgResolutionName="1km", variableName=None, regionGeom=exportRegion,
                                         aggregations=["Data"])

# this syntax only works in python 3.5+
exportItems = {**exportItemsSD, **exportItemsMean, **exportItemsBalancedMean}

In [327]:
t=createTasks(exportItems)

In [328]:
performTasks(t)

In [367]:
reportTasks(t)

['LST_Day_v6_Synoptic_Overall_SD_1km_Data: COMPLETED (7642.124 seconds)',
 'LST_Night_v6_Synoptic_Overall_SD_1km_Data: COMPLETED (8075.524 seconds)',
 'LST_Day_v6_Synoptic_Overall_mean_1km_Data: RUNNING (8250.14 seconds)',
 'LST_Night_v6_Synoptic_Overall_mean_1km_Data: COMPLETED (2561.55 seconds)',
 'LST_Day_v6_Synoptic_Overall_Balanced-mean_1km_Data: COMPLETED (2191.036 seconds)',
 'LST_Night_v6_Synoptic_Overall_Balanced-mean_1km_Data: COMPLETED (2353.463 seconds)']

## MCD12Q1 Landcover (IGBP)

This is categorical data. We want to export the original images as-is, then for the spatial aggregations we will export, from each input image, a single majority-class image and one fraction-cover image for each input class (value). 

First, the original collection contains multiple different landcover classification schemes; we are only interested in the IGBP one.


In [125]:
coll_mcd12q1 = ee.ImageCollection("MODIS/006/MCD12Q1")
coll_igbp = coll_mcd12q1.map(lambda i: (i.select('LC_Type1')))

In [200]:
#collection, mgTimeStep, timeSummaryName, mgResolutionName, 
#                               varName, regionGeom, aggregations=["Unchanged"]):
    
igbp_500m_items = getCollExportParams_CatAgg(collection=coll_igbp, 
                                            mgTimeStepName="Annual", timeSummaryName="Data", 
                                            mgResolutionName="500m", variableName="IGBP_Landcover",
                                            regionGeom=mgRegions['global'],
                                            aggregations='Data',
                                            categoryMapping=None)

In [None]:
igbp_500m_items

In [None]:
igbp_500m_items

In [97]:
igbpTasks = createTasks(igbp_500m_items)

In [100]:
performTasks(igbpTasks)

In [None]:
reportTasks(igbpTasks)

### 5km

For producing the class fractions, we need to know what values to look for, and what names (if different) these values should have. Get this from the metadata on a single image:

In [150]:
test_landcover = coll_igbp.first()
catVals = test_landcover.get('LC_Type1_class_values').getInfo()
catNames = test_landcover.get('LC_Type1_class_names').getInfo()
catNamesClean = [n.split(':')[0].replace(' ', '_').replace('/','_') for n in catNames]
catRefs = ["IGBP_Landcover_Class"+str(cat).zfill(2) for cat in catVals]
fileNames = [c + "_" + n for c,n in zip(catRefs, catNamesClean)]
values_with_names = list(zip(catVals, fileNames))


In [223]:
igbp_5km_items = getCollExportParams_CatAgg(collection=coll_igbp,
                                           mgTimeStepName="Annual", timeSummaryName="Data",
                                           mgResolutionName="5km", variableName="IGBP_Landcover",
                                           regionGeom=mgRegions['global'],
                                           aggregations=['majority', 'fractions'],
                                           categoryMapping=values_with_names)

In [219]:
values_with_names

[(1, 'IGBP_Landcover_Class01_Evergreen_Needleleaf_Forests'),
 (2, 'IGBP_Landcover_Class02_Evergreen_Broadleaf_Forests'),
 (3, 'IGBP_Landcover_Class03_Deciduous_Needleleaf_Forests'),
 (4, 'IGBP_Landcover_Class04_Deciduous_Broadleaf_Forests'),
 (5, 'IGBP_Landcover_Class05_Mixed_Forests'),
 (6, 'IGBP_Landcover_Class06_Closed_Shrublands'),
 (7, 'IGBP_Landcover_Class07_Open_Shrublands'),
 (8, 'IGBP_Landcover_Class08_Woody_Savannas'),
 (9, 'IGBP_Landcover_Class09_Savannas'),
 (10, 'IGBP_Landcover_Class10_Grasslands'),
 (11, 'IGBP_Landcover_Class11_Permanent_Wetlands'),
 (12, 'IGBP_Landcover_Class12_Croplands'),
 (13, 'IGBP_Landcover_Class13_Urban_and_Built-up_Lands'),
 (14, 'IGBP_Landcover_Class14_Cropland_Natural_Vegetation_Mosaics'),
 (15, 'IGBP_Landcover_Class15_Permanent_Snow_and_Ice'),
 (16, 'IGBP_Landcover_Class16_Barren'),
 (17, 'IGBP_Landcover_Class17_Water_Bodies')]

In [None]:
igbp_5km_items.keys)()

In [225]:
igbp5kmTasks = createTasks(igbp_5km_items)

In [231]:
performTasks(igbp5kmTasks)

In [None]:
reportTasks(igbp5kmTasks)

In [None]:
cancelTasks(igbp5kmTasks)

True