# Coastline / Limits data matcher

## Purpose

The code in this notebook assists with matching a raster grid to a pre-existing coastline or mask. It will clip the data back to the coastline, fill the data "out" to the coastline, or both. 

The filling part of the process is akin to the ArcGIS operation called "Nibble": cells that are no-data, but which are flagged by a separate mask grid to indicate that they ought to have data, are given a value taken from the nearest cell which has data. This code can optionally take that value from a single nearby cell, or from an average of several.

You can run in two steps, e.g. clipping to the outline of one dataset such as a coastline and then filling to the outline of another dataset such as a Pf limits mask.

Note that any cells that are "clipped" are discarded. If their values need transferring elsewhere (i.e. to a pixel within the coastline), see the notebook **PopulationReallocator**

The input grids (coastline, extra mask if used, and data) must all have exactly the same resolution and alignment (i.e. cells must overlay one another precisely). This needs sorting out with ArcMap / gdal_edit first! 

However the grids don't need to have the same extent - e.g. the global coastline can be used with an Africa data raster. See the code for calculation of necessary offsets in this scenario.

The input data must be of integer type - this is intended for filling of categorical data. For clipping of floating point data the code in the **PopulationReallocator** notebook can be used instead, with the reallocation function turned off.

In [1]:
from osgeo import gdal, gdal_array
import numpy as np
import glob
import os

### Load the cython function from external file

In [5]:
# if not already installed then build at the command line in the Cython_Raster_Funcs directory using 
#  python setup.py build_ext --inplace
# OR
# import pyximport
# pyximport.install()
from raster_utilities.template_matching import matchToCoastline, matchToCoastline_Float

ImportError: cannot import name matchToCoastline

In [7]:
from raster_utilities.io.tiff_management import SaveLZWTiff

# Basic usage

### Clip one or more files, with or without also filling

#### Configure files here

In [13]:
coastlineFile = r'G:\Supporting\CoastGlobal_5k.tif'
#coastlineFile = r'\\map-fs1.ndph.ox.ac.uk\map_data\mastergrids\Global_Masks\Land_Sea_Masks\CoastGlobal.tiff'

In [14]:
# get the files to process 
#inFilePattern = r'C:\Temp\dataprep\gbd2016_polygons\from_map_db\rds_*_adm_2.tif'
#inFiles = glob.glob(inFilePattern)
inFiles = glob.glob(r'\\map-fs1.ndph.ox.ac.uk\map_data\Madagascar_PfPR\Covariate_data\Monthly_variables\TSI_New\*.tif')

# or for one file only
#inFiles = [r'C:\Temp\dataprep\gbd2016_polygons\from_map_db\rds_gaul_adm_1_with_placeholder_in_gaps_only.tif']
#inFiles = [r'C:\Temp\testagg\mos\Global_Urban_Footprint_5km_UrbanLikeAdjacency_unmatched.tif']
#inFiles = [r'C:\Temp\dataprep\ihme_2016\GBD2016_analysis_polygons_with_pop_level_priority.tif']

# set the output folder
outDir = r'C:\Temp\dataprep\ihme_2016'

In [16]:
# Open the coastline data (matching resolution)
landDS = gdal.Open(coastlineFile)
bLand = landDS.GetRasterBand(1)
ndvLand = bLand.GetNoDataValue()
gtLand = landDS.GetGeoTransform()

#### Configure the fill parameters

Specify whether we should clip and / or fill, (either or both)

In [17]:
applyClip = 1
applyFill = 1

Set the useNearestNValues variable to 1 to use the nearest cell value, like Nibble. Set to a number greater than 1 to use the average of several nearby data cells. (Not relevant if applyFill == 0)

In [18]:
useNearestNValues = 10

In either case the fill-source pixel(s) must be found within a radius of this many pixels, otherwise no fill value will be found and the error flag will be set at the location in question. Setting this too high will cause the processing to be slower. (Not relevant if applyFill==0)

In [19]:
searchPixelRadius = 20

### Run the code

In [None]:
for inFileName in inFiles:
    print inFileName

    outDataFile = os.path.splitext(inFileName)[0] + ".MG_Clipped.tif"
    outFailFile = os.path.splitext(inFileName)[0] + ".MG_Clip_Errors.tif"
    inDS = gdal.Open(inFileName)
    bData = inDS.GetRasterBand(1)
    ndvIn = bData.GetNoDataValue()
    if ndvIn:
        print "Nodata incoming is "+str(ndvIn)
    gtIn = inDS.GetGeoTransform()
    projIn = inDS.GetProjection() 
    
    dTypeIn = bData.DataType
    incomingNPtype = gdal_array.GDALTypeCodeToNumericTypeCode(dTypeIn)
    fillType = None
    if issubclass(incomingNPtype, np.integer):
        fillType = "int"
    elif issubclass(incomingNPtype, np.float32):
        fillType = "float"
    else:
        print "Raster must be of integer or float type!"
        assert False
        
    # Ensure the resolutions match (not actually checking the alignment)
    # (or satisfy yourself that they're close, as sometimes the irrational 
    # number resolutions don't return equal)
    #  assert gtIn[1] == gtLand[1]
    # assert gtIn[5] == gtLand[5]
    
    # the input dataset is not necessarily global; where does it sit in the global coastline image?
    landOffsetW = int((gtIn[0]-gtLand[0]) / gtLand[1])
    landOffsetN = int((gtIn[3]-gtLand[3]) / gtLand[5])
    #print (landOffsetN, landOffsetW)
    
    # read the whole data file and upcast to long
    inData = bData.ReadAsArray()
    if fillType == "int":
        inData = inData.astype(long)
    else:
        inData = inData.astype(np.float32)
    inDS = None
    
    # read the relevant parts of the land file
    inLand = bLand.ReadAsArray(landOffsetW, landOffsetN, inData.shape[1], inData.shape[0])
    
    # Dataset-specific munging: 
    # some files have been created badly: they have one nodata value of e.g. -3.4e38, 
    # but ALSO contain pixels at -9999 which should be taken as nodata (probably some dumb
    # R or IDL code which always writes nodata as -9999 regardless of what the file is set to)
    # Replace those
    if ndvIn != -9999:
        if ((inData==-9999).any()):
            print "Image contains -9999 but this isn't recorded as nodata: recording it as such"
            inData[inData==-9999] = ndvIn
    
    # Alternatively or also, set ALL nodata pixels to -9999 
    # as a maxvalue +ve nodata value doesn't work reliably e.g. arcmap might 
    # report -1xxxIND for the stats
    # It doesn't even work reliably here (it's to do with the precision of the tiff format, 
    # most likely), or sometimes the large negative values get read as -inf (which fails equality 
    # test with everything) 
    # So just set all values beyond a reasonable limit to be nodata
    if abs(inData.max())>10e7:
        print "Image appears to have maxvalue as nodata - resetting to -9999"
        inData[abs(inData) > 10e7] = -9999
        ndvIn = -9999
    
    if fillType == "int":
        # Clip and / or fill (according to the options chosen above)
        errors = matchToCoastline(inData, inLand, _NDV=ndvIn, 
                                  applyClip=applyClip, applyFill=applyFill, 
                                  useNearestNPixels=useNearestNValues, 
                                  searchPixelRadius=searchPixelRadius) # so long as it's within 20 pixels radius
    else:
        errors = matchToCoastline_Float(inData, inLand, _NDV=ndvIn,
                                        applyClip = applyClip, applyFill = applyFill,
                                        useNearestNPixels=useNearestNValues,
                                        searchPixelRadius=searchPixelRadius)
    # write the data (the inData array was modified in-place)
    SaveLZWTiff(inData.astype(incomingNPtype), ndvIn, gtIn, projIn, outDir, outDataFile)
    
    # write the failures grid (no nodata)
    SaveLZWTiff(np.asarray(errors), None, gtIn, projIn, outDir, outFailFile)
#landDS = None

## Advanced use - match separately to a limits surface and a coastline

In this example we wanted to ensure that we have data within the entirety of the Pf limits surface (stable or unstable) but none in the sea. 

However we don't want data on _all_ land, so we run as a two-stage process: first we spread to the limits layer, to ensure that is fully covered, then separately clip to the coastline.

(Alternatively we could clip the limits layer to the coastline, then use that for a clip-and-fill on the data).

This workflow assumes that the limits has at least as large an extent as the data.

#### The coastal data should be global and the limits (fill-to) mask should be at least as large as the data file

In [None]:
# Set up the data locations
inDir = r'C:\Temp\PV_Alignment'
inFN = 'PvPR_Aug2015_2_Clip1_Copy.tif'
inLimsFN = 'PvPR_2010_5k.flt' # 'pvlims5f_5k.tif'

outFN = 'PvPR_Aug2015_2_Clip1_Copy_FilledToPvPRLims.tif'
outFailFN = 'PvPR_Aug2015_2_Clip1_Copy_PvPRLimsFillFailures.tif'

inFile = os.path.join(inDir, inFN)
inLimsFile = os.path.join(inDir, inLimsFN)

outDataFile = os.path.join(inDir, outFN)
outFailFile = os.path.join(inDir, outFailFN)

In [None]:
# Open the coastline data
landDS = gdal.Open(coastlineFile)
bLand = landDS.GetRasterBand(1)
ndvLand = bLand.GetNoDataValue()
gtLand = landDS.GetGeoTransform()

In [None]:
# Open the limits data
limsDS = gdal.Open(inLimsFile)
bLims = limsDS.GetRasterBand(1)
ndvMask = bLims.GetNoDataValue()
gtLims = limsDS.GetGeoTransform()

In [None]:
# Read the whole extent of the data
inDS = gdal.Open(inFile)
bData = inDS.GetRasterBand(1)
ndvIn = bData.GetNoDataValue()
gtIn = inDS.GetGeoTransform()
projIn = inDS.GetProjection()

In [None]:
# Ensure the resolutions match (not actually checking the alignment)
assert gtIn[1] == gtLand[1]
assert gtIn[5] == gtLand[5]

assert gtIn[1] == gtLims[1]
assert gtIn[5] == gtLims[5]

#### Work out which bits of the mask files we need

In [None]:
# the input dataset is not necessarily global; where does it sit in the global coastline image?
landOffsetW = int((gtIn[0]-gtLand[0]) / gtLand[1])
#landOffsetN = int((gtPop[3]-gtLand[3]) / gtLand[5])
landOffsetN = int((gtIn[3]-gtLand[3]) / gtLand[5])
landOffsetN, landOffsetW

In [None]:
# the input dataset is not necessarily global; where does it sit in the maybe-global clipping image?
maskOffsetW = int((gtIn[0]-gtLims[0]) / gtLims[1])
#landOffsetN = int((gtPop[3]-gtLand[3]) / gtLand[5])
maskOffsetN = int((gtIn[3]-gtLims[3]) / gtLims[5])
maskOffsetN, maskOffsetW

#### Read the data

In [None]:
# read all the data file
inData = bData.ReadAsArray(dataOffsetW, dataOffsetN, dataReadXSize, dataReadYSize).astype(long)
# read the relevant parts of the land and mask files
inLand = bLand.ReadAsArray(landOffsetW, landOffsetN, inData.shape[1], inData.shape[0])
inMask = bLims.ReadAsArray(maskOffsetW, maskOffsetN, inData.shape[1], inData.shape[0])

In [None]:
# prepare the limits data (specific to this dataset) - reclass the pv limits values into a 0-1 mask for data / nodata
# NB keeping "0" as "data"
#inMask[inMask!=3] = 10
#inMask[inMask==3] = 0
#inMask[inMask==10] = 1
inMask[inMask != ndvMask] = 1
inMask[inMask == ndvMask] = 0
inMask = inMask.astype(np.byte)

#### Do the fill and clip

In [None]:
# run the spreading to generate values for all pixels of the limits surface
# but do not clip to it, because the new Pv data covers more areas (e.g. algeria)
# than the limits - we don't want to delete that data.
fillErrors = matchToCoastline(inData, inMask, _NDV=ndvIn, 
                              applyClip=0, 
                              applyFill=1
                             )

In [None]:
# now clip the data to the coastline, but do not fill (as we don't want to fill into all land)
clipErrors = matchToCoastline(inData, inLand, _NDV=ndvIn, 
                              applyClip=1, 
                              applyFill=0)

#### Save out the data

In [None]:
# write the data
SaveLZWTiff(inData, ndvIn, gtIn, projIn, inDir, outFN)

# write the failures grid (no nodata)
SaveLZWTiff(np.asarray(errors), None, gtIn, projIn, inDir, outFailFN)

## Advanced Usage 2 - match multiple files of different sizes

Here we have multiple input / data files that must be matched to a single limits layer. We want data in the cells of the limits layer and *only* those cells, i.e. we want to both fill to and clip to the limits layer.

(The limits layer was in fact created by clipping to the coastline using the first workflow above, so we are then effectively clipping to the coastline at the same time).

The limits layer is not global, and some of the data file may cover a larger extent than the limits: so we need to read the limits into the appropriate part of an array of the same size as the data file each time.

In [None]:
inFilePattern = r'C:\Users\zool1301\Documents\Dial-A-Map\PWG-20160324-FairlyUrgentButNotTooOnerous\*.tif'
inFiles = glob.glob(inFilePattern)
outDir = r'C:\Users\zool1301\Documents\Dial-A-Map\PWG-20160324-FairlyUrgentButNotTooOnerous\Processed'

In [None]:
# the ITN file will be used as our limits layer 
inFiles.index(r'C:\Users\zool1301\Documents\Dial-A-Map\PWG-20160324-FairlyUrgentButNotTooOnerous\2015.ITN.use.yearavg.adj.stable.tif')

In [None]:
limsFile = inFiles.pop(0)

In [None]:
# Open the limits data
limsDS = gdal.Open(inLimsFile)
bLims = limsDS.GetRasterBand(1)
ndvMask = bLims.GetNoDataValue()
gtLims = limsDS.GetGeoTransform()

for inFile in inFiles:
    # Read the whole extent of the data each time
    print "__________________________________________"
    print inFile
    inDS = gdal.Open(inFile)
    bData = inDS.GetRasterBand(1)
    ndvIn = bData.GetNoDataValue()
    gtIn = inDS.GetGeoTransform()
    projIn = inDS.GetProjection()

    # Ensure the resolutions match (not actually checking the alignment)
    assert gtIn[1] == gtLims[1]
    assert gtIn[5] == gtLims[5]
    # read all the data file
    inData = bData.ReadAsArray().astype(np.float32)
    inDataOrig = bData.ReadAsArray()
    
    # Create an array for the limits that is the same size as the data, then read the appropriate 
    # part of the limits into the appropriate part of this array
    inMask = np.empty(shape=inData.shape)
    inMask[:] = ndvMask
    
    # how many pixels from the W edge of the limits does the W edge of the data sit?
    # This will be negative if the data goes further west than the limits
    maskOffsetW = int((gtIn[0] - gtLims[0]) / gtLims[1])
    # how many pixels from the N edge of the limits does the N edge of the data sit?
    # This will be negative if the data goes further north than the limits
    maskOffsetN = int((gtIn[3] - gtLims[3]) / gtLims[5])
    print "Mask offsets are "+str((maskOffsetN, maskOffsetW))
    
    # find the top left corner (in the data) of the limits we can read
    if maskOffsetN < 0:
        maskInDataOffsetN = abs(maskOffsetN) + 1
        maskOffsetN = 0
    else:
        maskInDataOffsetN = 0
    
    if maskOffsetW < 0:
        maskInDataOffsetW = abs(maskOffsetW) + 1
        maskOffsetW = 0
    else:
        maskInDataOffsetW = 0
    
    dataYSize, dataXSize = inData.shape
    maskYSize = limsDS.RasterYSize
    maskXSize = limsDS.RasterXSize
    
    # find how large the limits array we can read is
    if maskOffsetW + dataXSize > maskXSize:
        # the data goes beyond the E edge of the mask; read to the mask's edge
        maskReadXSize = maskXSize - maskOffsetW
    else:
        # the data does not go beyond the E edge of the mask, read to the data's size
        maskReadXSize = dataXSize
    if maskOffsetN + dataYSize > maskYSize:
        # the data goes beyond the S edge of the mask; read to the mask's edge
        maskReadYSize = maskYSize - maskOffsetN
    else:
        # the data does not go beyond the E edge of the mask, read to the data's size
        maskReadYSize = dataYSize
        
    if (maskInDataOffsetN > 0 or maskInDataOffsetW > 0):
        print inFile + " has greater extent than limits, reading mask as offset!"
    
    # read the relevant part of the mask file into the relevant part of the pre-prepared 
    # mask array (leaving any other part as nodata)
    inMask[maskInDataOffsetN:maskInDataOffsetN + maskReadYSize,
           maskInDataOffsetW:maskInDataOffsetW + maskReadXSize] = bLims.ReadAsArray(
    maskOffsetW, maskOffsetN, maskReadXSize, maskReadYSize)
    
    # specific task for this mask image: the mask image wasn't actually a mask; 
    # we just want to use anywhere that it wasn't nodata as our mask, so reclass it
    inMask[inMask != ndvMask] = 1
    inMask[inMask == ndvMask] = 0
    inMask = inMask.astype(np.byte)
    
    # Specific task for these data images: 
    # We have been asked to set all locations within the limits, where the data image 
    # is nodata, to zero, rather than filling based on nearest neighbours.
    # Also some images have nodata as -inf which doesn't play nicely with equality test,
    # so look for anything with an enormous negative value.
    inData[np.logical_and(np.logical_or(inData==ndvIn, inData<-1e100),
                          inMask == 1)] = 0
    
    # Spread and clip to the mask dataset only (the fill should not result in any 
    # cells being filled, due to the above step: we could turn it off)
    errors = spreadToCoast(inData, inMask, _NDV=ndvIn, applyClip=1, applyFill=1)
    
    outDataFile = inFile.replace(".tif", ".MG_Matched.LimsClipped.tif")
    # write the data - the geotransform is unchanged as we worked with the data extent
    # not the mask extent
    writeTiffFile(inData, outDataFile, gtIn, projIn, ndvIn, gdal.GDT_Float32 )