# Raster data manipulation with GDAL

In this tutorial, we will walk through examples of manipulating raster datasets with GDAL. We will focus on these common operations 

1. Format conversion
2. Cropping 
3. Oversampling/regridding
4. Multilooking

Note that we will not discuss coordinate transformation or projection conversion in this tutorial. We will discuss those topics in a separate tutorial.

## Before we start

This notebook assumes you have run the data download steps from the previous notebook '01_IntroToRasterData'. <b>If you have not yet done so, most of this notebook will not work!</b> So make sure that you have completed the data downloads from that notebook before proceeding.

In [None]:
###The usual python imports for the notebook
%matplotlib inline
from osgeo import gdal
import matplotlib.pyplot as plt
import numpy as np

gdal.UseExceptions()

#Utility function to load data
def loadData(infile, band=1):
    ds = gdal.Open(infile, gdal.GA_ReadOnly)
    #Data array
    data = ds.GetRasterBand(band).ReadAsArray()
    #Map extent
    trans = ds.GetGeoTransform()
    xsize = ds.RasterXSize
    ysize = ds.RasterYSize
    extent = [trans[0], trans[0] + xsize * trans[1],
            trans[3] + ysize*trans[5], trans[3]]
    
    ds = None
    return data, extent

## gdal_translate

We will be using the **gdal_translate** utility either as an executable or programatically for accomplishing all the above listed data manipulation operations in this tutorial. For list of options supported by **gdal_translate**, see  

http://www.gdal.org/gdal_translate.html

```bash
gdal_translate [--help-general]
       [-ot {Byte/Int16/UInt16/UInt32/Int32/Float32/Float64/
             CInt16/CInt32/CFloat32/CFloat64}] [-strict]
       [-of format] [-b band]* [-mask band] [-expand {gray|rgb|rgba}]
       [-outsize xsize[%]|0 ysize[%]|0] [-tr xres yres]
       [-r {nearest,bilinear,cubic,cubicspline,lanczos,average,mode}]
       [-unscale] [-scale[_bn] [src_min src_max [dst_min dst_max]]]* [-exponent[_bn] exp_val]*
       [-srcwin xoff yoff xsize ysize] [-epo] [-eco]
       [-projwin ulx uly lrx lry] [-projwin_srs srs_def]
       [-a_srs srs_def] [-a_ullr ulx uly lrx lry] [-a_nodata value]
       [-a_scale value] [-a_offset value]
       [-gcp pixel line easting northing [elevation]]*
       |-colorinterp{_bn} {red|green|blue|alpha|gray|undefined}]
       |-colorinterp {red|green|blue|alpha|gray|undefined},...]
       [-mo "META-TAG=VALUE"]* [-q] [-sds]
       [-co "NAME=VALUE"]* [-stats] [-norat]
       [-oo NAME=VALUE]*
       src_dataset dst_dataset
```

## 1. Format conversion

Converting raster data from one format to another is a common requirement to work with legacy code. If the software was built with GDAL support, this step would be optional. But we all have software written to work with flat binary files, single band geotiffs or GMT grd files. Here are some examples of using **gdal_translate** to convert formats.

### a. Converting single band images

In [None]:
#Convert DEM from GMT format to geotiff
!gdal_translate -of GTiff 18.grd 18.tif

In [None]:
#Load and compare data from both formats
gmt, gmtext = loadData('18.grd')
tif, tifext = loadData('18.tif')
plt.figure('GMT vs TIFF')
plt.subplot(1,2,1)
plt.imshow(gmt, clim=[-7,2690], extent=gmtext, cmap='gray')
plt.subplot(1,2,2)
plt.imshow(tif, clim=[-7,2690], extent=tifext, cmap='gray')
plt.show()
gmt = None
tif = None

<br>
<div class="alert alert-info">
<b>Note :</b>

gdalinfo on the converted file reported a **tif.aux.xml** file as well. This is because GDAL provides support for translating the basic raster information from one to another. Each format has its own custom method of storing metadata - which may not be compatible with other formats. In such cases, GDAL will create an aux.xml to dump the metadata into it. 

</div>

### b. Converting a single band and assigning NoDataValue

In [None]:
#Extract coherence layer
!gdal_translate -of GTiff -b 2 -a_nodata 0 stripmap/interferogram/topophase.cor.geo.vrt stripmap/interferogram/coherence.geo.tif

In [None]:
#Look at the output
!gdalinfo stripmap/interferogram/coherence.geo.tif

### c. Creation options - format specific features

Each data format has its own special features. For example, many formats include support for data compression to save disk space. When translating data to these formats, some of these options can be enabled by adding **-co** options. These options are specific to formats and can be discovered via their documentation pages. For example:

* GeoTiff: http://www.gdal.org/frmt_gtiff.html
* netCDF: http://www.gdal.org/frmt_netcdf.html

In [None]:
#We will create a compressed 16-bit floating type file for coherence with compression
!gdal_translate -of GTiff -b 2 -a_nodata 0 stripmap/interferogram/topophase.cor.geo.vrt stripmap/interferogram/coherence_compressed.geo.tif -co "NBITS=16" -co "COMPRESS=DEFLATE" 

In [None]:
#Look at the file sizes
!ls -ltr stripmap/interferogram/*.tif

In [None]:
#Look at the output
!gdalinfo stripmap/interferogram/coherence_compressed.geo.tif

### d. Flat binary files

GDAL supports a fairly large number of raw binary data formats - ENVI, SRTM, ROI_PAC, ISCE etc. A lot of these drivers are user contributed and are built on top of basic raw data handling mechanisms in GDAL. ENVI is widely used in industry and is possibly the most tested of the raw data formats. If you absolutely need to use raw binary formats for your intermediate products - we recommend that you use the ENVI format.

Support for other raw formats especially ROI_PAC and ISCE are limited and can get out of sync with the software generating them.

## 2. Cropping

Data cropping is a very common operation. GDAL allows us to crop data using map coordinates as well line/pixel locations. It even allows map locations in a coordinate system other than the one used by the source raster.

<br>
<div class="alert alert-info">
<b>Note :</b>

You can use VRT files to save disk space. You dont have to create cropped raster images unless absolutely needed.

</div>

### a. Cropping using map coordinates

In [None]:
#Cropping in same coordinate system as source raster
# ulx uly lrx lry
!gdal_translate -of VRT -b 2 -projwin -154.95 19.6 -154.75 19.4 stripmap/interferogram/topophase.cor.geo.vrt stripmap/coherence_crop.vrt

In [None]:
#Compare original and cropped data
orig, origext = loadData('stripmap/interferogram/topophase.cor.geo.vrt', band=2)
crop, cropext = loadData('stripmap/coherence_crop.vrt')

plt.figure('Source vs Crop')
plt.subplot(1,2,1)
plt.imshow(orig, clim=[0., 1.], extent=origext, cmap='gray')
plt.subplot(1,2,2)
plt.imshow(crop, clim=[0., 1.], extent=cropext, cmap='gray')
plt.show()

orig = None
crop= None

### b. Cropping using map coordinates but different coordinate system

We will use a DEM of Greenland in Polar Stereographic projection (EPSG:3413) for this example. This is accomplished using the **-projwin_srs** argument to **gdal_translate**

In [None]:
#Look at the input data
!gdalinfo Greenland1km.nc

In [None]:
#Crop DEM using lat/lon coordinates
# ulx uly rlx rly
!gdal_translate -of VRT -projwin -55 71 -50 69 -projwin_srs EPSG:4326 NETCDF:"Greenland1km.nc":topg Greenland_crop.vrt

In [None]:
#Compare original and cropped data
orig, origext = loadData('NETCDF:"Greenland1km.nc":topg')
crop, cropext = loadData('Greenland_crop.vrt')

plt.figure('Source vs Crop 2')
plt.subplot(1,2,1)
plt.imshow(orig, clim=[0., 3000.], extent=origext)
plt.subplot(1,2,2)
plt.imshow(crop, clim=[0., 3000.], extent=cropext)
plt.show()

orig = None
crop= None

### c. Cropping using line/pixel locations

One can also crop images using the line and pixel locations. This is particularly useful when working with rasters that are not geocoded. In our case, cropping radar geometry products can be accomplished using this approach. One should use the **-srcwin** argument to specify the region of interest.

In [None]:
#-srcwin xoff yoff xsize ysize
!gdal_translate -of VRT -srcwin 1400 1700 1200 1200 N33W119_wgs84.tif SRTM_crop.vrt

In [None]:
#Compare original and cropped data
orig, origext = loadData('N33W119_wgs84.tif')
crop, cropext = loadData('SRTM_crop.vrt')

plt.figure('Source vs Crop 3')
plt.subplot(1,2,1)
plt.imshow(orig, clim=[0., 1000.], extent=origext)
plt.subplot(1,2,2)
plt.imshow(crop, clim=[0., 1000.], extent=cropext)
plt.show()

orig = None
crop= None

Remember the raster layout from the tutorial on **Introduction to Raster Data**. **xoff, yoff** refers to the top-left corner of the first pixel of the region of interest. **xsize, ysize** refers to the number of pixels and lines in the region of interest.

## 3. Oversampling / regridding

Oversampling or regridding is another common data manipulation operations. We often need to regrid data from different sources to use them within the same tools. A typical example is the oversampling of the DEM. GDAL provides efficient support for certain types of interpolators to oversample of regrid the data. By default, the following interpolators are available

1. Nearest neighbor
2. Bilinear
3. Cubic
4. Cubic spline
5. Lanczos (Truncated sinc)
6. Average
7. Mode

The output format can again be VRT if needed. This is recommended if the oversampled / regridded data is an intermediate product and not going to used again and again. When VRT output format is used, resampling/regridding occurs on the fly when the dataset is read in.

### a. Regridding with sample spacing

One can directly provide sample spacing for regridding the data using the **-tr** argument.

In [None]:
#Oversample greenland dem to 500m
!gdal_translate -of GTiff -tr 500 500 -r cubicspline NETCDF:"Greenland1km.nc":topg Greenland_500m.tif

In [None]:
#Check the pixel spacing if you want!
!gdalinfo Greenland_500m.tif

In [None]:
#Compare original and resampled data
orig, origext = loadData('NETCDF:"Greenland1km.nc":topg')
grid, gridext = loadData('Greenland_500m.tif')

plt.figure('Greenland')
plt.subplot(1,2,1)
plt.imshow(orig, clim=[0., 3000.], extent=origext)
plt.subplot(1,2,2)
plt.imshow(grid, clim=[0., 3000.], extent=gridext)
plt.show()

orig = None
grid = None

### b. Regridding with output size

One can also directly specify the expected output size and **gdal_translate** will automatically compute the corresponding pixel spacing.

In [None]:
#Greenland DEM downsampled to 60, 110
!gdal_translate -of GTiff -outsize 60 110 -r nearest NETCDF:"Greenland1km.nc":topg Greenland_subsample.tif

In [None]:
#Check the pixel spacing if you want!
!gdalinfo Greenland_subsample.tif

In [None]:
#Compare original and resampled data
orig, origext = loadData('NETCDF:"Greenland1km.nc":topg')
grid, gridext = loadData('Greenland_subsample.tif')

plt.figure('Greenland 2')
plt.subplot(1,2,1)
plt.imshow(orig, clim=[0., 3000.], extent=origext)
plt.subplot(1,2,2)
plt.imshow(grid, clim=[0., 3000.], extent=gridext)
plt.show()

orig = None
grid = None

### c. Regridding with relative size

Alternately, **-outsize** can also accept relative size in percentages. 

In [None]:
!gdal_translate -of GTiff -outsize 10% 10% NETCDF:"Greenland1km.nc":topg Greenland_10perc.tif

In [None]:
#Check the pixel spacing if you want!
!gdalinfo Greenland_10perc.tif

In [None]:
#Compare original and resampled data
orig, origext = loadData('NETCDF:"Greenland1km.nc":topg')
grid, gridext = loadData('Greenland_10perc.tif')

plt.figure('Greenland 3')
plt.subplot(1,2,1)
plt.imshow(orig, clim=[0., 3000.], extent=origext)
plt.subplot(1,2,2)
plt.imshow(grid, clim=[0., 3000.], extent=gridext)
plt.show()

orig = None
grid = None

## 4. Multilooking

Multilooking is a very common operation in SAR / InSAR processing. In most common cases, data is reduced using a simple box car filter. This can be easily accomplished by manipulating the **-outsize** and **-srcwin** parameters. Here, we will set up a simple python function to multilook the data using GDAL.

In [None]:
def multiLook(infile, outfile, fmt='GTiff', xlooks=None, ylooks=None, noData=None, method='average'):
    '''
    infile - Input file to multilook
    outfile - Output file to multilook
    fmt - Output format
    xlooks - Number of looks in x/range direction
    ylooks - Number of looks in y/azimuth direction
    '''
    ds = gdal.Open(infile, gdal.GA_ReadOnly)

    #Input file dimensions
    xSize = ds.RasterXSize
    ySize = ds.RasterYSize

    #Output file dimensions
    outXSize = xSize//xlooks
    outYSize = ySize//ylooks

    ##Set up options for translation
    gdalTranslateOpts = gdal.TranslateOptions(format=fmt, 
                                              width=outXSize, height=outYSize,
                                             srcWin=[0,0,outXSize*xlooks, outYSize*ylooks],
                                             noData=noData, resampleAlg=method)

    #Call gdal_translate
    gdal.Translate(outfile, ds, options=gdalTranslateOpts)       
    ds = None

We will try using this function on some real valued datasets.

### a. Real valued data

In [None]:
#Multilook the coherence file and then visualize the output
multiLook('stripmap/interferogram/topophase.cor.vrt', 'stripmap/coherence_looks.tif', 
          xlooks=5, ylooks=5, noData=0)

orig, origext = loadData('stripmap/interferogram/topophase.cor.vrt', band=2)
grid, gridext = loadData('stripmap/coherence_looks.tif', band=2)

plt.figure('Multilook')
plt.subplot(1,2,1)
plt.imshow(orig, clim=[0., 1.], extent=origext, cmap='gray')
plt.subplot(1,2,2)
plt.imshow(grid, clim=[0., 1.], extent=gridext, cmap='gray')
plt.show()

orig = None
grid = None

### b. Complex valued data

<br>
<div class="alert alert-info">
<b>Note :</b>
<ul>
<li> GDAL has been highly optimized for use with real valued datasets. Some features like the average filter is still being implemented for Complex Data. We will use a work around by treating the real/imag parts as separate real valued channels. </li>

<li> GDAL support for NoDataValue for complex datasets is still evolving. Currently, this is only implemented in the C++ api. Hopefully, this will be exposed to the users in the near future. </li>

<li> Complex data support will hopefully be implemented in the next 6 months.</li>
</ul>
</div>

In [None]:
###This is a temporary fix. 
###Expect GDAL to support averaging and recognize nodata for complex data in near future
###We include this example to demonstrate the use of inmemory / temporary files

def multiLookCpx(infile, outfile, fmt='GTiff', xlooks=None, ylooks=None, noData=None, method='average'):
    '''
    infile - Input file to multilook
    outfile - Output file to multilook
    fmt - Output format
    xlooks - Number of looks in x/range direction
    ylooks - Number of looks in y/azimuth direction
    
    
    input cpx file
        |
    2 band real virtual
        |
    2 band real multilooked virtual
        |
    1 band complex virtual
        |
    output cpx file
        
    '''
    sourcexml = '''    <SimpleSource>
      <SourceFilename>{0}</SourceFilename>
      <SourceBand>1</SourceBand>
    </SimpleSource>'''.format(infile)
    
    ds = gdal.Open(infile, gdal.GA_ReadOnly)

    #Input file dimensions
    xSize = ds.RasterXSize
    ySize = ds.RasterYSize

    #Output file dimensions
    outXSize = xSize//xlooks
    outYSize = ySize//ylooks

    #Temporary filenames
    inmemfile = '/vsimem/cpxlooks.2band.vrt'
    inmemfile2 = '/vsimem/cpxlooks.multilooks.2band.vrt'
    inmemfile3 = '/vsimem/cpxlooks.combine.vrt'
    
    ##This is where we convert it to real bands and multilook
    #Create driver
    driver = gdal.GetDriverByName('VRT')
    rivrtds = driver.Create(inmemfile,xSize, ySize, 0)
    
    #Create realband
    options = ['subClass=VRTDerivedRasterBand',
               'PixelFunctionType=real',
               'SourceTransferType=CFloat32']
    rivrtds.AddBand(gdal.GDT_Float32, options)
    rivrtds.GetRasterBand(1).SetMetadata({'source_0' : sourcexml}, 'vrt_sources')
    
    #Create imagband
    options = ['subClass=VRTDerivedRasterBand',
               'PixelFunctionType=imag',
               'SourceTransferType=CFloat32']
    rivrtds.AddBand(gdal.GDT_Float32, options)
    rivrtds.GetRasterBand(2).SetMetadata({'source_0' : sourcexml}, 'vrt_sources')
    
    ##Add projection information
    rivrtds.SetProjection(ds.GetProjection())
    ds = None
    

    ##Set up options for translation
    gdalTranslateOpts = gdal.TranslateOptions(format='VRT', 
                                              width=outXSize, height=outYSize,
                                             srcWin=[0,0,outXSize*xlooks, outYSize*ylooks],
                                             noData=noData, resampleAlg=method)

    #Apply multilooking on real and imag channels
    mlvrtds = gdal.Translate(inmemfile2, rivrtds, options=gdalTranslateOpts)
    rivrtds = None
    mlvrtds = None
        
    #Write from memory to VRT using pixel functions
    mlvrtds = gdal.OpenShared(inmemfile2)
    cpxvrtds = driver.Create(inmemfile3, outXSize, outYSize, 0)
    cpxvrtds.SetProjection(mlvrtds.GetProjection())
    cpxvrtds.SetGeoTransform(mlvrtds.GetGeoTransform())


    options = ['subClass=VRTDerivedRasterBand',
               'pixelFunctionType=complex',
               'sourceTransferType=CFloat32']
    xmltmpl = '''    <SimpleSource>
      <SourceFilename>{0}</SourceFilename>
      <SourceBand>{1}</SourceBand>
    </SimpleSource>'''
    
    md = {'source_0': xmltmpl.format(inmemfile2, 1),
          'source_1': xmltmpl.format(inmemfile2, 2)}

    cpxvrtds.AddBand(gdal.GDT_CFloat32, options)
    cpxvrtds.GetRasterBand(1).SetMetadata(md, 'vrt_sources')
    mlvrtds = None
        
        
    ###Now create copy to format needed
    driver = gdal.GetDriverByName(fmt)
    outds = driver.CreateCopy(outfile, cpxvrtds)
    cpxvrtds = None
    
    outds = None
    gdal.Unlink(inmemfile)
    gdal.Unlink(inmemfile2)
    gdal.Unlink(inmemfile3)
            

In [None]:
import numpy as np
import os

#Multilook the coherence file and then visualize the output
multiLookCpx('stripmap/interferogram/topophase.flat.vrt', 'stripmap/flattened_3x.tif', 
          xlooks=3, ylooks=3, noData='0')

orig, origext = loadData('stripmap/interferogram/topophase.flat.vrt')
grid, gridext = loadData('stripmap/flattened_3x.tif')

plt.figure('Multilook 2')
plt.subplot(1,2,1)
plt.imshow(np.angle(orig), clim=[-np.pi, np.pi], extent=origext, cmap='hsv')
plt.subplot(1,2,2)
plt.imshow(np.angle(grid), clim=[-np.pi, np.pi], extent=gridext, cmap='hsv')
plt.show()

orig = None
grid = None

## Other features to keep an eye on

1. **gdaldem** is an utility that allows one to apply color palettes to raster images. Very fast and can use custom color palettes. Compatible with GMT's cpt files. (http://www.gdal.org/gdaldem.html)

2. **gdal_rasterize** allows users to rasterize shapefiles / vector formats. (http://www.gdal.org/gdal_rasterize.html)

3. **gdal_edit.py** allows users to edit raster metadata on the command line. (http://www.gdal.org/gdal_edit.html)