<center>
    <img src='./img/nsidc_logo.png'/>
</center>


# **Download, Crop, Resample and Plot Multiple GeoTIFFs**


## **1. Tutorial Overview**

This tutorial guides us through programmatically accessing and downloading NSIDC DAAC data to our local computer. Then cropping and resampling one GeoTIFF based on the extent and and pixel size of another GeoTIFF, then plotting one on top of the other. 

We use two data sets from the NASA [MEaSUREs](https://nsidc.org/data/measures) (Making Earth System data records for Use in Research Environments) program as an example:

* [MEaSUREs Greenland Ice Mapping Project (GrIMP) Digital Elevation Model from GeoEye and WorldView Imagery, Version 2](https://nsidc.org/data/nsidc-0715/versions/2)
* [MEaSUREs Greenland Ice Velocity: Selected Glacier Site Velocity Maps from InSAR, Version 4](https://nsidc.org/data/nsidc-0481/versions/4)

### **Credits**

Jennifer Roebuck contributed to this tutorial

For questions regarding the notebook, or to report problems, please create a new issue in the [NSIDC-Data-Tutorials repo](https://github.com/nsidc/NSIDC-Data-Tutorials/issues).

### **Objectives** 

1. Use the `earthaccess` library for authentication and to programmatically apply spatial and temporal filters to an NSIDC DAAC data set and download the matching files. 
2. Use the `gdal` and `osr` modules from the `osgeo` package to crop and resample one GeoTIFF based on the extent and pixel size of another GeoTIFF.
3. Use `rasterio` and `matplotlib` libraries to overlay one GeoTIFF on top of another.


### **Prerequisites**

To run this tutorial we will need an Earthdata Login for authentication and downloading the data. It is completely free. If we don't have one, we can register for one [here](https://urs.earthdata.nasa.gov/). We recommend using a .netrc file for storing our Earthdata Login username and password, instructions for setting one up can be found in Step 1 in this [Programmatic Data Access Guide](https://nsidc.org/data/user-resources/help-center/programmatic-data-access-guide#anchor-0). If we don't want to set one up, we will be prompted for our username and password during the tutorial.

A basic understanding of python may also be helpful for this tutorial.

### **Example of end product**

At the end of this tutorial, we will have produced a figure similar to the one below, which overlays velocity data on top of a digital elevation model:

<center>
    <img src='./img/example_geotiff_plot.png'/>
</center>
    

### **Time requirement**

This tutorial will take approximately 30 minutes to complete. 


## **2. Tutorial Steps**

### **Import libraries and classes**

We will use the following libraries:
1. `earthaccess` to authenticate, search and download NSIDC DAAC data 
2. `os` to list all the files we have downloaded 
3. `osgeo.gdal` to crop and resample one of the GeoTIFFs 
4. `rasterio`, `affine`, and `numpy` to read the GeoTIFFs and set up a grid for plotting the data. 
5. `matplotlib` for plotting the data. 

In [None]:
import earthaccess
import os
from osgeo import gdal, osr
import rasterio
import matplotlib.pyplot as plt
import numpy as np
from affine import Affine

### **Authentication**

We need to set up our authentication using our Earthdata Login credentials. If we have a .netrc we can just run the cell below and it will automatically authenticate. If we don't have a .netrc we will be prompted for our Earthdata Login username and password. 

In [None]:
auth = earthaccess.login()

### **Search for data using spatial and temporal filters**
This tutorial assumes we already know which data sets we would like to download, that have data in GeoTIFF format. Each data set at NSIDC has a data set ID associated with it. We will look at two data sets focused on Greenland, a Digital Elevation Model (DEM) and velocity at glacier sites and will use the `earthaccess` library and the following filters to search for granules within these data sets:

* `short_name` - this is the data set ID e.g., NSIDC-0715, NSIDC-0481. Can be found in the data set title on the data set landing page
* `version` - data set version number, also included in the data set title
* `cloud_hosted` - NSIDC is in the process of migrating data sets to the cloud. The data sets we are interested in are currently still archived on-prem so will set this to False.
* `bounding_box` - sets a spatial filter by specifying latitude and longitude in the following order: W, S, E, and N.
* `temporal` - sets a temporal filter by specifying a start and end date in the format YYYY-MM-DD.
* `count` - this sets the maximum number of granules that will be returned in the search

In [None]:
#Search for DEM files
results_dem = earthaccess.search_data(
    short_name='NSIDC-0715',
    version='2',
    cloud_hosted=False,
    bounding_box=(-33.45,68.29,-31.41,69.26),
    temporal=('2015-12-01','2015-12-31'),
    count=100
)

#Search for velocity data 
results_vel = earthaccess.search_data(
    short_name='NSIDC-0481',
    version='4',
    cloud_hosted=False,
    bounding_box=(-33.45,68.29,-31.41,69.26),
    temporal=('2017-01-01','2017-12-31'),
    count=100
)

### **Download the data**
Now we have found granules that meet our search criteria we can download them to an 'outputs' folder using `earthaccess`. Note that for these particular data sets within each granule there are multiple files. So even though 1 granule was found for the DEM data set, 6 files will be downloaded. 

In [None]:
#Set up an outputs folder to download the data to
path = str(os.getcwd() + '/outputs')
if not os.path.exists(path):
    os.mkdir(path)

#Download the DEM granules 
dem_files = earthaccess.download(results_dem, path)

#Download the velocity granules
vel_files = earthaccess.download(results_vel, path)


### **Check the files that have been downloaded**
We will list all the DEM and velocity files that were downloaded, as this is needed for the next steps. 

In [None]:
dir_list = os.listdir(path)

print('Files in ', path)

for x in dir_list:
    if x.endswith('.tif'):
        print(x)

### **Select and read in the DEM file and velocity file**

Based on the list of filenames above, we will select the files that we wish to plot and input them into the cell below. We will use the 'browse' file of the DEM tile, as that provides the best continuous surface for visual display. For the velocity we will plot the velocity magnitude, which is denoted by 'vv' in the filename, and we will plot the velocity covering the time period 07 August to 18 August 2017. 

We will be cropping the DEM file to the extent of the velocity file, so we will also set a filename for the cropped DEM file. 

In [None]:
vel_fp = str(path + '/TSX_E68.80N_07Aug17_18Aug17_19-41-22_vv_v04.0.tif')

dem_fp = str(path + '/tile_4_2_30m_browse_v02.0.tif')

dem_crop = str(path + '/dem_crop_100.tif')

### **Crop and resample DEM file based on velocity file extent and pixel size**

We will use `gdal` to read the velocity file and get the extent and pixel size, and we will use `osr` to get the projection information. We will then use this information to crop and downsample the DEM file.

In [None]:
vel_raster = gdal.Open(vel_fp)
geoTransform = vel_raster.GetGeoTransform()
proj=osr.SpatialReference(wkt=vel_raster.GetProjection())
epsg = 'EPSG:' + proj.GetAttrValue('AUTHORITY',1)

pixelSizeX = geoTransform[1]
pixelSizeY = geoTransform[5]
minx = geoTransform[0]
maxy = geoTransform[3]
maxx = minx + pixelSizeX * vel_raster.RasterXSize
miny = maxy + pixelSizeY * vel_raster.RasterYSize

kwargs = { 'format': 'GTiff', 'outputBounds': [minx, miny, maxx, maxy], 'outputBoundsSRS': epsg, 'xRes': pixelSizeX, 'yRes': pixelSizeY}
ds = gdal.Warp(dem_crop, dem_fp, **kwargs)
ds=None

### **Set up the grids to plot the DEM and velocity data**
To plot the cropped and downsampled DEM with the velocity data, we need to set up a grid. 

In [None]:
# read in subsetted and resampled DEM
dem_src = rasterio.open(dem_crop)

# print out metadata information
for k in dem_src.meta:
    print(k,dem_src.meta[k])

# Retrieve the affine transformation
if isinstance(dem_src.transform, Affine):
     transform = dem_src.transform
else:
     transform = dem_src.affine

N = dem_src.width
M = dem_src.height
dx = transform.a
dy = transform.e
minx = transform.c
maxy = transform.f

# Read the image data, flip upside down if necessary
dem_crop_in = dem_src.read(1)
if dy < 0:
    dy = -dy
    dem_crop_in = np.flip(dem_crop_in, 0)

#Uncomment the line below if you wish to see the min/max DEM values
#print('Data minimum, maximum = ', np.amin(data_in), np.amax(data_in))

# Generate X and Y grid locations
xdata = minx + dx/2 + dx*np.arange(N)
ydata = maxy - dy/2 - dy*np.arange(M-1,-1,-1)

dem_extent = [xdata[0], xdata[-1], ydata[0], ydata[-1]]

### read in the velocity data
vel_data = rasterio.open(vel_fp)

for k in vel_data.meta:
    print(k,vel_data.meta[k])

# Retrieve the affine transformation
if isinstance(vel_data.transform, Affine):
     transform = vel_data.transform
else:
     transform = vel_data.affine

N2 = vel_data.width
M2 = vel_data.height
dx2 = transform.a
dy2 = transform.e
minx2 = transform.c
maxy2 = transform.f

# Read the image data, flip upside down if necessary
vel_in = vel_data.read(1)
if dy2 < 0:
    dy2 = -dy2
    vel_in = np.flip(vel_in, 0)

#Uncomment the line below if you wish to see the min/max velocity values
#print('Data minimum, maximum = ', np.amin(vel_in), np.amax(vel_in))

# Generate X and Y grid locations
xdata2 = minx2 + dx2/2 + dx2*np.arange(N2)
ydata2 = maxy2 - dy2/2 - dy2*np.arange(M2-1,-1,-1)

vel_extent = [xdata2[0], xdata2[-1], ydata2[0], ydata2[-1]]

#Need to mask the no data values in the velocity data
vel_masked = np.ma.masked_where(vel_in == -1.0, vel_in, copy=True)

### **Plot the DEM and velocity data**
Now we can plot the DEM with the velocity data on top. We will set the transparency of the velocity layer so we can see the DEM underneath. There is also an option to save the figure as .png, we can uncomment the last line if we want to save the image.

In [None]:
%matplotlib inline
plt.figure(figsize=(8,8))
fig = plt.imshow(dem_crop_in, extent=dem_extent, origin='lower', cmap='gray')
fig2 = plt.imshow(vel_masked, extent=vel_extent, origin='lower', cmap='terrain', alpha=0.8)
plt.title('Velocity and DEM')
plt.xlabel('X (km)')
plt.ylabel('Y (km)')
cb = plt.colorbar(fig2, shrink=0.5)
cb.set_label('Velocity Magnitude (m/yr)')

#Option to save the figure
#plt.savefig("velocity.png", dpi=300, bbox_inches='tight', pad_inches=0.5)

## **3. Learning Outcomes**

* Search and download NSIDC DAAC data using `earthaccess`
* Crop and resample GeoTIFF using `gdal`
* Overlay one GeoTIFF on another in a plot using `matplotlib`

## **4. Additional Resources**

* Further details on the `earthacess` library can be found [here](https://github.com/nsidc/earthaccess)
* Further details on data available from NSIDC can be found [here](https://nsidc.org/data)