# Downloading and processing elevation data for a study site

Many problems which we might attempt to tackle with GIS (Grographic Information Systems) require elevation data for the area of study. This might be to inform a terrain analysis to work out which parts of a valley are visible from the top of a hill to decide the best place to put an antenna, or to model how water will flow over the landscape when planning for flooding scenarios or attempting to limit soil erosion. It could also feed into an ecological model which considers how different groups of plants are adapted to life at different elevations.

The datasets which contain these data are called Digital Elevation Models (DEMs). These take the form of geographical raster data -- georeferenced grids of numbers specifying the elevation at different point on the earth's surface.

- [SRTM](http://doi.org/10.5067/MEaSUREs/SRTM/SRTMGL1.003) (Shuttle Radar Topography Mission) was performed by NASA in 2000
- Obtained using radar signals to build up a 3D image of most of the Earth's land surface
- Now available at a resolution of approximately 30 meters, that is, each number in the returned dataset represents the elevation a 30 m${}^2$ square of the Earth's surface, in a manner analogous to how a pixel in a digital photograph might represent an area of 2 cm${}^2$ on the surface of the subject's face.

In this notebook we'll first define some functions to help us deal with the geometry of the problem of selecting the data around a point of interest.

use the Python module [`elevation`](https://pypi.org/project/elevation/#description) to do most of the hard work in selecting which data tile(s) to download, stitching them together for us if needs be.

In [None]:
from pathlib import Path
import logging
import os
import subprocess
from typing import Iterable, List, Union, NamedTuple, Tuple

import requests

import unidecode

import numpy as np
import pandas as pd

import geopandas as gpd
import georasters as gr
import elevation
import nlmpy
from pyproj import Proj, transform

import matplotlib
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
%matplotlib inline

## Setting up the geometry of the problem
Objective is to obtain a method which will go through each of my study sites and download STRM30 data whose bounding box encompasses a circle, with a specified radius, around each study site. 

Parameters:
`tgt_crs`: the CRS in which the bounding circle will be contained 
`r`: the radius of the circle in the units of the target crs

### Size of DEM to download

- Total area of circle surrounding pollen extraction point (for use with LRA) is 30 km${}^2$. 
- Decided in meeting with James on 15/5/18 that I should simulate an area of perhaps 150km${}^2$ to provide a buffer and assist with accounting for boundary conditions. Fires and seed sources will come from outside
- State that one of my big assumptions is that the area whose vegetation proportions are known from the LRA is representative of the larger area around it. 

<img src="img/study-site-loc-schematic.png" width=350>

**Study site location schematic**

Geometric construction of the problem of finding the bounding box around study site locations needed to provide the required raster layers for my simulations. Point $P$ is the location from which the sediment core used to derive pollen time series for the site was extracted according to the EPD. $a$ is the radius of the circle from which it is assumed pollen has contributed to the sediment core -- the *experimental zone*. $\beta$ is a buffer parameter which controls the area around the experimental zone which will also be included in the simulation to help account for edge effects. Points $A$ and $B$ are, respectively, the points of minimum and maximum latitude and longitude defining the bounding box around the study site.

In [None]:
def experimental_zone_radius(area):
    """Given a required area, return the radius of the experimental zone.
    
    Args:
        area (int, float): The experimental zone's area.
        
    Returns:
        float: The experimental zone's radius.
        """
    return np.sqrt(area/np.pi)

In [None]:
def study_site_bbox(p_coords, a, beta=1.0):
    """Get the coordinates specifying the bounding box around a study site.
    
    Args:
        p_coords (tuple, list): (x, y) coordinates of the point which the 
            bounding box should surround.
        a (float): Radius of the experimental zone around the study site.
        beta (Optional[float]): Paramater controlling the size of the buffer
            around the experimental zone. Defaults to 1.0, meaning the buffer
            will be the same as the radius, a.

    Returns:
        list: The bounding box coordinates in the form 
            [minx, miny, maxx, maxy]
    """
    if len(p_coords) != 2:
        raise ValueError('Bounding box must surround a point specified in '\
                         'exactly 2-dimensions')
    
    try:
        point_x = p_coords[0]
        point_y = p_coords[1]
    except Exception as e:
        print('Could not extract point coordinates')
        
    delta = a*(1 + beta)
    minx = point_x - delta
    miny = point_y - delta
    maxx = point_x + delta
    maxy = point_y + delta
    
    return [minx, miny, maxx, maxy]    

## Handling Coordinate Reference Systems

- SRTM data is in WGS84 geographical coordinates (latitude and longitude, possibly familiar from using Google maps). Units are angles
- Usually when working with a specific study site, we'll want to work in a local coordinate reference system whose units are a measure of distance, such as meters.
- In the UK, we'd use the [British National Grid](https://en.wikipedia.org/wiki/Ordnance_Survey_National_Grid) reference system

- First define functions which, given a tuple specifying a coordinate, can convert to or from WGS84


In [None]:
def from_wgs84(coord, tgt_epsg_no):
    """Convert coordinates from WGS84 to specified target crs.
    
    Args:
        coord (tuple, list): (x, y) coordinate pair in WGS84.
        tgt_epsg_no (int): EPSG numner for target crs
        
    Returns:
        tuple: (x, y) coordinate pair in target crs      
    """
    src = Proj('epsg:4326')
    tgt = Proj('epsg:' + str(tgt_epsg_no))
    
    return transform(src, tgt, coord[0], coord[1], always_xy=True)

In [None]:
def to_wgs84(coord, src_epsg_no):
    """Convert coordinates from specified source crs to WGS84.
    
    Args:
        coord (tuple, list): (x, y) coordinate pair in source crs.
        src_epsg_no (int): EPSG numner for source crs
        
    Returns:
        tuple: (x, y) coordinate pair in WGS84.    
    """
    src = Proj('epsg:' + str(src_epsg_no))
    tgt = Proj('epsg:4326')
    return transform(src, tgt, coord[0], coord[1], always_xy=True) 

- Now define a function which will use the `to_wgs84` function defined above to convert all the corners of a bounding box in a different coordinate reference system to WGS84

In [None]:
def bbox_to_wgs84(bbox_coords, src_epsg_no):
    """Convert coordinates specifying bounding box to WGS84 coords.
    
    Args:
        bbox_coords (list, tuple): the coordinates specifying the bounding box
            in the source crs, in the form [minx, miny, maxx, maxy]
        src_epsg_no (int): EPSG numner for source crs
    
    Returns:
        list: coordinates of bounding box in WGS84, in the form
            [minx, miny, maxx, maxy]            
    """
    if len(bbox_coords) != 4:
        raise ValueError('Bounding box must be specified by exactly 4 '\
                         'coords in the form [minx, miny, maxx, maxy]')
    min_vertex = (bbox_coords[0], bbox_coords[1]) 
    max_vertex = (bbox_coords[2], bbox_coords[3]) 
    
    min_vertex_wgs84 = to_wgs84(min_vertex, src_epsg_no)
    max_vertex_wgs84 = to_wgs84(max_vertex, src_epsg_no)
    
    return [min_vertex_wgs84[0], min_vertex_wgs84[1],
            max_vertex_wgs84[0], max_vertex_wgs84[1]]   

## Download data in WGS84

In [None]:
def get_wgs84_elev_data(fname, native_crs_epsg_no, p_coords, a, beta=1.0):
    """Retrieve STRM30 data for study site.   
    
    1. Convert WGS84 coordinates for specified point to target native crs.
    2. Work out coords of bounding box in native crs. Include extra buffer so 
       that returned data, when converted to target crs from wgs84, can be 
       trimmed to have straight edges.
    3. Convert buffered bounding box coords to WGS84.
    4. Get DEM data using `elevation`    
    
    Args:
        fname (str): Filename for retrieved tif file.
        native_crs_epsg_no (int): The EPSG number for the CRS in which buffer
            distances (e.g. a, see below) will be measured.
        p_coords (tuple, list): (x, y) coordinates of the point which the 
            bounding box should surround, in WGS84 coordinates.
        a (float): Radius of the experimental zone around the study site in 
            native spatial units (e.g. meters).
        beta (Optional[float]): Paramater controlling the size of the buffer
            around the experimental zone. Defaults to 1.0, meaning the buffer
            will be the same as the radius, a.        

    Returns: None
    """
    # convert WGS84 lat/lon coordinates to native x, y coordinates
    native_coords = from_wgs84(p_coords, native_crs_epsg_no)
    native_bbox = study_site_bbox(native_coords, a, beta=beta)
    wgs84_bbox = bbox_to_wgs84(native_bbox, native_crs_epsg_no)
    
    # retrienve and clip the SRTM1 30m DEM data for wgs84 bounding box
    # NOTE: elevation.clip fails if output name does not end in .tif
    # as of version 1.0.6. Also it's necessary to give the full file
    # path to output. Otherwise file will end up in elevation's cache
    # silently.
    base_name = fname.split('/')[0]
    tif_name = os.getcwd() + f'/{base_name}_tmp.tif'
    elevation.clip(bounds=wgs84_bbox, output=tif_name)
    os.rename(tif_name, fname)
    
    # clean up stale temporary files and fix the cache in the event of a 
    # server error
    elevation.clean()

## Convert retrieved data to target crs
Now use gdal to convert retrieved tif to target crs, and trim to bounding box which will be used for simulations.

In [None]:
def warp_elev_data(in_fname, out_fname, native_crs_epsg_no, p_coords, 
                   a, beta=1.0):
    """Reproject and trim WGS84 data for study site.
    
    For a local .tif file containing data for a study site located at 
    p_coords (where p_coords is in WGS84 coordinates), reproject that data
    into the crs specified by native_crs_epsg_no. Then trim it to the bounding
    box specified by a and beta (see dicstring for `study_site_bbox`).
    
    Args:
        in_fname (str): Filename for retrieved tif file.
        out_fname (str): Filename for converted tif file.
        native_crs_epsg_no (int): The EPSG number for the CRS in which buffer
            distances (e.g. a, see below) will be measured, and which the data 
            will be transformed to.
        p_coords (tuple, list): (x, y) coordinates of the point which the 
            bounding box should surround, in WGS84 coordinates.
        a (float): Radius of the experimental zone around the study site in 
            native spatial units (e.g. meters).
        beta (Optional[float]): Paramater controlling the size of the buffer
            around the experimental zone. Defaults to 1.0, meaning the buffer
            will be the same as the radius, a.        

    Returns: None
    """
    # convert WGS84 lat/lon coordinates to native x, y coordinates
    native_coords = from_wgs84(p_coords, native_crs_epsg_no)
    native_bbox = study_site_bbox(native_coords, a, beta=beta)
    
    # specify parameters to be passed to gdalwarp in an external process:
    param = ['gdalwarp', in_fname, out_fname, '-overwrite',
             '-s_srs', 'EPSG:4326', 
             '-t_srs', 'EPSG:' + str(native_crs_epsg_no),
             '-te', str(native_bbox[0]), str(native_bbox[1]), 
             str(native_bbox[2]), str(native_bbox[3])]
    
    subprocess.check_call(' '.join(param), shell=True)

## Automate process of downloading and transforming CRS
This is the end goal of our development so far: a single function which takes the lat/lon coordinates of our study site, creates a bounding box around it accounting for its experimental radius and a buffer zone, and downloads, trims and converts the data to our target coordinate reference system

In [None]:
def get_elev_data(fname, tgt_crs_epsg_no, p_coords, a, beta=1.0):
    """Retrieve and process STRM30 data.
    
    Data is centered on the point p_coords, specified in WGS84 coordinates,
    but returned data is in the crs specified by tgt_crs_epsg_no.
    
    Args:
        fname (str): Filename for resulting tif file.
        tgt_crs_epsg_no (int): The EPSG number for the CRS in which the data
            will be returned.
        p_coords (tuple, list): (x, y) coordinates of the point which the 
            bounding box should surround, in WGS84 coordinates.
        a (float): Radius of the experimental zone around the study site in 
            native spatial units (e.g. meters).
        beta (Optional[float]): Paramater controlling the size of the buffer
            around the experimental zone. Defaults to 1.0, meaning the buffer
            will be the same as the radius, a.        

    Returns: None
    """
    fname = os.path.join(os.getcwd(), fname)
    tmp_fname = fname + '.tmp'
    # add 50% onto buffer to insure against missing data when converting to 
    # target crs
    get_wgs84_elev_data(tmp_fname, tgt_crs_epsg_no, p_coords, a, beta=beta*1.5)
    warp_elev_data(tmp_fname, fname, tgt_crs_epsg_no, p_coords, a, beta=beta)
    os.remove(tmp_fname)
    print('Finished processing ' + fname)

## Load get data for all study sites

In [None]:
pwd = os.getcwd().split('/')[-1]
OUTPUT_DIR = Path('../outputs') if pwd == 'dem-derived' else Path('outputs')

In [None]:
sites = (
    pd.read_csv(OUTPUT_DIR / 'site_location_info.csv')
    .assign(sitecode=lambda df: (
        df['sitename']
        .apply(unidecode.unidecode)
        .str.replace(' ', '_')
        .str.lower()
    ))
    .set_index('sitecode')
)

sites

In [None]:
exp_radius = experimental_zone_radius(area=30000000) # 30 km^2
print('Downloading data for study sites with experimental zone radius: '
      + str(exp_radius)+ ' m')

for site_code, row in sites.iterrows():
    site_dir = OUTPUT_DIR.resolve() / site_code
    site_dir.mkdir(exist_ok=True)
    get_elev_data(
        site_dir / 'dem.tif',
        2062,
        (row['londd'], row['latdd']), 
        exp_radius,
    )

print('Done.')

### Make plot of all study sites' DEMs

In [None]:
class DEMData(NamedTuple):
    sitename: str
    dem: gr.georasters.GeoRaster
    
dem_data = [
    DEMData(row[1]['sitename'], gr.from_file(str(OUTPUT_DIR / row[0] / 'dem.tif')))
    for row in sites.iterrows()
]

In [None]:
min_val, max_val = get_min_max_raster_values(dem_data)
fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(12,9))
for site_no, site in enumerate(dem_data):
    ax = axes[site_no // 3][site_no % 3]
    im = ax.matshow(site.dem.raster)
    ax.set_title(sites.iloc[site_no]['sitename'], fontsize=16)
    ax.set_axis_off()
    divider = make_axes_locatable(ax)
    cax = divider.append_axes("right", size="5%", pad=0.1)
    cbar = fig.colorbar(im, ax=ax, cax=cax)
    if (site_no + 1) % 3 == 0:
        cbar.ax.set_ylabel('Elevation [m]', rotation=270, fontsize=12)
        cbar.ax.get_yaxis().labelpad = 18

plt.tight_layout()
try:
    fig.savefig("img/ssite-DEM-overview.pdf", bbox_inches='tight')
except FileNotFoundError:
    logging.warning('Could not write plot of DEMs') 