# Grid Data Extraction

We will extract the DEM's for each grid in the filtered grid shapefile.

1. Open the ArcticDEM Strip Shapefile
2. Open the filtered grid shapefile
3. Filter the Strip shapefile to only include DEMs which intersect the grids (this reduces the Strip count to 378)

## Imports

In [4]:
!pip install pandarallel --user

Collecting pandarallel
  Downloading https://files.pythonhosted.org/packages/5b/27/bfa87ad2c5c202834683db647084fe6908e5e6f54c67270fc3544beabffa/pandarallel-1.4.5.tar.gz
Building wheels for collected packages: pandarallel
  Building wheel for pandarallel (setup.py) ... [?25ldone
[?25h  Created wheel for pandarallel: filename=pandarallel-1.4.5-cp36-none-any.whl size=16025 sha256=e5047dc8bc1788804f267159b9a7df9c6a33a613c0f7d4a29b2786eac87ae657
  Stored in directory: /home/jovyan/.cache/pip/wheels/fa/70/d9/6a27d7fdddb6a7c10af68fffaf6f0a96846c750a840280f7bc
Successfully built pandarallel
Installing collected packages: pandarallel
Successfully installed pandarallel-1.4.5


In [2]:
# GIS
import geopandas as gpd
import rasterio as rio
from rasterio.mask import mask
import json

# Multiprocessing 
from functools import partial
from multiprocessing import Pool
from pandarallel import pandarallel

# General Use
import os
import pandas as pd
import numpy as np
import glob

# Opening files which fail to open
import tarfile
import gzip
import urllib

## Create a new index

We will create an index of the DEM strips using this new filtered shapefile.

In [7]:
if os.path.exists('../../data/rasters.pkl'):
    index = pd.read_pickle('../../data/rasters.pkl')
    print('Read index from file')
else:
    print('Creating new index from shapefile.')
    index = gpd.read_file('../../data/Filtered_ArcticDEM_Strip_Index_Rel7/ArcticDEM_Strip_Index_Rel7.shp')
    index = index.set_index('name', drop=True)
    index = index[['acquisitio', 'fileurl', 'dx', 'dy', 'dz', 'geometry', 'sensor1']]
    index.to_pickle('../../data/rasters.pkl')

Read index from file


In [8]:
index.head()

Unnamed: 0_level_0,acquisitio,fileurl,dx,dy,dz,geometry
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
SETSM_GE01_20120812_10504100007CE100_1050410000778700_seg1_2m_v3.0,2012-08-12,http://data.pgc.umn.edu/elev/dem/setsm/ArcticD...,0.188,-0.399,-0.867,"POLYGON ((-2087418.000 669438.000, -2087130.00..."
SETSM_GE01_20120812_10504100007A7300_1050410000751500_seg1_2m_v3.0,2012-08-12,http://data.pgc.umn.edu/elev/dem/setsm/ArcticD...,-0.98,-0.374,1.334,"POLYGON ((-2086408.000 672442.000, -2086400.00..."
SETSM_GE01_20120813_1050410000870B00_1050410000847E00_seg1_2m_v3.0,2012-08-13,http://data.pgc.umn.edu/elev/dem/setsm/ArcticD...,-1.759,-2.802,0.454,"POLYGON ((-2087612.000 670136.000, -2087604.00..."
SETSM_WV01_20130411_1020010022450500_1020010021AB8500_seg1_2m_v3.0,2013-04-11,http://data.pgc.umn.edu/elev/dem/setsm/ArcticD...,-0.496,-0.16,-0.306,"POLYGON ((-2168618.000 787874.000, -2168594.00..."
SETSM_WV01_20140708_102001002F474A00_102001003026E300_seg1_2m_v3.0,2014-07-08,http://data.pgc.umn.edu/elev/dem/setsm/ArcticD...,-0.14,0.114,-1.551,"POLYGON ((-2162648.000 786644.000, -2161808.00..."


## Load Grid Shapefile

In [9]:
grids = gpd.read_file('../../data/shapefiles/grid_shapefile/1km_filtered/filtered.shp')
grids['id'] = grids['id'].astype(int)
grids = grids.set_index('id', drop=True)

## Extract Each Grid's Data

We will iterate over the Strip frame, find each grid that it intersects, and extract the data for those grids.

#### Function to Find Raster-Grid Intersections

In [10]:
def find_grid_intersections(raster):
    '''
    Given a row (Strip raster) of the index, this function returns a list of the grids it intersects.
    '''
    
    intersection = []
    for _, grid in grids.iterrows():
        if grid['geometry'].intersects(raster['geometry']):
            intersection.append(grid)
    return intersection

#### Function to Download DEM to Temp Storage, Open, and Delete (Backup Option)

This function will only run if rasterio fails to open the DEM remotely.

In [1]:
def open_dem(raster):
    '''
    downloads file to temporary storage and then extracts the DEM, deleting afterwards
    This is used if opening with RasterIO fails    
    '''
    tempfile = urllib.request.urlretrieve(raster['fileurl'], filename=None)[0]
    
    tar = tarfile.open(tempfile)
    tar.extract(raster.name + '_dem.tif')
    src = rio.open('./' + raster.name + '_dem.tif')
    os.remove('./' + raster.name + '_dem.tif')
    return src

#### Function to Create Grid Rasters

In [2]:
def mask_grids(raster, overwrite=False):
    rio_url = 'tar+' + raster['fileurl'] + '!' + raster.name + '_dem.tif'
    try:
        src = rio.open(rio_url)
    except:  # sometimes the file fails to open
        src = open_dem(raster)
  
    for grid in find_grid_intersections(raster):
        out_dir = './data/grids_01_30_2020/' + str(grid.name) + '/'
        if not os.path.exists(out_dir):
            os.makedirs(out_dir)
    
        geo = gpd.GeoDataFrame({'geometry': grid['geometry']}, index=[0], crs=src.crs)
        geo = [json.loads(geo.to_json())['features'][0]['geometry']]
        
        out_img, out_transform = mask(src, shapes=geo, crop=True)
        out_img = np.squeeze(out_img)

        out_meta = src.meta.copy()
        out_meta.update({'driver':'GTiff',
                         'height': 501,
                         'width': 501,
                         'transform': out_transform,
                         'crs': src.crs
                        })
        
        
        msk = np.ma.masked_equal(out_img, src.nodata)
        if np.all(msk.mask):  # If all of the values are True (nodata)
            outfile = out_dir + raster.name + '_NODATA.tif'
        else:
            outfile = out_dir + raster.name + '_dem.tif'

        with rio.open(outfile, 'w', **out_meta) as dst:
            dst.write(out_img, 1)
    return

## Check those failed to open by RasterIO

Some of the rasters failed to open by RasterIO, they are marked as RASTER_NAME + '.txt' in the data directory.

#### Sample Error Message
RasterioIOError: '/vsitar/vsicurl/http://data.pgc.umn.edu/elev/dem/setsm/ArcticDEM/geocell/v3.0/2m/n69w156/SETSM_WV01_20130411_1020010022450500_1020010021AB8500_seg1_2m_v3.0.tar.gz/SETSM_WV01_20130411_1020010022450500_1020010021AB8500_seg1_2m_v3.0_dem.tif' not recognized as a supported file format.

In [24]:
failed = glob.glob('../../data/grids/*.txt')
failed = [x[x.rfind('/')+1:x.rfind('.')] for x in failed]
print(len(failed))
fail_index = index.loc[failed].reset_index()


38


In [27]:
fail_index['sensor'] = fail_index['name'].apply(lambda x: x.split('_')[1])

In [28]:
fail_index['sensor'].value_counts()

WV01    23
WV02     9
W1W2     2
W1W1     2
W2W2     1
WV03     1
Name: sensor, dtype: int64