# Automated Landsat 8 image bulk download and preprocessing

_Last modified 2022-05-06._

This script is run to download Landsat images over the glaciers available through the AWS s3 bucket and process the images prior to the Wavelet Transform Modulus Maxima (WTMM) segmentation analysis that produces the calving front delineations.

First, set up your AWS profile following these steps:

    aws configure --profile name_of_profile

The workflow is streamlined to analyze images for hundreds of glaciers, specifically, the marine-terminating glaciers along the periphery of Greenland. Sections of code that may need to be modified are indicated as below:

    ##########################################################################################

    Code that must be modified.

    ##########################################################################################

Keep a record of the csv file names generated throughout the preprocessing as many of them will be used later for analysis.

__Terminal command line software requirements:__

 - Follow instructions at https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html to get required __aws__ commands onto your command terminal.
 - Uses GDAL command line functions. Make sure __gdal__ is installed properly.
 - Uses image magick command line functions. Download instructions available at https://imagemagick.org/script/download.php.
 

__Outline of steps:__
    1. Set-up: import packages, set paths, and enter glaciers IDs
    2. Find all the Landsat footprints that overlap the glaciers
    3. Download Landsat metadata (*MTL.txt) files from AWS for all overlapping scenes
    4. Calculate cloud % over terminus box using Landsat quality band
    5. Create buffer zone around terminus boxes and rasterize terminus boxes
    6. Download non-cloudy Landsat images from AWS
    7. Calculate weighted average glacier flow direction using velocity data
    8. Rotate all images by flow direction
    9. Crop all images to the same size

# 1) Set-up: import packages, set paths, and enter glaciers IDs

In [73]:
import numpy as np
import pandas as pd
import scipy
import math
import subprocess
import os
import shutil
import datetime
import cv2
from PIL import Image
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import glob

# geospatial packages
import fiona
import geopandas as gpd
from shapely.geometry import Polygon, Point, LineString
import shapely
from matplotlib.pyplot import imshow
import rasterio as rio

# Enable fiona KML file reading driver
gpd.io.file.fiona.drvsupport.supported_drivers['KML'] = 'rw'

# import necessary functions from automated-glacier-terminus.py
from automated_terminus_functions import distance, resize_pngs

In [33]:
# change display width if desired
from IPython.display import display, HTML

display(HTML(data="""
<style>
    div#notebook-container    { width: 95%; }
    div#menubar-container     { width: 65%; }
    div#maintoolbar-container { width: 99%; }
</style>
"""))

### AWS configuration:

In [36]:
! aws s3 ls --profile terminusmapping

In [102]:
# AWS settings
from rasterio.session import AWSSession
import pickle
import boto3
import boto3.session

# cred = boto3.Session().get_credentials()
# ACCESS_KEY = cred.access_key
# SECRET_KEY = cred.secret_key
# SESSION_TOKEN = cred.token  ## optional
ACCESS_KEY = 'AKIA4CNBTBJH2TTAGWEW'
SECRET_KEY = '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'

s3client = boto3.client('s3', 
                        aws_access_key_id = ACCESS_KEY, 
                        aws_secret_access_key = SECRET_KEY, 
#                         aws_session_token = SESSION_TOKEN
                       )

# response = s3client.get_object(Bucket='name_of_your_bucket', Key='path/to_your/file.pkl')
# body = response['Body'].read()
# data = pickle.loads(body)
 
######################################################################################
# path to the collection on AWS usgs-landsat s3 bucket:
collectionpath = 'collection02/level-1/standard/' # collection 2 level 1 data being used
######################################################################################

### Define paths, satellites, geographic projections:

In [138]:
######################################################################################
# ADJUST THESE VARIABLES:
basepath = '/home/jukes/Documents/Sample_glaciers/' # folder containing the terminus box and RGI glacier outline shapefile(s)
downloadpath ='/media/jukes/jukes1/LS8aws/' # folder to eventually contain downloaded Landsat images'

sats = ['L7','L8'] # enter names of landsat satellites to download images from (e.g., Landsat 7 and Landsat 8)
L8_yrs = np.arange(2013,2022).astype(str) # set target years for L8: 2013-2021
L7_yrs = np.arange(1999,2003).astype(str) # set target years for L7: 1999-2002 (some path rows don't exist in 2003 and are throwing errors)
L8_bands = [8] # panchromatic band for L8
L7_bands = [8] # panchromatic band for L7

repopath = '/home/jukes/automated-glacier-terminus/' # path to this repository
os.chdir(repopath) # change directories to this repo

source_srs = '3413' # EPSG code for the current projection of the glacier shapefiles (3413 = Greenland polar stereo)

csvext = '_updated_test_Box008.csv' # enter a file suffix for the CSV files produced that describes the analysis (e.g., glacier or group of glaciers)

RGIpath = '/media/jukes/jukes1/RGI_shps/' # path to folder with all RGI glacier outline shapefiles
boxespath = '/media/jukes/jukes1/Boxes_individual/' # path to folder with all the glacier terminus box shapefiles
######################################################################################

### Enter in the glacier BoxIDs:

The Greenland peripheral glacier terminus boxes were referenced using their 3 digit BoxID: Box###.
For other glaciers, replace this code with a list of IDs corresponding to the glaciers and corresponding shapefiles (e.g. BoxHelheim.shp). 

In [39]:
######################################################################################
BoxIDs = []
boxes = list(map(str, np.arange(8, 9, 1))) #1, 642, 1
for BoxID in boxes: # convert integers to 3-digit strings with leading zeros
    BoxID = BoxID.zfill(3)
    BoxIDs.append(BoxID)
print(BoxIDs) # show the final BoxIDs
######################################################################################

['008']


### Create new folders corresponding to these glaciers:

In [168]:
# create new BoxID folders 
for BoxID in BoxIDs:
    # create folder to hold glacier shapefiles
    shapefilepath = basepath+'Box'+BoxID+'/' # path to that folder
    if os.path.exists(shapefilepath):
#         shutil.rmtree(shapefilepath) # remove the old folder
        print("Path exists already for Box", BoxID)
    else:
        os.mkdir(basepath+'Box'+BoxID)
            
    # create folder to hold glacier images (inside downloadpath)
    if os.path.exists(downloadpath+'Box'+BoxID):
        print("Path exists already in LS8aws for Box", BoxID)
    else:
        os.mkdir(downloadpath+'Box'+BoxID)
    
    # Now place terminus box shapefile and RGI glacier outline shapefile into the
    # boxespath folder. Done automatically below for the Greenland peripheral glaciers:
    ######################################################################################
    ID = int(BoxID) # make into an integer in order to grab the .shp files
    
    # if the terminus box shapefile is not in this folder, then move it
    if not os.path.exists(shapefilepath+'Box'+BoxID+'.shp'):
        for filename in os.listdir(boxespath):
            if filename.startswith('BoxID_'+str(ID)):
                shutil.copyfile(boxespath+filename, basepath+'Box'+BoxID+'/Box'+BoxID+filename[-4:])
                print("Box"+BoxID+filename[-4:], "moved")
    else:
        print("Box"+BoxID+'.shp', "already in folder")

    if not os.path.exists(shapefilepath+'RGI_Box'+BoxID+'.shp'): # if the RGI shapfile is not in this folder
        # move RGI glacier outline into the new folder
        for filename in os.listdir(RGIpath):
            if filename.startswith('BoxID_'+str(ID)):
                shutil.copyfile(RGIpath+filename, basepath+'Box'+BoxID+'/RGI_Box'+BoxID+filename[-4:])
                print("RGI_Box"+BoxID+filename[-4:], "moved")
    else:
        print("RGI_Box"+BoxID+'.shp', "already in folder")
    ######################################################################################

Path exists already for Box 008
Path exists already in LS8aws for Box 008
Box008.shp already in folder
RGI_Box008.shx moved
RGI_Box008.shp moved
RGI_Box008.dbf moved
RGI_Box008.shx moved
RGI_Box008.prj moved
RGI_Box008.qpj moved
RGI_Box008.qpj moved
RGI_Box008.shx moved
RGI_Box008.prj moved
RGI_Box008.shx moved
RGI_Box008.prj moved
RGI_Box008.prj moved
RGI_Box008.qpj moved
RGI_Box008.dbf moved
RGI_Box008.shp moved
RGI_Box008.shx moved
RGI_Box008.dbf moved
RGI_Box008.shp moved
RGI_Box008.prj moved
RGI_Box008.dbf moved
RGI_Box008.shp moved
RGI_Box008.dbf moved
RGI_Box008.qpj moved
RGI_Box008.prj moved
RGI_Box008.shx moved
RGI_Box008.prj moved
RGI_Box008.shp moved
RGI_Box008.shx moved
RGI_Box008.shx moved
RGI_Box008.shx moved
RGI_Box008.shp moved
RGI_Box008.prj moved
RGI_Box008.qpj moved
RGI_Box008.shp moved
RGI_Box008.dbf moved
RGI_Box008.prj moved
RGI_Box008.prj moved
RGI_Box008.qpj moved
RGI_Box008.qpj moved
RGI_Box008.shx moved
RGI_Box008.dbf moved
RGI_Box008.shp moved
RGI_Box008.dbf 

# 2) Find all the Landsat footprints that overlap the glaciers

This step requires the WRS-2_bound_world_0.kml file containing the footprints of all the Landsat scene boundaries available through the USGS (https://www.usgs.gov/land-resources/nli/landsat/landsat-shapefiles-and-kml-files). Place this file in your base directory (basepath). 

To check if they overlap the glacier terminus box shapefiles, the box shapefiles must be in WGS84 coordinates (ESPG: 4326). If they are not yet, we use the following GDAL command to reproject them into WGS84:

        ogr2ogr -f "ESRI Shapefile" -t_srs EPSG:NEW_EPSG_NUMBER -s_srs EPSG:OLD_EPSG_NUMBER output.shp input.shp

In [61]:
# Reproject terminus box shapefiles to WGS84 if in a different projection
for BoxID in BoxIDs:
    boxespath = basepath+"Box"+BoxID+"/Box"+BoxID # access the BoxID folders created 
    # construct the gdal command
    rp = "ogr2ogr -f 'ESRI Shapefile' -t_srs EPSG:4326 -s_srs EPSG:"+source_srs+" "+boxespath+"_WGS.shp "+boxespath+".shp"
    print("Command:", rp) # check command
    subprocess.run(rp, shell=True, check=True) # run the command on terminal
    
    # if an error is produced, check the error output on the terminal window that runs this notebook

Command: ogr2ogr -f 'ESRI Shapefile' -t_srs EPSG:4326 -s_srs EPSG:3413 /home/jukes/Documents/Sample_glaciers/Box008/Box008_WGS.shp /home/jukes/Documents/Sample_glaciers/Box008/Box008.shp


In [65]:
# Grab the WGS84 coordinates of the boxes
box_points = {} # dictionary of points
for BoxID in BoxIDs:
    boxpath = basepath+"Box"+BoxID+"/Box"+BoxID # path to the reprojected terminus box
    termbox = fiona.open(boxpath+'_WGS.shp') # open reprojected terminus box
    box = termbox.next(); box_coords=box['geometry']['coordinates'][0] # grab coords
    points = [] # to hold the box vertices
    
    # read coordinates and convert to a shapely object
    for coord_pair in box_coords: 
        lat = coord_pair[0]; lon = coord_pair[1]        
        point = shapely.geometry.Point(lat, lon) # create shapely point 
        points.append(point) # append to points list
        
    box_points.update({BoxID: points}) # update dictionary
    print("Box"+BoxID+" coordinates recorded.") # keep track of progress

Box008 coordinates recorded.


  


In [68]:
######################################################################################
# open the kml file with the Landsat path, row footprints:
WRS = fiona.open(basepath+'WRS-2_bound_world_0.kml', driver='KML') # check the path to the world bounds file
######################################################################################

In [69]:
paths = []; rows = []; boxes = [] # create lists to hold the paths and rows and BoxIDs

#loop through all Landsat scenes (path, row footprints)
for feature in WRS:
    # create shapely polygons from the Landsat footprints
    coordinates = feature['geometry']['coordinates'][0]
    coords = [xy[0:2] for xy in coordinates]
    pathrow_poly = Polygon(coords)
    
    # grab the path and row name from the WRS kml file:
    pathrowname = feature['properties']['Name']  
    path = pathrowname.split('_')[0]; row = pathrowname.split('_')[1]
#     print(path, row)
    
    # for each feature, loop through each of the vertices stored in the dictionary
    for BoxID in box_points:  
        box_points_in = 0 # counter for number of box_points in the pathrow_geom:
        points = box_points.get(BoxID) # grab the points corresponding to the ID
        for i in range(0, len(points)):
            point = points[i]
            if point.within(pathrow_poly): # if the pathrow shape contains the point
                box_points_in = box_points_in+1 # append the counter
        if box_points_in == 5: # if all box vertices are inside the footprint, save the path, row, BoxID
            paths.append('%03d' % int(path))
            rows.append('%03d' % int(row))
            boxes.append(BoxID)

# Store in dataframe
boxes_pr_df = pd.DataFrame(list(zip(boxes, paths, rows)), columns=['BoxID','Path', 'Row'])
boxes_pr_df = boxes_pr_df.sort_values(by='BoxID')
boxes_pr_df # display

Unnamed: 0,BoxID,Path,Row
0,8,31,6
1,8,30,6
2,8,29,6
3,8,28,6
4,8,27,6
5,8,32,5
6,8,31,5


In [78]:
# save to file
PR_FILENAME = 'LS_pathrows'+csvext # save name with common extension
boxes_pr_df.to_csv(path_or_buf = basepath+PR_FILENAME, sep=',') # write to csv

# 3) Download metadata files from AWS s3 for overlapping Landsat scenes
The old syntax for grabbing an individual Landsat 8 metadata file from the Collection 1 data is as follows:

    aws --no-sign-request s3 cp s3://landsat-pds/c1/L8/path/row/LC08_XXXX_pathrow_yyyyMMdd_01_T1/LC08_XXXX_pathrow_yyyyMMdd_01_T1_MTL.txt /path_to/output/
     
The new syntax for listing the Collection 2 Landsat image files AWS s3 bucket is as follows:

    aws s3 ls --request-payer requester s3://usgs-landsat/collection02/level-2/standard/oli-tirs/yyyy/path/row/LC08_LS2R_pathrow_yyyyMMdd_yyyyMMdd_02_T1/ 
    
If you use the old syntax for grabbing a metadata file, you will encounter an AccessDenied return. However, s3 is a valid argument command for the aws CLI. Writing this combined with ls (lists the files within a folder) and --request-payer requester (indicates the requester will be charged for data download) will allow you to access files within the USGS Landsat image bucket.

We can use the paths and rows in the dataframe to access the full Landsat 8 scene list and the corresponding metdata files. Read https://docs.opendata.aws/landsat-pds/readme.html to learn more.

The old way to download the metadatafiles into Path_Row folders was by using:

    aws --no-sign-request s3 cp s3://landsat-pds/c1/L8/031/005/ Output/path/LS8aws/Path031_Row005/ --recursive --exclude "*" --include "*MTL.txt" 
    
The updated way to download the metadatafiles into Path_Row folders is by using:
    
    aws s3api get-object --bucket usgs-landsat --key collection02/level-2/standard/oli-tirs/yyyy/path/row/LC08_L2SP_pathrow_yyyyMMdd_yyyyMMdd_02_T1/LC08_L2SP_pathrow_yyyyMMdd_yyyyMMdd_02_T1_MTL.txt  --request-payer requester LC08_L2SP_pathrow_yyyyMMdd_yyyyMMdd_02_T1_MTL.txt

__NOTE: Including the --request-payer requester as part of this line indicates that the referenced user will be charged for data download.__

In [79]:
# Read in csv file from Step 2
boxes_pr_df = pd.read_csv(basepath+PR_FILENAME, dtype=str)
boxes_pr_df = boxes_pr_df.set_index('BoxID'); boxes_pr_df

Unnamed: 0_level_0,Unnamed: 0,Path,Row
BoxID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
8,0,31,6
8,1,30,6
8,2,29,6
8,3,28,6
8,4,27,6
8,5,32,5
8,6,31,5


In [123]:
# Loop through the dataframe containing overlapping path, row info:
for index, row in boxes_pr_df.iterrows():
    p = row['Path']; r = row['Row']; folder_name = 'Path'+p+'_Row'+r+'_c1' # folder name
    bp_out = downloadpath+folder_name+'/' # output path for the downloaded files
    print("Downloaded metadata files are stored in:",bp_out)
    
    # create Path_Row folders if they don't exist already
    if os.path.exists(bp_out):
        print(folder_name, " EXISTS ALREADY. SKIP.")
    else:
        os.mkdir(bp_out)
        print(folder_name+" directory made")
    
    for sat in sats: # for each satellite
        if sat == 'L8':
            collectionfolder = 'oli-tirs/'; years = L8_yrs; prefix='LC08' # set folder, years, file prefix
        elif sat == 'L7':
            collectionfolder = 'etm/'; years = L7_yrs; prefix='LE07' # set folder, years, file prefix
        
        # loop through years
        for year in years:
            # grab list of images for the year
            find_imgs = 'aws s3 ls --request-payer requester s3://usgs-landsat/'+collectionpath+collectionfolder
            find_imgs += year+'/'
            find_imgs += p+'/'+r+'/'
            result = subprocess.check_output(find_imgs,shell=True)
            results = result.split() # split string
            
            imagenames = []
            for line in results: # loop through strings
                line = str(line)
                if prefix in line and 'T1' in line: # find just the Tier-1 image names with the correct prefixes
                    imgname = line[2:-2]; imagenames.append(imgname)

            # download the metadata (MTL.txt) file
            for imgname in imagenames:
                command = 'aws s3api get-object --bucket usgs-landsat --key '+collectionpath+collectionfolder
                command += year+'/'
                command += p+'/'+r+'/'
                command += imgname+'/'+imgname+'_MTL.txt'
                command += ' --request-payer requester '
                command += bp_out+imgname+'_MTL.txt'

                # # Old way to obtain files from AWS
                # command = 'aws --no-sign-request s3 cp '+totalp_in+' '+bp_out+' --recursive --exclude "*" --include "*MTL.txt"'

                print('Command:',command); print()
                subprocess.run(command,shell=True,check=True)

Downloaded metadata files are stored in: /media/jukes/jukes1/LS8aws/Path031_Row006_c1/
Path031_Row006_c1  EXISTS ALREADY. SKIP.
Command: aws s3api get-object --bucket usgs-landsat --key collection02/level-1/standard/etm/1999/031/006/LE07_L1TP_031006_19990829_20200918_02_T1/LE07_L1TP_031006_19990829_20200918_02_T1_MTL.txt --request-payer requester /media/jukes/jukes1/LS8aws/Path031_Row006_c1/LE07_L1TP_031006_19990829_20200918_02_T1_MTL.txt

Command: aws s3api get-object --bucket usgs-landsat --key collection02/level-1/standard/etm/2002/031/006/LE07_L1TP_031006_20020330_20200916_02_T1/LE07_L1TP_031006_20020330_20200916_02_T1_MTL.txt --request-payer requester /media/jukes/jukes1/LS8aws/Path031_Row006_c1/LE07_L1TP_031006_20020330_20200916_02_T1_MTL.txt

Command: aws s3api get-object --bucket usgs-landsat --key collection02/level-1/standard/oli-tirs/2013/031/006/LC08_L1TP_031006_20130523_20200913_02_T1/LC08_L1TP_031006_20130523_20200913_02_T1_MTL.txt --request-payer requester /media/jukes/j

KeyboardInterrupt: 

# 4) Calculate cloud % over terminus box using Landsat quality band


### Reproject terminus boxes into UTM projections to match Landsat files

If the terminus box shapefiles were not originally in UTM projection, will need to reproject them into UTM to match the Landsat projection. The code automatically finds the UTM zones from the metadata files and fills in the following syntax to reproject:
    
    ogr2ogr -f "ESRI Shapefile" -t_srs EPSG:326zone -s_srs EPSG:3413 output.shp input.shp
    
If they are already in UTM projection, skip this step and rename the files to end with "\_UTM\_##.shp" where ## corresponds to the zone number (e.g., "\_UTM\_07.shp", "\_UTM\_21.shp").

In [124]:
zones = {} # initialize dictionary to hold UTM zone for each Landsat scene path row
zone_list = [] # list of zones

# Loop through all scenes:
for index, row in boxes_pr_df.iterrows():
    BoxID = str(index)
    p = row['Path']; r = row['Row']; folder_name = 'Path'+p+'_Row'+r+'_c1' # Landsat path and row from dataframe
    pr_folderpath = downloadpath+folder_name+'/' # path to the downloaded metadata files
    pathtoshp = basepath+"Box"+BoxID+"/Box"+BoxID # path to the terminus box shapefiles (all projections)
    
    if len(os.listdir(pr_folderpath)) > 0: # if there are files in the folder
        # grab UTM Zone from the first metadata file
        mtl_scene = glob.glob(pr_folderpath+'*_MTL.txt')[0]
        mtl = open(mtl_scene, 'r')
        
        # loop through lines in the metadata file to find the UTM ZONE
        for line in mtl:  
            variable = line.split("=")[0] # grab the variable name
            if ("UTM_ZONE" in variable):
                zone = '%02d' % int(line.split("=")[1][1:-1]) # grab the 2-digit zone number
                zones.update({folder_name: zone}); zone_list.append(zone) # add to zone lists
                break
                
        # reproject shapefile(s) into UTM
        zone = zones[folder_name]
        rp_shp = 'ogr2ogr -f "ESRI Shapefile" '+pathtoshp+'_UTM_'+zone+'.shp '+pathtoshp+'_WGS.shp -t_srs EPSG:326'+zone+' -s_srs EPSG:4326'
        subprocess.run(rp_shp, shell=True,check=True)
        
    else: # if no files in folder, zone = nan, must fill in manually
        zone_list.append(np.nan)
        
boxes_pr_df['Zone'] = zone_list # add to the path row dataframe
boxes_pr_df.head()

Unnamed: 0_level_0,Unnamed: 0,Path,Row,Zone
BoxID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
8,0,31,6,19
8,1,30,6,19
8,2,29,6,19
8,3,28,6,19
8,4,27,6,20


In [16]:
# overwrite path row csv file with UTM zone information, see above for variable PR_FILENAME
boxes_pr_df.to_csv(path_or_buf = basepath+PR_FILENAME, sep=',')

### Download subset of the quality band file for each scene

Use gdalwarp commands and the __vsi3__ link to download subset of the quality band we will use to determine cloud cover over the terminus:

    gdalwarp -cutline path_to_shp.shp -crop_to_cutline /vsi3/usgs-landsat/collection02/level-1/standard/oli-tirs/yyyy/path/row/scene/scene_QA_PIXEL.TIF path_to_subset_QA_PIXEL.TIF


In [125]:
# Loop through all scenes:
for index, row in boxes_pr_df.iterrows():
    p = row['Path']; r = row['Row']; zone = row['Zone'] # grab path, row, zone
    BoxID = str(index)
    folder_name = 'Path'+p+'_Row'+r+'_c1'
    pr_folderpath = downloadpath+folder_name+'/' # path to the downloaded metadata files
    pathtoshp = basepath+"Box"+BoxID+"/Box"+BoxID # path to the terminus box shapefiles (all projections)
    pathtoshp_rp = pathtoshp+'_UTM_'+zone # path to the UTM projected box shapefile

    files = os.listdir(pr_folderpath) # grab the names of the Landsat scenes
    
    # for all files in the path row folders
    for file in files:
        scene = file[:-19] # slice the filename to grab the scene name
        print(scene)
        
        if 'L1TP' in scene and 'T1' in scene: # L1TP scenes
            scene_year = scene[17:21] # grab the year from the scene name
            
            if scene.startswith('LC08'):
                collectionfolder='oli-tirs/'
            elif scene.startswith('LE07'):
                collectionfolder='etm/'
                
            # set path to the QA pixel Landsat files
            pathtoQAPIXEL='/vsis3/usgs-landsat/'+collectionpath+collectionfolder
            pathtoQAPIXEL+=year+'/'
            pathtoQAPIXEL+=p+'/'+r+'/'
            pathtoQAPIXEL+=scene+'/'+scene+"_QA_PIXEL.TIF"
            
            # set path to the subset QA pixel files inside the path row folders
            subsetout = pr_folderpath+scene+'_QA_PIXEL_Box'+BoxID+'.TIF' 

            # construct download command
            QAPIXEL_dwnld_cmd='gdalwarp -overwrite -cutline '+pathtoshp_rp+'.shp -crop_to_cutline '+pathtoQAPIXEL+' '+subsetout
            QAPIXEL_dwnld_cmd+=' --config AWS_REQUEST_PAYER requester --config AWS_REGION us-west-2 '
            print("Command:", QAPIXEL_dwnld_cmd); print() # check command syntax before downloading

            subprocess.run(QAPIXEL_dwnld_cmd, shell=True, check=True)


LC08_L1TP_031006_20150310_20200909_02_T1
Command: gdalwarp -overwrite -cutline /home/jukes/Documents/Sample_glaciers/Box008/Box008_UTM_19.shp -crop_to_cutline /vsis3/usgs-landsat/collection02/level-1/standard/oli-tirs/2015/031/006/LC08_L1TP_031006_20150310_20200909_02_T1/LC08_L1TP_031006_20150310_20200909_02_T1_QA_PIXEL.TIF /media/jukes/jukes1/LS8aws/Path031_Row006_c1/LC08_L1TP_031006_20150310_20200909_02_T1_QA_PIXEL_Box008.TIF --config AWS_REQUEST_PAYER requester --config AWS_REGION us-west-2 



CalledProcessError: Command 'gdalwarp -overwrite -cutline /home/jukes/Documents/Sample_glaciers/Box008/Box008_UTM_19.shp -crop_to_cutline /vsis3/usgs-landsat/collection02/level-1/standard/oli-tirs/2015/031/006/LC08_L1TP_031006_20150310_20200909_02_T1/LC08_L1TP_031006_20150310_20200909_02_T1_QA_PIXEL.TIF /media/jukes/jukes1/LS8aws/Path031_Row006_c1/LC08_L1TP_031006_20150310_20200909_02_T1_QA_PIXEL_Box008.TIF --config AWS_REQUEST_PAYER requester --config AWS_REGION us-west-2 ' returned non-zero exit status 2.

# 5) Create buffer zone around terminus boxes and rasterize terminus boxes

In [127]:
buffers = []; mindimensions = [] # store the buffer distances and minimum dimensions

# Calculate a buffer distance around the terminus box:
for BoxID in BoxIDs:
    for file in os.listdir(basepath+'Box'+BoxID+'/'):
        if 'UTM' in file and '.shp' in file and "Box" in file: # identify UTM projected box
            boxpath = basepath+"Box"+BoxID+"/"+file  
            termbox = fiona.open(boxpath)
            
    # grab the box coordinates:
    box = termbox.next(); box_geom= box.get('geometry'); box_coords = box_geom.get('coordinates')[0]
    points = []
    for coord_pair in box_coords:
        lat = coord_pair[0]; lon = coord_pair[1]; points.append([lat, lon])
            
    # Calculate distance between coord 1 and 2 and between 2 and 3
    coord1 = points[0]; coord2 = points[1]; coord3 = points[2]   
    dist1 = distance(coord1[0], coord1[1], coord2[0], coord2[1]);
    dist2 = distance(coord2[0], coord2[1], coord3[0], coord3[1]) 
    buff_dist = int(np.max([dist1, dist2])) # pick the longer one as the buffer distance
    mindim = int(np.min([dist1, dist2]))/15.0 # calculate the minimum dimension in pixels
    
    # record:
    buffers.append(buff_dist)
    mindimensions.append(int(mindim))

# store as dataframe:
buff_df = pd.DataFrame(list(zip(BoxIDs, buffers, mindimensions)), columns=['BoxID', 'Buff_dist_m', 'min_dim_px'])
buff_df

  # This is added back by InteractiveShellApp.init_path()


Unnamed: 0,BoxID,Buff_dist_m,min_dim_px
0,8,9438,298


In [130]:
# write to csv, change name as desired
BOX_FILENAME = 'Buffdist'+csvext
buff_df.to_csv(basepath+BOX_FILENAME) 

### Create a buffer zone shapefile using GDAL command **ogr2ogr** with the following syntax:

    ogr2ogr Buffer###.shp path_to_terminusbox###.shp  -dialect sqlite -sql "SELECT ST_Buffer(geometry, buffer_distance) AS geometry,*FROM 'Box###'" -f "ESRI Shapefile"

In [132]:
# loop through the buffer distance dataframe:
for index, row in buff_df.iterrows():
    BoxID = row['BoxID']
    buff_dist = str(row['Buff_dist_m'])
    
    terminusbox_path = basepath+"Box"+BoxID+"/Box"+BoxID+".shp" # path to box shapefile
    outputbuffer_path = basepath+"Box"+BoxID+"/Buffer"+BoxID+".shp" # path and name of new buffer file
    
    # Set buffer command
    buffer_cmd = 'ogr2ogr '+outputbuffer_path+" "+terminusbox_path+' -dialect sqlite -sql "SELECT ST_Buffer(geometry, '+buff_dist+") AS geometry,*FROM 'Box"+BoxID+"'"+'" -f "ESRI Shapefile"'
    print("Command:", buffer_cmd)
    
    subprocess.run(buffer_cmd, shell=True, check=True) # run on terminal

Command: ogr2ogr /home/jukes/Documents/Sample_glaciers/Box008/Buffer008.shp /home/jukes/Documents/Sample_glaciers/Box008/Box008.shp -dialect sqlite -sql "SELECT ST_Buffer(geometry, 9438) AS geometry,*FROM 'Box008'" -f "ESRI Shapefile"


### Rasterize terminus boxes (to be used as a mask during the WTMM filering) using the GDAL **gdal_rasterize** command, subset to the buffer zone using the GDAL **gdalwarp** command, and reprojected:

1) Rasterize

        gdal_rasterize -burn 1.0 -tr x_resolution y_resolution -a_nodata 0.0 path_to_terminusbox.shp path_to_terminusbox_raster.TIF
    
2) Subset

        gdalwarp -cutline path_to_Buffer###.shp -crop_to_cutline path_to_terminusbox_raster.TIF path_to_subset_raster_cut.TIF
    
3) Reproject to UTM

In [136]:
for index, row in buff_df.iterrows():
    BoxID = row['BoxID']
    terminusbox_path = basepath+"Box"+BoxID+"/Box"+BoxID # path to box
    buffer_path = basepath+"Box"+BoxID+"/Buffer"+BoxID # path to buffer
    zones = boxes_pr_df.loc[BoxID, 'Zone'] # grab zone matching BoxID from other dataframe
    
    terminusraster_path = basepath+"Box"+BoxID+"/Box"+BoxID+".TIF" # path to rasterized box
    cutraster_path = basepath+"Box"+BoxID+"/Box"+BoxID+"_raster_cut.TIF" # name for cropped file
    
    # Set commands
    rasterize_cmd = 'gdal_rasterize -burn 1.0 -tr 15.0 15.0 -a_nodata 0.0 '+terminusbox_path+'.shp '+terminusraster_path
    subsetbuffer_cmd = 'gdalwarp -overwrite -cutline '+buffer_path+'.shp -crop_to_cutline '+terminusraster_path+' '+cutraster_path
    subprocess.run(rasterize_cmd, shell=True,check=True) # rasterize with command terminal
    subprocess.run(subsetbuffer_cmd, shell=True,check=True) # subset to buffer with command terminal
    
    # Reprojection needs to happen for each zone
    for zone in zones:
        rp_shp = 'ogr2ogr -f "ESRI Shapefile" -t_srs EPSG:326'+zone+' -s_srs EPSG:'+source_srs+' '+buffer_path+"_UTM_"+zone+".shp "+buffer_path+'.shp'
        subprocess.run(rp_shp, shell=True, check=True) # reproject

    print("Box"+BoxID) # check progress

Box008


# 6) Download non-cloudy Landsat images from AWS

To remove cloudy images, we will find the number of pixels in our terminus box that correspond to a cumulative pixel value of > 4096 in the QA_PIXEL band. If the fraction of cloudy pixels with values is above the threshold, we won't download the image. 

Additionally, we remove images that are primarily black (fill value of 0 or 1 in QA_PIXEL band). This ensures that the scenes that cut off halfway across the glacier are not included in further analysis. The fill percent threshold may need to be adjusted.

In [140]:
######################################################################################
# These are the recommended values. Adjust thresholds here:
QAPIXEL_thresh = 22280.0 # QA pixel value threshold to be considered cloud
cpercent_thresh = 50.0 # maximum cloud cover % in terminus box
fpercent_thresh = 60.0 # maximum fill % in terminus box
######################################################################################

In [141]:
# Download images that pass these thresholds:
for index, row in boxes_pr_df.iterrows():
    # grab paths
    p = row['Path']; zone = row['Zone']; r = row['Row']; BoxID = index; 
    folder_name = 'Path'+p+'_Row'+r+'_c1'
    pr_folderpath = downloadpath+folder_name+'/'
    bp_out = downloadpath+'Box'+BoxID+'/' # folder name for downloaded images
    if os.path.exists(bp_out): # create folder if it does not exist
        print("Box"+BoxID, " exists already. Skip creation of directory.")
    else:
        os.mkdir(bp_out)
        print("Box"+BoxID+" directory made.")
    
    # path to the shapefile covering the region that will be downloaded
    pathtobuffer = basepath+'Box'+BoxID+'/Buffer'+BoxID+'_UTM_'+zone+'.shp'  # buffer around box - recommended
#     pathtobox = basepath+'Box'+BoxID+'/Box'+BoxID+'_UTM_'+zone+'.shp' # just the box
    
    for scene in os.listdir(pr_folderpath):
        if scene.endswith(".TIF") and 'T1' in scene: # For Tier-1 images
            scene = scene[:-19]
            year = scene[17:21] # grab acquisition year
            
            if scene.startswith("LC08"): # Landsat 8
                collectionfolder = 'oli-tirs/'; bands = L8_bands
            elif scene.startswith("LC07"): # Landsat 7
                collectionfolder = 'etm/'; bands = L7_bands
 
            QApixelpath = pr_folderpath+scene+'_QAPIXEL_Box'+BoxID+'.TIF' # path to QA_PIXEL file
            subsetQApixel = mpimg.imread(QApixelpath) # read in as numpy array
            
            # calculate percentages of cloud and fill bixels
            totalpixels = subsetQApixel.shape[0]*subsetQApixel.shape[1] # total number of pixels
            cloudQApixel = subsetQApixel[subsetQApixel > QAPIXEL_thresh] # cloudy pixels (value > QAPIXEL_thresh)
            fillQApixel = subsetQApixel[subsetQApixel < 2.0] # fill pixels (value = 0 or 1)
            cloudpixels = len(cloudQApixel); fillpixels = len(fillQApixel) # count the cloudy and fill pixels
            cloudpercent = int(float(cloudpixels)/float(totalpixels)*100) # calculate percent cloudy
            fillpercent = int(float(fillpixels)/float(totalpixels)*100) # calculate percent fill
            print(scene, 'Cloud % ', cloudpercent, 'Fill %', fillpercent) # check values
            
            # evaluate thresholds
            if cloudpercent <= cpercent_thresh and fillpercent <= fpercent_thresh:
                # download the bands for that scene into your scene folders:
                for band in bands:
                        band = str(band) # string format
                        
                        # input path to your bands in AWS:
                        pathin = '/vsis3/usgs-landsat/'+collectionpath+collectionfolder+year+'/'+p+"/"+r+"/"+scene+"/"+scene+"_B"+band+".TIF"
                        
                        outfilename = scene[0:-19]+"_B"+band+'_Buffer'+BoxID+'.TIF' # output file name
                        pathout = downloadpath+'Box'+BoxID+'/'+outfilename # full output file path

                        # construct download command
                        download_cmd = 'gdalwarp -overwrite -cutline '+pathtobuffer+' -crop_to_cutline '+pathin+' '+pathout+ ' --config AWS_REQUEST_PAYER requester --config AWS_REGION us-west-2 '
                        print(download_cmd) # check

                        # Once checked, uncomment the following to commence download:                   
                        subprocess.run(download_cmd, shell=True, check=True)
                        

Box008  exists already. Skip creation of directory.
LC08_L1TP_031006_20150310_20200909_02_T1 Cloud %  37 Fill % 46
gdalwarp -overwrite -cutline /home/jukes/Documents/Sample_glaciers/Box008/Buffer008_UTM_19.shp -crop_to_cutline /vsis3/usgs-landsat/collection02/level-1/standard/oli-tirs/2015/031/006/LC08_L1TP_031006_20150310_20200909_02_T1/LC08_L1TP_031006_20150310_20200909_02_T1_B8.TIF /media/jukes/jukes1/LS8aws/Box008/LC08_L1TP_031006_2015_B8_Buffer008.TIF --config AWS_REQUEST_PAYER requester --config AWS_REGION us-west-2 


CalledProcessError: Command 'gdalwarp -overwrite -cutline /home/jukes/Documents/Sample_glaciers/Box008/Buffer008_UTM_19.shp -crop_to_cutline /vsis3/usgs-landsat/collection02/level-1/standard/oli-tirs/2015/031/006/LC08_L1TP_031006_20150310_20200909_02_T1/LC08_L1TP_031006_20150310_20200909_02_T1_B8.TIF /media/jukes/jukes1/LS8aws/Box008/LC08_L1TP_031006_2015_B8_Buffer008.TIF --config AWS_REQUEST_PAYER requester --config AWS_REGION us-west-2 ' returned non-zero exit status 2.

### Reproject the downloaded files from UTM into your desired projection

In [157]:
######################################################################################
desired_proj = '3413' # EPSG code for desired projection
suffix = '_PS' # suffix for reprojected images - something that indicates the projection
######################################################################################

for BoxID in list(set(boxes_pr_df.index)):
    bp_out = downloadpath+'Box'+BoxID+'/' # path to downloaded files
    
    # create output reprojected folder if does not exist
    if os.path.exists(bp_out+'reprojected/'):
        print("Box"+BoxID, "Reprojected folder exists already.")
    else:
        os.mkdir(bp_out+'reprojected/')
        print("Box"+BoxID+" Reprojected directory made")
                      
    downloadedimages = os.listdir(bp_out) # all downloaded images
    for image in downloadedimages:
        if image.endswith('.TIF'):
            imagename = image[:-4] # remove suffix
            print(image)

            rp_PS = "gdalwarp -t_srs EPSG:"+desired_proj+' '+bp_out+image+" "+bp_out+'reprojected/'+imagename+suffix+".TIF"
            subprocess.run(rp_PS, shell=True,check=True)   


Box008 Reprojected folder exists already.


### Automatically grab the image acquisition dates from the metadata files

In [149]:
datetimes = [] # list of scene datetimes
scenes_dated = [] # list of scenes

# Loop through the dataframe with your path row combinations:
for index, row in boxes_pr_df.iterrows():
    p = row['Path']; r = row['Row']; BoxID = index; 
    folder_name = 'Path'+p+'_Row'+r+'_c1'; print(folder_name)
    
    # Output folder paths"
    folderpath = downloadpath+folder_name+'/'
    bp_out = downloadpath+'Box'+BoxID+'/reprojected/'
      
    scenecount = 0 # keep track of the number of scenes:
    
    downloaded_scenes = os.listdir(bp_out)
    for scene in downloaded_scenes:
        if scene.endswith('.TIF'):
            scenename = scene[:-20] # MAY NEED TO ADJUST DEPENDING ON SUFFIX - CHANGE THIS TO start and end
            print(scenename)
            if scenename in os.listdir(folderpath):
                scenefiles = os.listdir(folderpath+scenename+'/')
                for file in scenefiles:
                    if ("MTL.txt" in file): # open metadata file
                        mdata = open(folderpath+scenename+"/"+scenename+"_MTL.txt", "r")
                        for line in mdata:
                            variable = line.split("=")[0]
                            if ("DATE_ACQUIRED" in variable):
                                date = line.split("=")[1][1:-1] # find acquisition date
                        dates = datetime.datetime.strptime(date, '%Y-%m-%d') # save as datetime object
                        datetimes.append(dates); scenes_dated.append(scenename) # store in lists
                scenecount = scenecount+1

# Store in a dataframe
datetime_df = pd.DataFrame(list(zip(scenes_dated, datetimes)), columns=['Scene', 'datetime'])
datetime_df = datetime_df.sort_values(by='datetime', ascending=True); datetime_df = datetime_df.drop_duplicates()
datetime_df

Path031_Row006_c1
Path030_Row006_c1
Path029_Row006_c1
Path028_Row006_c1
Path027_Row006_c1
Path032_Row005_c1
Path031_Row005_c1


Unnamed: 0,Scene,datetime


In [174]:
# write dates to csv
DATES_FILENAME = 'imgdates'+csvext 
datetime_df.to_csv(path_or_buf = basepath+DATES_FILENAME, sep=',') 

### Second pass at cloud filtering using range in pixel values

Removes cloudy images that slipped through QA filtering. Skip if unnecessary.

In [None]:
# # convert all files in reprojected folder to png from TIF
# for BoxID in BoxIDs:
#     print(BoxID)
#     command = 'cd '+downloadpath+'Box'+BoxID+'/reprojected/; '+'mogrify -format png *_PS.TIF'
#     subprocess.call(command, shell=True)

In [None]:
# ######################################################################################
# suffix = '_PS' # reprojection suffix
# ######################################################################################

# for BoxID in BoxIDs:
#     imagepath = downloadpath+'Box'+BoxID+'/reprojected/'
#     for img in os.listdir(imagepath):
#         if img.endswith('Buffer'+BoxID+suffix+'.png'):
#             image = cv2.imread(imagepath+img,-1) # read in image
#             imageplt = mpimg.imread(imagepath+img)
#             image_nofill = imageplt[imageplt > 0] # don't consider the fill points
#             img_std = np.std(image_nofill) # st. dev in values
#             if len(image_nofill.shape) > 1:
#                 img_range = np.max(image_nofill) - np.min(image_nofill)
#                 img_med = np.median(image_nofill)

#                 if img_std < 0.04 and img_med > 0.15: # adjust these threholds
#                     os.remove(imagepath+img) # remove png mimage
#                     os.remove(imagepath+img[:-4]+'.TIF') # remove pgm image     
#                     # show the image
#     #                 imgplt_trim = plt.imshow(cv2.cvtColor(imageplt, cv2.COLOR_BGR2RGB))
#     #                 plt.show()

### Grab fraction of total images available that were excluded due to clouds and fill

If you are interested in knowing how many images were filtered out using the cloud and fill thresholds, run the following cells. Otherwise, skip.

In [163]:
# # read in path, row csv file if not already loaded
# boxes_pr_df = pd.read_csv(basepath+PR_FILENAME, dtype=str)
# boxes_pr_df = boxes_pr_df.set_index('BoxID'); BoxIDs = list(set(boxes_pr_df.index))
# print(BoxIDs); boxes_pr_df.head()

In [161]:
# im_tots = []; downloaded = []; fractions = []

# for BoxID in BoxIDs:
#     pathrows_BoxID = boxes_pr_df[boxes_pr_df.index == BoxID].copy() # grab path rows for that BoxID
    
#     im_tot = 0 # count number of total scenes available
#     for idx, rw in pathrows_BoxID.iterrows():
#         p = rw['Path']; r = rw['Row']
#         ims_pr = len(os.listdir(downloadpath+'Path'+p+'_Row'+r+'_c1')) # grab number of scenes in that pathrow
#         im_tot = im_tot + ims_pr
    
#     counter = 0
#     if im_tot == 0: # if no images
#         download_frac = np.NaN
#     else:
#         # count the files that passed thresholds and got downloaded
#         for file in os.listdir(downloadpath+'Box'+BoxID+'/reprojected/'):
#             if file.endswith('.png') and 'B8' in file: # panchromatic band (B8)
#                 counter = counter + 1
            
#         download_frac = int(counter/im_tot*100) # calculate fraction downloaded
#     im_tots.append(im_tot); downloaded.append(counter); fractions.append(download_frac) # store values

# # store in dataframe
# downloaded_df = pd.DataFrame(list(zip(BoxIDs, im_tots, downloaded, fractions)), columns = ['BoxID', 'Total_ims', 'Downloaded', '%'])
# downloaded_df 

In [162]:
# # write percent downloaded to csv
# DOWNLOADED_FILENAME = 'Images_downloaded'+csvext
# downloaded_df.to_csv(basepath+DOWNLOADED_FILENAME, sep=',') # write to csv

# 7) Calculate weighted average glacier flow direction using velocity data

The following code processes ice velocity (vx, vy) rasters to determine each glacier of interest's weighted average flow direction. These files should be placed in the base directory (basepath). The rasters are subset using the terminus box shapefile or the Randolph Glacier Inventory outlines using a GDAL command (**gdalwarp**) with the following syntax:

    gdalwarp -cutline path_to_terminusbox.shp -crop_to_cutline path_to_input_velocity.TIF path_to_output_velocity.TIF

In [170]:
######################################################################################
# Change to your velocity input file names and paths
vx_name = 'greenland_vel_mosaic250_vx_v1.tif' # MEaSUREs product
vy_name = 'greenland_vel_mosaic250_vy_v1.tif' # MEaSUREs product
vpath = basepath # path to folder containing velocity files
no_data_val = -2000000000.0 # no data value for the velocity maps
vt = 'day' # time unit for velocity (e.g., day for m/day, year for m/year)
######################################################################################

for BoxID in BoxIDs:
    terminus_path = basepath+"Box"+BoxID+"/RGI_Box"+BoxID+".shp"  # path to RGI shapefile

    if not os.path.exists(terminus_path): # if the RGI shapefile does not exist
        terminus_path = basepath+"Box"+BoxID+"/Box"+BoxID+".shp"  # set the path to the box shapefile instead   
    
    print(terminus_path)
        
    # output paths for the cropped velocity data
    vx_out = terminus_path[:-4]+'_'+vx_name
    vy_out = terminus_path[:-4]+'_'+vy_name
    
    # input paths:
    vx_in = basepath+vx_name
    vy_in = basepath+vy_name
    
    # subset x and y velocity files
    v_subset1 = 'gdalwarp -cutline '+terminus_path+' -crop_to_cutline '+vx_in+" "+vx_out
    v_subset2 = 'gdalwarp -cutline '+terminus_path+' -crop_to_cutline '+vy_in+" "+vy_out
    subprocess.run(v_subset1, shell=True, check=True)
    subprocess.run(v_subset2, shell=True, check=True)
    
    print("Box"+BoxID+' done.')

/home/jukes/Documents/Sample_glaciers/Box008/RGI_Box008.shp


CalledProcessError: Command 'gdalwarp -cutline /home/jukes/Documents/Sample_glaciers/Box008/RGI_Box008.shp -crop_to_cutline /home/jukes/Documents/Sample_glaciers/greenland_vel_mosaic250_vx_v1.tif /home/jukes/Documents/Sample_glaciers/Box008/RGI_Box008_greenland_vel_mosaic250_vx_v1.tif' returned non-zero exit status 1.

Next, these subset velocity rasters are opened using the **rasterio** package and read into arrays. They are filtered for anomalous values and the velocity magnitudes are converted into weights. Then the **numpy.average()** function is used to calculated the weighted average flow directions where the flow directions of the pixels where the highest velocities are found are weighted more. 

The resulting average flow direction will be representative of the glacier's main flow. These directions will be used to rotate the images of the glaciers so that their flow is due right.

__For slow-moving glaciers with uncertain velocities from feature tracking based velocity datasets, use manual determination of velocities. Here, we use the manual delineations of the Greenland peripheral glaciers in 2000 and 2015 to approximate the flow direction.__

In [38]:
##################################################################################################################
# ONLY APPLIES TO GREENLAND PERIPHERAL GLACIERS
badvelocities = ['301', '289', '283', '265', '241', '223', '285', '181', '097', '091', '067','083',
                 '221', '173', '113', '101', '089', '082', '100', '112', '118', '130', '160', '196', 
                 '208', '226', '256', '262', '280', '298', '322', '072', '074', '080', '082', '084',
                '102', '114', '134', '132', '144', '159', '188', '189', '198', '207', '212', '222',
                '224', '234', '242', '243', '249', '254', '258', '264', '267', '272', '273', '278', 
                '282', '284', '288', '297', '305', '306', '307', '315', '318', '321', '324', '327',
                '330', '331', '338', '341', '344', '354', '356', '357', '358', '359', '362', '363',
                 '364', '369', '370', '371', '372', '373', '374', '376', '377', '379', '380', '381', 
                 '382', '383', '384', '385', '386', '387', '388', '389', '390', '391', '392', '393',
                 '394', '395', '396', '397', '398', '399', '400', '401' ,'404', '405', '406', '407',
                 '408', '409', '410', '414', '415', '416', '417', '418', '419', '420', '421', '422',
                 '427', '430', '431', '434', '436', '438', '440']
print(badvelocities)
##################################################################################################################

['301', '289', '283', '265', '241', '223', '285', '181', '097', '091', '067', '083', '221', '173', '113', '101', '089', '082', '100', '112', '118', '130', '160', '196', '208', '226', '256', '262', '280', '298', '322', '072', '074', '080', '082', '084', '102', '114', '134', '132', '144', '159', '188', '189', '198', '207', '212', '222', '224', '234', '242', '243', '249', '254', '258', '264', '267', '272', '273', '278', '282', '284', '288', '297', '305', '306', '307', '315', '318', '321', '324', '327', '330', '331', '338', '341', '344', '354', '356', '357', '358', '359', '362', '363', '364', '369', '370', '371', '372', '373', '374', '376', '377', '379', '380', '381', '382', '383', '384', '385', '386', '387', '388', '389', '390', '391', '392', '393', '394', '395', '396', '397', '398', '399', '400', '401', '404', '405', '406', '407', '408', '409', '410', '414', '415', '416', '417', '418', '419', '420', '421', '422', '427', '430', '431', '434', '436', '438', '440']


In [172]:
boxes = []; avg_rot = []; max_mag = []; num_cells = [] # to hold the boxIDs, rotation angle, max. glacier speed, and number of pixels

for BoxID in BoxIDs:
    rot_angles = []; max_magnitudes = [] # store angles and speeds from all pixels
    
    # determine if RGI outline was used to subset velocities
    rgi_exists = 0
    for file in os.listdir(basepath+"Box"+BoxID):
        if file.startswith('RGI'):
            rgi_exists = 1
            
    if rgi_exists == 1: # if yes, open those files    
        vx = rio.open(basepath+"Box"+BoxID+"/RGI_Box"+BoxID+'_'+vx_name, "r") 
        vy = rio.open(basepath+"Box"+BoxID+"/RGI_Box"+BoxID+'_'+vy_name, "r") 
    else: # if not, they were subset using the boxes. Open those files
        vx = rio.open(basepath+"Box"+BoxID+"/Box"+BoxID+'_'+vx_name, "r") 
        vy = rio.open(basepath+"Box"+BoxID+"/Box"+BoxID+'_'+vy_name, "r") 
    vx_array = vx.read(); vy_array = vy.read() # read as numpy array
    
    # remove no data values
    vx_masked = vx_array[vx_array != no_data_val]
    vy_masked = vy_array[vy_array != no_data_val]
    
    # calculate flow direction
    direction = np.arctan2(vy_masked, vx_masked)*180/np.pi 
    # transform so any negative angles are placed on 0 to 360 scale:
    if len(direction[direction < 0]) > 0:
        direction[direction < 0] = 360.0+direction[direction < 0]
    
    # calculate speed (flow magnitude)
    magnitude = np.sqrt((vx_masked*vx_masked) + (vy_masked*vy_masked)) 
    
    ncells = len(direction) # number of pixels
    if ncells > 0:
        # Determine if there are a large number of direction pixels with values > 200.0
        # If so, it's probably pointing East
        dir_range = direction.max() - direction.min()
        if dir_range > 200.0 and len(direction[direction > 200]): # if large range and values above 200
            direction[direction > 180] = direction[direction > 180] - 360.0 # transform those values on a negative scale
            # calculate weights (0 - 1) from magnitudes
            mag_range = magnitude.max() - magnitude.min()
            stretch = 1/mag_range; weights = stretch*(magnitude - magnitude.min()) # weights for averaging
            avg_dir = np.average(direction, weights=weights) # calculate average flow direction
            if avg_dir < 0: # if negative:
                avg_dir = avg_dir + 360.0 # transform back to 0 to 360 scale
        else:
            mag_range = magnitude.max() - magnitude.min(); stretch = 1/mag_range
            weights = stretch*(magnitude - magnitude.min())
            avg_dir = np.average(direction, weights=weights)
                
        if vt == 'day':
            max_magnitude = magnitude.max()
        elif vt == 'year':
            yr_day_conv = 0.00273973 # conversion to m/d from m/a
            max_magnitude = magnitude.max()*yr_day_conv  
    else: # no velocity pixels remaining once cropped
        avg_dir = np.NaN ; max_magnitude = np.NaN # no velocities to calculate this with
    
    ##################################################################################################################
    # ONLY APPLIES TO GREENLAND PERIPHERAL GLACIERS, COMMENT OUT FOR OTHER APPLICATIONS
    path2000_2015 = '/phoebekinzelman/research/greenland/2000_2015/'
    if BoxID in badvelocities:
        # grab the 2000 and 2015 delineation centroids:
        shp2000 = fiona.open(path2000_2015+'GreenlandPeriph_term2000_'+BoxID+'.shp'); feat2000= shp2000.next()
        lineshp2000 = LineString(feat2000['geometry']['coordinates'])
        cent2000 = np.array(lineshp2000.centroid)

        shp2015 = fiona.open(path2000_2015+'GreenlandPeriph_term2015_'+BoxID+'.shp'); feat2015= shp2015.next()
        lineshp2015 = LineString(feat2015['geometry']['coordinates'])
        cent2015 = np.array(lineshp2015.centroid)

        # grab displacements and use to calculate flow direction in degrees
        y = cent2000[1] - cent2015[1]
        x = cent2000[0] - cent2015[0]
        avg_dir = np.arctan2(y,x)*180/np.pi
        if avg_dir < 0:
            avg_dir = 360.0+avg_dir
         
        # if max_magnitude cannot be calculated from the velocity raster (pixels == 0)
        if ncells == 0:
            # use displacements and time to approximate speed in m/d
            yrs = 15.0
            max_magnitude = np.sqrt((y*y)+(x*x))/yrs*yr_day_conv
                
        ncells = np.NaN
    ##################################################################################################################
    
    # Append values to lists:
    avg_rot.append(avg_dir); max_mag.append(max_magnitude); boxes.append(BoxID); num_cells.append(ncells)  

# store the flow direction (rotation angle), maximum magnitude
velocities_df = pd.DataFrame(list(zip(boxes,avg_rot, max_mag, num_cells)), columns=['BoxID','Flow_dir', 'Max_speed', 'Pixels'])
velocities_df = velocities_df.sort_values(by='BoxID')
velocities_df # display

RasterioIOError: /home/jukes/Documents/Sample_glaciers/Box008/RGI_Box008_greenland_vel_mosaic250_vx_v1.tif: No such file or directory

In [40]:
# write velocity data to CSV
VEL_FILENAME = 'Glacier_vel'+csvext 
velocities_df.to_csv(path_or_buf = basepath+VEL_FILENAME, sep=',')

# 8) Rotate all images by flow direction

In [110]:
# Read in the glacier velocity file as velocities_df if not already loaded:
velocities_df = pd.read_csv(basepath+VEL_FILENAME, sep=',', dtype=str, usecols=[1,2,3,4])
velocities_df = velocities_df.set_index('BoxID')

In [111]:
# make directory for rotated images in BoxID folders if it doesn't already exist
for index, row in velocities_df.iterrows():
    BoxID = index
    if os.path.exists(downloadpath+"Box"+BoxID+'/rotated_c1/'):
        print("Already exists.")
    else:
        os.mkdir(downloadpath+"Box"+BoxID+'/rotated_c1/')
        print("Folder made for Box"+BoxID)

Already exists.


In [112]:
# move rasterized terminus box into reprojected folder, since it will also need to be rotated:
for index, row in velocities_df.iterrows():
    BoxID = index
    boxfile = 'Box'+BoxID+'_raster_cut.TIF'
    boxrasterpath = basepath+'Box'+BoxID+'/'+boxfile
    newpath = downloadpath+'Box'+BoxID+'/reprojected/'+boxfile
    shutil.copyfile(boxrasterpath, newpath)

In [126]:
# convert all images to png for rotation:
for index, row in velocities_df.iterrows():
    BoxID = index
    command = 'cd '+downloadpath+'Box'+BoxID+'/reprojected/; '+'magick mogrify -format png *.TIF'
    subprocess.run(command, shell=True,check=True)

cd /Users/phoebekinzelman/research/greenland/LS8aws/Box008/reprojected/; magick mogrify -format png *.TIF
madeit




In [127]:
# rotate
for index, row in velocities_df.iterrows():
    BoxID = index; print("Box"+BoxID) # keep track of progress
    for file in os.listdir(downloadpath+"Box"+BoxID+'/reprojected/'):
        if file.endswith('.png'):
            print(file)
            img  = Image.open(downloadpath+"Box"+BoxID+'/reprojected/'+file)
            rotated = img.rotate(-float(row['Flow_dir']))
            rotated.save(downloadpath+"Box"+BoxID+'/rotated_c1/R_'+file)

008
LC08_L1TP_031005_2021_B8_Buffer008_PS.png
LC08_L1TP_029006_2015_B8_Buffer008_PS.png
LC08_L1TP_031005_2020_B8_Buffer008_PS.png
LC08_L1TP_029006_2014_B8_Buffer008_PS.png
LC08_L1TP_031006_2019_B8_Buffer008_PS.png
LC08_L1TP_027006_2018_B8_Buffer008_PS.png
LC08_L1TP_029006_2017_B8_Buffer008_PS.png
LC08_L1TP_029006_2016_B8_Buffer008_PS.png
LC08_L1TP_027006_2019_B8_Buffer008_PS.png
LC08_L1TP_031006_2018_B8_Buffer008_PS.png
LC08_L1TP_032005_2013_B8_Buffer008_PS.png
LC08_L1TP_030006_2017_B8_Buffer008_PS.png
LC08_L1TP_028006_2019_B8_Buffer008_PS.png
LC08_L1TP_032005_2015_B8_Buffer008_PS.png
LC08_L1TP_028006_2018_B8_Buffer008_PS.png
LC08_L1TP_030006_2016_B8_Buffer008_PS.png
LC08_L1TP_032005_2014_B8_Buffer008_PS.png
LC08_L1TP_032005_2017_B8_Buffer008_PS.png
LC08_L1TP_030006_2015_B8_Buffer008_PS.png
LC08_L1TP_032005_2016_B8_Buffer008_PS.png
LC08_L1TP_030006_2014_B8_Buffer008_PS.png
LC08_L1TP_029006_2013_B8_Buffer008_PS.png
LC08_L1TP_030006_2019_B8_Buffer008_PS.png
LC08_L1TP_028006_2017_B8_Buffe

# 9) Crop all images to the same size

All input images will need to be the same size for the automated terminus detection analysis. The function resize_pngs resizes all png files within a folder to the minimum image dimensions found. The function crops around the edges, centering the image.

In [128]:
for BoxID in BoxIDs:
    resizepath = downloadpath+"Box"+BoxID+'/rotated_c1/' # path to rotated images
    resize_pngs(resizepath) # crop

In [None]:
# convert all final files to pgm
for index, row in velocities_df.iterrows():
    BoxID = index
    command = 'cd '+downloadpath+'Box'+BoxID+'/rotated_c1/; '+'mogrify -depth 16 -format pgm *.png'
    subprocess.call(command, shell=True)

In [None]:
# # rename the rasterized terminus box files if necessary
# for BoxID in BoxIDs:
#     files = os.listdir(downloadpath+'Box'+BoxID+'/rotated_c1/')
#     for file in files:
#         if file.startswith('R_Box'+BoxID+'_cut'):
#             rpath = downloadpath+'Box'+BoxID+'/rotated_c1/'
#             os.rename(rpath+file, rpath+'R_Box'+BoxID+'_raster_cut'+file[-4:])