# Code to download LS8 images for Greenland peripheral glaciers using Amazon Web Services (aws) 

### Jukes Liu

The following code automatically downloads Landsat 8 scenes available through Amazon Web Services that have less than a threshold % of cloud cover. The Landsat 8 scenes over each glacier are identified using their pre-determined path and row, stored in a .csv file. The scenes are filtered for cloud cover using their metadata files.

## 1) Set up:

#### Install AWS using pip or pip3

Must have Amazon Web Services installed on your terminal. Follow instructions at https://docs.aws.amazon.com/cli/latest/userguide/install-linux-al2017.html to get aws commands onto your shell terminal.

#### Import packages

In [1]:
import pandas as pd
import numpy as np
import os
import subprocess

#### Read in LS path and row for each peripheral glacier by BoxID into a DataFrame

The LS Path and Row information for each peripheral glacier is stored in a .csv file. 

Note: Many glaciers exist in the same Landsat scene, so some Paths and Rows are repeated. Therefore, the subsequent code will not repeat download for a path and row combination that already exists in the output directory.

In [2]:
#set basepath
basepath = '/home/jukes/Documents/Sample_glaciers/'
#basepath = '/home/automated-glacier-terminus/'
outputpath = '/media/jukes/jukes1/'

#read the path row csv file into a dataframe
pathrows_df = pd.read_csv(basepath+'LS_pathrows.csv', sep=',', usecols =[0,1,2], dtype=str, nrows =10)
pathrows_df = pathrows_df.set_index('BoxID')
pathrows_df.head()

Unnamed: 0_level_0,Path,Row
BoxID,Unnamed: 1_level_1,Unnamed: 2_level_1
1,34,5
2,31,5
4,31,5
33,8,14
120,232,17


In [3]:
#check the df dimensions
pathrows_df.shape

(10, 2)

#### Create output directory: LS8aws

In [4]:
#create LS8aws folder
if os.path.exists(outputpath+'LS8aws')==True:
    print("Path exists already")
else:
    os.mkdir(outputpath+'LS8aws')
    print("LS8aws directory made")

Path exists already


## 2) Download B8 (panchromatic band) and MTL.txt (metadata) files for all available images over the path/row of the glaciers

The Landsat8 scenes stored in AWS can be accessed using the landsat-pds bucket and the path and row information. Each of the bands and a metadata file can be accessed separately. 

We are interested in the panchromatic band (B8.TIF) and the metadata file to filter for cloud cover (MTL.txt). The download commands will use the following syntax:


    aws --no-sign-request s3 cp s3://landsat-pds/L8/path/row/LC8pathrowyear001LGN00/LC8pathrowyear001LGN00_MTL.txt /path_to/output/

    aws --no-sign-request s3 cp s3://landsat-pds/L8/path/row/LC8pathrowyear001LGN00/LC8pathrowyear001LGN00_B8.TIF /path_to/output/

Access https://docs.opendata.aws/landsat-pds/readme.html to learn more.

### 2A) Option 1: For one BoxID (one glacier) at a time

In [20]:
#choose a glacier: Box002
boxid = pathrows_df.index[1]
path = pathrows_df['Path'][1]
row = pathrows_df['Row'][1] 
print('BoxID ', boxid, 'path', path, 'row', row)

#set path row folder name
folder_name = 'Path'+path+'_Row'+row
print(folder_name)

#set input path
bp_in = 's3://landsat-pds/L8/'
totalp_in = bp_in+path+'/'+row+'/'
print(totalp_in)

#set output path
bp_out = outputpath+'LS8aws/'+folder_name+'/'
print(bp_out)

BoxID  002 path 031 row 005
Path031_Row005
s3://landsat-pds/L8/031/005/
/media/jukes/jukes1/LS8aws/Path031_Row005/


#### Create the Path_Row folder

In [270]:
#create Path_row folder and write path names to txt files
if os.path.exists(bp_out):
    print(folder_name, " EXISTS ALREADY. SKIP.")
else:
    os.mkdir(bp_out)
    print(folder_name+" directory made")

Path031_Row005 directory made


#### Download all the metadata text files using os.system aws commands

Use the following syntax:

    aws --no-sign-request s3 cp s3://landsat-pds/L8/031/005/ Output/path/LS8aws/Path031_Row005/ --recursive --exclude "*" --include "*.txt" 

In [18]:
#Check command syntax:
command = 'aws --no-sign-request s3 cp '+totalp_in+' '+bp_out+' --recursive --exclude "*" --include "*.txt"'
print(command)

aws --no-sign-request s3 cp s3://landsat-pds/L8/031/005/ /media/jukes/jukes1/LS8aws/Path031_Row005/ --recursive --exclude "*" --include "*.txt"


In [19]:
#call the command line that downloads the metadata files using aws
subprocess.call(command, shell=True)

0

#### Filter for cloud cover

If the metadata files indicate land cloud cover is less than the threshold, then download the B8, otherwise, delete the folder. Not all metadata files contain the land cloud cover, some only contain the overall cloud cover. If land cloud cover is not found, use the cloud cover value to determine whether the image should be downloaded .Use the following metadata attributes:

  GROUP = IMAGE_ATTRIBUTES
  
    CLOUD_COVER = 23.58
    CLOUD_COVER_LAND = 20.41

In [22]:
#set cloud cover % thresholds
ccland_thresh = 30.0
cc_thresh = 50.0

In [23]:
#loop through all the metadata files in the path_row folder:
for image in os.listdir(bp_out):
    if image.startswith("LC"):
        #list the name of the image folder
        print(image)
        
        #open the metadata file within that folder
        mdata = open(bp_out+image+"/"+image+"_MTL.txt", "r")
        
        #set a detection variable for whether or not the metadata contains land cloud cover
        ccl_detected = False
        
        #loop through each line in metadata to find Land Cloud Cover
        for line in mdata:
            cc_variable = line.split("=")[0]
            
            #if there is land cloud cover:
            if ("CLOUD_COVER_LAND" in cc_variable):
                #save it:
                ccl = np.float(line.split("=")[1])
                         
                #switch the ccl_detected variable to True!
                ccl_detected = True
                    
                #if the ccl is less than the threshold, delete the file
                if ccl > ccland_thresh:
                    #remove the image directory
                    subprocess.call('rm -r '+bp_out+image, shell=True)
                    print(ccl, ' > ', ccland_thresh, ", ", image, "removed")
                #otherwise: 
                else:
                    #DOWNLOAD THE B8 FILE
                    subprocess.call('source activate aws; aws --no-sign-request s3 cp '+totalp_in+' '+bp_out+' --recursive --exclude "*" --include "*B8.TIF"', shell=True)
                    print(image, "B8 downloaded -ccl ")
        
        #Was the ccl detected?
        print("CCL detected = ", ccl_detected)
                        
        #if False,use the overall cloud cover:
        if ccl_detected == False:   
            print("CCL not detected, use CC.")
            
            #open the metadata file again
            mdata = open(bp_out+image+"/"+image+"_MTL.txt", "r")
            for line in mdata:
                variable = line.split("=")[0]
                
                #now there should only be one line starting with cloud_cover
                if ("CLOUD_COVER" in variable):       
                    #save the cloud cover:
                    cc = np.float(line.split("=")[1])

                    #if the cc is less than the threshold, delete the file:
                    if cc > cc_thresh:
                        #remove the image directory
                        subprocess.call('rm -r '+bp_out+image, shell=True)
                        print(cc, ' > ', cc_thresh, ", ", image, "removed")

                    #otherwise: 
                    else:
                        #DOWNLOAD THE B8 FILE
                        subprocess.call('source activate aws; aws --no-sign-request s3 cp '+totalp_in+' '+bp_out+' --recursive --exclude "*" --include "*B8.TIF"', shell=True)
                        print(image, "B8 downloaded -cc")

LC80310052014146LGN00
LC80310052014146LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80310052017074LGN00
LC80310052017074LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80310052016168LGN00
LC80310052016168LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80310052016232LGN00
83.21  >  30.0 ,  LC80310052016232LGN00 removed
CCL detected =  True
LC80310052016120LGN00
58.87  >  30.0 ,  LC80310052016120LGN00 removed
CCL detected =  True
LC80310052016184LGN00
51.23  >  30.0 ,  LC80310052016184LGN00 removed
CCL detected =  True
LC80310052016280LGN00
LC80310052016280LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80310052015277LGN00
LC80310052015277LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80310052015149LGN00
LC80310052015149LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80310052014098LGN00
LC80310052014098LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80310052015165LGN00
LC80310052015165LGN00 B8 downloaded -ccl 
CCL detected =  True
LC80310052013271LGN00
LC80310052013271LGN00 B8 

### 2B) Option 2: For all glaciers, loop through the DataFrame and perform all the Option 1 steps

In [5]:
#SET cloud cover thresholds for filtering
ccland_thresh = 30.0
cc_thresh = 50.0



#LOOP through each of the glaciers in the DataFrame and download for each path and row
for i in range(0, len(pathrows_df.index)):
    #SET path and row variables to the LS path and rows of the box
    path = pathrows_df['Path'][i]
    row = pathrows_df['Row'][i]
    #print(path, row)
    
    #1) CREATE path and row folders to download into and set input output paths
    #SET path row folder name
    folder_name = 'Path'+path+'_Row'+row
    print(folder_name)
    
    #SET input path
    bp_in = 's3://landsat-pds/L8/'
    totalp_in = bp_in+path+'/'+row+'/'
    #print(totalp_in)

    #SET output path
    bp_out = basepath+'LS8aws/'+folder_name+'/'
    #print(bp_out)
    
    
    
    #IF the folder exists, it's already been downloaded, do not attempt download.
    if os.path.exists(bp_out):
        print(folder_name, " EXISTS ALREADY. SKIP.")
    #2) OTHERWISE, create the folder and download into it
    else:
        os.mkdir(bp_out)
        print(folder_name+" directory made")

        
        #3) DOWNLOAD metadata files into the new path-row folder
        #CHECK COMMAND SYNTAX
        command = 'source activate aws; aws --no-sign-request s3 cp '+totalp_in+' '+bp_out+' --recursive --exclude "*" --include "*.txt"'
        #print(command)
        subprocess.call(command, shell=True)
        
        
        
        #4) LOOP through all the files in the path_row folder to download based on cloud cover
        # in the metadata files
        for image in os.listdir(bp_out):
            if image.startswith("LC"):
                #list the name of the image folder
                print(image)

                #open the metadata file within that folder
                mdata = open(bp_out+image+"/"+image+"_MTL.txt", "r")

                #set a detection variable for whether or not the metadata contains land cloud cover
                ccl_detected = False

                #loop through each line in metadata to find the land cloud cover
                for line in mdata:
                    cc_variable = line.split("=")[0]

                    #if there is land cloud cover:
                    if ("CLOUD_COVER_LAND" in cc_variable):
                        #save it:
                        ccl = np.float(line.split("=")[1])

                        #switch the ccl detected variable to True
                        ccl_detected = True

                        #if the ccl is less than the threshold, delete the file
                        if ccl > ccland_thresh:
                            #remove the image directory
                            #subprocess.call('rm -r '+bp_out+image, shell=True)
                            print(ccl, ' > ', ccland_thresh, ", ", image, "removed")
                        #otherwise: 
                        else:
                            #download the B8 file
                            #subprocess.call('source activate aws; aws --no-sign-request s3 cp '+totalp_in+' '+bp_out+' --recursive --exclude "*" --include "*B8.TIF"', shell=True)
                            print(image, "B8 downloaded -ccl ")

                print("CCL detected = ", ccl_detected)

                #if False,use the overall cloud cover:
                if ccl_detected == False:   
                    print("CCL not detected, use CC.")

                    #open the metadata file again
                    mdata = open(bp_out+image+"/"+image+"_MTL.txt", "r")
                    for line in mdata:
                        variable = line.split("=")[0]

                        #now there should only be one line starting with cloud_cover
                        if ("CLOUD_COVER" in variable):       
                            #save the cloud cover:
                            cc = np.float(line.split("=")[1])

                            #if the cc is less than the threshold, delete the file:
                            if cc > cc_thresh:
                                #remove the image directory
                                #subprocess.call('rm -r '+bp_out+image, shell=True)
                                print(cc, ' > ', cc_thresh, ", ", image, "removed")

                            #otherwise: 
                            else:
                                #DOWNLOAD THE B8 FILE
                                #subprocess.call('source activate aws; aws --no-sign-request s3 cp '+totalp_in+' '+bp_out+' --recursive --exclude "*" --include "*B8.TIF"', shell=True)
                                print(image, "B8 downloaded -cc")

KeyError: 'BoxID'

## For download using Google instead, follow these instructions:

Use gsutil: https://krstn.eu/landsat-batch-download-from-google/

To access a scene for Path 124, Row 053, use this syntax:

gsutil cp -n gs://earthengine-public/landsat/L8/124/053/LC81240532013107LGN01.tar.bz /landsat/
