# How to bulk download LS8 images using Amazon Web Services (aws)

This tutorial goes through how to bulk download Landsat-8 images stored on the Amazon Web Services cloud (s3 bucket) over a region of interest. You can access each band and the metadata file for each LS8 scene separately! Read more about the Landsat-8 data availability on the AWS cloud here: https://registry.opendata.aws/landsat-8/. You will need to install AWS and have a shapefile over your region of interest. Make sure to also have the Geospatial Data Abstraction Library (GDAL) downloaded so you can run commands like __ogr2ogr__ and __gdalwarp__ from your command terminal.

_by Jukes Liu. Last modified 10-04-2019._

## 1) Set-up
#### Install AWS command line interface using pip or pip3

This download workflow requires you to have the Amazon Web Services command line interface installed on your terminal. Follow instructions at https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html to get aws commands onto your shell terminal.

#### Import packages:

In [14]:
import pandas as pd
import numpy as np
import os
import subprocess
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy.ma as ma

#geospatial packages
import fiona
import geopandas as gpd
from shapely.geometry import Polygon, Point
import shapely

# Enable fiona KML file reading driver
gpd.io.file.fiona.drvsupport.supported_drivers['KML'] = 'rw'

If some these packages cannot be imported, you may not have those modules installed on your machine or the path is different from the one accessed by Jupyter notebook. Either way, run the following cell to install the package that is giving you an error and try importing them after:

In [15]:
# import sys
# !{sys.executable} -m pip install fiona

Replace os with whichever package you want to install.

#### Set working paths and create a new folder for your LS8 downloads:

In [16]:
#Set a basepath for your input files (shapefiles, WRS world bound kml file, etc.) here:
basepath = '/home/jukes/Documents/Sample_glaciers/'

#Set an output path for your downloaded images here:
outputpath = '/media/jukes/jukes1/'

#create a new folder to hold your new downloads in your output path:
newfoldername = 'LS8aws'

if os.path.exists(outputpath+newfoldername)==True:
    print("Path exists already")
else:
    os.mkdir(outputpath+newfoldername)
    print("LS8aws directory made")

Path exists already


## 2) Find all LS Path Row combinations over the shapefile

#### Set the path to your shapefile over the region of interest:

In [30]:
shpname_no_ext = "Box001"
pathtoshp = basepath+"Box001/"+shpname_no_ext

If your shapefile is not in the WRS projection (ESPG: 4326), reproject it into WRS84 coordinates using GDAL command __ogr2ogr__:

    ogr2ogr -f "ESRI Shapefile" -t_srs EPSG:NEW_EPSG_NUMBER -s_srs EPSG:OLD_EPSG_NUMBER output.shp input.shp

In [31]:
rp_command = 'ogr2ogr -f "ESRI Shapefile" -t_srs EPSG:4326 -s_srs EPSG:3413 '+pathtoshp+'_WRS.shp '+pathtoshp+'.shp'
#print the command to check syntax:
print(rp_command)
#Then uncomment the following line and run it:
# subprocess.call(rp_command, shell=True)

ogr2ogr -f "ESRI Shapefile" -t_srs EPSG:4326 -s_srs EPSG:3413 /home/jukes/Documents/Sample_glaciers/Box001/Box001_WRS.shp /home/jukes/Documents/Sample_glaciers/Box001/Box001.shp


#### Now, we read in the shapefile as a fiona vector feature and grab its vertex coordinates:

In [45]:
#new path to the reprojected shapefile and open it using fiona
WRS84_shp = pathtoshp+'_WRS.shp'
shp = fiona.open(WRS84_shp)

shp_feature = shp.next()
shp_geom= shp_feature.get('geometry')
shp_coords = shp_geom.get('coordinates')[0]
print("Coordinates:", shp_coords)
#if it's a shapefile with n vertices, there should be n+1 coordinate pairs with the 1st and last being the same

#Grab the vertex points and turn them into shapely geometries stored in a list
points = []

#loops through all of the coordinate pairs from above:
for coord_pair in shp_coords:
    lat = coord_pair[0]
    lon = coord_pair[1]
    
    #create shapely points and append to points list
    point = shapely.geometry.Point(lat, lon)
    points.append(point)

Coordinates: [(-69.95445523105522, 77.23057991382787), (-69.95269043891739, 77.24074644306974), (-69.92754635068265, 77.24053201366061), (-69.92933129107398, 77.23036566058502), (-69.95445523105522, 77.23057991382787)]


  """


#### Read in the LS8 Path Row footprint .kml file and find which Path Row combinations overlay your shapefile!

In [51]:
#open the kml file with the pathrow bounds as WRS
WRS = fiona.open(basepath+'WRS-2_bound_world.kml', driver='KML')

The following code loops through all the Path Row (features) combinations and stores those that contain ALL of your shapefile vertices:

In [60]:
#create lists to hold the paths and rows
paths = []
rows = []

#loop through all features in the WRS .kml:
for feature in WRS:
    #create shapely polygons with the path row bounds
    coordinates = feature['geometry']['coordinates'][0]
    coords = [xy[0:2] for xy in coordinates]
    pathrow_poly = Polygon(coords)
    
    #grab the path and row from the metadata
    pathrowname = feature['properties']['Name']  
    path = pathrowname.split('_')[0]
    row = pathrowname.split('_')[1]
    
    #create a counter for the number of shapefile vertices found in the path row footpring
    points_in = 0
    
    #for each path row, loop through each of the box_points
    for point in points:
        #if the pathrow shape contains the point
        if point.within(pathrow_poly):
            #append the counter
            points_in = points_in+1
        
    #If the number of vertices found in the Path Row footprint is equal to the total number (all of them)
    if points_in == len(points):
        #append the path and row to the lists (3 digit formatting)
        paths.append('%03d' % int(path))
        rows.append('%03d' % int(row))

#Store in a dataframe:
pr_df = pd.DataFrame(list(zip(paths, rows)), columns=['Path', 'Row'])
pr_df

Unnamed: 0,Path,Row
0,35,5
1,34,5
2,33,5
3,32,5
4,31,5
5,37,4
6,36,4


#### Optional: write the path row combinations found to a .csv file

In [63]:
# #write the data frame to csv file
# pr_df.to_csv(path_or_buf = basepath+'LS_pathrows_'+shpname_no_ext+'.csv', sep=',')

## 3) Download metadata (MTL.txt) files for all available images over the shapefile region

It's good practice to keep the metadata file around for any images that you're using, so let's go ahead and download all of them for the available image over your region. This is the first instance where we're using the AWS command line interface. We'll use this syntax:

     aws --no-sign-request s3 cp s3://landsat-pds/L8/path/row/LC8pathrowyear001LGN00/LC8pathrowyear001LGN00_MTL.txt /path_to/output/

Access https://docs.opendata.aws/landsat-pds/readme.html to learn more.

#### To keep things organized, let's create folders corresponding to the Path_Row IDs and download the MTL.txt files into them.

#### Download the metadatafiles into these path row folders using the following syntax:

    aws --no-sign-request s3 cp s3://landsat-pds/L8/031/005/ Output/path/LS8aws/Path031_Row005/ --recursive --exclude "*" --include "*.txt" 

In [70]:
#Loop through the dataframe with your path row combinations:
for index, row in pr_df.iterrows():
    #grab the path row names and set the folder name
    path = row['Path']
    row = row['Row']
    folder_name = 'Path'+path+'_Row'+row

    #set basepath to access the image on the amazon cloud
    bp_in = 's3://landsat-pds/L8/'
    totalp_in = bp_in+path+'/'+row+'/'
    print(totalp_in)

    #set output path for the downloaded files:
    bp_out = outputpath+newfoldername+'/'+folder_name+'/'
    print(bp_out)
    
    #create Path_Row folders if they don't exist already
    if os.path.exists(bp_out):
        print(folder_name, " EXISTS ALREADY. SKIP.")
    else:
        os.mkdir(bp_out)
        print(folder_name+" directory made")
        
    #Check download command syntax:
    command = 'aws --no-sign-request s3 cp '+totalp_in+' '+bp_out+' --recursive --exclude "*" --include "*.txt"'
    print(command)
    
    #When you've checked everything, uncomment the following to run:
#     #call the command line that downloads the metadata files using aws
#     subprocess.call(command, shell=True)

Path035_Row005  EXISTS ALREADY. SKIP.
aws --no-sign-request s3 cp s3://landsat-pds/L8/035/005/ /media/jukes/jukes1/LS8aws/Path035_Row005/ --recursive --exclude "*" --include "*.txt"
Path034_Row005  EXISTS ALREADY. SKIP.
aws --no-sign-request s3 cp s3://landsat-pds/L8/034/005/ /media/jukes/jukes1/LS8aws/Path034_Row005/ --recursive --exclude "*" --include "*.txt"
Path033_Row005  EXISTS ALREADY. SKIP.
aws --no-sign-request s3 cp s3://landsat-pds/L8/033/005/ /media/jukes/jukes1/LS8aws/Path033_Row005/ --recursive --exclude "*" --include "*.txt"
Path032_Row005  EXISTS ALREADY. SKIP.
aws --no-sign-request s3 cp s3://landsat-pds/L8/032/005/ /media/jukes/jukes1/LS8aws/Path032_Row005/ --recursive --exclude "*" --include "*.txt"
Path031_Row005  EXISTS ALREADY. SKIP.
aws --no-sign-request s3 cp s3://landsat-pds/L8/031/005/ /media/jukes/jukes1/LS8aws/Path031_Row005/ --recursive --exclude "*" --include "*.txt"
Path037_Row004  EXISTS ALREADY. SKIP.
aws --no-sign-request s3 cp s3://landsat-pds/L8/037/