# Week 5: Accessing satellite data from Google Earth Engine from Python

Individual learning outcomes: At the end of this week, all students should be able to access Sentinel-2 image composites from Google Earth Engine via the Python API, set up and submit a data query, download the data to Google Drive and Colab, and create a movie from a time series.

# Get a user account for Google Earth Engine

Before we begin, make sure to register for an account.

For registration follow the link to the Open Access Hub and register: https://signup.earthengine.google.com/#!/

In previous weeks, we had manually uploaded a Sentinel-2 image to our Google Drive directory.

Today, we want to access Sentinel-2 imagery from Google Earth Engine (GEE) and search for available images over an area of interest of our choice. GEE allows users to submit a processing request. This is different from just accessing data, as it allows the user to request image composites that are aggregated from several different individual image takes from different dates, and the user can define the area for the download.

# Accessing Sentinel-2 images

Workflow for this practical:
* Define an area of interest based on an ESRI shapefile
* Define a time window for our data search
* Set a maximum acceptable cloud cover for our search
* Use Google Earth Engine to make temporal composites of available images for selected spectral bands
* Download them to your Google Drive
* Reproject (warp) the images to the projection of the shapefile
* Plot maps of the images
* Make a movie for our area of interest


Connect to our Google Drive from Colab.

In [None]:
# Load the Drive helper and mount your Google Drive as a drive in the virtual machine
from google.colab import drive
drive.mount('/content/drive')

Import required libraries

In [None]:
# install some libraries that are not on Colab by default
!pip install rasterio
!pip install geopandas
!pip install rasterstats
!pip install earthengine-api
!pip install requests
!pip install sentinelsat

# import libraries
import geopandas as gpd
import rasterio
from rasterio import plot
from rasterio.plot import show_hist
import matplotlib.pyplot as plt
import numpy as np
from osgeo import gdal, ogr
import json
import os
from os import listdir
from os.path import isfile, isdir, join
import math
from pprint import pprint
import shutil
import sys
import zipfile
import requests
import io
import webbrowser
import ee

# make sure that this path points to the location of the pygge module on your Google Drive
libdir = '/content/drive/MyDrive/practicals21-22' # this is where pygge.py needs to be saved
if libdir not in sys.path:
    sys.path.append(libdir)

# import the pygge module
import pygge

%matplotlib inline

# Set up some directory paths on Google Drive
Modify these string variables to match your data directory structure if need be.

BEFORE YOU RUN THIS CELL, EDIT THE VARIABLE wd BELOW TO POINT TO YOUR DIRECTORY ON GOOGLE DRIVE

IMPORTANT: You must upload a shapefile of your area of interest to your Google Drive before running the next cell. Set the variable 'shapefile' below to point to this file. You can draw a polygon and save it as a shapefile on http://www.geojson.io.

In [None]:
# Connect to Google Earth Engine API
# This will open a web page where you have to enter your account information and a code is provided. Paste it in the terminal.
!earthengine authenticate

ee.Initialize()

In [None]:
# set up your directories for the satellite data
# Note that we do all the downloading and data analysis on the temporary drive
#    on Colab. We will copy the output directory to our Google Drive at the end.
#    Colab has more disk space (about 40 GB free space) than Google Drive (15 GB).
#    However, the data on the Colab disk space are NOT kept when you log out.

# path to your Google Drive
# EDIT THIS LINE (/content/drive/MyDrive is the top directory on Google Drive):
wd = "/content/drive/MyDrive/practicals21-22"
print("Connected to data directory: " + wd)

# path to your temporary drive on the Colab Virtual Machine
cd = "/content/work"

# directory for downloading the Sentinel-2 composites
# Note that we are using the 'join' function imported from the os library here
# It is an easy way of merging strings into a directory structure.
# It is clever and chooses the / or \ depending on whether you are on Windows or Linux.
downloaddir = join(cd, 'download') # where we save the downloaded images

# CAREFUL: This code removes the named directories and everything inside them to free up space
# Note: shutil provides a lot of useful functions for file and directory management
try:
  shutil.rmtree(downloaddir)
except:
  print(downloaddir + " not found.")

# create the new directories, unless they already exist
os.makedirs(cd, exist_ok=True)
os.makedirs(downloaddir, exist_ok=True)

print("Connected to Colab temporary data directory: " + cd)

print("\nList of contents of " + wd)
for f in sorted(os.listdir(wd)):
  print(f)

# Define our search parameters

You can modify some of the parameters and upload your own shapefile.

In [None]:
# EDIT THE SEARCH OPTIONS BELOW

# YOU CAN PLACE A DIFFERENT SHAPEFILE ONTO YOUR GOOGLE DRIVE BUT MAKE SURE THAT
#    THE VARIABLE shapefile POINTS TO THE CORRECT FILE:
shapefile = join(wd, 'oakham', 'Polygons_small.shp') # ESRI Shapefile of the study area

# Define a date range for our search
datefrom = '2019-03-01' # start date for imagery search
dateto   = '2019-04-30' # end date for imagery search
time_range = [datefrom, dateto] # format as a list

# Define which cloud cover we accept in the images
clouds = 10 # maximum acceptable cloud cover in %

# Authenticate to the Google Earth Engine API.

API stands for 'application programming interface'. An API defines interactions between multiple software intermediaries, in this case between our Jupyter Notebook and the ESA Copernicus Data Hub. It defines the kinds of calls or requests that can be made, how to make them, the data formats that should be used, the conventions to follow etc. (text modified after Wikipedia)

Get some information about our shapefile.

In [None]:
# Get the shapefile layer's extent, CRS and EPSG code
extent, outSpatialRef, epsg = pygge.get_shp_extent(shapefile)
print("Extent of the area of interest (shapefile):\n", extent)
print(type(extent))
print("\nCoordinate referencing system (CRS) of the shapefile:\n", outSpatialRef)
print('EPSG code: ', epsg)

Get the extent of the shapefile into a format that Google Earth Engine understands.

Look at the printed outputs of the type conversions. The code will make more sense then.

In [None]:
# GEE needs a special format for defining an area of interest. 
# It has to be a GeoJSON Polygon and the coordinates should be first defined in a list and then converted using ee.Geometry. 
extent_list = list(extent)
print(extent_list)
print(type(extent_list))
# close the list of polygon coordinates by adding the starting node at the end again
# and make list elements in the form of coordinate pairs (y,x)
area_list = list([(extent[0], extent[2]),(extent[1], extent[2]),(extent[1], extent[3]),(extent[0], extent[3]),(extent[0], extent[2])])
print(area_list)
print(type(area_list))

search_area = ee.Geometry.Polygon(area_list)
print(search_area)
print(type(search_area))

Now we can access the Sentinel-2 collection on Google Earth Engine and run our search. This will return a URL (web link) from which we can download the data.

In [None]:
# Obtain download links for image composites from an image collection on Google Earth Engine
# All products available are detailed on this page https://developers.google.com/earth-engine/datasets/.

# Name of the Sentinel 2 image collection
s2collection = ('COPERNICUS/S2')

# get the median composite of Sentinel-2 images in the time range
s2median = pygge.obtain_image_sentinel(s2collection, time_range, search_area, clouds)

# to save disk space, we may want to download only certain bands
# band names for download, a list of strings
# only download R,G,B and NIR bands
bands = ['B2', 'B3', 'B4', 'B8']
print(bands)

# spatial resolution of the downloaded data
resolution = 20 # in units of metres

# Download images in Geotiff, using the get_url(name, image, scale, region) method
# ‘region’ is obtained from the area, but the format has to be adjusted using get_region(geom) method
search_region = pygge.get_region(search_area)
s2url = pygge.get_url('s2', s2median.select(bands), resolution, search_region, filePerBand=False)
print(s2url)

# Download the data

The next cell downloads the image composite as a zip file and unzips it.

In [None]:
# change directory to download directory
os.chdir(downloaddir)

# request information on the file to be downloaded
f = pygge.requests.get(s2url, stream =True)

# check whether it is a zip file
check = zipfile.is_zipfile(io.BytesIO(f.content))

# either download the file as is, or unzip it
while not check:
    f = requests.get(s2url, stream =True)
    check = zipfile.is_zipfile(io.BytesIO(f.content))
else:
    z = zipfile.ZipFile(io.BytesIO(f.content))
    z.extractall()

# Explore the data directory structure of our downloaded files


In [None]:
# where we stored the downloaded Sentinel-2 images
os.chdir(downloaddir)
print("contents of ", downloaddir, ":")
!ls -l

You should see the downloaded file.

Remember that we have saved the downloaded images to a temporary directory that will be deleted when we close the virtual machine. If you want to save your images to your local directory, this is how it goes.

Go to your Google Colab  folder in the panel on the left hand side.

Find the download directory and click on a Sentinel-2 image folder.

Right-click on it and select 'download' to save it.

# Show the image as a true colour composite

A true colour composite is a visualisation where the red, green and blue channels of the sensors are shown in the same colour on screen. Let's visualise our data composite in this way.

First, let's see what tiff files are in our directory.


In [None]:
# get list of all tiff files in the directory
allfiles = [f for f in listdir(downloaddir) if isfile(join(downloaddir, f))]
print(allfiles)

# select the file for visualisation
thisfile = allfiles[0]
print(thisfile)

Make some maps.

In [None]:
# create a figure with 2x3 subplots
fig, (ax1, ax2, ax3) = plt.subplots(3,1, figsize=(10,16))
fig.patch.set_facecolor('white')

# the downloaded file is float32 data format
# for plotting, we need uint8 data format

# plot the image with full extent
pygge.easy_plot(thisfile, ax=ax1, bands=[3,2,1], percentiles=[0,99])

# zoom in to an area of interest
pygge.easy_plot(thisfile, ax=ax2, bands=[3,2,1], percentiles=[0,99], xlim=[-0.75, -0.70], ylim=[52.66, 52.68])

# zoom in elsewhere
pygge.easy_plot(thisfile, ax=ax3, bands=[3,2,1], percentiles=[0,99], xlim=[-0.70, -0.60], ylim=[52.63, 52.68])

# Warp the downloaded image composite into another map projection

The coordinate reference system (CRS) of the downloaded image composite is not in the UK national map projection. We will hence reproject it.

In [None]:
# print the EPSG code of our shapefile into which we want to reproject the TCI images
print("Reprojecting image to EPSG projection ", epsg)

# make a file name for our new file
warpfile = thisfile.split(sep='.')[0] + '_warped.tif'
print("We are in this directory: ")
!pwd
print("Input file: ", thisfile)
print("Output file: ", warpfile)

# call the easy_warp function
tmp = pygge.easy_warp(thisfile, warpfile, epsg)

# Plot the shapefile on top of the raster

Suppose we want to see the locations of our polygons on top of our image composite. We can do that with the Geopandas library.

In [None]:
# create a figure with subplots
fig, ax = plt.subplots(3,1, figsize=(10,16))
fig.patch.set_facecolor('white')

# plot the image with full extent
pygge.easy_plot(warpfile, ax=ax[0], percentiles=[0,98], bands=[3,2,1],
                shapefile=shapefile, fillcolor="none", linecolor="yellow", 
                title="Rutland Water")

# zoom in to an area of interest
pygge.easy_plot(warpfile, ax=ax[1], percentiles=[0,98], bands=[3,2,1],
                xlim=[-0.75, -0.70], ylim=[52.66, 52.68],
                shapefile=shapefile, fillcolor="none", linecolor="black", 
                title="Zoom window 1")

# zoom in elsewhere
pygge.easy_plot(warpfile, ax=ax[2], percentiles=[0,98], bands=[3,2,1],
                xlim=[-0.70, -0.60], ylim=[52.63, 52.68],
                shapefile=shapefile, fillcolor="none", linecolor="red", 
                title="Zoom window 2")

# Make a movie from several Sentinel-2 image composites

To analyse several images, we can simply repeat the API query and download temporal composites. These are made automatically by Google Earth Engine. In our case, we want to calculate the median reflectance of all pixel values that are cloud-free, aggregated by month.

For this task, we copy and paste the code from above into a single cell (below), and iterate over the different months for our searches. The for loop does the job for us.

We will use the imageio library to make a movie from the results.

In [None]:
# Obtain monthly image composites

# change directory to download directory
os.chdir(downloaddir)

# make a list of lists with all date ranges for our new searches
months = [['2020-01-01', '2020-01-31'],
          ['2020-02-01', '2020-02-29'],
          ['2020-03-01', '2020-03-31'],
          ['2020-04-01', '2020-04-30'],
          ['2020-05-01', '2020-05-31'],
          ['2020-06-01', '2020-06-30'],
          ['2020-07-01', '2020-07-31'],
          ['2020-08-01', '2020-08-31'],
          ['2020-09-01', '2020-09-30'],
          ['2020-10-01', '2020-10-31'],
          ['2020-11-01', '2020-11-30'],
          ['2020-12-01', '2020-12-31']]

# set cloud cover threshold
clouds = 30

# band names for download, a list of strings
# only download R,G,B bands
bands = ['B4', 'B3', 'B2']

# spatial resolution of the downloaded data
resolution = 20 # in units of metres

# iterate over the months
for month in range(len(months)):
  time_range = months[month]
  print(time_range)

  # do the search on Google Earth Engine
  s2median = pygge.obtain_image_sentinel(s2collection, time_range, search_area, clouds)

  # print out the band names of the image composite that was returned by our search
  band_names = s2median.bandNames().getInfo()

  # check whether the search returned any imagery
  if len(band_names) == 0:
    print("Search returned no results.")

  else:
    # print all band names  
    print(band_names)

    # begin the file name with this ID
    file_id = 's2_month'
    
    s2url = pygge.get_url(file_id+str(month+1).zfill(3), s2median.select(bands), resolution, search_region, filePerBand=False)
    print(s2url)

    # request information on the file to be downloaded
    f = pygge.requests.get(s2url, stream =True)

    # check whether it is a zip file
    check = zipfile.is_zipfile(io.BytesIO(f.content))

    # either download the file as is, or unzip it
    while not check:
        f = pygge.requests.get(s2url, stream =True)
        check = zipfile.is_zipfile(io.BytesIO(f.content))
    else:
        z = zipfile.ZipFile(io.BytesIO(f.content))
        z.extractall()

# after downloading all image composites, get a list of all files we want to warp

allfiles = [f for f in listdir(downloaddir) if isfile(join(downloaddir, f))]
files_for_warp = [s for s in allfiles if file_id in s]

print("Files with file ID ", file_id, " for warping:")
pprint(sorted(files_for_warp))


Get all images into the same projection as the shapefile using easy_warp.

In [None]:
# create and empty list for the newly created uint8 data file names
uint8files = []

# now warp them all
for f in sorted(files_for_warp):
  # make a file name for our new files
  warpfile = f.split('.')[0]+'_warped.tif'
  uint8file = f.split('.')[0]+'_warped_uint8.tif'
  # call the easy_warp function
  print("Warping raster file " + f)
  pygge.easy_warp(f, warpfile, epsg)
  # convert to uint8 data type
  print("Converting raster file " + warpfile + " to 8-bit unsigned integer data type.")
  pygge.convert_to_dtype(warpfile, uint8file, np.uint8, percentiles=[0,98])
  uint8files.append(uint8file)
  # create thumbnails for quality checking
  fig, ax = plt.subplots(1,2, figsize=(5,2.5))
  fig.patch.set_facecolor('white')
  pygge.easy_plot(warpfile, ax=ax[0], percentiles=[0,99], title=warpfile)
  pygge.easy_plot(uint8file, ax=ax[1], percentiles=[0,99], title=uint8file)
  
# after downloading and warping all image composites, get a list of all warped tiff files in the directory
allfiles = [f for f in listdir(downloaddir) if isfile(join(downloaddir, f))]
warpfiles = [s for s in allfiles if "_warped.tif" in s]

print("Files after warping:")
pprint(sorted(warpfiles))

print("Files after conversion to uint8 data type:")
pprint(sorted(uint8files))

Now it is time to make the actual movie.

In [None]:
import imageio

# create an empty Numpy array where we will merge all raster images
images = []

# iterate over all zoom files
for f in sorted(uint8files):
  images.append(imageio.imread(f)) # read the next image and append it

# set the frame rate in seconds
framerate = { 'duration': 2 }

# save the movie
imageio.mimsave(join(downloaddir, "movie.gif"), images, **framerate)

Now download the file movie.gif from Colab using the folder icon on the left hand side. Locate the file in the 'out' directory, right-click and select 'download'. 

Save it to your local hard drive and open it with a video player to view it.

# Formative assignment for this week

Write a code cell that downloads a stack of Sentinel-2 composites over a different study area of your choice and make a movie. Try to make the image enhancement as good as you can by trying different percentile values.
