# Week 8: Creating an automated Sentinel-2 processing chain

Individual learning outcomes: At the end of this week, all students should be able to calculate the Normalized Difference Vegetation Index (NDVI) from the Sentinel-2 spectral bands and extract statistics of pixel values within polygons of a shapefile from within Python.

We will use the output images that we processed from the downloaded Sentinel-2 images in last week's practical. Make sure they are in the right directory on your Google Drive.

Connect to our Google Drive from Colab.

In [None]:
# Load the Drive helper and mount your Google Drive as a drive in the virtual machine
from google.colab import drive
drive.mount('/content/drive')

Install libraries

In [None]:
#import required libraries
!pip install rasterio
!pip install sentinelsat
!pip install geopandas
!pip install rasterstats

from collections import OrderedDict
import csv 
import geopandas as gpd
import json
import matplotlib.pyplot as plt
import math
from math import floor, ceil
import numpy as np
from osgeo import gdal, ogr
import os
from os import listdir
from os.path import isfile, isdir, join
import pickle
from pprint import pprint
from pyproj import Proj
import rasterio
from rasterio import plot
from rasterio.plot import show_hist
from rasterio.windows import Window
from rasterstats import zonal_stats
from sentinelsat.sentinel import SentinelAPI, read_geojson, geojson_to_wkt, geojson
import shutil
import sys
import zipfile

# make sure that this path points to the location of the pygge module on your Google Drive
libdir = '/content/drive/MyDrive/practicals21-22' # this is where pygge.py needs to be saved
if libdir not in sys.path:
    sys.path.append(libdir)

# import the pygge module
import pygge

%matplotlib inline

# Processing Sentinel-2 images

The workflow for this practical is:
* Calculate the Normalised Difference Vegetation Index from the 10 m bands of the images we downloaded last time
* Save the NDVI files as Geotiff format
* Visualise the NDVI images
* Extract zonal statistics of NDVI for polygons in our shapefile
* Save the statistics as a csv file for use in Excel
* Plot the statistics to explore them

# set up the directories for the satellite data



In [None]:
# path to your Google Drive
wd = "/content/drive/MyDrive/practicals21-22" 
print("Connected to data directory: " + wd)

# path to the directory where we saved the downloaded Sentinel-2 images last time
datadir = join(wd,"download")
print("Looking for image data in: " + datadir)

# path to your temporary drive on the Colab Virtual Machine
cd = "/content/work"

# Name of the shape file
shapefile = join(wd, 'oakham', 'Polygons_small.shp') # ESRI Shapefile of the study area

# directory for downloading the Sentinel-2 granules
downloaddir = join(cd, 'download') # where we save the downloaded images
quickdir = join(cd, 'quicklooks')  # where we save the quicklooks
outdir = join(cd, 'out')           # where we save any other outputs

# CAREFUL: This code removes the named directories and everything inside them to free up space
try:
  shutil.rmtree(downloaddir)
except:
  print(downloaddir + " not found.")

try:
  shutil.rmtree(quickdir)
except:
  print(quickdir + " not found.")

try:
  shutil.rmtree(outdir)
except:
  print(outdir + " not found.")

# create the new directories, unless they already exist
os.makedirs(cd, exist_ok=True)
os.makedirs(downloaddir, exist_ok=True)
os.makedirs(quickdir, exist_ok=True)
os.makedirs(outdir, exist_ok=True)

print("Connected to Colab temporary data directory: " + cd)

print("\nList of contents of " + wd)
for f in sorted(os.listdir(wd)):
  print(f)

Get the extent of the shapefile to define our search area.

In [None]:
# Get the shapefile layer's extent, CRS and EPSG code
extent, SpatialRef, epsg = pygge.get_shp_extent(shapefile)
print("Extent of the area of interest (shapefile):\n", extent)
print("\nSpatial referencing information (CRS) of the shapefile:\n", SpatialRef)
print("EPSG code: ", epsg)

# Explore the data directory structure of our downloaded files


In [None]:
# where we stored the downloaded files
os.chdir(datadir)
print("contents of ", datadir, ":")
!ls -l

# copy the data from Google Drive (permanent storage) to Colab (temporary storage but more disk space)
#pygge.copy_and_overwrite(datadir, downloaddir, delete_target_dir=True)

# look at copied files on the Colab drive
#os.chdir(downloaddir)
#print("contents of ", downloaddir, ":")
#!ls -l

# Calculate NDVI

Before we calculate the Normalised Difference Vegetation Index from our images, we need to find the directories that contain the band files. They are located in the 10m subdirectories of our downloaded Sentinel-2 directories for each image.

In [None]:
# make a list of all RED and NIR image band file names
files_red = pygge.get_filenames(datadir, filepattern="B04", dirpattern="R10m")
files_nir = pygge.get_filenames(datadir, filepattern="B08", dirpattern="R10m")

# print the list of band files
# the output looks neater if we print each element of the list of strings in a new line
print("List of all Granule IDs:")
s2ids = [] # make an empty list to get all Sentinel 2 IDs
for i in files_red:
  s2id = i.split("/")[-1].split("_B04")[0]
  print(s2id)
  s2ids.append(s2id) # remember the id

print("List of all Red band image files:")
for i in files_red:
  print(i)

print("List of all NIR band image files:")
for i in files_nir:
  print(i)

# Calculate the NDVI from the warped Red and NIR bands

The Normalized Difference Vegetation Index (NDVI) is an indicator of the proportion and condition of green vegetation. Generally for surfaces with some vegetation the value of NDVI is positive, for surfaces without vegetation the value is near zero, while for water and clouds the value is usually negative. The closer to the positive end, the higher the density of the vegetation cover, that is, it is consistent with its dense and developed stage. This value gradually decreases for less dense vegetation cover, which has positive but not very high values.

In [None]:
# make an empty list to store the file names of our new NDVI files
ndvifiles = []

# iterate over all Sentinel-2 images and calculate NDVI
for x in range(len(files_red)):
  file_red = files_red[x]
  file_nir = files_nir[x]
  # create an output file name
  ndvifile = join(quickdir, s2ids[x] + "_ndvi.tif")
  # remember the file name we created
  ndvifiles.append(ndvifile)
  # calculate the NDVI raster
  pygge.easy_ndvi(file_red, file_nir, ndvifile)

print("List of all NDVI image files:")
for i in ndvifiles:
  print(i)

# Warp the NDVI images to the same projection as the shapefile.

If the shapefile is the area we are interested in, then  the map user probably wants the output maps in the same map projection. Besides, Sentinel-2 images from neighbouring orbits can have different UTM zones in their projections, which means they cannot be easily mosaicked. They have to be brought into a common map projection first.

In [None]:
print("Reprojecting all band images to the following projection:")
print(SpatialRef)
print("with EPSG code ", epsg)

warpfiles = [] # make an empty list of warped NDVI file names

# iterate over all NDVI files and warp the image
for x in range(len(ndvifiles)):
  ndvifile = ndvifiles[x]

  # make a directory path and file name for the warped output file
  warpfile = join(quickdir, s2ids[x] + "_ndvi_warped.tif")
  warpfiles.append(warpfile) # add it to our list

  # call the easy_warp function from pygge
  pygge.easy_warp(ndvifile, warpfile, epsg)

for f in warpfiles:
  print(f)

# Clip the NDVI files to the shapefile extent

For mapping application, it is often sufficient to map the area of interest and not the entire image.

In [None]:
# make an empty list where we will remember all clipped file names
clipfiles = [] 

nfiles = len(ndvifiles)

# arrange our subplots, assuming a 16:9 screen ratio
cols = min(nfiles, 4) # maximum of 4 plots in one row
rows = math.ceil(nfiles / cols) # round up to nearest integer

# create a figure with subplots
fig, ax = plt.subplots(rows, cols, figsize=(21,7))
fig.patch.set_facecolor('white')

# iterate over all warped raster files
for i, warpfile in enumerate(warpfiles):
  print(warpfile)

  # make the filename of the new zoom image file
  clipfile = warpfile.split(".")[0] + "_clip.tif"
  clipfiles.append(clipfile) # remember the zoom file name in our list
  print("Clipped file: ", clipfile)

  # clip it to the shapefile extent
  pygge.easy_clip(warpfile, clipfile, extent)

# make maps
if len(clipfiles) == 1:
  pygge.easy_plot(clipfile, ax, bands=[1], cmap="Greens", percentiles=[0,100],
                  shapefile=shapefile, linecolor="yellow", 
                  title = clipfile.split("/")[-1].split(".")[0], fontsize=8)
else:
  for i, clipfile in enumerate(clipfiles):
    pygge.easy_plot(clipfile, ax[i], bands=[1], cmap="Greens", percentiles=[0,100],
                  shapefile=shapefile, linecolor="yellow", 
                  title = clipfile.split("/")[-1].split(".")[0], fontsize=8)
    
os.chdir(quickdir)
!ls -l

# Now let's extract some statistics on NDVI from a small polygon

We will use the zonal statistics function from the Rasterstats library for this purpose, as it is easy to use.

https://pythonhosted.org/rasterstats/manual.html

It is implemented in the pygge library function easy_zonal_stats, which calles the rasterstats function for each raster file in a list of input files and saves the statistics outputs in one large table, with the scene ID of each satellite image as a column. We will save the files in .csv format, so they can be read into Excel.

In Python we can also save files with entire objects in their original form. The pickle library allows us to do that.

In [None]:
# make the name of the statistics output file
statsfile = outdir + "/" + "zonalstats.csv"

# call the pygge function for processing zonal statistics for multiple raster files
zonalstats = pygge.easy_zonal_stats(clipfiles, shapefile, statsfile, nodata=0)

print("\nSaved statistics file: " + statsfile)
# open the file
f = open(statsfile,"r") 
# read and print its contents (all lines)
pprint(f.read().splitlines())
# close the file
f.close()

# make the filename of the new pickle file for the stats object
pklfile = statsfile.split(".")[0] + ".pkl"

# write object to file
with open(pklfile, "wb") as f:
  pickle.dump(zonalstats, f)
print("\nPickled zonal statistics as pandas dataframe in file: " + pklfile + "\n")

# Plotting data as graphs
Before we finish, let's just explore how we can make graphs to explore our results.

In [None]:
# read the dataframe object from file (not necessary, for demonstration only)
df = pickle.load(open(pklfile, 'rb'))
print("\nPlotting statistics from file: ", pklfile)
print(df)

# get a list of unique scene IDs from the dataframe
scene_ids = sorted(df['scene_id'].unique())

print("Available file types for the graphics files:")
pprint(plt.gcf().canvas.get_supported_filetypes_grouped())

# iterate over all scene IDs in the pandas dataframe
for x in scene_ids:
  # extract the rows for that scene ID from the full dataframe
  b = df.loc[df['scene_id'] == x]

  # iterate over all columns
  for key, values in b.items():
    if key != "scene_id":
      # make a bar chart of the data for each column in the dataframe
      fig = plt.figure(figsize=(5,5))
      plt.style.use('ggplot')
      plt.bar(range(1, 1+len(values)), values, color='green')
      plt.xlabel("Poly No.")
      plt.ylabel(key)
      plt.title(x)
      # show the plot on screen
      fig.show()
      # save the plot to a tiff file
      filename = pklfile.split(".")[0] + key + ".jpg"
      print(filename)
      fig.savefig(filename, format="jpg")

# Formative assignment of the week

Your assignment for this week is to create a shapefile for this test area with 5-10 polygons in whcih each polygon represents a different land cover type. Then write a code cell below that plot the mean NDVI over the different land cover types for your downloaded Sentinel-2 images. Interpret the results. 