# Week 10: Classifying multitemporal image stacks

Individual learning outcomes: After this week, all students should be able to stack multi-temporal Sentinel-2 images into a raster stack (e.g. spring and autumn images), train a random forest model with the stack of images and run the classification on that stack.

All we need to do is modify the iPython notebook from last week. We classified a single Sentinel-2 image using a random forest classification. We trained the model with a raster layer which contains class numbers as pixel values.

This week, we use the same training raster. We will, however, train the model on a raster that contains reflectance bands from several Sentinel-2 images acquired on different dates, and then classify that stack of images into a single map.

First, we need to connect to our Google Drive from Colab.

In [None]:
# Load the Drive helper and mount your Google Drive as a drive in the virtual machine
from google.colab import drive
drive.mount('/content/drive')

Now let us import all necessary libraries.

In [None]:
# Adapted from: http://remote-sensing.eu/image-classification-with-python/

#import required libraries
!pip install rasterio
!pip install sentinelsat
!pip install geopandas
!pip install rasterio
!pip install rasterstats
import csv 
import geopandas as gpd
import math
from math import floor, ceil
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import ogr
import os
from os import listdir
from os.path import isfile, isdir, join
from osgeo import gdal
import pickle
from pyproj import Proj
from pprint import pprint
import rasterio
from rasterio.windows import Window
from rasterio import features, plot
from rasterio.plot import show_hist, reshape_as_raster, reshape_as_image
from rasterstats import zonal_stats
import skimage.io as io
from sklearn.ensemble import AdaBoostClassifier, RandomForestClassifier, GradientBoostingClassifier, ExtraTreesClassifier
from sklearn.externals import joblib
import shutil
import sys
%matplotlib inline

Make sure that all your files are in the right places before running the next cell.

Edit the directory paths if they are not fitting to your own directory organisation.

In [None]:
# set up your working directory with the satellite data
wd = '/content/drive/MyDrive/practicals20-21/rf'

'''
modified from last week:
* s2path now points to a directory of several images (.SAFE directories)
  rather than a single Sentinel-2 image (we called this 'downloaddir' previously)
'''
# path to your download directory that contains several full Sentinel-2 images (granules)
s2path = '/content/drive/MyDrive/practicals20-21/download/'

# path to our shapefile to get the extent of the clipped image area
shapefile = join('/content/drive/MyDrive/practicals20-21', 'oakham', 'Polygons_small.shp') # ESRI Shapefile of the study area

# names of bands to be included in the merged GeoTiff file
bandnames = ["B02", "B03", "B04", "B08"]

'''
modified from last week:
* We call the merged raster file S2_stack.tif now because it contains several bands
  from several images acquired at different dates
'''
# path to the new merged file with the selected bands in GeoTiff format
s2merged = '/content/drive/MyDrive/practicals20-21/rf/S2_stack.tif'

# path to your corresponding pixel samples (training data converted to a geotiff raster file)
# pixel values are the class numbers
#samples = join(wd, "training_raster.tif") 

'''
modified from last week:
* We use a different output file name to avoid overwriting the file from last week
'''
# make a filename for the classified map
outfile = join(wd, "LandCoverMap_multitemp.tif")

# define the name of the shapefile containing the training polygons
trainshapefile = '/content/drive/MyDrive/practicals20-21/rf/training_areas.shp'

# define the name of the output raster file that will contain the class numbers of our
#   training areas as pixel values
trainraster = '/content/drive/MyDrive/practicals20-21/rf/training_areas.tif'



We define the same helper function as last week to convert coordinates for the clipping.

In [None]:
# define a helper function that converts latitude, longitude coordinates into pixel locations
def longlat2window(lon, lat, dataset):
    """
    Args:
        lon (tuple): Tuple of min and max lon
        lat (tuple): Tuple of min and max lat
        dataset: Rasterio dataset

    Returns:
        rasterio.windows.Window
    """
    p = Proj(dataset.crs)
    t = dataset.transform
    xmin, ymin = p(lon[0], lat[0])
    xmax, ymax = p(lon[1], lat[1])
    col_min, row_min = ~t * (xmin, ymin)
    col_max, row_max = ~t * (xmax, ymax)
    return Window.from_slices(rows=(floor(row_max), ceil(row_min)),
                              cols=(floor(col_min), ceil(col_max)))


# Merge all bands into a single file
The Sentinel-2 image bands are all in separate .jp2 files. First, we need to find the file names of all image bands we want to include in the classification and merge the band rasters into a single GeoTiff file.

In [None]:
'''
Changed from last week:
* Use all four 10 m resolution bands from all Sentinel-2 images in the download directory
* Use s2path pointing to the download directory (like two weeks ago)
'''

# how many .SAFE directories are in the download directory?
# get the list of all directories in the download directory
dirlist = [d for d in listdir(s2path) if isdir(join(s2path, d))]

# make an empty list of all Sentinel-2 granule IDs we have downloaded
s2IDs = [] 

# iterate over all Sentinel-2 .SAFE image directories
for d in range(len(dirlist)):
  # the directory names have the following structure, for example:
  # S2A_MSIL2A_20190919T110721_N0213_R137_T30UXD_20190919T140654.SAFE
  # the first part of the directory name is the granule ID
  # so we split off the ".SAFE" as follows:
  sceneID = dirlist[d].split(".")[0] 
  s2IDs.append(sceneID) #append the unique identifier to the list

print(len(s2IDs), " Sentinel-2 images found.")
print("List of all Granule IDs:")
pprint(s2IDs)

# make an empty list of all directory paths pointing to the selected 10 m resolution band files
s2dirs = [] 

# make an empty list of all band file names for all images we want to merge into oe raster for the classification
files_selected = []

# iterate over all Sentinel-2 image directories again
for d in range(len(dirlist)):
  # the directory names have the following structure, for example:
  # S2A_MSIL2A_20190919T110721_N0213_R137_T30UXD_20190919T140654.SAFE
  # the first part of the directory name is the granule ID
  # so we split off the ".SAFE" as follows:
  sceneID = dirlist[d].split(".")[0] 
  s2IDs.append(sceneID) #append the unique identifier to the list

  # find the GRANULE, then L2A_*, then IMG_DATA, then R10m directory
  thisdir = join(s2path, dirlist[d], "GRANULE")

  # find the full name of the L2A_* subdirectory (contains the scene ID)
  subdirlist = [s for s in listdir(thisdir) if isdir(join(thisdir, s))]
  for y in range(len(subdirlist)):
    if subdirlist[y].split("_")[0] == "L2A":
      thisdir = join(thisdir, subdirlist[y])

  # add IMG_DATA/R10m to subdirectory, this is where the TCI image is found
  s2dir = join(thisdir, "IMG_DATA", "R10m")
  s2dirs.append(s2dir) # add it to our list

  # get all image band file names
  files_10m = [f for f in listdir(s2dir) if isfile(join(s2dir, f))]
  files_10m = sorted(files_10m)

  print("All bands in the image directories:")
  pprint(files_10m)
  print("\n")

  # We split the filename into components based on the underscore _
  # e.g. "T30UXD_20190919T110721_B04_10m.jp2"
  # becomes ["T30UXD", "20190919T110721", "B04", "10m.jp2"]
  # so the component indexed 2 contains the band number
  for b in bandnames:
    '''
    changed from last week:
    * below, we join the directory path to the 10 m resolution imagery for the Sentinel-2
      granule we are just iterating over to the names of the 10 m band files,
      so we can later find them again
    '''
    files_selected.append([join(s2dir, files_10m[index]) for index, content in enumerate(files_10m) if b in content][0])

print("List of all Sentinel-2 directories:")
for i in s2dirs:
  print(i)
print("\nList of all selected band image files for merging into one raster file:")
for i in files_selected:
  print(i)

# open one of the band files to get metadata
f = rasterio.open(join(s2dir, files_selected[0]))
dt = f.read(1).dtype

'''
changed from last week:
* below, we now have to iterate over all band files from several images acquired on different dates
  and write them to subsequent output bands in the merged file, e.g. if 4 bands are 
  included from each image, then the first band of the second image would become output band 5.
  That means our output raster file needs 'bands * images' output bands.
  We substitute len(bandnames) with len(files_selected) to create enough output bands.
'''

# open the new file with the merged band data for writing
s2merged_file = rasterio.open(s2merged, 'w', driver='Gtiff', width=f.width, 
                              height=f.height, count=len(files_selected), 
                              crs=f.crs, transform=f.transform, dtype=dt)

# close the file
f.close()

'''
changed from last week:
* files_selected this time includes the full directory path, so we do not need to
  merge the directory name with the file name anymore
'''

# now iterate over all band files we want to include in the merged file
for index, f in enumerate(files_selected):
  with rasterio.open(f, 'r') as thisfile:
    s2merged_file.write(thisfile.read(1), index+1)
  # close the file
  thisfile.close()

# close the output file
s2merged_file.close()

# Warp the merged raster file
Because we want to display the results together with our shapefile, we need to warp the raster data, as we did before.

In [None]:
# warp it to the same projection as the shapefile

# get the shapefile extent
driver = ogr.GetDriverByName("ESRI Shapefile")
ds = driver.Open(shapefile, 0)
lyr = ds.GetLayer()
extent = lyr.GetExtent()
print("Extent of the area of interest (shapefile):\n", extent)

# get projection information of the shapefile
outSpatialRef = lyr.GetSpatialRef().ExportToWkt()
ds = None # close file

print("Reprojecting image to the following projection:")
print(outSpatialRef)

# make a file name for our new file
warpfile = s2merged.split(sep='.')[0] + '_warped.tif'

# check whether the warp file already exists and skip if it does
if not os.path.exists(warpfile):
  print("Creating warped file:" + warpfile)
  # call the GDAL Warp command
  ds = gdal.Warp(warpfile, s2merged, dstSRS=outSpatialRef)
  if ds == None:
    print()
  ds = None #remember to close and save the output file
else:
  print("warped file already exists")

# Clip the image file to our area of interest
Here, we do not want to classify the entire Sentinel-2 granule (image). We only want to classify our area of interest defined by the shapefile extent.

Hence we have to clip our merged, warped image to that extent. This will also speed up processing time.

In [None]:
# clip the merged, warped file

# make the filename of the new zoom image file
clipfile = warpfile.split(".")[0] + "_clip.tif"
print("Producing clipped image file: " + clipfile)

# clip it with rasterio to the shapefile extent
# rasterio offers an option called 'window' to load a subset of a raster file

# open the source file
with rasterio.open(warpfile, 'r') as src:
  
  # convert the shapefile extent to a rasterio window object
  window = longlat2window((extent[0], extent[1]), (extent[2], extent[3]), src)
  print("Window coordinates: ", window)
  
  # read all bands but only for the window extent
  arr = src.read(window=window, out_shape=(src.count, window.height, window.width))
  print("Window array size: ", arr.shape)

  # get the data type
  dt = arr.dtype

  # open the destination file
  # copy metadata from source file
  # BUT we must change the geotransform to the window with the update below
  # https://rasterio.readthedocs.io/en/latest/topics/windowed-rw.html
  kwargs = src.meta.copy()
  kwargs.update({'height': window.height,
                  'width': window.width,
                  'transform': rasterio.windows.transform(window, src.transform),
                  'driver': 'Gtiff', 
                  'count': src.count,
                  'crs': src.crs,
                  'dtype': dt
                  })

  with rasterio.open(clipfile, 'w', **kwargs) as dst:
    dst.write(arr)

    # close the destination file
    dst.close()

  # close the sourcefile
  src.close()


# Visualise our image
At this point, we may want to check whether all processing steps have worked. We need to look at our image to see whether anything went wrong.

We define our helper function to plot a true colour image on screen, as we did before.

In [None]:
# We need our old helper function to convert an image to uint8 data type for plotting
def tci(afile, ax=None, bands=[3,2,1], percentiles=[0,100], xlim=None, ylim=None): 
  # tci stands for true colour image
  # afile is a handle to an image file opened with RasterIO.Open()
  # ax is the axes handle to plot the map on
  # bands is the order of image bands in the source file to become RGB channels
  # percentiles = list of percentiles for trimming the histogram
  #    [0,100] stands for min, max
  # xlim =[xmin, xmax] is the map extent to be shown in x direction
  # ylim =[ymin, ymax] is the map extent to be shown in y direction
  
  # we define a function within this function:
  def scale_to_uint8(x, percentiles=[0,100]):
    # scale array x to 0-255 and convert to uint8
    # x = input array
    # percentiles = list of percentiles for trimming the histogram
    #    [0,1] stands for min, max
    x = np.float32(x)
    amin = np.percentile(x, percentiles[0])
    amax = np.percentile(x, percentiles[1])
    anewmin = 0.0
    anewmax = 255.0
    xscaled = (x - amin) * ((anewmax - anewmin) / (amax - amin)) + anewmin
    return(xscaled.astype(np.uint8))

  # save the uint8 image as a temporary Geotiff file
  tmpfile = rasterio.open('tmp_rgb_imagefile_ cjdlsbYFEOGFHEWBVUW.tiff',
                            'w',driver='Gtiff', width=afile.width, height=afile.height,
                            count=3, crs=afile.crs, transform=afile.transform, 
                            dtype=np.uint8)

  # mask out extreme values for each band
  for b in range(3):
    # read band data
    a = afile.read(bands[b])
    a_uint8 = scale_to_uint8(a, percentiles) 
    # write the output into the new file as band b+1
    tmpfile.write(a_uint8, b+1)

  # close the file
  tmpfile.close()

  # try plotting the image
  imgfile = rasterio.open(r'tmp_rgb_imagefile_ cjdlsbYFEOGFHEWBVUW.tiff', count=3)

  if (xlim==None):
    xlim=[afile.bounds.left, afile.bounds.right]
    # afile.bounds returns a BoundingBox(left, bottom, right, top) object,
    #    from which we need to get the corner coordinates like so

  if (ylim==None):
    ylim=[afile.bounds.bottom, afile.bounds.top]
  
  # zoom in to an area of interest by setting the axes limits of our map
  ax.set_xlim(xlim)
  ax.set_ylim(ylim)
  plot.show(imgfile, ax=ax)

  # close the temporary file
  imgfile.close()

  # and remove the temporary file when we do not need it anymore
  os.remove('tmp_rgb_imagefile_ cjdlsbYFEOGFHEWBVUW.tiff')

  return()

In [None]:
# create a figure with subplots
fig, ax = plt.subplots(figsize=(10,10))
fig.patch.set_facecolor('white')

'''
changed from last week:
* We use the tci plotting function here to check whether our images taken on 
  different dates are coregistered well, i.e. the pixels are in the same place.
  Hence, we select three bands that correspond to different acquisition dates
  but have the same wavelength.
'''

# plot it and display the blue band acquired on the first, second and third acquisition date
#    as red, green and blue on screen
with rasterio.open(clipfile, "r") as img:
  tci(img, ax=ax, bands = [1, 1+len(bandnames), 1+2*len(bandnames)], percentiles=[0,98])
  # set a title for the subplot
  mytitle = clipfile
  ax.set_title(mytitle, fontsize=8)

You will notice that the colour scheme looks really odd and a bit psychedelic. This is because we display bands of the same wavelength but acquired on a different date as red, green and blue channels on screen. Check out the last block of code to see how this is done.

# Training a random forest model with QGIS and SciKit-Learn

We will do the same like last week, but this time our model will be trained on many more input bands from different acquisition dates.

LandCover:

1 = Water

2 = Residential

3 = Industrial

4 = Pasture

5 = Crops

6 = Bare soil

7 = Forest


# Read in the shapefile with the training polygons
We need to read in the vector layer that contains our training areas first of all, convert it to the same coordinate reference system as our raster file and create a new training raster file with 'burned-in' pixel values showing class numbers. This function will do that.

In [None]:
def read_training_shapefile(training_shapefilename, inraster_filename, outraster_filename):
  '''
  This function reads in a shapefile with training polygons and produces a raster file
     that aligns with an input rasterfile (same corner coordinates, resolution, coordinate 
     reference system and geotransform). Each pixel value in the output raster will
     indicate the class number of the training shapefile based on the attribute column 
     named 'Class' with a capital C.

  Credit: https://gis.stackexchange.com/questions/151339/rasterize-a-shapefile-with-geopandas-or-fiona-python  

  Parameters:
  training_shapefilename = string pointing to the input training shapefile in ESRI format
  inraster_filename = string pointing to the input raster file that we want to align the output raster to
  outraster_filename = string pointing to the output raster file

  '''

  # Open the shapefile with GeoPANDAS
  shp = gpd.read_file(training_shapefilename)

  # Open the input raster file with RasterIO  
  inraster = rasterio.open(inraster_filename, 'r')
  # Reproject the geometries from the shapefile to the CRS of the raster
  shp = shp.to_crs(inraster.crs)

  # copy and update the metadata from the input raster for the output
  meta = inraster.meta.copy()
  meta.update(dtype=np.uint8)
  meta.update(count=1)

  # Now burn the features into the raster and write it
  with rasterio.open(outraster_filename, 'w', **meta) as outraster:
    # create the output array as a Numpy array filled with zeros and of the same
    #    shape as the input raster
    raster_shape = inraster.read(1).shape
    out_arr = np.zeros(raster_shape, dtype=np.uint8)
    
    # this is where we create a generator of geom, value pairs to use in rasterizing
    shapes = ((geom, value) for geom, value in zip(shp.geometry, shp.Class))
    burned = features.rasterize(shapes=shapes, fill=0, out=out_arr, 
                                transform=outraster.transform, all_touched=False)

    '''
    # uncomment this part if you want detailed printed output

    print("Shapefile attribute table and polygon geometries:")
    pprint(shp)

    print("shapes is a generator object:")
    print(shapes)

    # this is how we can access its contents (for convenience we print the class first):
    for geom, value in zip(shp.geometry, shp.Class):
      print(value, geom)

    print("Rasterised layer:")    
    print(burned)
    print("The output raster has values from ", np.min(burned), " to ", np.max(burned))
    '''
    
    outraster.write(burned.astype(np.uint8), 1)
    outraster = None
  inraster = None

Now run the function.

In [None]:
print(trainshapefile)
print(clipfile)
print(trainraster)

read_training_shapefile(trainshapefile, clipfile, trainraster)

# Build the random forest model
Now we can define a function to read in the training raster file, build the random forest classification model, print some model diagnostics and save the model as a picle file for future use. Each pixel value in the training raster represents the land cover class of that pixel. 

In [None]:
# declare a new function
def training(raster, samples, modelfile, ntrees = 101):
    '''
    raster = filename and path to the raster file to be classified (in tiff format)
    samples = filename and path to the raster file with the training samples as pixel values (in tiff format)
    modelfile = filename and path to a pickle file to save the trained model in
    ntrees = number of trees in the random forest
    '''
    # read in clipped Sentinel-2A raster from geotiff (unsigned 16-bit integer format)
    img_ds = io.imread(raster)
    # convert to 16bit numpy array 
    img = np.array(img_ds, dtype='int16')

    # do the same with your training sample pixels 
    roi_ds = io.imread(samples)   
    roi = np.array(roi_ds, dtype='int8')  
    
    # read in your labels
    labels = np.unique(roi[roi > 0]) 
    print('The training data include {n} classes: {classes}'.format(n=labels.size, classes=labels))

    # compose your X,Y data (dataset - training data)     
    # 0 = missing value
    X = img[roi > 0, :] 
    Y = roi[roi > 0]     

    # assign class weights (class 1 has the weight 1, etc.)
    weights = {1:1, 2:2, 3:2, 4:1, 5:2, 6:2, 7:2}

    # build your Random Forest Classifier 
    # for more information: http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

    rf = RandomForestClassifier(class_weight = weights, n_estimators = ntrees, criterion = 'gini', max_depth = 4, 
                                min_samples_split = 2, min_samples_leaf = 1, max_features = 'auto', 
                                bootstrap = True, oob_score = True, n_jobs = 1, random_state = None, verbose = True)  

    # alternatively you may try out a Gradient Boosting Classifier 
    # It is much less RAM consuming and considers weak training data      
    """ 
    rf = GradientBoostingClassifier(n_estimators = ntrees, min_samples_leaf = 1, min_samples_split = 4, max_depth = 4,    
                                    max_features = 'auto', learning_rate = 0.8, subsample = 1, random_state = None,         
                                    warm_start = True)
    """

    # now fit your training data with the original dataset
    rf = rf.fit(X,Y)

    # export your Random Forest / Gradient Boosting Model     
    joblib.dump(rf, modelfile)
    
    # calculate feature importances
    importances = rf.feature_importances_
    std = np.std([tree.feature_importances_ for tree in rf.estimators_], axis=0)
    indices = np.argsort(importances)[::-1]

    # Print the feature ranking
    print("Feature ranking:")
    for f in range(X.shape[1]):
        print("%d. feature %d (%f)" % (f + 1, indices[f], importances[indices[f]]))

    # Plot the feature importances of the forest
    plt.figure()
    plt.title("Feature importances")
    plt.bar(range(X.shape[1]), importances[indices], color="r", yerr=std[indices], align="center")
    plt.xticks(range(X.shape[1]), indices)
    plt.xlim([-1, X.shape[1]])
    plt.show()
    
    # Out-of-bag error rate as a function of number of trees:
    oob_error = [] # define an empty list with pairs of values
    
    # Range of `n_estimators` values to explore.
    mintrees = 50 # this needs to be a sensible minimum number to get reliable OOB error estimates
    maxtrees = max(mintrees, ntrees) # go all the way to the highest number of trees
    nsteps = 10 # number of steps to calculate OOB error rate for (saves time)
    
    # work out error rate for each number of trees in the random forest
    for i in range(mintrees, maxtrees + 1, round((maxtrees - mintrees)/nsteps)): # start, end, step
        rf.set_params(n_estimators=i)
        rf.fit(X, Y)
        oob_error.append((i, 1 - rf.oob_score_))

    # Plot OOB error rate vs. number of trees
    xs, ys = zip(*oob_error)
    plt.plot(xs, ys)
    # plt.xlim(0, maxtrees)
    plt.xlabel("n_estimators")
    plt.ylabel("OOB error rate")
    # plt.legend(loc="upper right")
    plt.show()

    return(rf) # returns the random forest model object

# Train the random forest model
Now let us execute the function we have just defined. This will read in our merged, warped and clipped image and our training raster. From those two datasets, the random forest algorithm will define a collection of decision trees that best represent land cover as a function of spectral band information contained in the satellite image. 

We save that model to a pickle file for further use.

In [None]:
# the name of our model file we want to save
modelfile = wd+"model.pkl"

print(clipfile)
print(trainraster)
print(modelfile)

# call the training function
model = training(clipfile, trainraster, ntrees=61, modelfile=modelfile)

So far, we have fitted the random forest classification model, assessed which Sentinel-2 bands contribute most to the classification, and looked at how the number of decision trees in the random forest influences the OOB error rate. This is useful to know to see whether the number of trees selected was too low, i.e. the error still decreases a lot when more trees are added.

The next step is to classify the Sentinel-2 image. Following the same approach as above, we define a function to do the classification, then we execute it.

# Classify the image
Now let's run the classification function and see what output we get.

# Define our random forest classification function
As with the training of our model, we want to define a function that applies that model to a new image raster, classifies it and saves the land cover map to an output file.



In [None]:
def classification(raster, modelfile, outfile):
    '''
    raster = filename and path to the raster file to be classified (in tiff uint16 format)
    modelfile = filename and path to the pickled file with the random forest model in uint8 format
    outfile = filename and path to the output file with the classified map in uint8 format
    '''

    # Read Data    
    src = rasterio.open(raster, 'r')   
    img = src.read()

    print("img.shape = ", img.shape)

    # get number of bands
    n = img.shape[0]
    print(n, " Bands")

    # load your random forest model from the pickle file
    clf = joblib.load(modelfile)    

    # to work with SciKitLearn, we have to reshape the raster as an image
    # this will change the shape from (bands, rows, columns) to (rows, columns, bands)
    img = reshape_as_image(img)

    # next, we have to reshape the image again into (rows * columns, bands)
    # because that is what SciKitLearn asks for
    new_shape = (img.shape[0] * img.shape[1], img.shape[2]) 

    print("img[:, :, :n].shape = ", img[:, :, :n].shape)
    print("new_shape = ", new_shape)

    img_as_array = img[:, :, :n].reshape(new_shape)   

    print("img_as_array.shape = ", img_as_array.shape)

    # classify it
    class_prediction = clf.predict(img_as_array) 

    # and reshape the flattened array back to its original dimensions
    
    print("class_prediction.shape = ", class_prediction.shape)
    print("img[:, :, 0].shape = ", img[:, :, 0].shape)

    class_prediction = np.uint8(class_prediction.reshape(img[:, :, 0].shape))

    print(class_prediction.dtype)
    
    # save the image as a uint8 Geotiff file
    tmpfile = rasterio.open(outfile, 'w', driver='Gtiff', 
                            width=src.width, height=src.height,
                            count=1, crs=src.crs, transform=src.transform, 
                            dtype=np.uint8)

    tmpfile.write(class_prediction, 1)

    tmpfile.close()

Now we can run the classification.

In [None]:
# call our classification function
classification(clipfile, modelfile, outfile)

# Visualise the classified image
We need to check the results to make sure it worked. Let's take a look.

What we do differently here is that we visualise a classified raster dataset. This means we can define our own class labels (that match our training data labels) and associate them with a colour scheme.


In [None]:
# inspired by https://www.earthdatascience.org/courses/use-data-open-source-python/intro-raster-data-python/raster-data-processing/classify-plot-raster-data-in-python/

# Create a list of labels to use for your legend
class_labels = ["Water", 
                "Residential", 
                "Industrial", 
                "Pasture", 
                "Crops", 
                "Bare soil", 
                "Forest"]

# Create a colormap from a list of colours
# see this chart for information on available colours:
# https://matplotlib.org/2.0.0/examples/color/named_colors.html 
colours = ['mediumblue', 
          'firebrick', 
          'red', 
          'yellowgreen', 
          'gold', 
          'saddlebrown', 
          'darkolivegreen']

cmap = matplotlib.colors.ListedColormap(colours)

# calculate the bin boundaries between class values, e.g. 0.5-1.5 for class 1
class_bins = [i+0.5 for i in range(len(class_labels)+1)]

# Generate a colourmap index based on discrete intervals
norm = matplotlib.colors.BoundaryNorm(class_bins, len(colours))

# Plot our classified land cover raster
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(10, 5))
fig.patch.set_facecolor('white')

imgfile = rasterio.open(outfile, 'r')
plot.show(imgfile, ax=ax1, cmap=cmap, norm=norm)
# set a title for the plot
mytitle = "My own land cover map"
ax1.set_title(mytitle, fontsize=12)
imgfile.close()

# We will use the Geopandas library for plotting the shapefile
shp = gpd.read_file(trainshapefile)
shp.plot(ax=ax1, facecolor="none", edgecolor="black")
shp.plot(ax=ax2, facecolor="none", edgecolor="black")
# set a title for the subplot
mytitle = trainshapefile
ax2.set_title(mytitle, fontsize=8)


We have produced a new land cover map from multi-temporal imagery. We could now evaluate which map is more accurate using accuracy assessment techniques.

The classification methodology and the workflow in these practicals can be a basis for your own satellite image processing application.