# Maximum Entropy classification

In this practical, we will read in a Sentinel-2 image from its original data format as obtained from the ESA Copernicus Sentinel Data Hub. We will extract four bands of interest and convert them to one single Geotiff file (the Sentinel-2 data are originally delivered as separate JPEG2000 files, one for each band).
We will then train the Maximum Entropy algorithm.
Finally, we will classify the Sentinel-2 image.

In [2]:
# import packages
# Note: GDAL needs to be version 2.1.3
# from Anaconda terminal type:
#    conda install -c conda-forge gdal

import gdal
import matplotlib.pyplot as plt
import numpy as np
import os
from os import listdir
from os.path import isfile, join
from osgeo import ogr
import shutil
import skimage
from skimage import io
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.exceptions import ConvergenceWarning
#from sklearn.ensemble import AdaBoostClassifier, RandomForestClassifier, GradientBoostingClassifier, ExtraTreesClassifier
from sklearn.externals import joblib
import subprocess
import sys
import warnings
gdal.UseExceptions()

Everything installed. Now onto the processing.
The next block of code reads in the Sentinel-2 L2A (Level 2A) image obtained from the Copernicus Sentinel Data Hub.
Sentinel images can be obtained for free from this web site: https://scihub.copernicus.eu/dhus/#/home


In [3]:
# in this directory are the Sentinel-2, 20 m resolution band files
# N.B. os.path.join uses the correct '\' for Windows OS or '/' for LINUX
datadir = join(os.sep, 'home', 'heiko', 'sf_GY7709_Satellite_Data_Analysis_in_Python', 'practicals', \
               'S2A_MSIL2A_20180507T110621_N0207_R137_T30UXD_20180507T131836.SAFE', 'GRANULE', \
               'L2A_T30UXD_A015006_20180507T110835', 'IMG_DATA', 'R10m')
# e.g. /home/heiko/sf_GY7709_Satellite_Data_Analysis_in_Python/practicals/S2A_MSIL2A_20180507T110621_N0207_R137_T30UXD_20180507T131836.SAFE/GRANULE/L2A_T30UXD_A015006_20180507T110835/IMG_DATA/R10m

# directory for output image files from MaxEnt
outdir = join(os.sep, 'home', 'heiko', 'sf_GY7709_Satellite_Data_Analysis_in_Python', 'practicals')

print('Files in directory ' + datadir)
allfiles = [f for f in listdir(datadir) if isfile(join(datadir, f))]
for f in allfiles:
    print(f)

Files in directory /home/heiko/sf_GY7709_Satellite_Data_Analysis_in_Python/practicals/S2A_MSIL2A_20180507T110621_N0207_R137_T30UXD_20180507T131836.SAFE/GRANULE/L2A_T30UXD_A015006_20180507T110835/IMG_DATA/R10m
T30UXD_20180507T110621_AOT_10m.jp2
T30UXD_20180507T110621_B02_10m.jp2
T30UXD_20180507T110621_B03_10m.jp2
T30UXD_20180507T110621_B04_10m.jp2
T30UXD_20180507T110621_B08_10m.jp2
T30UXD_20180507T110621_TCI_10m.jp2
T30UXD_20180507T110621_WVP_10m.jp2


Above, we have listed all files in the directory where the 10 m resolution bands of Sentinel-2 are located.
Now, let's read in the bands we want.
Sentinel-2 bands 2,3,4 and 8 are in the above file list in positions 1,2,3,4 (remember the first index in Python is 0).
These bands are blue, green, red and NIR.

Note: For this to work, GDAL requires the JP2OpenJPEG driver.

In [4]:
# make a band selection
bands = allfiles[1:5]
# this gives a collection of the following file names:
'''
    {"band_02" :  inputPath + "B02_10m.jp2",
     "band_03" :  inputPath + "B03_10m.jp2",
     "band_04" :  inputPath + "B04_10m.jp2",
     "band_08" :  inputPath + "B08_10m.jp2"}
'''

# build a command line command for GDAL to convert the files into 10 m resolution VRT format
cmd = ['gdalbuildvrt', '-resolution', 'user', '-tr' ,'10', '10', '-separate', join(outdir + '16Bit.vrt')]
for band in bands:
    cmd.append(join(datadir, band))
           
vrtfile = join(outdir + '16Bit.vrt')
if not os.path.exists(vrtfile): # skip if the output file already exists
    print('\n')
    print(' '.join(cmd))
    print('\n')
    subprocess.run(cmd) # execute the command in the command line
else:
    print(vrtfile,' already exists.\n')
    
# now build a command to translate the four band raster files into one geotiff file with 4 bands
tiffile = join(outdir + '16Bit.tif')
cmd = ['gdal_translate', '-of' ,'GTiff', vrtfile, tiffile]

if not os.path.exists(tiffile): # skip if the output file already exists
    print(' '.join(cmd))
    print('\n')
    subprocess.run(cmd) # execute it
else:
    print(tiffile,' already exists.\n')

/home/heiko/sf_GY7709_Satellite_Data_Analysis_in_Python/practicals16Bit.vrt  already exists.

/home/heiko/sf_GY7709_Satellite_Data_Analysis_in_Python/practicals16Bit.tif  already exists.



The Sentinel-2 data preparation is completed at this stage. We have the four bands we want in one Geotiff file.

# Your portfolio task

Open QGIS or ArcGIS and read in the tiff file. Take a look at the bands and make a true colour composite.
Take a screenshot of the overview image, i.e. the whole extent. Add that to your portfolio.

Now zoom in to the full resolution, take another screenshot of an area you find interesting and add it to your portfolio. 

Describe in text form (300 words) what you see. You can use arrows to annotate your images.

Now, the next step is the train the Maximum Entropy machine learning model. Remember that Maximum Entropy is also known as Logistic Regression.

We will use the same training data as for the Random Forest practical.

The code below is modified after the example by Arthur Mensch: https://scikit-learn.org/stable/auto_examples/linear_model/plot_sparse_logistic_regression_20newsgroups.html#sphx-glr-auto-examples-linear-model-plot-sparse-logistic-regression-20newsgroups-py


In [8]:
# ignore less important warnings in SciKit-Learn
warnings.filterwarnings("ignore", category=ConvergenceWarning, module="sklearn")

# We use the SAGA solver
solver = 'saga'

# Reduce number of samples for faster run time
n_samples = 3000

# We train the MaxEnt algorithm on the clipped Sentinel-2 image. This has two reasons:
# 1. It is faster to process.
# 2. We already have the training data and clipped Sentinel-2 image from the Random Forest exercise.

# set up your directories with the satellite training data
rootdir = join(os.sep, "home", "heiko", "sf_GY7709_Satellite_Data_Analysis_in_Python", "practicals")
# path to your training data
path_pix = rootdir
# path to your model
path_model = rootdir
# path to your classification results
path_class = rootdir

# path to your Sentinel-2 clipped TIFF file from the Random Forest exercise
raster = join(rootdir, "s2a_leicester_clipped.tif")
# path to your corresponding pixel samples (training data converted to a geotiff raster file)
# pixel values are the class numbers
samples = join(path_pix, "training_raster.tif")

# read in clipped Sentinel-2A raster from geotiff (unsigned 16-bit integer format)
# this was created in QGIS from the original Sentinel-2 10m bands (R,G,B,NIR)
img_ds = io.imread(raster)
# convert to 16bit numpy array 
img = np.array(img_ds, dtype='int16')

# do the same with your training sample pixels 
roi_ds = io.imread(samples)   
roi = np.array(roi_ds, dtype='int8')  
    
# read in your labels
labels = np.unique(roi[roi > 0]) 
n_classes = labels.size
print('The training data include {n} classes: {classes}'.format(n=labels.size, classes=labels))

# compose your X,Y data (dataset - training data)     
X = img[roi > 0, :] 
Y = roi[roi > 0]     

# print out the number of pixels, number of lines and number of bands of the image
print("Dimensions of the clipped Sentinel-2 image:")
print(img.shape)

The training data include 7 classes: [1 2 3 4 5 6 7]
Dimensions of the clipped Sentinel-2 image:
(3118, 4665, 4)


Now we have read in the data to train the model.
Let's read in the full Sentinel-2 image in its original extent from the Geotiff file.

In [9]:
# Read in the full Sentinel-2 data from the Geotiff we have created above
s2img = io.imread(tiffile) # returns an ndarray with all bands for all pixels

# print out the number of pixels, number of lines and number of bands of the image
print("Dimensions of the full Sentinel-2 image:")
print(s2img.shape)

Dimensions of the full Sentinel-2 image:
(10980, 10980, 4)


Next step: train the MaxEnt model.

In [10]:
# Model training
# Split the training data into 75% for training and 25% held back for testing the classification model
x_train, x_test, y_train, y_test = train_test_split(X, Y, random_state=42, stratify=Y, test_size=0.25)
train_samples, n_features = x_train.shape
print('Sentinel-2, train_samples=%i, n_features=%i, n_classes=%i' % (train_samples, n_features, n_classes))

# A small number of iterations leads to faster run time
max_iter = 10
print('MaxEnt, solver=%s. Number of iterations: %s' % (solver, max_iter))
# run the logistic regression, this is equivalent to maximum entropy modelling
lr = LogisticRegression(solver=solver, multi_class='multinomial', C=1, penalty='l1', fit_intercept=True, max_iter=max_iter, random_state=42)
# fit model to training data
lr.fit(x_train, y_train)

# export your Random Forest / Gradient Boosting Model     
model = datadir + "model_maxent.pkl"
joblib.dump(lr, model)

# predict classes for each pixel of the testing data
y_pred = lr.predict(x_test)
accuracy = np.sum(y_pred == y_test) / y_test.shape[0]
print('Test accuracy for MaxEnt model: %.4f%%' % (accuracy * 100.))

Sentinel-2, train_samples=96402, n_features=4, n_classes=7
MaxEnt, solver=saga. Number of iterations: 10
Test accuracy for MaxEnt model: 92.3355%


The output above gives the overall accuracy of the MaxEnt model for the testing data. Remember, we have held back a proportion of the training data (25% of all pixels) for testing the accuracy.

In [12]:
# Apply the MaxEnt model to the whole Sentinel-2 image
# Classification of array and save as image (5 refers to the number of bands)
# first, string out the image array into a long and thin shape for processing
new_shape = (s2img.shape[0] * s2img.shape[1], s2img.shape[2]) 
img_as_array = s2img[:, :, :5].reshape(new_shape)

# to save memory, process the Sentinel-2 scene in 10 chunks
print("Processing 10 chunks of a Sentinel-2 image")
for chunk in range(0,10):
    print("chunk ", chunk, " from pixel ", (chunk * int(new_shape[0] / 10)), " to ", ((chunk + 1) * int(new_shape[0] / 10)) - 1)
    chunk_prediction = lr.predict(img_as_array[(chunk * int(new_shape[0] / 10)):((chunk + 1) * int(new_shape[0] / 10)), :int(new_shape[1])])
    if chunk == 0:
        class_prediction = list(chunk_prediction)
    else:
        class_prediction.extend(list(chunk_prediction))

# convert list object to numpy array and bring back from a long and thin shape to an image shape
class_prediction = np.array(class_prediction)
class_prediction = class_prediction.reshape(s2img[:, :, 0].shape)  

# add the geotransform - it contains projection information
# Open the Sentinel-2 Geotiff file and read in the geographic extent, which is the same as for the classified map
print("Opening file: {}".format(tiffile))
dataset = gdal.Open(tiffile, gdal.GA_ReadOnly)
if not dataset:
    print("Error. File not found.")
print("File driver: {}/{}".format(dataset.GetDriver().ShortName, dataset.GetDriver().LongName))
print("Raster size: {} x {} x {}".format(dataset.RasterXSize, dataset.RasterYSize, dataset.RasterCount))
wkt_projection = dataset.GetProjection()
print("Projection: {}".format(wkt_projection))
geotransform = dataset.GetGeoTransform()
if geotransform:
    print("Origin = ({}, {})".format(geotransform[0], geotransform[3]))
    print("Pixel Size = ({}, {})".format(geotransform[1], geotransform[5]))

'''
Contents of the GeoTransform:
GeoTransform[0] /* top left x coordinate */
GeoTransform[1] /* West to East pixel resolution in x direction */
GeoTransform[2] /* 0 */
GeoTransform[3] /* top left y coordinate */
GeoTransform[4] /* 0 */
GeoTransform[5] /* North to South pixel resolution in y direction (negative value) */
'''

# Create the destination data file
classfilename = join(path_class, "landcover_maxent.tif")
print("Creating output classification file: {}".format(classfilename))
# make the class file the same length and width like the Sentinel-2 image, but with only one band and as Byte values
classfile = gdal.GetDriverByName('GTiff').Create(classfilename, s2img.shape[0], s2img.shape[1], 1, gdal.GDT_Byte)
classfile.SetGeoTransform((geotransform[0], geotransform[1], 0, geotransform[3], 0, geotransform[5]))
classfile.SetProjection(wkt_projection)

print(class_prediction.shape)

# now export your classification map to a file
classfile.GetRasterBand(1).WriteArray(class_prediction)
classfile.FlushCache()  # Write to disk.

# All done, close the data file to free up memory
classfile = None


Processing 10 chunks of a Sentinel-2 image
chunk  0  from pixel  0  to  12056039
chunk  1  from pixel  12056040  to  24112079
chunk  2  from pixel  24112080  to  36168119
chunk  3  from pixel  36168120  to  48224159
chunk  4  from pixel  48224160  to  60280199
chunk  5  from pixel  60280200  to  72336239
chunk  6  from pixel  72336240  to  84392279
chunk  7  from pixel  84392280  to  96448319
chunk  8  from pixel  96448320  to  108504359
chunk  9  from pixel  108504360  to  120560399
Opening file: /home/heiko/sf_GY7709_Satellite_Data_Analysis_in_Python/practicals16Bit.tif
File driver: GTiff/GeoTIFF
Raster size: 10980 x 10980 x 4
Projection: PROJCS["WGS 84 / UTM zone 30N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin

# Your portfolio task
Open the classified map file in QGIS or ArcGIS and include an overview with a colour legend of the classes in your portfolio.

Choose two areas of interest. Zoom in to full resolution. Add the zoom maps to the portfolio.

Write about 300 words about what you see in your areas of interest. Reflect on whether the MaxEnt classification gives accurate and representative results.