<a href="https://colab.research.google.com/github/kavyajeetbora/end_to_end_gee_with_python/blob/master/end_to_end_earth_engine/Module_03_Supervised_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Module 3: Supervised Classification

## Introduction to Machine Learning and Supervised Classification

Supervised classification is arguably the most important classical machine learning techniques in remote sensing. Applications range from generating Land Use/Land Cover maps to change detection. Google Earth Engine is unique suited to do supervised classification at scale. The interactive nature of Earth Engine development allows for iterative development of supervised classification workflows by combining many different datasets into the model. This module covers basic supervised classification workflow, accuracy assessment, hyperparameter tuning and change detection


01. Basic Supervised Classification
02. Accuracy Assessment
03. Improving the Classification
04. Exporting Classification Results
05. Calculating Area

In [1]:
import ee
import geemap

ee.Authenticate()
ee.Initialize(project='kavyajeetbora-ee')

## Basic Supervised Classification

- We will learn how to do a basic land cover classification using training - samples collected from the Code Editor using the High Resolution basemap imagery provided by Google Maps.
- This method requires no prior training data and is quite effective to generate high quality classification samples anywhere in the world.
- The goal is to classify each source pixel into one of the following classes - urban, bare, water or vegetation.
- Using the drawing tools in the code editor, you create 4 new feature collection with points representing pixels of that class.
- Each feature collection has a property called landcover with values of 0, 1, 2 or 3 indicating whether the feature collection represents urban, bare, water or vegetation respectively.
- We then train a Random Forest classifier using these training set to build a model and apply it to all the pixels of the image to create a 4 class image

**Fun fact**: The classifiers in Earth Engine API have names starting with smile - such as ee.Classifier.smileRandomForest(). The smile part refers to the [Statistical Machine Intelligence and Learning Engine (SMILE)](https://haifengl.github.io/index.html) JAVA library which is used by Google Earth Engine to implement these algorithms.

<img src="https://courses.spatialthoughts.com/images/end_to_end_gee/classified.png" height=300/>

### Load the geomtry of area of interest:

In [18]:
bangalore = ee.FeatureCollection('users/ujavalgandhi/public/bangalore_boundary')
geometry = bangalore.geometry()
bangalore.size()

### Load the sentinel image

The band values will be used as the input properties for the classifier

In [19]:
## Get the sentinel image collection
s2 = ee.ImageCollection('COPERNICUS/S2_HARMONIZED')

## Filter the image collection
filtered = s2.filter(ee.Filter.lt("CLOUDY_PIXEL_PERCENTAGE",30))\
.filter(ee.Filter.date('2023-01-01','2024-01-01'))\
.filter(ee.Filter.bounds(geometry))

filtered.size()

Since this is a image collection, convert it to a composite using `median()`

In [20]:
## create a composite and clip it as per geometry
median = filtered.median().clip(geometry)

### Load the GCPs
Now load the gcps (ground control points) that are labelled with the correct class: urban, water, bare and vegetation

In [21]:
## Load the training data with lables
## basically a point geometry with a class value store as property: 'landcover'
urban = ee.FeatureCollection('users/ujavalgandhi/e2e/urban_gcps')
water = ee.FeatureCollection('users/ujavalgandhi/e2e/water_gcps')
bare = ee.FeatureCollection('users/ujavalgandhi/e2e/bare_gcps')
vegetation = ee.FeatureCollection('users/ujavalgandhi/e2e/vegetation_gcps')

## Now merge the data into one single table
gcps = urban.merge(water).merge(bare).merge(vegetation)
## Viewing the first gcp
gcps.getInfo()['features'][0]

{'type': 'Feature',
 'geometry': {'type': 'Point',
  'coordinates': [77.65618319730623, 12.954774480921643]},
 'id': '1_1_1_00000000000000000000',
 'properties': {'landcover': 0}}

### Prepare the training data

In [22]:
## Prepare the training data
## we need to merge the pixel values from sentinel image with the class values
## The band values from sentinel will be the input values
## and gcps landcover property will be the output class
training = median.sampleRegions(
    collection = gcps,
    properties = ['landcover'],
    scale=100
)

Here the all the bands are the input characteristics of the pixel and 'landcover' value is the class that determines whether it is urban, water, bare or vegetation

### Plot the training data on a map

In [23]:
## Plot on map
vizParams = {
    'min': 0,
    'max': 3000,
    'bands': ['B4', 'B3', "B2"]
}

Map = geemap.Map()
Map.addLayer(median, vizParams, name='Sentinel')
Map.addLayer(gcps, {'color':'red'})
Map.centerObject(geometry, zoom=12)

Map

Map(center=[12.978777310922773, 77.60355759502113], controls=(WidgetControl(options=['position', 'transparent_…

### Train a classifier

We will use a random forest model for training

In [24]:
classifier = ee.Classifier.smileRandomForest(50)
classifier

train the model

In [25]:
classifier = classifier.train(
    features= training,
    classProperty = 'landcover',
    inputProperties = median.bandNames()
)

classifier

In [26]:
classified = median.classify(classifier)

In [27]:
palette = ['#e41a1c','#377eb8','#4daf4a','#984ea3']
visParams = {
    'min': 0,
    'max':3,
    'palette': palette
}

Map = geemap.Map()
Map.addLayer(classified, visParams, name="prediction")
Map.centerObject(geometry, zoom=12)
Map

Map(center=[12.978777310922773, 77.60355759502113], controls=(WidgetControl(options=['position', 'transparent_…

Filter the water area

In [28]:
water = classified.eq(2)
Map = geemap.Map()
Map.addLayer(water, visParams, name="water")
Map.centerObject(geometry, zoom=12)
Map

Map(center=[12.978777310922773, 77.60355759502113], controls=(WidgetControl(options=['position', 'transparent_…

Reference
1. [Module 4 - 01 Basic Supervised Classification - GEE for Water Resources Management](https://youtu.be/Karfbita0Qo?si=M_UpqyqY-mgrGQ3Y)

2. [Module 4 - 03 Accuracy Assessment - GEE for Water Resources Management](https://youtu.be/erwxur0HMao?si=STjsgdeVvJZ7-RzW)

## Accuracy Assessment

It is important to get a quantitative estimate of the accuracy of the classification. To do this, a common strategy is to divide your training samples into 2 random fractions - one used for training the model and the other for validation of the predictions. Once a classifier is trained, it can be used to classify the entire image. We can then compare the classified values with the ones in the validation fraction. We can use the ee.Classifier.confusionMatrix() method to calculate a Confusion Matrix representing expected accuracy.

Classification results are evaluated based on the following metrics

- Overall Accuracy: How many samples were classified correctly.
- Producer’s Accuracy: How well did the classification predict each class.
- Consumer’s Accuracy (Reliability): How reliable is the prediction in each class.
- Kappa Coefficient: How well the classification performed as compared to random assignment.

<img src='https://courses.spatialthoughts.com/images/end_to_end_gee/accuracy_assessment.png' height=300/>

In [31]:
## Load the sentinel image collection
s2 = ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')

## Load the HydroSHEDS dataset
basin = ee.FeatureCollection('WWF/HydroSHEDS/v1/Basins/hybas_7')

## Load the training dataset with labels
gcp = ee.FeatureCollection('users/ujavalgandhi/e2e/arkavathy_gcps')

gcp.size()

In [34]:
arkavathy = basin.filter(ee.Filter.eq('HYBAS_ID', 4071139640))
geometry = arkavathy.geometry()

In [39]:
filtered

Name,Description,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25,Unnamed: 26,Unnamed: 27,Unnamed: 28,Unnamed: 29,Unnamed: 30,Unnamed: 31,Unnamed: 32,Unnamed: 33,Unnamed: 34,Unnamed: 35,Unnamed: 36,Unnamed: 37,Unnamed: 38,Unnamed: 39,Unnamed: 40,Unnamed: 41,Unnamed: 42,Unnamed: 43,Unnamed: 44,Unnamed: 45,Unnamed: 46,Unnamed: 47,Unnamed: 48,Unnamed: 49,Unnamed: 50,Unnamed: 51,Unnamed: 52,Unnamed: 53,Unnamed: 54,Unnamed: 55,Unnamed: 56,Unnamed: 57,Unnamed: 58,Unnamed: 59,Unnamed: 60,Unnamed: 61,Unnamed: 62,Unnamed: 63,Unnamed: 64,Unnamed: 65,Unnamed: 66,Unnamed: 67,Unnamed: 68,Unnamed: 69,Unnamed: 70,Unnamed: 71,Unnamed: 72,Unnamed: 73,Unnamed: 74,Unnamed: 75,Unnamed: 76,Unnamed: 77,Unnamed: 78,Unnamed: 79,Unnamed: 80,Unnamed: 81,Unnamed: 82,Unnamed: 83,Unnamed: 84,Unnamed: 85,Unnamed: 86,Unnamed: 87,Unnamed: 88,Unnamed: 89,Unnamed: 90,Unnamed: 91,Unnamed: 92,Unnamed: 93,Unnamed: 94,Unnamed: 95,Unnamed: 96,Unnamed: 97,Unnamed: 98,Unnamed: 99
B1,Aerosols,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
B2,Blue,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
B3,Green,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
B4,Red,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
B5,Red Edge 1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
B6,Red Edge 2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
B7,Red Edge 3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
B8,NIR,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
B8A,Red Edge 4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
B9,Water vapor,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Name,Type,Description
AOT_RETRIEVAL_ACCURACY,DOUBLE,Accuracy of Aerosol Optical thickness model
CLOUDY_PIXEL_PERCENTAGE,DOUBLE,Granule-specific cloudy pixel percentage taken from the original metadata
CLOUD_COVERAGE_ASSESSMENT,DOUBLE,Cloudy pixel percentage for the whole archive that contains this granule. Taken from the original metadata
CLOUDY_SHADOW_PERCENTAGE,DOUBLE,Percentage of pixels classified as cloud shadow
DARK_FEATURES_PERCENTAGE,DOUBLE,Percentage of pixels classified as dark features or shadows
DATASTRIP_ID,STRING,Unique identifier of the datastrip Product Data Item (PDI)
DATATAKE_IDENTIFIER,STRING,"Uniquely identifies a given Datatake. The ID contains the Sentinel-2 satellite, start date and time, absolute orbit number, and processing baseline."
DATATAKE_TYPE,STRING,MSI operation mode
DEGRADED_MSI_DATA_PERCENTAGE,DOUBLE,Percentage of degraded MSI and ancillary data
FORMAT_CORRECTNESS,STRING,Synthesis of the On-Line Quality Control (OLQC) checks performed at granule (Product_Syntax) and datastrip (Product Syntax and DS_Consistency) levels


In [41]:
rgbVis = {
    'min': 0,
    'max': 3000,
    'bands': ['B4', 'B3', 'B2']
}

filtered = s2.filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 30))\
.filter(ee.Filter.date('2023-01-01','2024-01-01'))\
.filter(ee.Filter.bounds(geometry)).select('B.*')

composite = filtered.median()

Map = geemap.Map()
Map.addLayer(composite.clip(geometry), rgbVis, 'image')
Map.centerObject(geometry, zoom=10)
Map

Map(center=[12.8376291976128, 77.44795130277099], controls=(WidgetControl(options=['position', 'transparent_bg…

Preparing the data for training

In [54]:
## Create a random column which will assign a float number to each feature with uniform distribution
gcps = gcp.randomColumn()
train_test_split = 0.8
trainGCP = gcps.filter(ee.Filter.lt('random', train_test_split))
testGCP = gcps.filter(ee.Filter.gte('random',train_test_split))

print('Training data size', len(trainGCP.getInfo()['features']))
print('Test data size', len(testGCP.getInfo()['features']))

Training data size 359
Test data size 88


while overlaying the GCP points over the image,  you may get an error like this:

```python
EEException: Output of image computation is too large (12 bands for 936000 pixels = 85.7 MiB > 80.0 MiB).
If this is a reduction, try specifying a larger 'tileScale' parameter.
```

The error message you're encountering indicates that the output of an image computation operation is too large. Specifically, it seems that you're trying to compute an image with 12 bands for 936,000 pixels, resulting in a size of 85.7 MiB, which exceeds the maximum allowed size of 80.0 MiB.

The `tileScale` parameter controls the size of the individual tiles used in the computation process, and increasing it can help reduce memory usage

By increasing the `tileScale`, you're essentially splitting the computation into smaller tiles, which can help reduce memory usage and avoid the error.

In [69]:
## Now overlay the gcp points on the image to get the training inputs
## Currently the points are only having the class labels
training = composite.sampleRegions(
    collection = trainGCP,
    properties = ['landcover'],
    scale=10,
    tileScale=16
)

In [70]:
## Print out a sample training input
training.getInfo()['features'][0]

{'type': 'Feature',
 'geometry': None,
 'id': '000000000000000000af_0',
 'properties': {'B1': 1329,
  'B11': 4034,
  'B12': 4388,
  'B2': 2024,
  'B3': 2344,
  'B4': 2518,
  'B5': 2547,
  'B6': 2545,
  'B7': 2511,
  'B8': 2435,
  'B8A': 2497,
  'B9': 2503,
  'landcover': 0}}

Train the classifier model

In [71]:
classifier = ee.Classifier.smileRandomForest(50).train(
    features = training, ## Are the feature collection with labels
    classProperty = 'landcover',
    inputProperties = composite.bandNames()
)

Classify the given composite

In [74]:
## After training, now classify all the pixels from the composite and visualize
classified = composite.classify(classifier)

palette = ['#e41a1c','#377eb8','#4daf4a','#984ea3']
visParams = {
    'min': 0,
    'max':3,
    'palette': palette
}

Map = geemap.Map()
Map.addLayer(classified.clip(geometry), visParams, 'classified')
Map.centerObject(geometry, zoom=10)
Map

Map(center=[12.8376291976128, 77.44795130277099], controls=(WidgetControl(options=['position', 'transparent_bg…

## Evaluate the model

In [78]:
test = classified.sampleRegions(
    collection = testGCP,
    properties = ['landcover'],
    scale = 10,
    tileScale = 16
)
## Print the first test value
test.getInfo()['features'][0]

{'type': 'Feature',
 'geometry': None,
 'id': '000000000000000000b0_0',
 'properties': {'classification': 0, 'landcover': 0}}

When classifying a image, a band called 'classification' is added

In [79]:
test.errorMatrix('landcover', 'classification')