## Description
Supervised classification is arguably the most important classical machine learning techniques in remote sensing. 
Applications range from generating Land Use/Land Cover maps to change detection. 
In this session, you will learn how to split GCPs into training/vaildation and apply machine learning techniques to covers basic supervised classification workflow, and accuracy assessment

## Aims of the practical session
* Load images for the region of interest
* Collect training samples
* split smaples into training/validation data
* correspond training data with the data
* Use classifier
* accuracy assessement for training/validation data

## Getting started

### Load packages

Import GEE packages that are needed for the analysis.

In [1]:
import ee
import geemap
# ee.Authenticate()

### Connect to Google Earth Engine (GEE)

Connect to the GEE to have access computing tools and GEE datasets.
You may be required to input your Google account for authorization.

In [2]:
Map = geemap.Map()
# Map.add_basemap('HYBRID')
Map

Map(center=[20, 0], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=HBox(children=(Togg…

### Adding Region of Interest (ROI)

Create ROI that we want to work on it and then add and display it on the GEE map.
We can create ROI through manually drawing option in GEE or import the downloaded shapefile from your computer path. 

In [40]:
#Map.user_rois.getInfo()

In [3]:
geometry = ee.Geometry.Polygon([[
    [149.08169361455955, -35.32478551096885],
    [149.1481265674404, -35.325065623240356],
    [149.14829822881737, -35.27911424131675],
    [149.08289524419823, -35.27855369756653]
]])

Map.addLayer(geometry, {}, 'Canberra ROI')
Map.centerObject(geometry);

### Training data
Training data (or a training dataset) is the initial data used to train machine learning models. Import your pre-selected training dataset from your system.

In [4]:
#### load training data
trainingS_path = 'C:/Users/Abolfazl/Desktop/code/Google Earth Engine/training_data.shp'
training_data = geemap.shp_to_ee(trainingS_path)
Map.addLayer(training_data, {}, 'training_data')
Map.centerObject(training_data)

  pd.Int64Index,


### Image collection
An ImageCollection is a stack or sequence of images. An ImageCollection can be loaded by pasting an Earth Engine asset ID into the ImageCollection constructor. You can find ImageCollection IDs in the <a href="https://developers.google.com/earth-engine/datasets">data catalog</a>. 

We will:
* Load Landsat-8 images for the anlysis
* Filter a collection by date range
* Stack images
* Clip based on the geometry
* Display it on Geemap

In [5]:
landsat = (
    ee.ImageCollection('LANDSAT/LC08/C01/T1_SR')
    .filterBounds(geometry)
    .filterDate('2020-09-01','2020-09-30')
#     .filter(ee.Filter.lessThan('CLOUD_COVER',10))
    .median()
    .clip(geometry)
)

vis_params = {'min': 0, 'max': 3000, 'bands': ['B5', 'B4', 'B3']}

# Map.centerObject(point, 8)
Map.addLayer(landsat, vis_params, "Landsat-8")

In [6]:
# print(landsat.getInfo())

### Sample imagery at training points to create training datasets
Now that we have created the points and labels, we need to sample the Landsat 8 imagery using image.sampleRegions(). This command will extract the reflectance in the designated bands for each of the points you have created. 

We will then:
* Select the bands for training
* Sample the input imagery to get a FeatureCollection of training data

### Split the samples into training/test sets
The goal is to split up the region of interest into training data and validation data (with a randomization). The training set is used to train the model and test set is used to validate it.

In [7]:
# # selct bands wanted to use in the classifcation
bands = ['B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B10']

In [14]:
# This property of the table stores the land cover labels.
label = 'landcover'
# # Add a random column and split the GCPs into training and validation set
gcp = training_data.randomColumn()

# # This being a simpler classification, we take 30% points
# # for validation. Normal recommended ratio is
# # 70% training, 30% validation
trainingGcp = gcp.filter(ee.Filter.lt('random', 0.7))
validationGcp = gcp.filter(ee.Filter.gte('random', 0.7))
Map.addLayer(validationGcp)
# # Overlay the point on the image to get training data.
composite = landsat.select(bands)
training = composite.sampleRegions(
    **{
  'collection': trainingGcp,
  'properties': [label],
  'scale': 30}
)
print(training.size().getInfo())

241


### Classifcation method
The <a href="https://developers.google.com/earth-engine/guides/classification">Classifier</a> package in handles supervised classification by ML algorithms running in Earth Engine. Thus, in this part we will:
* Instantiate a supervised classifier
* Set its parameters if necessary
* Train the classifier using the training data
* Classify an image or feature collection
* Display the classified map

In [16]:
# # classifier
classifier = ee.Classifier.libsvm().train(**{
  'features' : training,
  'classProperty' : 'landcover',
  'inputProperties' : bands
})

classified = landsat.select(bands).classify(classifier)

# # Display the clusters with random colors.
Map.addLayer(classified.randomVisualizer(), {}, 'classified')

### Accuracy assessment
Use validation samples to assess the accuracy of a classifier, use a Confusion Matrix (<a href="http://www.sciencedirect.com/science/article/pii/S0034425797000837">Stehman 1997</a>) to calculate overall accuracy (OA) and Kappa coefficient. confusionMatrix() computes a 2D confusion matrix for a classifier based on its training data (ie: resubstitution error). Axis 0 of the matrix correspond to the input classes (i.e., reference data), and axis 1 to the output classes (i.e., classification data). The rows and columns start at class 0 and increase sequentially up to the maximum class value

In [21]:
# # # Accuracy Assessment
test = classified.sampleRegions(
    **{
  'collection': validationGcp,
  'properties': [label],
  'scale': 30}
)
print(test.size().getInfo())

108


In [22]:
# # # confusion matrix
test_accuracy = test.errorMatrix('landcover', 'classification')
test_accuracy.getInfo()

[[17, 6, 9, 0], [2, 22, 0, 0], [3, 3, 36, 0], [0, 0, 0, 10]]

In [23]:
# # accuracy
test_accuracy.accuracy().getInfo()

0.7870370370370371

In [24]:
# # kappa
test_accuracy.kappa().getInfo()

0.6980306345733043

### Download confusion matrix
Use the following code to download the calculated confusion matrix and save it as CSV file in your system.

In [25]:
# # # Download confusion matrix
import csv
import os

out_dir = os.path.join(os.path.expanduser('~'), 'Downloads')
testing_csv = os.path.join(out_dir, 'test_accuracy.csv')

with open(testing_csv, "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(test_accuracy.getInfo())

### Export the classified result
Export the result directly to your computer

In [26]:
# # # Export the result
import os

out_dir = os.path.join(os.path.expanduser('~'), 'Downloads')
out_file = os.path.join(out_dir, 'landcover.tif')

In [43]:
geemap.ee_export_image(classified, filename=out_file, scale=30)

Generating URL ...
Downloading data from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/thumbnails/676e2d1df6b0df0e1da296724ce39eab-cfc6ed8f554a9a85445db6da65dc7175:getPixels
Please wait ...
Data downloaded to C:\Users\Abolfazl\Downloads\landcover.tif


<span style='background:yellow'> <span style="font-size:16.0pt"> Exercise  </span>

### Exercise 1 - Split GCPs into different ratio 
Try to choose different percentage of ratio for training/test samples and check how the classifcation results and accuracy of the model for test samples varies.

#### Then try to answer the following questions:
* What ratio did you choose?
* Is the accuracy of the model changed?
* What is your recommended ratio for splitting the samples?

### Exercise 2 - Load another data 
Try to collect another dataset (e.g., Sentinel-2) using Imagecollection and apply the same/different ML technique to classify the image. You can check different data from <a href="https://developers.google.com/earth-engine/datasets">data catalog</a>. 

#### Then try to answer the following questions:
* What are the differences between both datasets (e.g., in terms of bands, resolution, etc.)?
* For which dataset does the model achieve better results?
* Is the accuracy of the model higher/lower for Sentinel-2, and why?

## References
This is where the references go. For exmaple:

* Wu, Q., (2020). geemap: A Python package for interactive mapping with Google Earth Engine. The Journal of Open Source Software, 5(51), 2305. https://doi.org/10.21105/joss.02305

* "Earth Observation: Data, Processing and Applications" book. Available through Wattle, or http://www.crcsi.com.au/earth-observation-series.

## Additional information

**License:** The code in this notebook was initially created by the team at [Digital Earth Australia](https://github.com/GeoscienceAustralia/dea-notebooks), and has been modified by Abolfazl Abdollahi. The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 

**Contact:** If you need assistance, please post a question on the ENGN3903 Wattle site. 

**Last modified:** June 2022

### Exercise answers

<a name="ex1answer">Answer to Exercise 1</a>

In [30]:
# # This property of the table stores the land cover labels.
# label = 'landcover'
# # # Add a random column and split the GCPs into training and validation set
# gcp = training_data.randomColumn()

# # # This being a simpler classification, we take 30% points
# # # for validation. Normal recommended ratio is
# # # 70% training, 30% validation
# trainingGcp = gcp.filter(ee.Filter.lt('random', 0.6))
# validationGcp = gcp.filter(ee.Filter.gte('random', 0.6))
# Map.addLayer(validationGcp)
# # # Overlay the point on the image to get training data.
# composite = landsat.select(bands)
# training = composite.sampleRegions(
#     **{
#   'collection': trainingGcp,
#   'properties': [label],
#   'scale': 30}
# )
# print(training.size().getInfo())

<a name="ex1answer">Answer to Exercise 2</a>

In [38]:
# #Import Imagery
# sentinel2 = ee.ImageCollection('COPERNICUS/S2') \
#   .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 10)) \
#   .filterDate('2020-09-01','2020-09-30') \
#   .filterBounds(geometry) \
#   .select('B.*')
# #
# print(sentinel2.size().getInfo());

In [39]:
# # # # Visualize Image
# composite = sentinel2.median().clip(geometry)
# vis_params = {'min': 0, 'max': 3000, 'bands': ['B4', 'B3', 'B2']}

# Map.addLayer(composite, vis_params, 'Sent-2 Image' );
# Map