<a href="https://colab.research.google.com/github/kevinworthington/geospatial_colab/blob/main/google_earth_engine_supervised_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


The following notebook using an existing area that has land cover classification values and uses these to train a model. This model is then used to classify an another area that doesn't have classification information.

This works best when we use the same satellite imagery and time period used to derive the landcover values.

We'll be using the National Land Cover Database (NLCD) and Landsat data for this exercise.

This notebook is referenced from: https://githubtocolab.com/gee-community/geemap/blob/master/docs/notebooks/32_supervised_classification.ipynb



## Supervised classification algorithms available in Earth Engine

Source: https://developers.google.com/earth-engine/classification

The `Classifier` package handles supervised classification by traditional ML algorithms running in Earth Engine. These classifiers include CART, RandomForest, NaiveBayes and SVM. The general workflow for classification is:

1. Collect training data. Assemble features which have a property that stores the known class label and properties storing numeric values for the predictors.
2. Instantiate a classifier. Set its parameters if necessary.
3. Train the classifier using the training data.
4. Classify an image or feature collection.
5. Estimate classification error with independent validation data.

The training data is a `FeatureCollection` with a property storing the class label and properties storing predictor variables. Class labels should be consecutive, integers starting from 0. If necessary, use remap() to convert class values to consecutive integers. The predictors should be numeric.

In [33]:
import ee
import geemap

In [34]:
# Trigger the authentication flow.
ee.Authenticate()

# Initialize the library.
ee.Initialize(project='ee-csucentroidtest')

In [35]:
# Create an interactive map
region_map = geemap.Map()

# display the map
region_map

Map(center=[0, 0], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=SearchDataGUI(childr…

In [36]:
# In the map above, create a rectangle within the USA with similar characteristis to the the area you'd like to classify elsewhere.

# E.g if we want to classify an area in Uzbekistan, we might consider starting with an area with Colorado with similar terrain.


In [37]:
# Store the region as a variable for later use.
region = region_map.user_roi
if not region:
  # Alernatively, specify a region programatically
  region = ee.Geometry.Rectangle([-106.413574, 38.591114,-102.106934, 40.971604])

region

### Add data to the map

In [38]:
# We'll load in LANDSAT data within our region for the entire year for 2016, the same year as the NLCD data.

# Raster Ref https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C02_T1_L2#bands
image = (
    ee.ImageCollection("LANDSAT/LC08/C02/T1_L2")
    .filterBounds(region)
    .filterDate("2016-01-01", "2016-12-31")
    .sort("CLOUD_COVER")
    .select("SR_B[1-7]")
).reduce(ee.Reducer.median())

# Determine how to visualize the map
vis_params = {"bands": ["SR_B5_median", "SR_B4_median", "SR_B3_median"]}

# Center the map to the "region" and set the zoom
region_map.centerObject(region, 8)

# Add the data to the map
region_map.addLayer(image, vis_params, "Landsat-8")

### Make training dataset

There are several ways you can create a region for generating the training dataset.

- Draw a shape (e.g., rectangle) on the map and the use `region = Map.user_roi`
- Define a geometry, such as `region = ee.Geometry.Rectangle([-122.6003, 37.4831, -121.8036, 37.8288])`
- Create a buffer zone around a point, such as `region = ee.Geometry.Point([-122.4439, 37.7538]).buffer(10000)`
- If you don't define a region, it will use the image footprint by default

In this example, we are going to use the [USGS National Land Cover Database (NLCD)](https://developers.google.com/earth-engine/datasets/catalog/USGS_NLCD) to create label dataset for training


![](https://i.imgur.com/7QoRXxu.png)

In [39]:
nlcd = ee.Image("USGS/NLCD/NLCD2016").select("landcover").clip(region)
region_map.addLayer(nlcd, {}, "NLCD")

In [40]:
# Make the training dataset.
points = nlcd.sample(
    **{
        "region": region,
        "scale": 30,
        "numPixels": 5000,
        "seed": 0,
        "geometries": True,  # Set this to False to ignore geometries
    }
)

region_map.addLayer(points, {}, "training", False)

In [41]:
print(points.size().getInfo())

5000


In [42]:
print(points.first().getInfo())

{'type': 'Feature', 'geometry': {'type': 'Point', 'coordinates': [-104.71615773568249, 40.4318244867619]}, 'id': '0', 'properties': {'landcover': 21}}


### Train the classifier

In [43]:
# Use these bands for prediction.
bands = ["SR_B1_median", "SR_B2_median", "SR_B3_median", "SR_B4_median", "SR_B5_median", "SR_B6_median", "SR_B7_median"]


# This property of the table stores the land cover labels.
label = "landcover"

# Overlay the points on the imagery to get training.
training = image.select(bands).sampleRegions(
    **{"collection": points, "properties": [label], "scale": 30}
)

# Train a CART classifier with default parameters.
trained = ee.Classifier.smileCart().train(training, label, bands)

In [44]:
print(training.first().getInfo())

{'type': 'Feature', 'geometry': None, 'id': '0_0', 'properties': {'SR_B1_median': 9365, 'SR_B2_median': 9866, 'SR_B3_median': 11289, 'SR_B4_median': 11842, 'SR_B5_median': 16599, 'SR_B6_median': 14699, 'SR_B7_median': 13384, 'landcover': 21}}


### Classify another area

In [45]:
# Create an interactive map
classify_map = geemap.Map()

# display the map
classify_map

Map(center=[0, 0], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=SearchDataGUI(childr…

In [46]:
# In the map above, create a rectangle in Uzbekistan


In [47]:
# Store the region as a variable for later use.
classify_bounds = classify_map.user_roi

#Alernatively, specify a region programatically
if not classify_bounds:
  classify_bounds = ee.Geometry.Rectangle([68.079529, 40.765982, 69.383911, 40.824549])

classify_bounds

In [48]:
# Load the Landsat data within our classify bounds
image = (
    ee.ImageCollection("LANDSAT/LC08/C02/T1_L2")
    .filterBounds(classify_bounds)
    .filterDate("2016-01-01", "2016-12-31")
    .sort("CLOUD_COVER")
    .select("SR_B[1-7]")
).reduce(ee.Reducer.median())

# Determine how to visualize the map
vis_params = {"bands": ["SR_B5_median", "SR_B4_median", "SR_B3_median"]}

# Add the data to the map
classify_map.addLayer(image, vis_params, "Landsat-8")

In [49]:
# Classify the image with the same bands used for training.
result = image.select(bands).classify(trained)

# # Display the clusters with random colors.
classify_map.addLayer(result.randomVisualizer(), {}, "classified")

### Render categorical map

To render a categorical map, we can set two image properties: `landcover_class_values` and `landcover_class_palette`. We can use the same style as the NLCD so that it is easy to compare the two maps.

In [61]:
class_values = nlcd.get("landcover_class_values").getInfo()
class_values

[11,
 12,
 21,
 22,
 23,
 24,
 31,
 41,
 42,
 43,
 51,
 52,
 71,
 72,
 73,
 74,
 81,
 82,
 90,
 95]

In [51]:
class_palette = nlcd.get("landcover_class_palette").getInfo()
class_palette

['476ba1',
 'd1defa',
 'decaca',
 'd99482',
 'ee0000',
 'ab0000',
 'b3aea3',
 '68ab63',
 '1c6330',
 'b5ca8f',
 'a68c30',
 'ccba7d',
 'e3e3c2',
 'caca78',
 '99c247',
 '78ae94',
 'dcd93d',
 'ab7028',
 'bad9eb',
 '70a3ba']

In [52]:
landcover = result.set("classification_class_values", class_values)
landcover = landcover.set("classification_class_palette", class_palette)

In [53]:
classify_map.addLayer(landcover, {}, "Land cover")

classify_map.add_legend(builtin_legend="NLCD")

In [54]:
# classify multiple date ranges

date_ranges=[["2017-01-01", "2017-12-31"],["2018-01-01", "2018-12-31"]]

def classify_multiple_date_ranges(range,title):
  # Load the Landsat data within our classify bounds
  temp_image = (
      ee.ImageCollection("LANDSAT/LC08/C02/T1_L2")
      .filterBounds(classify_bounds)
      .filterDate(range[0], range[1])
      .sort("CLOUD_COVER")
      .select("SR_B[1-7]")
  ).reduce(ee.Reducer.median())

  temp_result = temp_image.select(bands).classify(trained)
  # # Display the clusters with NLCD colors.
  temp_result = temp_result.set("classification_class_values", class_values)
  temp_result = temp_result.set("classification_class_palette", class_palette)
  classify_map.addLayer(temp_result, {}, title)

  # Export to Google drive
  # note the scale refers to the resolution in meters per pixcel
  # not working :(
  # geemap.ee_export_image_to_drive(
  #   temp_result, description=title, folder="export", scale=900
  # )

for id, d in enumerate(date_ranges):
  classify_multiple_date_ranges(d,"classified_"+str(id))


In [55]:
# get the landcover statistics for the first classified image

import pandas as pd

class_values = result.reduceRegion(
    reducer=ee.Reducer.frequencyHistogram(),
    geometry=classify_bounds,
    scale=30  # Adjust scale as needed
).get('classification').getInfo()

# Convert the dictionary to a DataFrame
df = pd.DataFrame.from_dict(class_values, orient='index', columns=['Area (pixels)'])

# Calculate area in square meters (assuming 30m resolution)
df['Area (sq meters)'] = df['Area (pixels)'] * 30 * 30

In [56]:
df

Unnamed: 0,Area (pixels),Area (sq meters)
11,1867.729412,1680956.0
21,67507.160784,60756440.0
22,135348.113725,121813300.0
23,34359.85098,30923870.0
24,801.709804,721538.8
31,15.0,13500.0
41,44248.505882,39823660.0
42,73020.094118,65718080.0
43,89.0,80100.0
52,221915.196078,199723700.0


In [71]:
# put it all together to generate an exportable dataframe
date_ranges=[["2017-01-01", "2017-12-31"],["2018-01-01", "2018-12-31"]]
dataframes=[]


def classify_multiple_date_ranges(range,title):
  # Load the Landsat data within our classify bounds
  temp_image = (
      ee.ImageCollection("LANDSAT/LC08/C02/T1_L2")
      .filterBounds(classify_bounds)
      .filterDate(range[0], range[1])
      .sort("CLOUD_COVER")
      .select("SR_B[1-7]")
  ).reduce(ee.Reducer.median())

  temp_result = temp_image.select(bands).classify(trained)


  class_values = temp_result.reduceRegion(
      reducer=ee.Reducer.frequencyHistogram(),
      geometry=classify_bounds,
      scale=30  # Adjust scale as needed
  ).get('classification').getInfo()

  # Convert the dictionary to a DataFrame
  df = pd.DataFrame.from_dict(class_values, orient='index', columns=['Area (pixels) '+title])

  # Calculate area in square meters (assuming 30m resolution)
  df['Area (sq meters) '+title] = df['Area (pixels) '+title] * 30 * 30
  dataframes.append(df)



for id, d in enumerate(date_ranges):
  classify_multiple_date_ranges(d,"classified_"+str(id))

all_dataframes = pd.concat(dataframes, axis=1)

In [72]:
all_dataframes

Unnamed: 0,Area (pixels) classified_0,Area (sq meters) classified_0,Area (pixels) classified_1,Area (sq meters) classified_1
11,362.247059,326022.4,367.12549,330412.9
21,66407.501961,59766750.0,68080.164706,61272150.0
22,136211.545098,122590400.0,124745.058824,112270600.0
23,34346.631373,30911970.0,48419.419608,43577480.0
24,602.619608,542357.6,1377.545098,1239791.0
31,9.0,8100.0,46.0,41400.0
41,42312.043137,38080840.0,33599.905882,30239920.0
42,90349.078431,81314170.0,67898.866667,61108980.0
43,187.745098,168970.6,62.0,55800.0
52,163754.721569,147379200.0,184712.666667,166241400.0


In [78]:
# export the data
from google.colab import drive
drive.mount('/content/gdrive')

all_dataframes.to_csv('/content/gdrive/My Drive/output.csv')
#then unmount the drive
drive.flush_and_unmount()

Mounted at /content/gdrive
