<a href="https://colab.research.google.com/github/kerner-lab/gee_tutorials/blob/main/Crop_Map_Argentina.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial: Cropland mapping with Google Earth Engine
Google Earth Engine (GEE) is a cloud-based platform for interacting with and analyzing petabytes of satellite and other Earth data sets. GEE can be used with the Javascript browser-based code editor (https://code.earthengine.google.com) or the Python API. The main benefit of GEE is that it allows you to access huge remote sensing data sets and perform analysis entirely on Google's infrastructure without having to download files to install libraries on your own computer... for free!

This tutorial shows how to use the GEE Python API to create a cropland classification map.
1.   define a region of interest (ROI)
1.   load training and validation labels
1.   load satellite data sets and make cloud-free composite
1.   train a random forest classifier
1.   apply the trained classifier to generate a cropland map
1.   generate performance metrics for training and validation subsets
1.   export classified map to Google Drive

To run this Colab notebook, you will need a Google Earth Engine account (https://signup.earthengine.google.com/#!/).

You will also need a Google Drive account to **save a copy of this notebook** so that you can save your changes. Before you get started, click File > Save a Copy in Drive, then rename the file using a name of your choice (e.g., Hannah-Crop-Map-Argentina).


Acknowledgment: Some of this tutorial was adapted from the [Rapid Classification of Croplands](https://developers.google.com/earth-engine/tutorials/community/classify-maizeland-ng) tutorial in the Google Earth Engine documentation.

## Set up your environment

The Google Earth Engine API is already installed by default in the Colab environment. We need to authenticate to use our GEE account.

In [None]:
!earthengine authenticate

Now we can import the Earth Engine API and initialize it.

In [None]:
import ee
ee.Initialize()

Next we'll install `geemap`, a python library that provides useful functions for the GEE Python API (https://github.com/giswqs/geemap)

NOTE: You may get an error in this step that says, "You must restart the runtime in order to use newly installed versions." This is a known issue - you will need to click the "Restart Runtime" button and re-run the steps from the beginning (i.e., you will need to authenticate twice).

In [None]:
!pip install geemap

Now that we've installed geemap, we can import the library to use it in our notebook.

In [None]:
import geemap

## Displaying the map

GEE enables you to visualize your data and outputs on a map using `ipyleaflet`, `folium` or other python libraries for map visualization. The `geemap` library provides useful wrapper functions for visualizing the map in just a couple of lines.

In [None]:
# Instantiate a new map
Map = geemap.Map() 

# We can set the map options, e.g., to show the satellite basemap
Map.setOptions('SATELLITE') 

# Display the Map object
Map 

## Define a region of interest

We will define Buenos Aires province as our region of interest.

In [None]:
# Import admin1 boundaries feature collection
admin1 = ee.FeatureCollection("FAO/GAUL/2015/level1");

# Apply filter where admin0 name is Argentina and add it to the map
argentina = admin1.filter(ee.Filter.eq('ADM0_NAME', 'Argentina'));
Map.addLayer(argentina, {}, 'Argentina province boundaries')

buenosaires = admin1.filter(ee.Filter.eq('ADM1_NAME', 'Buenos Aires'));
Map.addLayer(buenosaires, {}, 'Buenos Aires province boundaries')

Map.centerObject(argentina, 6)

Map

In [None]:
# Create a new map object
Map = geemap.Map() 
Map.setOptions('SATELLITE') 

# Convert the Buenos Aires boundary feature collection to a line for map display
border = ee.Image().byte().paint(featureCollection=buenosaires, color=1, width=3)

# Display the map
Map.centerObject(buenosaires, 6)
Map.addLayer(border, {}, 'Buenos Aires province border')
Map

## Import labeled data

We are going to use a labeled dataset of nearly 1000 points created by NASA Harvest available at `users/hkerner/buenos-aires-crop-labels`. These points were randomly sampled from a rectangular bounding box containing Buenos Aires province and labeled by two labelers in Collect Earth Online. 

Each point has an attribute called `subset` which indicates whether the point belongs to the training (70% of points) or validation (30% of points) subset and a `class` attribute which provides a label of crop or non-crop.

First, let's visualize the points on the map colored by training (purple) and validation (yellow) subsets.

In [None]:
samples = ee.FeatureCollection('users/hkerner/buenos-aires-crop-labels')

training_pts = samples.filter(ee.Filter.eq('subset', 'train'))
val_pts = samples.filter(ee.Filter.eq('subset', 'val'))

Map.addLayer(training_pts, {'color': 'purple'}, 'Training points')
Map.addLayer(val_pts, {'color': 'yellow'}, 'Validation points')

Map

We can also visualize the points according to their label. We'll color crop points in green and non-crop points in red.

Notice how there are a lot more non-crop points than crop and all of the points in the ocean are labeled non-crop (of course!). You can zoom in on the map to see the satellite data correpsonding with each label. See if you agree with the labels assigned for some of the points!

In [None]:
crop_pts = samples.filter(ee.Filter.eq('class', 'Crop'))
noncrop_pts = samples.filter(ee.Filter.eq('class', 'Non-crop'))

Map.addLayer(crop_pts, {'color': 'green'}, 'Crop points')
Map.addLayer(noncrop_pts, {'color': 'red'}, 'Non-crop points')

Map

## Load the satellite data

We will create a cloud-free composite of Sentinel-2 images to use as the input for our classifier. 

Sentinel-2 is a frequently-used Earth observation satellite that has 10m resolution and 5-day revisit frequency. You can read more about the dataset in the [GEE catalog](https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR_HARMONIZED?hl=en).

The below code defines and applies functions for creating a cloud-free composite. It is based on this demo in the [GEE python documentation](https://developers.google.com/earth-engine/tutorials/community/sentinel-2-s2cloudless).

In [None]:
# Define parameter settings for cloud masking
START_DATE = ee.Date('2020-04-01')
END_DATE = ee.Date('2021-03-31')
CLOUD_FILTER = 60
CLD_PRB_THRESH = 50
NIR_DRK_THRESH = 0.15
CLD_PRJ_DIST = 1
BUFFER = 50

In [None]:
def get_s2_sr_cld_col(aoi, start_date, end_date):
    # Import and filter S2 SR.
    s2_sr_col = (ee.ImageCollection('COPERNICUS/S2_SR')
        .filterBounds(aoi)
        .filterDate(start_date, end_date)
        .filter(ee.Filter.lte('CLOUDY_PIXEL_PERCENTAGE', CLOUD_FILTER)))

    # Import and filter s2cloudless.
    s2_cloudless_col = (ee.ImageCollection('COPERNICUS/S2_CLOUD_PROBABILITY')
        .filterBounds(aoi)
        .filterDate(start_date, end_date))

    # Join the filtered s2cloudless collection to the SR collection by the 'system:index' property.
    return ee.ImageCollection(ee.Join.saveFirst('s2cloudless').apply(**{
        'primary': s2_sr_col,
        'secondary': s2_cloudless_col,
        'condition': ee.Filter.equals(**{
            'leftField': 'system:index',
            'rightField': 'system:index'
        })
    }))

In [None]:
s2_sr_cld_col_eval = get_s2_sr_cld_col(buenosaires, START_DATE, END_DATE)

In [None]:
def add_cloud_bands(img):
    # Get s2cloudless image, subset the probability band.
    cld_prb = ee.Image(img.get('s2cloudless')).select('probability')

    # Condition s2cloudless by the probability threshold value.
    is_cloud = cld_prb.gt(CLD_PRB_THRESH).rename('clouds')

    # Add the cloud probability layer and cloud mask as image bands.
    return img.addBands(ee.Image([cld_prb, is_cloud]))

In [None]:
def add_shadow_bands(img):
    # Identify water pixels from the SCL band.
    not_water = img.select('SCL').neq(6)

    # Identify dark NIR pixels that are not water (potential cloud shadow pixels).
    SR_BAND_SCALE = 1e4
    dark_pixels = img.select('B8').lt(NIR_DRK_THRESH*SR_BAND_SCALE).multiply(not_water).rename('dark_pixels')

    # Determine the direction to project cloud shadow from clouds (assumes UTM projection).
    shadow_azimuth = ee.Number(90).subtract(ee.Number(img.get('MEAN_SOLAR_AZIMUTH_ANGLE')));

    # Project shadows from clouds for the distance specified by the CLD_PRJ_DIST input.
    cld_proj = (img.select('clouds').directionalDistanceTransform(shadow_azimuth, CLD_PRJ_DIST*10)
        .reproject(**{'crs': img.select(0).projection(), 'scale': 100})
        .select('distance')
        .mask()
        .rename('cloud_transform'))

    # Identify the intersection of dark pixels with cloud shadow projection.
    shadows = cld_proj.multiply(dark_pixels).rename('shadows')

    # Add dark pixels, cloud projection, and identified shadows as image bands.
    return img.addBands(ee.Image([dark_pixels, cld_proj, shadows]))

In [None]:
def add_cld_shdw_mask(img):
    # Add cloud component bands.
    img_cloud = add_cloud_bands(img)

    # Add cloud shadow component bands.
    img_cloud_shadow = add_shadow_bands(img_cloud)

    # Combine cloud and shadow mask, set cloud and shadow as value 1, else 0.
    is_cld_shdw = img_cloud_shadow.select('clouds').add(img_cloud_shadow.select('shadows')).gt(0)

    # Remove small cloud-shadow patches and dilate remaining pixels by BUFFER input.
    # 20 m scale is for speed, and assumes clouds don't require 10 m precision.
    is_cld_shdw = (is_cld_shdw.focalMin(2).focalMax(BUFFER*2/20)
        .reproject(**{'crs': img.select([0]).projection(), 'scale': 20})
        .rename('cloudmask'))

    # Add the final cloud-shadow mask to the image.
    return img_cloud_shadow.addBands(is_cld_shdw)

In [None]:
s2_sr_cld_col = get_s2_sr_cld_col(buenosaires, START_DATE, END_DATE)

In [None]:
def apply_cld_shdw_mask(img):
    # Subset the cloudmask band and invert it so clouds/shadow are 0, else 1.
    not_cld_shdw = img.select('cloudmask').Not()

    # Subset reflectance bands and update their masks, return the result.
    return img.select('B.*').updateMask(not_cld_shdw)

In [None]:
s2_sr_median = (s2_sr_cld_col.map(add_cld_shdw_mask)
                             .map(apply_cld_shdw_mask)
                             .median())

The result of this preprocessing is a cloud-free Sentinel-2 mosaic consisting of the median values across all observations acquired over the one year period from April 2020 to April 2021.

In [None]:
Map.addLayer(border, {}, 'Buenos Aires province border')

# Add layers to the map.
Map.addLayer(s2_sr_median,
                {'bands': ['B11', 'B8', 'B3'], 'min': 225, 'max': 4000, 'gamma': 1.1},
                'S2 cloud-free mosaic')

Map

## Random Forest Classifier

Now that we have our input data (Sentinel-2 mosaic) and our labels (crop and non-crop labels) with training and validation subsets, we can train the Random Forest classifier using the training data points. Then we will use the trained model to predict all pixels in the Buenos Aires ROI as either crop or non-crop. Finally, we will assess the performance metrics for the training and validation subsets.

In [None]:
# Specify and select bands that will be used in the classification.
bands = ['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B8A', 'B9', 'B11', 'B12']

imageCl = s2_sr_median.select(bands)

# Overlay the training points on the imagery to get a training sample; include
# the crop classification property ('class') in the sample feature collection.
training = imageCl.sampleRegions(
                     collection=training_pts,
                     properties=['binaryclas'],
                     scale=30,
                     tileScale=8).filter(ee.Filter.neq(
                       'B1', None)); # Remove null pixels.

In [None]:
# Train a random forest classifier with default parameters.
trainedRf = ee.Classifier.smileRandomForest(numberOfTrees=10).train(features=training,
                                                                    classProperty='binaryclas',
                                                                    inputProperties=bands)

In [None]:
# Apply the trained model to the entire Buenos Aires region
classifiedRf = imageCl.select(bands).classify(trainedRf)

In [None]:
trainedRf_clipped = classifiedRf.clip(buenosaires.clip())

In [None]:
# Add the output of the training classification to the map.
classVis = {'min': 0, 'max': 1, 'palette': ['484848', '00ff00']}
Map.addLayer(trainedRf_clipped, classVis, 'Classes (RF)')

Map

In [None]:
# Compute the confusion matrix for the training data
trainAccuracyRf = trainedRf.confusionMatrix()

# Print model accuracy results.
print('##### TRAINING ACCURACY #####')
print('RF: overall accuracy:', trainAccuracyRf.accuracy().getInfo())
print('RF: user accuracy:', trainAccuracyRf.consumersAccuracy().getInfo())
print('RF: producer accuracy:', trainAccuracyRf.producersAccuracy().getInfo())

In [None]:
# Print the error matrix for the training set
print('RF: error matrix:', trainAccuracyRf.getInfo())

In [None]:
# Extract band pixel values for validation points.
validation = imageCl.sampleRegions(
                       collection= val_pts,
                       properties= ['binaryclas'],
                       scale= 10,
                       tileScale= 8
                     ).filter(ee.Filter.neq('B1', None)) # Remove null pixels.

In [None]:
# Classify the validation data.
validatedRf = validation.classify(trainedRf)

In [None]:
# Calculate the validation error matrix and accuracy for both classifiers by
# using the "confusionMatrix" function to generate metrics on the
# resubstitution accuracy.

validationAccuracyRf = validatedRf.errorMatrix('binaryclas', 'classification')

# Print validation accuracy results.
print('##### VALIDATION ACCURACY #####')
print('RF: overall accuracy: ', validationAccuracyRf.accuracy().getInfo())
print('RF: user accuracy:', validationAccuracyRf.consumersAccuracy().getInfo())
print('RF: producer accuracy:', validationAccuracyRf.producersAccuracy().getInfo())

print('RF: error matrix: ', validationAccuracyRf.getInfo())

## Export the classified map

We can export the classified map to Google Drive so that we can use it for downstream analysis (e.g., area estimation, yield prediction, conditions, etc.).

In [None]:
# Export classified map (RF) to Google Drive; alter the command to export to
# other endpoints.
mytask = ee.batch.Export.image.toDrive(
  image= trainedRf_clipped.byte(),
  description= 'cropland-argentina-rf-20220906',
  scale= 10,
  region= buenosaires.geometry(),
  maxPixels= 1e13,
  crs='EPSG:32718')

mytask.start()