<img src="https://avatars.githubusercontent.com/u/74911464?s=200&v=4"
     alt="OpenEO Platform logo"
     style="float: left; margin-right: 10px;" />
# OpenEO Platform - UC9
Dynamic large area land cover mapping

In [2]:
import openeo
import geopandas as gpd
import pandas as pd
from openeo_classification.landuse_classification import *
from datetime import date
import ipywidgets as widgets
import datetime
import json

## Objectives and approach

In this notebook we will be studying land cover mapping. Land cover mapping has been done since the onset of remote sensing, and LC products have been identified as a fundamental variable needed for studying the functional and morphological changes occurring in the Earth's ecosystems and the environment, and plays therefore an important role in studying climate change and carbon circulation (Congalton et al., 2014; Feddema et al., 2005; Sellers et al., 1997). In addition to that, it provides valuable information for policy development and a wide range of applications within natural sciences and life sciences, making it one of the most widely studied applications within remote sensing (Yu et al., 2014, Tucker et al., 1985; Running, 2008; Yang et al., 2013).

With this variety in application fields comes a variety of user needs. Depending on the use case, there may be large differences in the target labels desired, the target year(s) requested, the output resolution needed, the featureset used, the stratification strategy employed, and more. The goal of this use case is to show that OpenEO as a platform can deal with this variability, and we will do so through creating a userfriendly interface in which the user can set a variety of parameters that will tailor the pipeline from -reference set & L2A+GRD > to model > to inference- to the users needs.

## Methodology

#### Reference data 
The reference dataset used in this project is the Land Use/Cover Area frame Survey (LUCAS) Copernicus dataset. LUCAS is an evenly spaced in-situ land use and land cover ground survey exercise that extends over the entire of the European Union. The Copernicus module extends up to 51m in four cardinal directions to delineate polygons for each of these points. The final product contains about 60,000 polygons, from which subsequent points can be sampled (d'Andrimont et al., 2021). The user can specify how many points to sample from these polygons to train his model. In addition, the user can upload extra target data to improve performance.

#### Input data
The service created runs on features constructed from GRD sigma0 and L2A data. This data will be accessed through OpenEO platform from Terrascope and  SentinelHub. The extracted data stems from 01-01-2018 to 31-12-2018, corresponding to the year of collection of the LUCAS dataset (2018) on the basis of which the model was trained. Data from other years can be extracted for prediction, provided that the user uploads their own reference set.

From S2: calculation of 7 indices (NDVI, NDMI, NDGI, ANIR, NDRE1, NDRE2, NDRE5) and keeping 2 bands (B06, B12)
From S1: VV, VH and VV/VH
For all of these, 10 features: p25, p50, p75, sd and 6 t-steps, with flexible range

#### Preprocessing
The L2A data has been masked using the sen2cor sceneclassification, with a buffering approach developed at VITO and made available as a process called mask_scl_dilation.

#### Feature engineering
From the L2A data, 7 indices were calculated: the NDVI, NDMI, NDGI, NDRE1, NDRE2, NDRE5 and ANIR. After calculating the indices, most bands were dropped except for B06 and B12. The outputs are rescaled to 0 to 30000 for the sake of computational efficiency. The indices are aggregated temporally with a step size of 10 days with an overlap of 10 days by taking the median. The output is then interpolated linearly to end up with a timeseries. The linear interpolation calculates an interpolated value for every NA value, except for trailing and leading NA’s.
From the Sentinel-1 GRD collection, backscatter is calculated. The ratio of VV / VH backscatter is calculated and rescaled to 0-30000. The timeseries is then aggregated temporally with a step size of 12 and an overlap of 6 by taking the mean. The interpolation output is repeated even though there should not be any missing values in this set. Next, the S1 cube is resampled spatially.
Next, the two datacubes are merged and 10 features are calculated on each of the band dimensions. These 10 features are the standard deviation, 25th, 50th and 75th percentile, and 6 equidistant t-steps. Through this procedure, we end up with a total of 120 features (12 bands x 10 features).

#### Model
Where previously models had to be trained outside of openEO, we can now train Random Forest models in openEO itself. Hyperparameter tuning can be performed using a custom hyperparameter set, and models can be constructed using either feature fusion or decision fusion, i.e., combine all S1 and S2 features and train one model, or train one model on the S1 features and one on the S2 features, and combine the models later. Random Forest was chosen as implementation as it is a fairly basic algorithm that trains rather quickly. Models can be trained using a stratification layer uploaded by the user, resulting in one model per stratification class. After training, the model(s) are validated and the model is used for prediction.

## Implementation
First, create an area of interest for which you want do this classification, and potentially a stratification layer if you would like to make use of stratification. Tune the other parameters to your personal preference.

In [11]:
### TODO: Also a drawing functionality for the user (for AOI, stratification layer) ?

In [3]:
train_test_split, algorithm, nrtrees, mtry, fusion_technique, aoi, strat_layer, start_date, end_date, nr_targets, nr_spp = getStartingWidgets()

Box(children=(Label(value='Train / test split:'), FloatSlider(value=0.75, max=1.0, step=0.05)))

Dropdown(description='Model:', disabled=True, options=('Random Forest',), value='Random Forest')

Box(children=(Label(value='Hyperparameters RF model:'), IntText(value=1000, description='Nr trees:'), IntText(…

Box(children=(Label(value='S1 / S2 fusion:'), RadioButtons(options=('Feature fusion', 'Decision fusion'), valu…

FileUpload(value={}, accept='.geojson,.shp', description='Upload AOI', layout=Layout(width='20em'))

FileUpload(value={}, accept='.geojson,.shp', description='Upload stratification', layout=Layout(width='20em'))

DatePicker(value=datetime.date(2018, 1, 1), description='Start date')

DatePicker(value=datetime.date(2018, 12, 31), description='End date')

Box(children=(Label(value='Select the amount of target classes:'), IntSlider(value=10, max=37, min=2)))

Box(children=(Label(value='Select the amount of times you want to point sample each reference polygon:'), IntS…

In [4]:
target_classes = getTargetClasses(nr_targets)

SelectMultiple(description='Target class', options=('A00: Artificial land', 'A10: Roofed built-up areas', 'A20…

SelectMultiple(description='Target class', options=('A00: Artificial land', 'A10: Roofed built-up areas', 'A20…

SelectMultiple(description='Target class', options=('A00: Artificial land', 'A10: Roofed built-up areas', 'A20…

SelectMultiple(description='Target class', options=('A00: Artificial land', 'A10: Roofed built-up areas', 'A20…

In [5]:
y = getReferenceSet(aoi, nr_spp, target_classes)

Loading in the LUCAS Copernicus dataset...
Finished loading data.
Extracting points and converting target labels...
Finished extracting points and converting target labels


In [6]:
def getStrata():
    if len(strat_layer.value) == 0:
        strata = gpd.GeoDataFrame.from_features(json.loads(aoi.data[0]))
    else:
        strata = gpd.GeoDataFrame.from_features(json.loads(strat_layer.data[0]))
    return strata

strata = getStrata()

In [None]:
### TODO: Misschien nog een check inbouwen dat AOI en stratificatielagen elkaar op zijn minst overlappen?

Now we will start to fit our model. If you are using stratification, you will be constructing a model for every stratum you have specified. Also, if you are using decision fusion, you will have a separate model for both the Sentinel-1 and the Sentinel-2 features. If you are using feature fusion, the S1 and S2 features will be merged and you will train one model per stratum.

In [9]:
def fitRandomForestModel(feature_raster, y):
    features, feature_list = load_lc_features(feature_raster, y, start_date.value, end_date.value)
    X = features.aggregate_spatial(json.loads(y.to_json()), reducer="mean")
    ml_model = X.fit_class_random_forest(target=json.loads(y.to_json()), training=train_test_split.value, num_trees=nrtrees.value, mtry=mtry.value)
    model = ml_model.save_ml_model()
    training_job = model.create_job()
    training_job.start_and_wait()
    return training_job.job_id

In [10]:
jobids = {}
for index, stratum in enumerate(strata["geometry"]):
    y_final = gpd.clip(y, stratum)
    if fusion_technique.value == "Decision fusion":
        jobids["s1_stratum"+str(index)] = fitRandomForestModel("s1", y_final)
        jobids["s2_stratum"+str(index)] = fitRandomForestModel("s2", y_final)
    else:
        jobids["both_stratum"+str(index)] = fitRandomForestModel("both", y_final)

Authenticated using refresh token.
Authenticated using refresh token.
0:00:00 Job 'd458856e-b315-4d13-a369-a2c86f3adaf2': send 'start'
0:01:06 Job 'd458856e-b315-4d13-a369-a2c86f3adaf2': queued (progress N/A)
0:01:11 Job 'd458856e-b315-4d13-a369-a2c86f3adaf2': queued (progress N/A)
0:01:18 Job 'd458856e-b315-4d13-a369-a2c86f3adaf2': queued (progress N/A)
0:01:26 Job 'd458856e-b315-4d13-a369-a2c86f3adaf2': queued (progress N/A)
0:01:37 Job 'd458856e-b315-4d13-a369-a2c86f3adaf2': queued (progress N/A)
0:01:49 Job 'd458856e-b315-4d13-a369-a2c86f3adaf2': queued (progress N/A)
0:02:06 Job 'd458856e-b315-4d13-a369-a2c86f3adaf2': queued (progress N/A)
0:02:25 Job 'd458856e-b315-4d13-a369-a2c86f3adaf2': queued (progress N/A)
0:02:50 Job 'd458856e-b315-4d13-a369-a2c86f3adaf2': running (progress N/A)
0:03:24 Job 'd458856e-b315-4d13-a369-a2c86f3adaf2': running (progress N/A)
0:04:01 Job 'd458856e-b315-4d13-a369-a2c86f3adaf2': running (progress N/A)
0:04:49 Job 'd458856e-b315-4d13-a369-a2c86f3adaf

After training the respective models, we can do inference. In general you would like to first do inference on a test set, so that you can calculate a number of accuracy metrics, such as the overall accuracy, the F-score, and/or creating a confusion matrix.

In [14]:
cube = features.filter_spatial(gpd.GeoDataFrame.from_features(json.loads(list(aoi.value.values())[0]["content"]))["geometry"][0])
predicted = cube.predict_random_forest(
    model="f18fe77e-8767-4132-9977-43e13ae33999",
    dimension="bands"
)

nieuwe_job = predicted.execute_batch(format="GTiff")

0:00:00 Job '490c49ec-6cbd-4664-bf79-dc924106cc1a': send 'start'


OpenEoApiError: [500] unknown: Did not find value which can be converted into java.lang.String

## References
d'Andrimont, Raphaël, et al., 2021. "LUCAS Copernicus 2018: Earth-observation-relevant in situ data on land cover and use throughout the European Union." Earth System Science Data 13.3 (2021): 1119-1133.

Congalton, R. G., Gu, J., Yadav, K., Thenkabail, P., & Ozdogan, M. (2014). Global land cover mapping: A review and uncertainty analysis. Remote Sensing, 6(12), 12070-12093.

Feddema, J. J., Oleson, K. W., Bonan, G. B., Mearns, L. O., Buja, L. E., Meehl, G. A., & Washington, W. M. (2005). The importance of land-cover change in simulating future climates. Science, 310(5754), 1674-1678.

Running, S. W. (2008). Ecosystem disturbance, carbon, and climate. Science, 321(5889), 652-653.

Sellers, P. J., Dickinson, R. E., Randall, D. A., Betts, A. K., Hall, F. G., Berry, J. A., ... & Henderson-Sellers, A. (1997). Modeling the exchanges of energy, water, and carbon between continents and the atmosphere. Science, 275(5299), 502-509.

Tucker, C. J., Townshend, J. R., & Goff, T. E. (1985). African land-cover classification using satellite data. Science, 227(4685), 369-375.

Yang, J., Gong, P., Fu, R., Zhang, M., Chen, J., Liang, S., ... & Dickinson, R. (2013). The role of satellite remote sensing in climate change studies. Nature climate change, 3(10), 875-883.

Yu, L., Liang, L., Wang, J., Zhao, Y., Cheng, Q., Hu, L., ... & Gong, P. (2014). Meta-discoveries from a synthesis of satellite-based land-cover mapping research. International Journal of Remote Sensing, 35(13), 4573-4588.