### CBERS DATA CUBE: A POWERFUL TECHNOLOGY FOR MAPPING AND MONITORING BRAZILIAN BIOMES (Replicability test)

**Abstract**

<div style="text-align: justify">
Currently, the overwhelming amount of Earth Observation data demands new solutions regarding processing and storage. To reduce the amount of time spent in searching, downloading and pre-processing data, the remote Sensing community is coming to an agreement on the minimum amount of corrections satellite images must convey in order to reach the broadest range of applications. Satellite imagery meeting such criteria (which usually include atmospheric, radiometric and topographic corrections) are generically called Analysis Ready Data (ARD). Furthermore, ARD is being assembled into multidimensional data cubes, minimising pre- processing tasks and allowing scientists and users in general to focus on analysis. A particular instance of this is the Brazil Data Cube (BDC) project, which is processing remote sensing images of medium spatial resolution into ARD datasets and assembling them as multidimensional cubes of the Brazilian territory. For example, BDC users are released from performing tasks such as image co-registration , aerosol interference correction. This work presents a BDC proof of concept, by analysing a BDC data cube made with images from the fourth China-Brazil Earth Resources Satellite (CBERS-4) of one of the largest biodiversity hotspot in the world, the Cerrado biome. It also shows how to map and monitor land use and land cover using the CBERS data cube. We demonstrate that the CBERS data cube is effective in resolving land use and and land cover issues to meet local and national needs related to the landscape dynamics, including deforestation, carbon emissions, and public policies.
</div>    

**DOI**: [10.5194/isprs-annals-V-3-2020-533-2020](10.5194/isprs-annals-V-3-2020-533-2020)

### Cerrado Biome classification using CBERS datacubes

This document will present the steps used to generate the classification map presented in the paper. As presented in the article, the classification process was done using the [SITS R package](github.com/e-sensing/sits).


In [None]:
#
# Seed to results reproducibility
#
set.seed(777)

**Parameters**

In [None]:
#
# Computational resources
#
memsize <- 2
multicores <- 2

#
# classification definitions
#
num_trees <- 1000

#
# post-processing definition
#
smoothing <- "bayesian"

#
# shapefile
#
shp_filename <- ""
shp_directory <- ""
shp_class_attribute <- "label"

#
# bricks configurations
#
bands <- ""
timeline <- ""

bricks_dir <- ""
bricks_names <- ""

#
# output
#
output_dir <- ""

**Processing input**

In [1]:
#
# samples
#
input_samples_shapefile <- paste(shp_directory, shp_filename, sep = "/")

#
# extract bands
#
bands <- unlist(strsplit(bands, ","))

#
# extract timeline
#
timeline <- unlist(strsplit(timeline, ","))

#
# extract brick names
#
bricks_names <- unlist(strsplit(bricks_names, ","))
bricks <- unlist(lapply(X = bricks_names, FUN = function(x) {
    paste(bricks_dir, x, sep = "/")
}))

**output directory**

In [None]:
dir.create(paste(output_dir, "out", sep = "/"), recursive = TRUE)

**Libraries**

In [None]:
library(sits)
library(rgdal)

**Generating datacube using RasterBricks**

The classification process was done with the use of RasterBricks. In general, RasterBricks represent rasters files with multiple dimensions, where each dimension represents an instant of time of a given place and spectral band. Thus, ten RasterBricks are used, one for each spectral band. 


In [None]:
brick_cube <- sits_cube(
                   type      = "BRICK",
                   name      = "Picoli-etal_CUBE",
                   satellite = "CBERS-4",
                   sensor    = "AWFI",
                   timeline  = timeline,
                   bands     = bands,
                   files     = bricks)

**Classification**

Now the classification can be done, as presented in the article, will be made using the Random Forest algorithm.


Load samples to train Random Forest algorithm

In [None]:
#
# extract time-series
#
samples <- sits_get_data(brick_cube, file = input_samples_shapefile, shp_attr = shp_class_attribute)

#
# show extracted time-series
#
head(samples$time_series[[1]], 4)

**K-Fold training**

Before performing the classification of an entire data cube (which can take a while), to then check the results, below is done the K-Fold Cross-Validation. In this, the model is trained with different configurations of the data set, and the general accuracy is the average of the accuracy obtained in the different settings.

> The configuration used below works with five folds (K in `K`-Folds), indicating that 80% of the data was chosen for training and 20% for testing for each training.

> This process is being done to compare the new cube results with the old ones used to create the original article.


In [None]:
sits_kfold_validate(samples, 
                     folds     = 5,
                     ml_method = sits_rfor(num_trees = num_trees)) %>%
                     sits_conf_matrix()

**Train an Random Forest Model**

In [None]:
rfor_model <- sits_train(data      = samples,
                         ml_method = sits_rfor(num_trees = num_trees))

**Classify the datacube**

> This is a time-consuming process


In [None]:
probs <- sits_classify(data       = brick_cube,
                       ml_model   = rfor_model,
                       memsize    = memsize,
                       multicores = multicores,
                       output_dir = output_dir)

**Generate classification label map**

In [None]:
#
# smoothing using 5x5 (sits default in v.0.10.0)
#
probs_smoothed <- sits_smooth(probs, 
                              type       = smoothing, 
                              output_dir = output_dir)

In [None]:
#
# generate labels
#
labels <- sits_label_classification(cube       = probs,
                                    output_dir = output_dir)

**Save the results**

In [None]:
saveRDS(probs, file = paste0(output_dir, "probs_cube.rds"))
save(probs,    file = paste0(output_dir, "probs_cube.Rdata"))

In [None]:
saveRDS(labels, file = paste0(output_dir, "labels_cube.rds"))
save(labels,    file = paste0(output_dir, "labels_cube.Rdata"))