# Land Cover Mapping using Digital Earth Africa Sandbox

## Background

Earth observation (EO) uses remote sensing to gather and interpret information about the Earth’s state. Remote sensing techniques are used to produce continuous and categorical maps of the properties of the Earth’s surface, known as land cover mapping. Land cover mapping visualises the physical coverage of the Earth’s surface. The term land cover can be broadly classified into categories such as water, crops, or built area; or classified into more dynamic categories such as forest, grassland, shrub-land, perennial cropland, annual cropland, wetland, water body and urban settlement.

Land cover maps are used to visualise land cover. These visualisations provide insight into environmental changes which are used to better inform public, private and non-profit decision makers and researchers.

In the past, accessing detailed and up-to-date land cover data has been difficult. Classification of satellite images using supervised machine learning (ML) techniques has become a common occurence in the remote sensing literature. Machine learning offers an effective means for identifying complex land cover classes in a relatively efficient manner. However, sensibly implementing machine learning classifiers is not always straighforward owing to the training data requirements, the computational requirements, and the challenge of sorting through a proliferating number of software libraries. Add to this the complexity of handling large volumes of satellite data and the task can become unwieldy at best.

## About the Sandbox

The DE Africa Analysis Sandbox is a cloud-based user computational platform that operates through a Jupyter Lab environment. It provides users with access to data and analysis tools, democratising access to remote-sensing data to allow for ad-hoc report generation and rapid development of new algorithms.

### Open Data Cube 

DE Africa is based on the [Open Data Cube (ODC)](https://www.opendatacube.org/) infrastructure. The ODC is an open-source solution for accessing, managing, and analysing large quantities of Earth observation (EO) data, in particular time-series satellite imagery. The ODC consists of: 

* Multi-dimensional (space, time, data type) time-series spatial data
* Freely accessible analysis-ready-data (ARD)

## The workflow
This notebook series is designed to provide an introduction on the context of this workshop, a guide to getting started with DE Africa, and a manual to guide you through the steps of using machine learning and satellite images for land cover mapping with DE Africa Sandbox.
There are five primary notebooks in this notebook series (along with an optional sixth notebook), that each represent a critical step in the land cover ML workflow.

1. `0_Extract_Training_Features.ipynb` explores how to extract (feature layers) from the ODC using geometries within a shapefile (or geojson). The goal of this notebook is to familarise users with the `collect_training_data` function so you can extract the appropriate data for your use-case.
2. `1_Filter_Training_Data.ipynb` explores how to extract and filter training data (feature layers) from the ODC using geometries within a shapefile (or geojson). The goal of this notebook includes training and applying k-means clustering and filter out minor clusters from the training data.
3. `2_Evaluate_Fit_Classifier.ipynb`: Using the training data extracted in the first notebook, this notebook first evaluates the accuracy of a Random Forest model (using k-fold cross validation), performs a hyperparameter optimization, and then fits a model on the training data. It also helps you choose the most important features.
4. `3_Land_Cover_Classification.ipynb`: This is where we load in satellite data and classify it using the model created in the previous notebook. The notebook initially asks you to provide a number of small test locations so we can observe visually how well the model is going at classifying real data.  
5. `4_Post_Processing.ipynb`: This notebook implements morphological processing, rule-based reclassification using information from external layers.

## Important notes
* For this workshop, we use prepared datasets in the '/Data' folder for demonstration. If you wish to begin running your own classification workflow, the first step is to replace the required datasets with your own in the folder and replace the corresponding file paths in the relevant notebooks.

* There are many different ML models for land cover classification problem. In this workshop we use the Random Forest classifier which was applied in our project work. It's advisable to research different methods for evaluating and training a model to determine which approach is best for you.

* These notebooks rely on [dask](https://dask.org/) (and [dask-ml](https://ml.dask.org/)) to manage memory and distribute the computations across mulitple cores. However, the notebooks are set up for the case of running on a single machine. For example, if your machine has 2 cores and 16 Gb of RAM (these are the specs on the default Sandbox), then you'll only be able to load and classify data up to that 16 Gb limit (and parallelization will be limited to 2 cores). Access to larger machines is required to scale analyses to very large areas. Its unlikley you'll be able to use these notebooks to classify satellite data at the country-level scale using laptop sized machines.  To better understand how we use dask, have a look at the [dask notebook](../../Beginners_guide/08_Parallel_processing_with_dask.ipynb).


## Helpful Resources
* There are many online courses that can help you understand the fundamentals of machine learning with python e.g. [edX](https://www.edx.org/course/machine-learning-with-python-a-practical-introduct), [coursera](https://www.coursera.org/learn/machine-learning-with-python). 
* The [Scikit-learn](https://scikit-learn.org/stable/supervised_learning.html) documentation provides information on the available models and their parameters.
* This [review article](https://www.tandfonline.com/doi/full/10.1080/01431161.2018.1433343) provides a nice overview of machine learning in the context of remote sensing.
* The Digital Earth Africa Notebooks repository in the [deafrica-sandbox-notebooks](https://github.com/digitalearthafrica/deafrica-sandbox-notebooks) hosts series of Jupyter Notebooks, Python scripts and workflows for analysing Digital Earth Africa (DE Africa) satellite data and derived products.
___


## Getting Started

To begin working through the workflow, click on the links to the notebooks:

1. [Extract training features](0_Extract_Training_Features.ipynb)
2. [Filter training data](1_Filter_Training_Data.ipynb)
3. [Evaluate optimize fit classifier](2_Evaluate_Fit_Classifier.ipynb)
4. [Land cover classification](3_Land_Cover_Classification.ipynb)
5. [Post-processing](4_Post_Processing.ipynb)

***

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Africa data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Slack channel](http://slack.opendatacube.org/) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).