<img style="float: left; margin:0px 15px 15px 0px; width:120px" src="https://www.orfeo-toolbox.org/wp-content/uploads/2016/03/logo-orfeo-toolbox.png">

# OTB Guided Tour - Virtual Workshop
## Emmanuelle SARRAZIN, Yannick TANGUY and David YOUSSEFI (CNES, French Space Agency)

<br>

Orfeo ToolBox (OTB) is an open-source library for remote sensing images processing. It has been initiated and funded by CNES to promote the use and the exploitation of the satellites images. Orfeo ToolBox aims at enabling large images state-of-the-art processing even on limited resources laptops, and is shipped with a set of extensible ready-to-use tools for classical remote sensing tasks, as well as a fully integrated, end-users oriented software called Monteverdi ; OTB is also accessible via QGIS processing module.

This tutorial will present the ORFEO Toolbox and showcase available applications for processing and manipulating satellite imagery.

<b> Press <span style="color:black;background:yellow">SHIFT+ENTER</span> to execute the notebook interactively cell by cell </b></div>



## Mount Google Drive and OTB Installation (optional step)

The following cells are needed to run this notebook on Google Colab. First it mounts a google drive folder, then it downloads OTB binaries and install it in the virtual environment. Then, it compiles Python bindings so we can later run a "import otpApplication" command.

*if you run this notebook from your own computer, you can jump to cell 5 ("Sentinel 2 Dataset")*

In [None]:
from google.colab import drive
drive.mount('/content/gdrive/')
# *******************************************************************************************************
# 
# At this step, google will ask you to login and authorize access to your google drive from this notebook
#
# *******************************************************************************************************

import sys

# TODO : fix folder path
FOLDER = 'gdrive/My Drive/material/01-otb-guided-tour/'
sys.path.append(FOLDER)

# This will download Orfeo ToolBox
!wget https://www.orfeo-toolbox.org/packages/OTB-7.2.0-Linux64.run
!apt-get install file

# This configures OTB (source environment and compile Python bindings)
!chmod +x OTB-7.2.0-Linux64.run && ./OTB-7.2.0-Linux64.run && cd OTB-7.2.0-Linux64 && ctest -S share/otb/swig/build_wrapping.cmake -VV

In [None]:
# *******************************************************************************************************
# Configure OTB environment variables
# *******************************************************************************************************
import os, sys
os.environ["CMAKE_PREFIX_PATH"] = "/content/OTB-7.2.0-Linux64"
os.environ["OTB_APPLICATION_PATH"] = "/content/OTB-7.2.0-Linux64/lib/otb/applications"
os.environ["PATH"] = "/content/OTB-7.2.0-Linux64/bin" + os.pathsep + os.environ["PATH"]
sys.path.insert(0, "/content/OTB-7.2.0-Linux64/lib/python")
os.environ["LC_NUMERIC"] = "C"
os.environ["GDAL_DATA"] = "content/OTB-7.2.0-Linux64/share/gdal"
os.environ["PROJ_LIB"] = "/content/OTB-7.2.0-Linux64/share/proj"
os.environ["GDAL_DRIVER_PATH"] = "disable"
os.environ["OTB_MAX_RAM_HINT"] = "1000"

In [None]:
# Installation of third-parties libraries
!pip install rasterio

## Display the Sentinel 2 Dataset

### Configure Dataset location and Output directory

* fix links to the dataset
* different dates avalaible
* display images : ipyleaflets doen't work on google colab -> use rasterio

In [None]:
# Execution from COLAB -> use a folder on google drive
# Data directory
DATA_DIR = "/gdrive/MyDrive/OTB_training/data"

# Output directory
OUTPUT_DIR = "/gdrive/MyDrive/OTB_training/output"


In [1]:
# Local execution
DATA_DIR = "/home/yannick/Dev/dataset_OTB_virtual_workshop/"
# Local execution
OUTPUT_DIR = "/home/yannick/tmp"


In [2]:
import os
from glob import glob

In [3]:
image = DATA_DIR+"/xt_20180701_RVB_NIR.tif"

In [4]:
import display_api
import rasterio
import folium

In [6]:
im = rasterio.open(DATA_DIR+"/xt_SENTINEL2B_20180621_RGB_NIR.tif")

In [7]:
display_api.rasters_on_map_with_folium([im],OUTPUT_DIR,["label"])

## 1) How to compute vegetation index with OTB

Here we create an application with otbApplication.Registry.CreateApplication("BandMath")

BandMath takes a list of images as input, so we have to give a Python list with "il" parameter : [image], or [image1, image2, .., imageN] and the main parameter is the mathematical expression "exp".

Here, we compute NDVI :
NDVI=(𝑋𝑛𝑖𝑟−𝑋𝑟𝑒𝑑)(𝑋𝑛𝑖𝑟+𝑋𝑟𝑒𝑑)

The corresponding bands for NIR and Red are respectively the 4th and the 1st bands (b4, b1) of the first image (im1)

In [8]:
import otbApplication as otb

In [9]:
def compute_ndvi(im, ndvi):
    app = otb.Registry.CreateApplication("BandMath")
    app.SetParameterStringList("il",[im])
    app.SetParameterString("out", ndvi)
    app.SetParameterString("exp", "(im1b4-im1b1)/(im1b4+im1b1)")
    exit_code = app.ExecuteAndWriteOutput()

In [10]:
image1 = DATA_DIR+"/xt_SENTINEL2B_20180621_RGB_NIR.tif"
image2 = DATA_DIR+"/xt_SENTINEL2B_20180701_RGB_NIR.tif"
image3 = DATA_DIR+"/xt_SENTINEL2B_20180711_RGB_NIR.tif"

In [11]:
ndvi1 = OUTPUT_DIR+"/ndvi1.tif"
ndvi2 = OUTPUT_DIR+"/ndvi2.tif"
ndvi3 = OUTPUT_DIR+"/ndvi3.tif"

In [15]:
# Choose an image an compute the NDVI ()
# compute_ndvi(image1, ndvi1)
compute_ndvi(...., ....)

2021-02-09 17:49:44 (INFO) BandMath: Default RAM limit for OTB is 256 MB
2021-02-09 17:49:44 (INFO) BandMath: GDAL maximum cache size is 391 MB
2021-02-09 17:49:44 (INFO) BandMath: OTB will use at most 8 threads
2021-02-09 17:49:44 (INFO) BandMath: Image #1 has 4 components

2021-02-09 17:49:44 (INFO): Estimated memory for full processing: 248.552MB (avail.: 256 MB), optimal image partitioning: 1 blocks
2021-02-09 17:49:44 (INFO): File /home/yannick/tmp/ndvi1.tif will be written in 1 blocks of 2410x2080 pixels
Writing /home/yannick/tmp/ndvi1.tif...: 100% [**************************************************] (0s)
2021-02-09 17:49:45 (INFO) BandMath: Default RAM limit for OTB is 256 MB
2021-02-09 17:49:45 (INFO) BandMath: GDAL maximum cache size is 391 MB
2021-02-09 17:49:45 (INFO) BandMath: OTB will use at most 8 threads
2021-02-09 17:49:45 (INFO) BandMath: Image #1 has 4 components

2021-02-09 17:49:45 (INFO): Estimated memory for full processing: 248.552MB (avail.: 256 MB), optimal ima

In [16]:
# display the result
display_api.rasters_on_map_with_folium([rasterio.open(ndvi2)],OUTPUT_DIR,["image", "ndvi"])

## 2) Compute Normalized Difference Water Index from Green and Near InfraRed bands

NDWI2 is computed from green and nir bands (defined by McFeeters, 1996):

NDWI2=(𝑋𝑔𝑟𝑒𝑒𝑛−𝑋𝑛𝑖𝑟)(𝑋𝑔𝑟𝑒𝑒𝑛+𝑋𝑛𝑖𝑟)

For this second variant of the NDWI, a threshold can also be found in https://www.mdpi.com/2072-4292/5/7/3544/htm (McFeeters, 2013):

* < 0.3 - Non-water
* >= 0.3 - Water


In [None]:
def compute_ndwi(im, ndwi):
    app = otb.Registry.CreateApplication("BandMath")
    app.SetParameterStringList("il",[im])
    app.SetParameterString("out", ndwi)
    app.SetParameterString("exp", "your expression")
    exit_code = app.ExecuteAndWriteOutput()

In [17]:
def compute_ndwi(im, ndwi):
    app = otb.Registry.CreateApplication("BandMath")
    app.SetParameterStringList("il",[im])
    app.SetParameterString("out", ndwi)
    app.SetParameterString("exp", "(im1b2-im1b4)/(im1b2+im1b4)")
    exit_code = app.ExecuteAndWriteOutput()

In [24]:
# Compute and display NDWI on your images : 
ndwi1 = OUTPUT_DIR+"/ndwi1.tif"
ndwi2 = OUTPUT_DIR+"/ndwi2.tif"
ndwi3 = OUTPUT_DIR+"/ndwi3.tif"

compute_ndwi(image1,ndwi1)
compute_ndwi(image2,ndwi2)
compute_ndwi(image3,ndwi3)

display_api.rasters_on_map_with_folium([rasterio.open(ndwi1)],OUTPUT_DIR,["ndwi"])

2021-02-09 17:57:51 (INFO) BandMath: Default RAM limit for OTB is 256 MB
2021-02-09 17:57:51 (INFO) BandMath: GDAL maximum cache size is 391 MB
2021-02-09 17:57:51 (INFO) BandMath: OTB will use at most 8 threads
2021-02-09 17:57:51 (INFO) BandMath: Image #1 has 4 components

2021-02-09 17:57:51 (INFO): Estimated memory for full processing: 248.552MB (avail.: 256 MB), optimal image partitioning: 1 blocks
2021-02-09 17:57:51 (INFO): File /home/yannick/tmp/ndwi1.tif will be written in 1 blocks of 2410x2080 pixels
Writing /home/yannick/tmp/ndwi1.tif...: 100% [**************************************************] (0s)
2021-02-09 17:57:51 (INFO) BandMath: Default RAM limit for OTB is 256 MB
2021-02-09 17:57:51 (INFO) BandMath: GDAL maximum cache size is 391 MB
2021-02-09 17:57:51 (INFO) BandMath: OTB will use at most 8 threads
2021-02-09 17:57:51 (INFO) BandMath: Image #1 has 4 components

2021-02-09 17:57:51 (INFO): Estimated memory for full processing: 248.552MB (avail.: 256 MB), optimal ima

## 3) Compute Watermask

The aim of this second exercise is to combine NDWI2 values to create a water mask.

As we have seen, the NDWI2 images are different depending on the dates, mainly because tides level is different and there maybe some clouds that hide some regions of the image.

We have to find a function that can combine the information from the different NDWI images to create a watermask

### Create a simple watermask (NDWI threshold)

OTB BandMath can use these formula :

    binary operators:
        ‘+’ addition, ‘-‘ subtraction, ‘*’ multiplication, ‘/’ division
        ‘^’ raise x to the power of y
        ‘<’ less than, ‘>’ greater than, ‘<=’ less or equal, ‘>=’ greater or equal
        ‘==’ equal, ‘!=’ not equal
        ‘||’ logical or, ‘&&’ logical and
    functions: exp(), log(), sin(), cos(), min(), max(), ...
    if-then-else : "(<expression> ? <value if true> : <value if false>)"

https://www.orfeo-toolbox.org/CookBook/Applications/app_BandMath.html

Try to write an expression to create a basic watermask :

*if ndwi < 0.3 then return 0 else return 1*

In [None]:
def threshold_ndwi(ndwi, mask):
    # Fill the threshold_ndwi function
    pass

mask1 = OUTPUT_DIR+"/mask1.tif"
threshold_ndwi(ndwi1, mask1)

In [19]:
def threshold_ndwi(ndwi, mask):
    app = otb.Registry.CreateApplication("BandMath")
    app.SetParameterStringList("il",[ndwi])
    app.SetParameterString("out", mask)
    app.SetParameterString("exp", "(im1b1 < 0.3 ? 0 : 1) ")
    exit_code = app.ExecuteAndWriteOutput()

In [20]:
mask1 = OUTPUT_DIR+"/mask1.tif"
threshold_ndwi(ndwi1, mask1)

2021-02-09 17:55:45 (INFO) BandMath: Default RAM limit for OTB is 256 MB
2021-02-09 17:55:45 (INFO) BandMath: GDAL maximum cache size is 391 MB
2021-02-09 17:55:45 (INFO) BandMath: OTB will use at most 8 threads
2021-02-09 17:55:45 (INFO) BandMath: Image #1 has 1 components

2021-02-09 17:55:45 (INFO): Estimated memory for full processing: 133.818MB (avail.: 256 MB), optimal image partitioning: 1 blocks
2021-02-09 17:55:45 (INFO): File /home/yannick/tmp/mask1.tif will be written in 1 blocks of 2410x2080 pixels
Writing /home/yannick/tmp/mask1.tif...: 100% [**************************************************] (0s)


In [21]:
display_api.rasters_on_map_with_folium([rasterio.open(mask1)],OUTPUT_DIR,["mask1"])

### Create a watermask with the different NDWI images

We now want to use the three dates to obtain the better mask, that will better identify the water presence. To do so, we shall identify the largest areas (high tides) in the different watermasks.

Tips : OTB BandMath can take as input a list of images (im1, im2, ...) and produce a single result

In [None]:
def create_water_mask(ndwi1, ndwi2, ndwi3, mask):
    pass


In [26]:
def create_water_mask(ndwi1, ndwi2, ndwi3, mask):
    app = otb.Registry.CreateApplication("BandMath")
    app.SetParameterStringList("il",[ndwi1, ndwi2, ndwi3])
    app.SetParameterString("out", mask)
    app.SetParameterString("exp", "(max(im1b1, im2b1, im2b1) < 0.3 ? 0 : 1) ")
    exit_code = app.ExecuteAndWriteOutput()

In [27]:
final_mask = OUTPUT_DIR+"/mask.tif"
create_water_mask(ndwi1, ndwi2, ndwi3, final_mask)

2021-02-09 17:58:23 (INFO) BandMath: Default RAM limit for OTB is 256 MB
2021-02-09 17:58:23 (INFO) BandMath: GDAL maximum cache size is 391 MB
2021-02-09 17:58:23 (INFO) BandMath: OTB will use at most 8 threads
2021-02-09 17:58:23 (INFO) BandMath: Image #1 has 1 components

2021-02-09 17:58:23 (INFO) BandMath: Image #2 has 1 components

2021-02-09 17:58:23 (INFO) BandMath: Image #3 has 1 components

2021-02-09 17:58:23 (INFO): Estimated memory for full processing: 210.307MB (avail.: 256 MB), optimal image partitioning: 1 blocks
2021-02-09 17:58:23 (INFO): File /home/yannick/tmp/mask.tif will be written in 1 blocks of 2410x2080 pixels
Writing /home/yannick/tmp/mask.tif...: 100% [**************************************************] (0s)


In [28]:
display_api.rasters_on_map_with_folium([rasterio.open(final_mask)],OUTPUT_DIR,["mask"])

## 4) Polygonize watermask and filter features to count islands

In this step, we are going to polygonize our binary masks : we will obtain a lot of polygons ! Some of these features have to be filtered (main land, ocean) in order to count the islands in Morbihan gulf.

In [37]:
mask_final = rasterio.open(final_mask)

                           

In [39]:
mask_final.

CRS.from_epsg(32630)

In [57]:
import numpy as np

import rasterio
from rasterio import features
from rasterio import warp

from shapely.geometry import Polygon

# Output directory
OUTPUT_DIR = "output"

# Morbihan gulf
morbihan = {'type': 'FeatureCollection', 'features': [{'type': 'Feature', 'properties': {}, 'geometry': {'type': 'Polygon', 'coordinates': [[[-2.953968, 47.603544], [-2.958085, 47.589653], [-2.929956, 47.563713], [-2.883303, 47.561397], [-2.86478, 47.556763], [-2.840081, 47.547958], [-2.826638, 47.552744], [-2.808114, 47.55761], [-2.774497, 47.5495], [-2.749113, 47.546951], [-2.730932, 47.564329], [-2.728188, 47.5861], [-2.740537, 47.595825], [-2.747055, 47.605317], [-2.780672, 47.609252], [-2.786846, 47.613649], [-2.802348, 47.61882], [-2.831848, 47.616969], [-2.857233, 47.620209], [-2.862721, 47.609563], [-2.865466, 47.600303], [-2.880559, 47.59984], [-2.891536, 47.597988], [-2.896339, 47.582243], [-2.938875, 47.589653], [-2.953968, 47.603544]]]}}]}
morbihan_as_polygon = Polygon(morbihan['features'][0]['geometry']['coordinates'][0])

# Convert Watermask to geojson collection
with rasterio.open(final_mask) as src:
    image = src.read(1).astype(np.uint8)
try:
    transform = src.affine
# depend on rasterio version
except AttributeError:
    transform = src.transform

results = ({'type':'Feature', 'properties': {}, 'geometry': s} for i, (s, __) in enumerate(features.shapes(image, mask=image, transform=transform)))      

# Filter geojson
collection = {'type': 'FeatureCollection', 'features': list()}
for res in results:
    # area in m^2
    item_area = Polygon(res['geometry']['coordinates'][0]).area
    
    print("> "+str(item_area))
    
    # convert geom to EPSG:4326 (WGS84)
    geom_for_geojson = warp.transform_geom(src.crs, 'EPSG:4326', res['geometry'])   
    #print(geom_for_geojson)
    #print("*****\n", morbihan_as_polygon)
    #break
    island_as_polygon = Polygon(geom_for_geojson['coordinates'][0])
    
    # Filter the smallest areas and the biggest (main land) and
    # Crop the "watermask" with envelope shape morbihan (~ Morbihan gulf)
    #
     
    # island_as_polygon.intersects(morbihan_as_polygon):
    if item_area < 5000000.0 and item_area > 10000.0: 
        feature = dict(res)
        feature['geometry'] = geom_for_geojson
        collection['features'].append(feature)

> 500.0
> 100.0
> 100.0
> 200.0
> 200.0
> 100.0
> 100.0
> 100.0
> 200.0
> 100.0
> 500.0
> 22200.0
> 800.0
> 500.0
> 100.0
> 100.0
> 100.0
> 200.0
> 100.0
> 100.0
> 66200.0
> 100.0
> 200.0
> 100.0
> 100.0
> 100.0
> 300.0
> 100.0
> 100.0
> 400.0
> 100.0
> 400.0
> 700.0
> 100.0
> 200.0
> 200.0
> 1000.0
> 1100.0
> 100.0
> 1800.0
> 100.0
> 100.0
> 200.0
> 100.0
> 1400.0
> 100.0
> 200.0
> 1400.0
> 1700.0
> 2400.0
> 100.0
> 300.0
> 100.0
> 300.0
> 800.0
> 200.0
> 100.0
> 200.0
> 100.0
> 200.0
> 700.0
> 800.0
> 100.0
> 200.0
> 100.0
> 300.0
> 300.0
> 100.0
> 100.0
> 300.0
> 100.0
> 100.0
> 200.0
> 500.0
> 200.0
> 6000.0
> 100.0
> 400.0
> 300.0
> 100.0
> 200.0
> 200.0
> 100.0
> 100.0
> 400.0
> 800.0
> 300.0
> 100.0
> 1000.0
> 100.0
> 100.0
> 10900.0
> 1300.0
> 300.0
> 100.0
> 800.0
> 500.0
> 1100.0
> 300.0
> 200.0
> 100.0
> 100.0
> 100.0
> 200.0
> 10400.0
> 300.0
> 100.0
> 12900.0
> 700.0
> 200.0
> 100.0
> 400.0
> 100.0
> 100.0
> 100.0
> 300.0
> 100.0
> 300.0
> 200.0
> 100.0
> 200.0
> 200.0
> 1

> 100.0
> 100.0
> 200.0
> 1000.0
> 100.0
> 100.0
> 500.0
> 100.0
> 100.0
> 100.0
> 100.0
> 100.0
> 200.0
> 200.0
> 100.0
> 100.0
> 700.0
> 100.0
> 30400.0
> 100.0
> 100.0
> 200.0
> 300.0
> 100.0
> 600.0
> 300.0
> 100.0
> 200.0
> 200.0
> 200.0
> 400.0
> 200.0
> 200.0
> 1500.0
> 100.0
> 100.0
> 100.0
> 100.0
> 200.0
> 100.0
> 100.0
> 100.0
> 200.0
> 100.0
> 100.0
> 100.0
> 100.0
> 100.0
> 100.0
> 200.0
> 100.0
> 2300.0
> 100.0
> 200.0
> 100.0
> 100.0
> 100.0
> 200.0
> 4500.0
> 400.0
> 100.0
> 100.0
> 100.0
> 2900.0
> 200.0
> 100.0
> 100.0
> 100.0
> 700.0
> 200.0
> 1500.0
> 800.0
> 200.0
> 800.0
> 100.0
> 300.0
> 300.0
> 200.0
> 200.0
> 300.0
> 100.0
> 100.0
> 100.0
> 1800.0
> 100.0
> 100.0
> 300.0
> 600.0
> 100.0
> 200.0
> 200.0
> 700.0
> 1900.0
> 100.0
> 100.0
> 100.0
> 100.0
> 200.0
> 100.0
> 300.0
> 100.0
> 100.0
> 100.0
> 100.0
> 200.0
> 400.0
> 100.0
> 100.0
> 100.0
> 100.0
> 46100.0
> 100.0
> 100.0
> 100.0
> 200.0
> 100.0
> 100.0
> 100.0
> 100.0
> 500.0
> 100.0
> 100.0
> 100.0
> 10

In [58]:
results

<generator object <genexpr> at 0x7f3b7889c360>

### Visualize the islands (and count them :-)

In [59]:
import rasterio
from glob import glob
import display_api
import json

# Data directory
DATA_DIR = "data"

DATE = "20180711"

print ("Nb islands: {}".format(len(collection['features'])))
raster = rasterio.open(glob(os.path.join(DATA_DIR, "*{}*.tif".format(DATE)))[0])
m, dc = display_api.rasters_on_map([raster], OUTPUT_DIR, [DATE], geojson_data=collection)
m

Nb islands: 27


IndexError: list index out of range

## Extra steps

We could optimize the previous code by using OTB pipeline : instead of computing 3 watermaks, and then combining them, we could simplify the processing chain and compute directly the final watermask. This will save I/O and thus save computation time (especially if the chain is complex or use a lot of images).

Since OTB 5.8, it is possible to connect an output image parameter from one application to the input image parameter of the next parameter. This results in the wiring of the internal ITK/OTB pipelines together, permitting image streaming between the applications. Consequently, this removes the need of writing temporary images and improves performance. Only the last application of the processing chain is responsible for writing the final result images.

<b> Please rewrite the  <span style="color:black;background:yellow"> code bellow </span> in order to only write the watermask file </b>

**Tips:** Only call Execute() to setup the pipeline, not ExecuteAndWriteOutput() which would run it and write the output image and also use these functions to connect OTB applications :
- ```GetParameterOutputImage``` : get a pointer to an image object [instead of reading from file]
- ```AddImageToParameterInputImageList``` : add an image to an InputImageList parameter as an pointer to an image object pointer [instead of reading from file] (```SetParameterInputImage``` for an InputImageList)