<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#1-Introduction" data-toc-modified-id="1-Introduction-1">1 Introduction</a></span></li><li><span><a href="#2-Data-access" data-toc-modified-id="2-Data-access-2">2 Data access</a></span></li><li><span><a href="#3-Train-the-classification-algorithms" data-toc-modified-id="3-Train-the-classification-algorithms-3">3 Train the classification algorithms</a></span></li><li><span><a href="#3-Saves-the-algorithm" data-toc-modified-id="3-Saves-the-algorithm-4">3 Saves the algorithm</a></span></li></ul></div>

# 1 Introduction
<hr>

A classification algorithm is trained for Sentinel-2 MSI and is composed of two steps; first, the Optical Water Types (OWT) are classified by its shape using the normalized remote sensing reflectance (Rrs) bands (e.g., normB2); second, the OWTs 6, 7, and 8 are classified by its Rrs intensity using band B3. In both steps, the Support Vector Machine Classifier (SVMC) is used for training the classification algorithm. The novelty detection is not used in this notebook because all dataset is composed of known OWTs. Novelty detection is only useful when applying to new datasets, such as satellite images.
<hr>
This notebook calibrate the SVMs algorithm using the entire dataset, which is then applied to all satellite images.

<img src="00_Database/00_Figures/01_Algorithm_flowchart.jpg" style="width:70%">

# 2 Data access

In [1]:
# library used
import pandas as pd

# msia Input
insitu_db = pd.read_csv('00_Database/01_Tables/insitu_db.csv', index_col=0)

# define the input data
msi_norm = insitu_db[['normB2', 'normB3', 'normB4', 'normB5','normB6']]
msi_b3 = insitu_db[['B3']]
owts = insitu_db[['OWTs']]
owts_shape = owts.applymap(lambda x: x.replace('OWT 6', 'change').
              replace('OWT 7', 'change').
              replace('OWT 8', 'change').
              replace('change', 'OWT 678'))

# 3 Train the classification algorithms

In [2]:
# library used
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import balanced_accuracy_score
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
import copy as cp

# creates the arrays for the SVMC algorithm (shape classification)
msi_shape_x = msi_norm.values
msi_shape_y = list(owts_shape.loc[:,'OWTs'])


# create the arrays for the SVMC algorithm (intensity classifications)
# separates the specific ids for further classifying OWTs into 7, 8, and 9
train_678 = owts_shape[owts_shape['OWTs'] == 'OWT 678'].index
msi_678_x = msi_b3.loc[train_678,:].values
msi_678_y = list(owts.loc[train_678,'OWTs'])

# train the shape classification algorithm
msi_svmc_shape = svm.SVC(C=1,
                         kernel='rbf',
                         decision_function_shape='ovo',
                         class_weight='balanced',
                         probability=True).fit(msi_shape_x,
                                               msi_shape_y)

# train the intensity classification algorithm
msi_svmc_678 = svm.SVC(C=1,
                       kernel='rbf',
                       decision_function_shape='ovo',
                       class_weight='balanced',
                       probability=True).fit(msi_678_x,
                                             msi_678_y)

# saves the algorithms
msi_svmc_alg = cp.copy([msi_svmc_shape, msi_svmc_678])


# 3 Saves the algorithm

In [3]:
# library used
import pickle

# saves the algorithms in files, where each file is a list of two algorithms (shape classification and intensity classification)
file_pi = open('00_Database/02_Algorithms/msi_svmc_owts_alldataset.obj', 'wb') 
pickle.dump(msi_svmc_alg, file_pi)
file_pi.close()