<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#1-Introduction" data-toc-modified-id="1-Introduction-1">1 Introduction</a></span></li><li><span><a href="#2-Data-access" data-toc-modified-id="2-Data-access-2">2 Data access</a></span></li><li><span><a href="#3-Train-the-classification-algorithms" data-toc-modified-id="3-Train-the-classification-algorithms-3">3 Train the classification algorithms</a></span></li><li><span><a href="#4-Selects-the-algorithm-for-Sentinel-2-MSI" data-toc-modified-id="4-Selects-the-algorithm-for-Sentinel-2-MSI-4">4 Selects the algorithm for Sentinel-2 MSI</a></span></li><li><span><a href="#5-Exhibits-the-statistical-results-of-the-selected-algorithm" data-toc-modified-id="5-Exhibits-the-statistical-results-of-the-selected-algorithm-5">5 Exhibits the statistical results of the selected algorithm</a></span></li><li><span><a href="#6-Saves-the-algorithm" data-toc-modified-id="6-Saves-the-algorithm-6">6 Saves the algorithm</a></span></li></ul></div>

# 1 Introduction
<hr>

A classification algorithm is trained for Sentinel-2 MSI and is composed of two steps; first, the Optical Water Types (OWT) are classified by its shape using the normalized remote sensing reflectance (Rrs) bands (e.g., normB2); second, the OWTs 6, 7, and 8 are classified by its Rrs intensity using band B3. In both steps, the Support Vector Machine Classifier (SVMC) is used for training the classification algorithm. The novelty detection is not used in this notebook because all dataset is composed of known OWTs. Novelty detection is only useful when applying to new datasets, such as satellite images.

<img src="00_Database/00_Figures/01_Algorithm_flowchart.jpg" style="width:70%">

# 2 Data access

In [1]:
# library used
import pandas as pd

# msia Input
insitu_db = pd.read_csv('00_Database/01_Tables/insitu_db.csv', index_col=0)

# define the input data
msi_norm = insitu_db[['normB2', 'normB3', 'normB4', 'normB5','normB6']]
msi_b3 = insitu_db[['B3']]
owts = insitu_db[['OWTs']]
owts_shape = owts.applymap(lambda x: x.replace('OWT 6', 'change').
              replace('OWT 7', 'change').
              replace('OWT 8', 'change').
              replace('change', 'OWT 678'))

# 3 Train the classification algorithms

In [None]:
from sklearn.model_selection import GridSearchCV
# parameters of random forest
x = ['Zsd','N','P', 'P/N','Depth','oxbow','ria','protected','open']
y = 'Chl-a'
parameters = {'C':[1, 2], 'kernel': 'rbf'}
RF = svm.SVC()
clf = GridSearchCV(RF, parameters, n_jobs=-1, scoring='explained_variance', cv=2)
clf.fit(subset.loc[:, x].values, subset.loc[:, y].values)

In [2]:
owts

Unnamed: 0,OWTs
0,OWT 7
1,OWT 3
2,OWT 3
3,OWT 3
4,OWT 2
...,...
387,OWT 5
388,OWT 5
389,OWT 5
390,OWT 5


In [2]:
# library used
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import balanced_accuracy_score
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
import copy as cp

# dictionaries for saving the accuracy results
msi_ba_dic = {}
msi_precision_dic = {}
msi_recall_dic = {}
msi_cm = {}

# dictionary for saving the algorithms
msi_svmc_alg = {}

# dictionary for saving the test labels
test_dic = {}
# interaction for randomly splitting train and test datasets
for x in range(1, 10000):
    
    # splits the ids into a train (70%) and a test (30%) datasets, weighting the relative proportion of OWTs
    train, test = train_test_split(list(owts.index),
                                   test_size=0.3,
                                   stratify=owts['OWTs'])

    # separates the specific ids for further classifying OWTs into 7, 8, and 9
    train_678 = owts_shape[owts_shape['OWTs'] == 'OWT 678'].index.intersection(train)

    # defines the true values of the test dataset
    true = owts.loc[test,:]

    # creates the arrays for the SVMC algorithm (shape classification)
    msi_shape_x = msi_norm.loc[train,:].values
    msi_shape_y = list(owts_shape.loc[train,'OWTs'])

    # create the arrays for the SVMC algorithm (intensity classifications)
    msi_678_x = msi_b3.loc[train_678,:].values
    msi_678_y = list(owts.loc[train_678,'OWTs'])

    # train the shape classification algorithm
    msi_svmc_shape = svm.SVC(C=1,
                             kernel='rbf',
                             decision_function_shape='ovo',
                             class_weight='balanced',
                             probability=True).fit(msi_shape_x,
                                                   msi_shape_y)
    


    # train the intensity classification algorithm
    msi_svmc_678 = svm.SVC(C=1,
                           kernel='rbf',
                           decision_function_shape='ovo',
                           class_weight='balanced',
                           probability=True).fit(msi_678_x,
                                                 msi_678_y)

    # predict the OWTs in the test dataset (shape)
    msi_svmc_predicted = msi_svmc_shape.predict(msi_norm.loc[test,:].values)
    msi_svmc_predicted = pd.DataFrame(msi_svmc_predicted,
                                      index=test,
                                      columns=['msi_predicted'])

    # predict the OWTs in the test dataset (OWTs 7, 8, and 9)
    msi_owt678_predicted_id = msi_svmc_predicted[msi_svmc_predicted['msi_predicted'] == 'OWT 678'].index
    msi_svmc_predicted_678 = msi_svmc_678.predict(msi_b3.loc[msi_owt678_predicted_id,:].values)
    msi_svmc_predicted_678 = pd.DataFrame(msi_svmc_predicted_678,
                                          index=msi_owt678_predicted_id,
                                          columns=['msi_predicted'])

    # concatenate all predicted OWTs
    msi_svmc_predicted.update(msi_svmc_predicted_678)
    
    # creates an evaluation dataset, concatenate predicted OWTs and the true OWTs
    msi_evaluation = msi_svmc_predicted.join(true)
    
    # computes balanced accuracy, precision, and recall
    msi_ba = balanced_accuracy_score(msi_evaluation['OWTs'],
                                     msi_evaluation['msi_predicted'])

    msi_report = pd.DataFrame(classification_report(msi_evaluation['OWTs'],
                                                    msi_evaluation['msi_predicted'],
                                                    output_dict=True))
    
    msi_confusion_matrix = confusion_matrix(msi_evaluation['OWTs'],
                                            msi_evaluation['msi_predicted'])
    
    
    # saves the statistical results
    msi_ba_dic[x] = msi_ba
    msi_precision_dic[x] = msi_report.loc['precision',:]
    msi_recall_dic[x] = msi_report.loc['recall',:]
    msi_cm[x] = msi_confusion_matrix    
    
    # saves the algorithms
    msi_svmc_alg[x] = cp.copy([msi_svmc_shape, msi_svmc_678])
    
    # saves the test label
    test_dic[x] = test

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


# 4 Selects the algorithm for Sentinel-2 MSI
<hr>
Ten thousand algorithms were calibrated for Sentinel-2 MSI sensor, where each interaction corresponds to different training and test datasets. The algorithm chosen will be that with the median accuracy.

In [3]:
# creates a table with the accuracies of all algorithms
msi_ba_table = pd.DataFrame(msi_ba_dic.values(), index=msi_ba_dic.keys(), columns=['Accuracy'])

# sort accuracy values
msi_ba_table = msi_ba_table.sort_values(by='Accuracy')

# retrieves the algorithm id with the median accuracy
middle_row = round(len(msi_ba_table)/2)
msi_alg_number = msi_ba_table.iloc[middle_row,:].name

# 5 Exhibits the statistical results of the selected algorithm

In [4]:
# library used
from IPython.core import display as ICD

# print the accuracy values
print('Sentinel-2 MSI: SVMC balanced accuracy')
ICD.display(msi_ba_table.loc[msi_alg_number,:].round(2)[0])

Sentinel-2 MSI: SVMC balanced accuracy


0.94

In [5]:
# select the precision of all OWTs
precision = msi_precision_dic[msi_alg_number].iloc[0:8]

# select the recall of all OWTs
recall = msi_recall_dic[msi_alg_number].iloc[0:8]


In [6]:
print('Precision')
ICD.display(precision.round(2))
print('\n')
print('Recall')
ICD.display(recall.round(2))


Precision


OWT 1    0.92
OWT 2    1.00
OWT 3    0.82
OWT 4    1.00
OWT 5    1.00
OWT 6    1.00
OWT 7    1.00
OWT 8    1.00
Name: precision, dtype: float64



Recall


OWT 1    1.00
OWT 2    0.96
OWT 3    1.00
OWT 4    0.82
OWT 5    1.00
OWT 6    0.75
OWT 7    1.00
OWT 8    1.00
Name: recall, dtype: float64

In [7]:
print('Sentinel-2 MSI confusion matrix')
ICD.display(pd.DataFrame(msi_cm[msi_alg_number],
           index=['OWT 1', 'OWT 2',  'OWT 3','OWT 4', 'OWT 5', 'OWT 6', 'OWT 7', 'OWT 8'],
           columns=['OWT 1', 'OWT 2',  'OWT 3','OWT 4', 'OWT 5', 'OWT 6', 'OWT 7', 'OWT 8']))


Sentinel-2 MSI confusion matrix


Unnamed: 0,OWT 1,OWT 2,OWT 3,OWT 4,OWT 5,OWT 6,OWT 7,OWT 8
OWT 1,24,0,0,0,0,0,0,0
OWT 2,0,22,1,0,0,0,0,0
OWT 3,0,0,9,0,0,0,0,0
OWT 4,2,0,0,9,0,0,0,0
OWT 5,0,0,0,0,11,0,0,0
OWT 6,0,0,1,0,0,3,0,0
OWT 7,0,0,0,0,0,0,30,0
OWT 8,0,0,0,0,0,0,0,4


# 6 Saves the algorithm

In [8]:
# library used
import pickle

# saves the algorithms in files, where each file is a list of two algorithms (shape classification and intensity classification)
file_pi = open('00_Database/02_Algorithms/msi_svmc_owts.obj', 'wb') 
pickle.dump(msi_svmc_alg[msi_alg_number], file_pi)
file_pi.close()

In [9]:
# saves the test ids
pd.DataFrame(test_dic[msi_alg_number], 
             columns = ['id']).to_csv('00_Database/02_Algorithms/test_ids.csv')