# Introduction:
One of the common classifier used for image-processing is Support Vector Machine (SVM). The goal of this kernel is to provide a basic guideline for beginners applying this strong tool on C-CORE Iceberg Classifier Challenge. We used [**sklearn SVM class**](http://scikit-learn.org/stable/modules/svm.html) for Python. This is a collaborative work with [Mehdi](https://www.kaggle.com/mnoori).

The codes are organized as follows:
* Data Preparation
* Fitting and Tuning the parameters
* Conclusion and Visualizations

# 1- Data Preparation:

In [None]:
# Importing packages:
import numpy as np
import pandas as pd

from sklearn.preprocessing import MaxAbsScaler
from sklearn.metrics import accuracy_score,f1_score,log_loss,roc_auc_score

from sklearn.model_selection import train_test_split,ShuffleSplit,cross_val_score,GridSearchCV

from sklearn import svm
from sklearn.linear_model import LogisticRegression,SGDClassifier

from os.path import join as opj
from matplotlib import pyplot as plt

from matplotlib.colors import Normalize

# Reading the traning data set json file to a pandas dataframe
train=pd.read_json('../input/train.json')

# Lets take a look at the first 5 rows of the dataset
train.head(5)

The data frame has 3 main features: 
* band_1: flatten 75*75 horizontal radar frequency information. Please refer to [Problem Background](https://www.kaggle.com/c/statoil-iceberg-classifier-challenge#Background).
* band_1: flatten 75*75 Vertical radar frequency information. Please refer to [Problem Background](https://www.kaggle.com/c/statoil-iceberg-classifier-challenge#Background).
* inc_angle: incidence angle.
and there is a respond binary vector called 'is_iceberg'.

First of all for this kernel we are going to get rid of 'na's in inc_angle column. Please note that 'na' in this column is in string format which cannot be detected using [numpy.isnan](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.isnan.html). So we gonna replace the 'na's with numpy.nan first, then we gonna drop those entries. We also can impute the the angle values by fiding similar enteries using other available features of the data (e.g. band_1 and band_2 data). You can refer to our work which provides a beginner guide to use [Fancy Impute Package](https://www.kaggle.com/mnoori/fancy-imputing-the-missing-inc-angles-beginner) for this feature.

In [None]:
# Replace the 'na's with numpy.nan
train.inc_angle.replace({'na':np.nan}, inplace=True)

# Drop the rows that has NaN value for inc_angle
train.drop(train[train['inc_angle'].isnull()].index,inplace=True)

SVM only gets the numerical value as a numpy matrix, so we create a numpy matrix which includes all features. However, as shown in the code, we created different matrices for each feature which enables us to build different combinations of input variables for the model . For a large dataset we can also pass the sparse numpy matrix to the model [Compressed Sparse Row matrix](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html).

In [None]:
X_HH_train=np.array([np.array(band).astype(np.float32) for band in train.band_1])
X_HV_train=np.array([np.array(band).astype(np.float32) for band in train.band_2])
X_angle_train=np.array([[np.array(angle).astype(np.float32) for angle in train.inc_angle]]).T
y_train=train.is_iceberg.values.astype(np.float32)
X_train=np.concatenate((X_HH_train,X_HV_train,X_angle_train), axis=1)
# Now, we have 75*75 numerical features for band_1, 75*75 numerical features for band_2, and 1  feature for angle 
X_train.shape

SVM algorithms are not scale invariant, so it is highly recommended to scale your data. For example, scale each attribute on the input vector X to [0,1] or [-1,+1], or standardize it to have mean 0 and variance 1.

For this problem, we will face some difficulties to scale the variance of each feature as the variance of the features are very small and close to zero. So we use [MaxAbsScaler](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html#sklearn.preprocessing.MaxAbsScaler) to scale each variable in the range of [-1,+1] which is centered by 0. This scaler is also able to keep the scaling scheme on the train set which later can be applied on test set. 

In [None]:
scaler = MaxAbsScaler()
X_train_maxabs = scaler.fit_transform(X_train)

# 2- Fitting and Tuning the parameters
We used Radial Basis Function (rbf) as the SVM kernel. 

We also set the **probability** method to **False** in order to get the faster result. Please note that SVM cannot directly calculate the probability of each class and it needs to fit an additional cross-validation on the training data to get the probabilities calibrated using Platt scaling: logistic regression on the SVM’s scores. For the final submission which you need probability of each class, the **probability** method should be set to **True**.

The gamma parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’. The gamma parameters can be seen as the inverse of the radius of influence of samples selected by the model as support vectors. The C parameter trades off misclassification of training examples against simplicity of the decision surface. A low C makes the decision surface smooth, while a high C aims at classifying all training examples correctly by giving the model freedom to select more samples as support vectors.
 

In [None]:
# Create the SVM instance using Radial Basis Function (rbf) kernel
clf = svm.SVC(kernel='rbf',probability=False)
# Set the range of hyper-parameter we wanna use to tune our SVM classifier
C_range = [0.1,1,10,50,100]
gamma_range = [0.00001,0.0001,0.001,0.01,0.1]
param_grid_SVM = dict(gamma=gamma_range, C=C_range)
# set the gridsearch using 3-fold cross validation and 'ROC Area Under the Curve' as the cross validation score. 
grid = GridSearchCV(clf, param_grid=param_grid_SVM, cv=3,scoring='roc_auc')
grid.fit(X_train_maxabs, y_train)
print("The best parameters are %s with a score of %0.2f" % (grid.best_params_, grid.best_score_))

# 3- Conclusion and Visualizations
You can see the tradeoffs among different value of C and gamma, and how it can effect the validation score of the model.
[visualization credit](http://scikit-learn.org/stable/auto_examples/svm/plot_rbf_parameters.html#sphx-glr-auto-examples-svm-plot-rbf-parameters-py)

In [None]:
class MidpointNormalize(Normalize):

    def __init__(self, vmin=None, vmax=None, midpoint=None, clip=False):
        self.midpoint = midpoint
        Normalize.__init__(self, vmin, vmax, clip)

    def __call__(self, value, clip=None):
        x, y = [self.vmin, self.midpoint, self.vmax], [0, 0.5, 1]
        return np.ma.masked_array(np.interp(value, x, y))

    
plt.figure(figsize=(8, 6))
plt.subplots_adjust(left=.2, right=0.95, bottom=0.15, top=0.95)
scores = grid.cv_results_['mean_test_score'].reshape(len(C_range),len(gamma_range))
plt.imshow(scores, interpolation='nearest', cmap=plt.cm.hot,norm=MidpointNormalize(vmin=0.5, midpoint=0.95))
plt.xlabel('gamma')
plt.ylabel('C')
plt.colorbar()
plt.xticks(np.arange(len(gamma_range)), gamma_range, rotation=45)
plt.yticks(np.arange(len(C_range)), C_range)
plt.title('Validation ROC_AUC score')
plt.show()