# <center> **Kernel Methods For Machine Learning - Data Challenge: Image Classification**

### **Abstract**

Image Classification is classical task in Machine Learning that attempts to comprehend an entire image as a whole. The goal is to classify the image by assigning it to a specific label. In this data challenge, we use Kernel methods techniques to perform image multi-class classification.

### **Introduction**:

In this challenge a data-set of $5000$ labeled images, and $2000$ images as test data to be classified in $10$ classes, is given. Each
image is of $32\times32$ pixels with $3$ color channels.

In this challenge, we tried not to use any external machine learning libraries and implement the major algorithms from scratsh except **optimize.minimze** module in python using the **SLSQP** method.

### Imports

In [None]:
!pip install numpy skimage scipy

In [None]:
import sys
import pickle as pkl
from time import time

import numpy as np
import pandas as pd
import skimage

from scipy import optimize
from scipy.linalg import cho_factor, cho_solve

import matplotlib.pyplot as plt

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
class utils:

    @staticmethod
    def log_process(title, cursor, finish_cursor, start_time = None):
        percentage = float(cursor + 1)/finish_cursor
        now_time = time()
        time_to_finish = ((now_time - start_time)/percentage) - (now_time - start_time)
        mn, sc = int(time_to_finish//60), int((time_to_finish/60 - time_to_finish//60)*60)
        if start_time:
            sys.stdout.write("\r%s - %.2f%% ----- Estimated time: %d min %d sec -----" %(title, 100*percentage, mn, sc))
            sys.stdout.flush()
        else:
            sys.stdout.write("\r%s - \r%.2f%%" %(title, 100*percentage))
            sys.stdout.flush()

#SIFT Implementation: Features Extraction

**Scale-Invariant Feature Transform (SIFT)** is an algorithm that extracts distinctive invariant features from images that can be used for object recognition and to perform reliable matching between different views of an object or scene.

As described in **[D. G. Lowe. “Distinctive image features from scale-invariant keypoints.” In: International journal of computer vision, 60(2):91–110. 2004]** , SIFT approach is composed of the following steps.

- *Scale-space extrema detection*: We compute  the scale space of an image which is defined as the convolution of an input image with a variable-scale Gaussian which gives a pyramid of Gaussian-blurred images. We then compute the difference of these successive blurred images. Candidate Key-points are given by the local extremas of the difference of Gaussian's (DoG) pyramids.

- *Keypoint localization*: To Localize the key-points with more accuracy, SIFT refines them to sub-pixel precision using a local quadratic function that is the approximation of the second order Taylor development  of the Difference-of-Gaussians function (see section 3.2 of **[Ives Rey Otero and Mauricio Delbracio. “Anatomy of the SIFT Method.” In: Image Processing On Line, https://doi.org/10.5201/ipol.201482. 2014.2]**), then we  filter the unstable Key-points by discarding the ones that are either  low contrasted or located edges.
    
- *Orientation assignment*: This step is done by three steps. First, orientation histogram accumulation that's done by by computing a histogram of oriented gradients in the neighboring pixels of each key-point. Second,  We smooth the histogram by applying six times a circular convolution with the three-tap box filter $[1, 1, 1]/3$. Finally, we extract one or more reference orientations from the smoothed histogram and assign the orientation with the highest peak to the key-point.

- *Key-point descriptor*: The SIFT descriptor encodes the local spatial distribution of the gradient orientation on a particular neighborhood.



In [None]:
params = { ' gridSpacing': 6,
           'patchSize': 31,
           'sift_thres': .3,
           'gaussian_thres': .7,
           'sigma_edge': .4,
           'num_angles': 12,
           'num_bins': 5,
           'alpha': 9.0 }

In [None]:
class SIFT:

      '''
    This implementation is based on dense Sift
    Svetlana Lazebnik's Matlab implementation, which could be found at:
    http://www.cs.unc.edu/~lazebnik/
    Yangqing Jia, jiayq@eecs.berkeley.edu

    https://github.com/Yangqing/dsift-python/blob/master/dsift.py
    '''

  def __init__(self, gridSpacing = 8, patchSize = 16, gaussian_thres = 1.0,
                 sigma_edge = 0.8, sift_thres = 0.2, num_angles = 12, num_bins = 5, alpha = 9.0):
      self.num_angles = num_angles
      self.num_bins = num_bins
      self.alpha = alpha
      self.angle_list = np.array(range(num_angles))*2.0*np.pi/num_angles
      self.gs = gridSpacing
      self.ps = patchSize
      self.gaussian_thres = gaussian_thres
      self.sigma =  sigma_edge
      self.sift_thres = sift_thres
      self.weights = self._get_weights(num_bins)


  def get_process_image(self, image):
    '''
        processes a single image, return the locations
        and the values of detected SIFT features.
        image: a M*N image which is a numpy 2D array. If you
            pass a color image, it will automatically be converted
            to a grayscale image.
        positionNormalize: whether to normalize the positions
            to [0,1]. If False, the pixel-based positions of the
            top-right position of the patches is returned.

        Return values:
        feaArr: the feature array, each row is a feature
        positions: the positions of the features

    '''

    image = image.astype(np.double)
    if image.ndim == 3:
      # Convert to gray
      image = np.mean(image, axis=2)

    # compute the grids
    H, W = image.shape
    gS = self.gs
    pS = self.ps
    remH = np.mod(H-pS, gS)
    remW = np.mod(W-pS, gS)
    offsetH = remH//2
    offsetW = remW//2
    #print(offsetH, H-pS+1, gS, offsetW, W-pS+1, gS)
    gridH, gridW = np.meshgrid(range(offsetH, H-pS+1, gS), range(offsetW, W-pS+1, gS))
    gridH = gridH.flatten()
    gridW = gridW.flatten()
    features = self._calculate_sift_grid(image, gridH, gridW)
    features = self._normalize_sift(features)
    positions = np.vstack((gridH / np.double(H), gridW / np.double(W)))
    return features, positions

  def get_X(self, data):
    out = []
    start = time()
    finish = len(data)
    for idx, dt in enumerate(data):
        utils.log_process('SIFT', idx, finish_cursor=finish, start_time = start)
        out.append(self.get_process_image(np.mean(np.double(dt), axis=2))[0][0])
    return np.array(out)



  def _get_weights(self, num_bins):
    # compute the weight contribution map
    sample_res = self.ps / np.double(num_bins)
    sample_p = np.array(range(self.ps))
    sample_ph, sample_pw = np.meshgrid(sample_p,sample_p)
    sample_ph.resize(sample_ph.size)
    sample_pw.resize(sample_pw.size)
    bincenter = np.array(range(1,num_bins*2,2)) / 2.0 / num_bins * self.ps - 0.5
    bincenter_h, bincenter_w = np.meshgrid(bincenter,bincenter)
    bincenter_h.resize((bincenter_h.size,1))
    bincenter_w.resize((bincenter_w.size,1))
    dist_ph = abs(sample_ph - bincenter_h)
    dist_pw = abs(sample_pw - bincenter_w)
    weights_h = dist_ph / sample_res
    weights_w = dist_pw / sample_res
    weights_h = (1-weights_h) * (weights_h <= 1)
    weights_w = (1-weights_w) * (weights_w <= 1)
    # weights is the contribution of each pixel to the corresponding bin center
    return weights_h * weights_w

  def _calculate_sift_grid(self, image, gridH, gridW):
    H, W = image.shape
    Npatches = gridH.size
    features = np.zeros((Npatches, self.num_bins * self.num_bins * self.num_angles))
    gaussian_height, gaussian_width = self._get_gauss_filter(self.sigma)
    IH = self._convolution2D(image, gaussian_height)
    IW = self._convolution2D(image, gaussian_width)
    Imag = np.sqrt(IH**2 + IW**2)
    Itheta = np.arctan2(IH,IW)
    Iorient = np.zeros((self.num_angles, H, W))
    for i in range(self.num_angles):
      Iorient[i] = Imag * np.maximum(np.cos(Itheta - self.angle_list[i])**self.alpha, 0)
    for i in range(Npatches):
        currFeature = np.zeros((self.num_angles, self.num_bins**2))
        for j in range(self.num_angles):
            currFeature[j] = np.dot(self.weights, Iorient[j,gridH[i]:gridH[i]+self.ps, gridW[i]:gridW[i]+self.ps].flatten())
        features[i] = currFeature.flatten()
    return features

  def _normalize_sift(self, features):

    '''
        This function does sift feature normalization
        following David Lowe's definition (normalize length ->
        thresholding at 0.2 -> renormalize length)
    '''

    siftlen = np.sqrt(np.sum(features**2, axis=1))
    hcontrast = (siftlen >= self.gaussian_thres)
    siftlen[siftlen < self.gaussian_thres] = self.gaussian_thres
    # normalize with contrast thresholding
    features /= siftlen.reshape((siftlen.size, 1))
    # suppress large gradients
    features[features>self.sift_thres] = self.sift_thres
    # renormalize high-contrast ones
    features[hcontrast] /= np.sqrt(np.sum(features[hcontrast]**2, axis=1)).\
        reshape((features[hcontrast].shape[0], 1))
    return features


  def _get_gauss_filter(self, sigma):
    '''
          generating a derivative of Gauss filter on both the X and Y
          direction.
    '''


    gaussian_filter_amp = np.int64(2*np.ceil(sigma))
    gaussian_filter = np.array(range(-gaussian_filter_amp, gaussian_filter_amp+1))**2
    gaussian_filter = gaussian_filter[:, np.newaxis] + gaussian_filter
    gaussian_filter = np.exp(- gaussian_filter / (2.0 * sigma**2))
    gaussian_filter /= np.sum(gaussian_filter)
    gaussian_height, gaussian_width = np.gradient(gaussian_filter)
    gaussian_height *= 2.0/np.sum(np.abs(gaussian_height))
    gaussian_width  *= 2.0/np.sum(np.abs(gaussian_width))
    return gaussian_height, gaussian_width

  def _convolution2D(self, image, kernel):
    imRows, imCols = image.shape
    kRows, kCols = kernel.shape

    y = np.zeros((imRows,imCols))

    kcenterX = kCols//2
    kcenterY = kRows//2

    for i in range(imRows):
        for j in range(imCols):
          for m in range(kRows):
            mm = kRows - 1 - m
            for n in range(kCols):
              nn = kCols - 1 - n

              ii = i + (m - kcenterY)
              jj = j + (n - kcenterX)

              if ii >= 0 and ii < imRows and jj >= 0 and jj < imCols :
                y[i][j] += image[ii][jj] * kernel[mm][nn]

    return y


# Data processing

In [None]:
def reshape_data(X):
  R, G, B = np.hsplit(X, 3)
  data = np.array([np.dstack((R[i], B[i], G[i])).reshape(32, 32, 3) for i in range(len(X))])
  return data

In [None]:
class ImageTransformation:

  @staticmethod
  def flip_image(image):
    # Takes an image as input and outputs the same image with a horizontal flip
    result = image.copy()
    for channel in range(3):
      flips = image[:, :, channel]
      for column in range(len(flips)):
        result[:, column, channel] = flips[:, len(flips) - column - 1]
    return result

In [None]:
Xtr = pd.read_csv('/content/drive/MyDrive/mva-mash-kernel-methods-2021-2022/Xtr.csv', header=None)
Ytr = pd.read_csv('/content/drive/MyDrive/mva-mash-kernel-methods-2021-2022/Ytr.csv')
Xte = pd.read_csv('/content/drive/MyDrive/mva-mash-kernel-methods-2021-2022/Xte.csv', header=None)

In [None]:
Xtr

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,3063,3064,3065,3066,3067,3068,3069,3070,3071,3072
0,0.007018,0.000323,0.002215,0.000781,-0.005636,-0.001525,-0.001090,-0.001907,0.004179,-0.004225,...,-0.002166,-0.005094,0.001906,-0.006143,-0.013265,-0.013873,0.005223,-0.000860,-0.012881,
1,0.000819,0.001688,0.002698,0.004685,0.011166,0.017482,0.045989,0.031377,0.032150,0.062066,...,0.007203,0.008634,0.006800,0.014114,0.000243,-0.019384,-0.046763,-0.048919,-0.057449,
2,-0.016779,0.006662,-0.007226,-0.003798,-0.004273,-0.009955,-0.030925,-0.007064,0.008136,0.000618,...,-0.023748,0.047707,0.072310,0.056837,0.045410,0.015561,0.003272,-0.013745,0.000968,
3,0.014936,0.004218,0.009732,0.007309,0.004914,0.008172,0.014205,-0.023263,-0.023014,0.011482,...,-0.029634,-0.024069,-0.000788,-0.005010,-0.004260,0.014308,-0.010997,-0.025966,-0.025786,
4,-0.073091,-0.046800,-0.056235,-0.063619,-0.088387,-0.044682,-0.014172,-0.077535,-0.100056,-0.066161,...,-0.018166,0.012983,0.022676,0.014233,0.047403,0.052239,-0.029272,0.001368,-0.001475,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,-0.021390,0.029503,-0.002429,0.006146,-0.011715,-0.014993,0.000653,-0.015904,-0.037338,-0.076354,...,0.006321,-0.003823,-0.004482,0.007168,0.007442,0.008854,-0.004593,-0.000920,-0.017339,
4996,-0.028515,-0.031435,-0.012834,-0.029745,0.021972,0.003324,0.023001,0.029703,-0.038012,0.059018,...,-0.007123,-0.064752,-0.078392,-0.074313,-0.029270,0.056367,0.054470,0.010949,0.062182,
4997,0.049680,0.050423,0.012001,-0.003546,0.011553,0.000293,0.003198,0.016430,-0.010582,-0.019450,...,0.009501,0.014085,0.010988,0.011975,0.018873,-0.010400,-0.013394,-0.015530,-0.001548,
4998,0.000044,0.000599,0.000182,0.000025,-0.000004,0.000004,0.000000,0.000000,0.000000,0.000000,...,0.061100,0.017464,0.011662,0.036202,0.025268,0.020164,0.016163,0.025354,0.034644,


In [None]:
Xtr.apply(lambda x: sum(x.isnull())).sort_values(ascending=True) #Checking missing values in training set

0          0
2042       0
2043       0
2044       0
2045       0
        ... 
1027       0
1028       0
1029       0
1535       0
3072    5000
Length: 3073, dtype: int64

The only row with missing values is 3072 which has 5000 missing values (all entries).

In [None]:
Xtr.isnull().sum().sum()

5000

In [None]:
Xtr = Xtr.dropna(axis=1, how='all', inplace=False)   ## drop any rows and columns with messing values

In [None]:
Xtr.apply(lambda x: sum(x.isnull())) #Checking missing values in training set

0       0
1       0
2       0
3       0
4       0
       ..
3067    0
3068    0
3069    0
3070    0
3071    0
Length: 3072, dtype: int64

In [None]:
Xte.apply(lambda x: sum(x.isnull())).sort_values(ascending=True) #Checking missing values in the test set

0          0
2042       0
2043       0
2044       0
2045       0
        ... 
1027       0
1028       0
1029       0
1535       0
3072    2000
Length: 3073, dtype: int64

In [None]:
Xte = Xte.dropna(axis=1, how='all', inplace=False)   ## drop any rows and columns with messing values

In [None]:
Xte.apply(lambda x: sum(x.isnull())) #Checking missing values in the test set

0       0
1       0
2       0
3       0
4       0
       ..
3067    0
3068    0
3069    0
3070    0
3071    0
Length: 3072, dtype: int64

In [None]:
Xtrain=Xtr.values
Ytrain=y = Ytr['Prediction'].values
Xtest=Xte.values

In [None]:
data_train  = reshape_data(Xtrain)
data_test = reshape_data(Xtest)

In [None]:
# Flipping images to augment data
start = time()
finish = len(data_train)

augmented_train = []

for row in range(0, finish):
    if row % 50 == 0 or row == finish-1:
        utils.log_process('Flipping image...', row, finish_cursor=finish, start_time = start)
    augmented_train.append(data_train[row])
    augmented_train.append(ImageTransformation.flip_image(data_train[row]))

augmented_train=np.array(augmented_train)

#  augment labels
start = time()
augmented_labels = []
for row in range(len(data_train)):
    label = y[row]
    augmented_labels.append(label)
    augmented_labels.append(label)

augmented_labels = np.array(augmented_labels)


Flipping image... - 100.00% ----- Estimated time: 0 min 0 sec -----

In [None]:
extractor = SIFT( gridSpacing=params[' gridSpacing'],
                 patchSize=params['patchSize'],
                 sift_thres=params['sift_thres'],
                 sigma_edge=params['sigma_edge'],
                 gaussian_thres=params['gaussian_thres'],
                 num_angles=params['num_angles'],
                 num_bins=params['num_bins'],
                 alpha=params['alpha'])


augmented_data = False                      ##          True to augment data and False to use only the original data
                                            ## Data Augmentation will lead to better generelization but it will require more training time, that's why We won't use it for this test!

if augmented_data:
    X_train = extractor.get_X(augmented_train)
else:
    X_train = extractor.get_X(data_train)

SIFT - 100.00% ----- Estimated time: 0 min 0 sec -----

##Building HoG Features: Histogram of Oriented Gradients

In [None]:
from skimage.feature import hog

def hog_features(X):

  """
      The HOG features are widely use for object detection. HOG decomposes an image into small squared cells, computes an histogram of oriented gradients in each cell,
      normalizes the result using a block-wise pattern, and return a descriptor for each cell.

      Stacking the cells into a squared image region can be used as an image window descriptor for object detection, for example by means of an SVM

      see : https://www.vlfeat.org/overview/hog.html#:~:text=The%20HOG%20features%20are%20widely,a%20descriptor%20for%20each%20cell.
      see : https://scikit-image.org/docs/stable/auto_examples/features_detection/plot_hog.html

  """

  m,n = X.shape
  Features = np.zeros((m,1024))
  dimensions = 32

  for k in range(m) :

    image = np.full((3,32,32),0).astype(float)
    im = X[k]

    for j in range(dimensions):

      image[0,:,j] += im[j*dimensions:(j+1)*dimensions]
      image[1,:,j] += im[dimensions**2+j*dimensions: dimensions**2+(j+1)*dimensions]
      image[2,:,j] += im[2*dimensions**2+j*dimensions: 2*dimensions**2+(j+1)*dimensions]

    image = image.reshape(32,32,3)
    #t_0 = time()
    fd, hog_image = hog(image, orientations=4, pixels_per_cell=(2, 2),cells_per_block=(1, 1), visualize=True, multichannel=True)
    #fd, hog_image = hog(image, orientations=4, pixels_per_cell=(2, 2),cells_per_block=(1, 1), visualize=True, multichannel=True)
    print(m-k)

    Features[k] = hog_image.reshape(1,-1)


  return Features


## SVMs Classifiers & Optimization

In this challenge, we have decided to use a multi Support Vector Machine classifier, specifically 10 classifiers. Each classifier $i$ classifies for the boundaries between belonging to the $i$-th class or not. This, therefore, yields 10 equivalent optimization problems. Following the optimization for the $l$-th classifier, classifying for class $l$, we have:

$$
\text{min } C(\sum_{i=1}^n max(0,1-(f_l(x_i)+b)\times y_{i,l} )+\frac{1}{2}||f_l||_{\mathcal{H}}^2,
$$

where $y_{i,l} \in \{-1, 1\}$. $y_{i,l} = 1$ if $x_i$ is in class $l$ and $y_{i,l} = -1$ otherwise.

We have used the RBF kernel defined as for the vector of features $x$:

$$ K_x : v \xrightarrow[]{} \exp(-\frac{||x-v||^2}{\sigma^2}),$$


leading by the reproducing property to

$$
\langle K_x, K_v \rangle = \exp\left(-\frac{\|x-v\|^2}{2\sigma^2}\right).
$$

Moreover, via the representer theorem, we have that for the optimal result $f^*$, $f^* = \sum_{i=1}^n \alpha_i \times K_{x_i}$, which yields a Quadratic Programming (QP) problem for the dual form of the optimization.

The optimization has been done via the optimization of the dual problem, which leads to a Quadratic Programming problem. We have solved this using the `optimize.minimize` module in Python with the SLSQP method.

Having obtained our 10 optimal classifiers $f_i^*$, given any $x$, the policy we have followed in order to assign a new label to a data point $x$ is the following:

$$ label_x = \text{argmax}_{i}f_i^*(x)$$

In [None]:
class RBF:
    def __init__(self, sigma=1.):
        self.sigma = sigma  # The variance of the kernel

    def kernel(self, X, Y):
        # calculate the L2 norms (squared) of each row in X and Y
        X_norm_squared = np.sum(X ** 2, axis=1).reshape(-1, 1)
        Y_norm_squared = np.sum(Y ** 2, axis=1).reshape(1, -1)

        # Calculate the squared Euclidean distance matrix
        distances_squared = X_norm_squared + Y_norm_squared - 2 * np.dot(X, Y.T)

        # aapply the Gaussian (RBF) kernel formula
        K = np.exp(-distances_squared / (2 * self.sigma ** 2))

        return K


class Linear:
    #def __init__(self, sigma=1.):
    #    self.sigma = sigma  # This parameter is not actually used in the linear kernel

    def kernel(self, X, Y):
        # Direct matrix multiplication between X and Y^T performs the dot product between all pairs
        return np.dot(X, Y.T)

In [None]:
class KernelSVC:

    def __init__(self, C, kernel, epsilon=1e-3):
        self.type = 'non-linear'
        self.C = C
        self.kernel = kernel
        self.alpha = None
        self.support = None
        self.epsilon = epsilon
        self.norm_f = None

    def fit(self, X, y):
        N = y.shape[0]
        self.D = np.diag(y)
        ke = self.kernel(X, X)
        self.X = X

        # Precompute Gram matrix G for efficiency
        G = self.D @ ke @ self.D

        # Lagrange dual problem
        def loss(alpha):
            return -alpha.sum() + 0.5 * alpha @ G @ alpha

        # Partial derivative of Ld on alpha
        def grad_loss(alpha):
            return -np.ones_like(alpha) + G @ alpha

        # Constraints on alpha
        A = np.vstack((-np.eye(N), np.eye(N)))
        b = np.hstack((np.zeros(N), self.C * np.ones(N)))

        constraints = (
            {'type': 'eq', 'fun': lambda alpha: np.dot(alpha, y), 'jac': lambda alpha: y},
            {'type': 'ineq', 'fun': lambda alpha: b - np.dot(A, alpha), 'jac': lambda alpha: -A}
        )

        optRes = optimize.minimize(fun=loss, x0=np.zeros(N), method='SLSQP', jac=grad_loss, constraints=constraints)
        self.alpha = optRes.x

        # Efficiently identify support vectors and calculate b
        support_mask = (self.alpha > self.epsilon) & (self.alpha < self.C - self.epsilon)
        self.support = X[support_mask]
        support_labels = y[support_mask]
        support_alphas = self.alpha[support_mask]
        self.b = np.mean(support_labels - np.sum(ke[support_mask][:, support_mask] * support_alphas * support_labels, axis=1))

        # RKHS norm of the function f
        self.norm_f = self.alpha @ G @ self.alpha

    def separating_function(self, x):
        return np.dot(self.kernel(x, self.X), self.D @ self.alpha)

    def predict(self, X):
        """ Predict y values in {-1, 1} """
        return np.sign(self.separating_function(X) + self.b)

In [None]:
class out_of_sample_prediction:

    def __init__(self, X, x, alpha, D, b, sigma=1.):
        self.X = X  # Training data
        self.x = x  # New data points for prediction
        self.sigma = sigma
        self.D = D  # Diagonal matrix of labels for training data
        self.alpha = alpha  # Lagrange multipliers
        self.b = b  # Offset

    def kernel(self):
        # Efficient computation of the RBF kernel using broadcasting
        X2 = np.sum(self.X**2, axis=1)
        x2 = np.sum(self.x**2, axis=1)
        cross_term = np.dot(self.x, self.X.T)
        distances = X2 - 2 * cross_term + x2[:, np.newaxis]
        A = -distances / (2 * self.sigma**2)
        return np.exp(A)

    def separating_function(self):
        res = self.kernel()
        gam = np.dot(self.D, self.alpha)
        final = np.dot(res, gam)
        return final

    def predict(self):
        """Predict y values in {-1, 1}"""
        d = self.separating_function()
        return np.sign(d + self.b)

# Data Precessing

In [None]:
#Ytr = pd.read_csv('/content/Ytr.csv')
Ytr = np.array(Ytr.iloc[:,1])

In [None]:
Ytr

array([8, 9, 3, ..., 1, 7, 5])

In [None]:
from tqdm.auto import tqdm

#fig, ax = plt.subplots(1,3, figsize=(20, 5))
#C = 10000. WAY BETTER With 10000

test = []
variances = [1.5] #np.linspace(1000,5000,20) #1 is the best so far
for sig in variances:
  C = 10
  num_classes = int(max(Ytr)-min(Ytr)+1)
  n_samples = len(X_train) ## 2000
  train_dataset = X_train[:n_samples]#Xtr[:n_samples]
  Ytr_copy = Ytr[:n_samples]
  Diagonals = np.full((num_classes, n_samples, n_samples), 0).astype(float)

  alphas = []
  b = []
  for i in tqdm(range(num_classes), desc ="Training Classifiers"):
    print(i)
    values = np.copy(Ytr_copy)
    idx = np.where(Ytr_copy==i)
    idx_not = np.where(Ytr_copy!=i)
    values[idx]=1
    values[idx_not]=-1

     ## Select Kernel
    kernel = RBF(sigma = sig).kernel  ## use RBF Kernel
    #kernel = Linear().kernel         ## use Lienar Kernel

    ## Select Model
    model = KernelSVC(C=C, kernel=kernel)
    results = values
    model.fit(train_dataset, results)
    alpha = model.alpha
    alphas.append(alpha)
    b.append(model.b)
    Diagonals[i] = model.D

  0%|          | 0/10 [00:00<?, ?it/s]

0


# Submission

In [None]:
X_test = extractor.get_X(data_test)

SIFT - 0.15% ----- Estimated time: 2 min 18 sec -----

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations


SIFT - 100.00% ----- Estimated time: 0 min 0 sec -----

In [None]:
X = X_train[:n_samples]
pred = []
n = X_test.shape[0]
for j in range(n):

    res = np.array([out_of_sample_prediction(X, X_test[j].reshape(1,-1), alphas[i], Diagonals[i], b[i], sigma = 1.5).predict()[0] for i in range(10)])
    print(res)
    idx = np.argmax(res)
    print(idx)
    pred.append(idx)
    print(j)

[1;30;43mLe flux de sortie a été tronqué et ne contient que les 5000 dernières lignes.[0m
750
[-2.88749123 -3.55669505 -1.71863103 -3.92166218 -2.02699452 -3.53686866
 -3.20186521 -4.11517663 -1.20536999 -0.64349033]
9
751
[-0.84255468 -2.31296743 -2.77845026 -3.03474433 -3.1473502  -2.84314139
 -2.39051064 -2.43624241 -1.91627474 -4.52870938]
0
752
[ 0.29786148 -5.94444434 -2.77692511 -3.23040198 -2.79875101 -2.81741449
 -3.98074012 -2.49790541 -1.45643851 -4.62979717]
0
753
[-3.2513223  -2.81663024 -3.18171754 -2.46023584 -2.62657867 -2.34872262
 -2.36615678 -2.91535697 -3.07024133 -2.22022393]
9
754
[-3.27168353 -3.49474085 -2.75406249 -3.00839981 -2.39697738 -1.74031864
 -2.58735395 -4.1401113  -2.58550365 -1.04486183]
9
755
[-3.39792099 -3.58857569 -3.39769841 -2.14249793 -2.31675312 -3.30204763
 -2.92492764 -2.51036057 -3.48849655 -0.02352969]
9
756
[-2.69496694 -2.87792278 -3.46364348 -1.83654291 -2.62813262 -2.05823642
 -2.24613135 -2.9095187  -2.02355495 -2.33640839]
3
757
[

In [None]:
pred                                    ##Predictions

In [None]:
Yte = pd.DataFrame(pred, columns=['Prediction'])
Yte['id'] = np.arange(1, len(Yte)+1)

In [None]:
columns_titles = ["id","Prediction"]
Yte=Yte.reindex(columns=columns_titles)

In [None]:
Yte.head()

Unnamed: 0,id,Prediction
0,1,1
1,2,7
2,3,9
3,4,9
4,5,9


In [None]:
## save the predictions

path = '/content/drive/MyDrive/mva-mash-kernel-methods-2021-2022/Yte.csv'

with open(path, 'w', encoding = 'utf-8') as f:
  Yte.to_csv(f, sep=',', index=False)

In [None]:
Yte.head()

Unnamed: 0,Prediction,id
0,1,1
1,7,2
2,9,3
3,9,4
4,9,5


In [None]:
Ytr

array([8, 9, 3, ..., 1, 7, 5])