## Instructions

1. For this SVM implementation, you would need to install `cvxopt` by entering the following in your terminal:

   ```
   sudo pip install cvxopt
   ```

   <br>
   
2. Examine the `SVM` class in `SVM.py`. It takes 2 hyperparameters:

   - `C` is the slack variable 
   - `kernel` is the kernel function in `Kernel.py`, e.g. `Kernel.linear()`
   
   <br>
   
3. Fill in the function `gram_matrix()` which takes the feature matrix (X) as an argument. The gram matrix is essentially a `n x n` matrix where each entry in the matrix is computed by applying the kernel function to the corresponding pairwise data points at corresponding i, j positions of the gram matrix.

   Define a matrix of `n x n` filled with zero. Write a double nested loop to loop through all the pairwise `i` and `j` positions. Apply the corresponding kernel function (stored in the `self.kernel` variable) to the data points at the given pairwise `i` and `j` positions. Assign the computed value back to `i`, `j` in the matrix to overwrite the zero value and return the gram matrix at the end of the function.

   <br>
   
4. At the core of the SVM algorithm is a quadratic optimization procedure of the dual form. The quadratic optimization would take too long to implement in the scope of this practicum. Hence we are going to leverage the `cvxopt` package to do the quadratic optimization for us. 

   Nonetheless, we will specifiy the input for the quadratic optimization, i.e. define elements of the dual. The input we will define (later) might not seem to fully specify the dual since `cvxopt.solvers.qp` already makes some assumptions about the form of the function to optimize for. 
   
   In the following steps, we will walk through how to specify elements of the dual in `cvxopt` in order to fill in the `solve()` function.
   
   Below is a formulation of the dual:
   ![](images/dual.png)
   
   <br>
   
5. Do the following steps in the `solve()` function:

   - Compute and assign the variables `K`, `P` and `q`
     
     - Call the `gram_matrix()` function on `X` and assign it K. That is the kernelized features (shown in red box)
       
       ![](images/gram_matrix.png)
      
       <br>
       
     - Take the outer-product of `y` and multiply (element-wise) with `K`. Assign the result to `P`.
     
       Since we are going to use `P` later in `cvxopt`, wrap the `numpy array` in a `cvxopt.matrix(numpy_arr)`
       
       i.e. `P = cvxopt.matrix(numpy_operations)` 
     
       The result is represented in the green box.
       
       ![](images/p.png)
       
       <br>

     - Assign `q` to be a vector of length of `m` (number of rows) of -1s. Put it in a `cvxopt.matrix` as you have done above
     
       This represents the parameters that are to be solved and optimized for with the qudratic optimization. You can think of them as weight coefficients for each data point to decide how much influence each data point has to the prediction. Points closer to the decision boundary have more influence.
       
       ![](images/q.png)
       
       <br>

   - Without exposing too much of the internals of `cvxopt`, we will treat the rest of the function (that is given to you) as a black box. 
   
     __When you run the function `solve`, it will return the weight for each of the data point__.
   
     If you are interested, you can read more about `cvxopt` and in particular `cvxopt.solvers.qp` [here](http://cvxopt.org/userguide/coneprog.html). 
   
     You can also consult [this book](http://stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf) for more theory behind SVM optimization
     
     <br>
     
6. Now assuming the model has been fitted and predictions are to be made for a number of new data points.

   The `_predict()` function accepts new data points as a matrix. 
   
   Write a loop that computes the kernel of between each support vector and the data points. This can be achieved by `kernel(support_vector, X)`. This acts as a measure of similarity between the data points and the support vector. 
   
   Within the loop you would also want to access the weight of each support vector and the label of the support vector. Multiply the result from the kernel with the weight and the label. Cumulatively add the product to the `result` variable.
   
   Finally take the sign of the result, and that will classify the data points as `1` or `-1`.
   
   <br>
   
7. Now run `Demo.py` and you should see an output as follows and the `cost` should be decrease with more iterations:

    ```
         pcost       dcost       gap    pres   dres
     0: -1.8408e+02 -1.4058e+03  8e+03  3e+00  8e-15
     1: -1.1419e+02 -9.5030e+02  2e+03  5e-01  7e-15
     2: -6.6777e+01 -2.9565e+02  4e+02  9e-02  6e-15
     3: -4.3733e+01 -8.7563e+01  7e+01  2e-02  6e-15
     4: -4.6412e+01 -5.4696e+01  1e+01  2e-03  5e-15
     5: -4.7316e+01 -5.1159e+01  5e+00  9e-04  4e-15
     6: -4.8005e+01 -4.9241e+01  2e+00  3e-04  3e-15
     7: -4.8244e+01 -4.8673e+01  5e-01  6e-05  3e-15
     8: -4.8351e+01 -4.8469e+01  1e-01  8e-06  3e-15
     9: -4.8401e+01 -4.8405e+01  4e-03  8e-08  3e-15
    10: -4.8403e+01 -4.8403e+01  4e-05  8e-10  3e-15
    ```

    You should also produce a plot as such:
    
    ![](images/result.png)
   
    
    


   
