In [None]:
from sklearn import svm
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import confusion_matrix, accuracy_score
from tensorflow import keras
from keras.datasets import mnist
%matplotlib inline

In [None]:


(X_train, y_train),(X_test, y_test ) = mnist.load_data()
X_train.shape, X_test.shape, y_train.shape, y_test.shape

In [None]:
X_train = X_train.reshape(60000,784)
X_test = X_test.reshape(10000,784)

In [None]:
X_train = X_train/255.0
X_test = X_test/255.0

In [None]:
classifier = svm.SVC()
classifier.fit(X_train,y_train)

In [None]:
y_pred = classifier.predict(X_test)


In [None]:
print(confusion_matrix(y_test, y_pred))

In [None]:
accuracy = accuracy_score(y_test,y_pred)
accuracy

In [None]:
params_GS = {'kernel': ['linear'],
              'C': [ 1, 10, 100,],  
              'gamma': [1, 0.1, 0.01], 
              }  
  
GS = GridSearchCV(svm.SVC(), param_grid = params_GS, refit = True, verbose = 3, cv=3) 
   

In [None]:
# fitting the model for grid search 
GS.fit(X_train, y_train)

In [None]:
GS_predictions = GS.predict(X_test) 


In [None]:
accuracy_GS = accuracy_score(y_test, GS_predictions)
accuracy_GS

After applying the GridSearch CV the accuracy of the SVM decreases. Other combination of parameters might generate more accuracy.

## Problem 2

To find separating vector $\underset{w}{\rightarrow}$ to achieve maximal separating margin minimize
</br>
$\underset{w}{\rightarrow}\frac{\underset{W}{\rightarrow}^{T}\underset{W}{\rightarrow}}{2}$ subject to $ y_{i}\cdot \underset{w}{\rightarrow}^{T}\underset{x_{i}}{\rightarrow}\geq 1 $ , for i = 1, ... , l
</br>

The margin is $ \gamma = \frac{1}{\sqrt{w^{t}w}}$

</br>
Lagrange function can be given as 

$ L = \frac{\underset{w}{\rightarrow}^{T}\underset{w}{\rightarrow}}{2}+\sum a_{i}\left ( 1-y_{i}\cdot w^{T} x_{i}\right )$

</br>

Lagrange multipliers :
${a_{i}\geq 0}$

</br>

Claim:

$\underbrace{\underset{a}{max}\underset{w}{min}}L\leq \underbrace{\underset{w}{min}\underset{a}{max}}L$

dual solution   primal solution

</br>

### Dual Problem ###

$ L = \frac{\underset{w}{\rightarrow}^{T}\underset{w}{\rightarrow}}{2}+\sum a_{i}\left ( 1-y_{i}\cdot w^{T} x_{i}\right )$
</br>

####Set partial derivative of Lagrange function oven primal variable to 0  ####

$\partial _{w}L= w -\sum_{i}a_{i}y_{i}x_{i}^{\rightarrow }= 0\Rightarrow w= \sum_{i}a_{i}y_{i}x_{i}^{\rightarrow }$
</br>

####Second order partial derivative ####

$\partial _{w}L=1>0 : w =\sum_{i}a_{i}y_{i}x_{i}^{\rightarrow }$


####Substituting #### 
$w =\sum_{i}a_{i}y_{i}x_{i}^{\rightarrow }$

into Lagrange funtion , we get the dual problem of maxizing

$ L = \frac{\underset{w}{\rightarrow}^{T}\underset{w}{\rightarrow}}{2}+\sum a_{i}\left ( 1-y_{i}\cdot w^{T} x_{i}\right )$

$ L = \frac{1}{2}w^{T}+\sum_{i} a_{i}y_{i}x_{i}^{\rightarrow } +\sum_{i} - w^{T}\sum_{i}a_{i}y_{i}x_{i}^{\rightarrow }$

$= \sum_{i}a_{i}\frac{1}{2}w^{T}\sum_{i}a_{i}y_{i}x_{i}^{\rightarrow }$

$=\sum_{i}a_{i}-\frac{1}{2}\sum_{ij}a_{i}a_{j}y_{i}y_{i}x_{i}^{\rightarrow }x_{j}$





####Benifits of maximizing the margin####

Since SVM supports kernel, it is easy to determine non-leanear relationships in output and input data. Moreover, creating a model with high margin, helps to reguralize the residing w while training, ultimatily performing better at prediction and prevents overfitting, with the inbuilt cross-validation feature.In addition to that fluctuation in data can be handled effectively if me maximize the margin in SVM.

###Benifits of solving the dual problem over Primal problem ###

In dual problem number of dimentions are equal to the number of datapoints.In addition to that, dual problem is more efficient than primal problem when data points are less than the dimensions, regardless of increased data points.Moreover, the main benifit of using dual problem over primal problem is, we can apply kernel in the model.
