# SVM

The Goal is to minimize the cost function for SVM


$$
    \min_{w, b} \frac{1}{2} \textbf{w}^T \textbf{w}
$$
$$
    s.t: \,\, \forall i\in[N]: y_n (\textbf{w}^Tx_n+b) \geqslant 1
$$

*Part 1 will use Breast Cancer to illustrate and Part 2 will use iris dataset to illustrate*

we will use the following libraries:
* pandas --- only to load in data
* numpy
* cvxopt
* sklearn --- only to compare the outcome of my SVM with the library SVM in the end

In [1]:
import pandas as pd
import numpy as np
from cvxopt import matrix, solvers

# 1.Using cvxopt.solvers.qp(P, q, G, h, A, b)

A standard form of Quadratic Programming(QP) at cvxopt document is:
$$
    \text{min }\frac{1}{2}x^TPx+q^Tx\\
    Gx \preceq h\\
    Ax=b
$$

## Data loading and preprocessing
* I use Pandas to load in the data from the breast cancer dataset. 
* Also, for the y(label), I have changed every 0 to -1 in order to simplify later work without loss of generosity


In [2]:
cancer_x_train = pd.read_csv('dataset_files/cancer_X_train.csv')
cancer_y_train = pd.read_csv('dataset_files/cancer_Y_train.csv')
cancer_x_train = cancer_x_train.iloc[0:, ].values
cancer_y_train = cancer_y_train.iloc[0:, 0].values

cancer_x_test = pd.read_csv('dataset_files/cancer_X_test.csv')
cancer_y_test = pd.read_csv('dataset_files/cancer_Y_test.csv')
cancer_x_test = cancer_x_test.iloc[0:, ].values
cancer_y_test = cancer_y_test.iloc[0:, 0].values

cancer_y_train=cancer_y_train.astype(float)
cancer_y_train= (cancer_y_train-0.5)*2

cancer_y_test=cancer_y_test.astype(float)
cancer_y_test= (cancer_y_test-0.5)*2

##  Soft-margin SVM with QP by cvxopt 

under this case:

In [3]:
class SVM_soft():

    def __init__(self):
        self.w = self.b = None

    def fit(self, X, y, C=1.0):
        n = len(X)
        d = len(X[0])
        #initialize p q G h
        self.P = matrix(np.identity(d + 1 + n, dtype=np.float))  
        self.q = matrix(np.zeros((d + 1 + n), dtype=np.float))
        self.G = matrix(np.zeros((n + n, d + 1 + n), dtype=np.float))
        self.h = -matrix(np.ones((n+n,), dtype=np.float))     
        
        #put in values for p q G h
        self.q[-n:,0] = C

        self.h[-n:,0] = 0
        
        self.P[0, 0] = 0
        self.P[1+d:, :] = 0
        
        for i in range(n):
            self.G[i, 0] = -y[i]
            self.G[i, 1: 1+d] = -X[i, :] * y[i]
            self.G[i, 1 + d + i] = -1
            self.G[i + n, i + d + 1] = -1 
            
        # QP
        sol = solvers.qp(self.P,self.q,self.G,self.h)

        self.w = np.zeros(d,)
        self.b = sol["x"][0] 
        for i in range(1, d + 1):
            self.w[i - 1] = sol["x"][i]
        
        return self.w, self.b

    def predict(self, X):
        return np.sign(np.dot(self.w, X.T) + self.b)

## Results under different C

### 1. C = 0 (i.e no slack)

In [4]:
svm_soft = SVM_soft()
svm_soft.fit(cancer_x_train, cancer_y_train, 0)

print("Training set accuracy：{:8.6} %".format((svm_soft.predict(cancer_x_train) == cancer_y_train).mean() * 100))
print("Testing set accuracy：{:8.6} %".format((svm_soft.predict(cancer_x_test) == cancer_y_test).mean() * 100))

     pcost       dcost       gap    pres   dres
 0:  1.0333e+00  5.1985e+01  3e+03  3e+00  2e+05
 1:  3.7636e-01 -1.1988e+02  1e+02  1e-01  7e+03
 2:  6.4760e-03 -7.3443e+00  7e+00  7e-03  4e+02
 3:  1.1177e-03 -2.2768e-01  2e-01  2e-04  1e+01
 4:  5.9596e-05 -2.3184e-02  2e-02  2e-05  1e+00
 5:  1.0621e-06 -1.1134e-03  1e-03  1e-06  6e-02
 6:  5.2714e-07 -4.3139e-05  4e-05  4e-08  2e-03
 7:  7.1622e-08 -6.2193e-06  6e-06  6e-09  3e-04
 8:  2.6419e-10 -4.8977e-07  5e-07  4e-10  2e-05
 9:  2.6866e-14 -4.9455e-09  5e-09  4e-12  3e-07
10:  2.6866e-18 -4.9455e-11  5e-11  4e-14  3e-09
Optimal solution found.
Training set accuracy： 62.6761 %
Testing set accuracy： 62.9371 %


### 2. C = 1.0 (default)

In [5]:
svm_soft.fit(cancer_x_train, cancer_y_train, 1)

print("Training set accuracy：{:8.6} %".format((svm_soft.predict(cancer_x_train) == cancer_y_train).mean() * 100))
print("Testing set accuracy：{:8.6} %".format((svm_soft.predict(cancer_x_test) == cancer_y_test).mean() * 100))

     pcost       dcost       gap    pres   dres
 0: -3.1583e+02  8.9218e+02  5e+03  4e+00  8e+03
 1:  3.5679e+02 -2.8004e+02  9e+02  6e-01  1e+03
 2:  2.2042e+02 -1.2900e+02  5e+02  3e-01  5e+02
 3:  1.4552e+02 -4.9921e+01  2e+02  1e-01  2e+02
 4:  9.5396e+01 -9.8586e+00  1e+02  6e-02  1e+02
 5:  6.4752e+01  1.5392e+01  6e+01  2e-02  4e+01
 6:  4.7356e+01  2.8937e+01  2e+01  8e-03  1e+01
 7:  4.1609e+01  3.4227e+01  8e+00  9e-04  2e+00
 8:  3.9830e+01  3.5307e+01  5e+00  5e-04  9e-01
 9:  3.8542e+01  3.6125e+01  2e+00  6e-05  1e-01
10:  3.7852e+01  3.6489e+01  1e+00  2e-05  4e-02
11:  3.7139e+01  3.6969e+01  2e-01  2e-06  4e-03
12:  3.7046e+01  3.7041e+01  5e-03  4e-08  8e-05
13:  3.7043e+01  3.7043e+01  9e-05  7e-10  1e-06
14:  3.7043e+01  3.7043e+01  2e-06  9e-12  1e-07
15:  3.7043e+01  3.7043e+01  8e-08  9e-14  6e-06
16:  3.7043e+01  3.7043e+01  2e-09  7e-15  4e-05
Terminated (singular KKT matrix).
Training set accuracy： 96.4789 %
Testing set accuracy： 95.1049 %


### 3. C = 10 

In [6]:
svm_soft.fit(cancer_x_train, cancer_y_train, 10)

print("Training set accuracy：{:8.6} %".format((svm_soft.predict(cancer_x_train) == cancer_y_train).mean() * 100))
print("Testing set accuracy：{:8.6} %".format((svm_soft.predict(cancer_x_test) == cancer_y_test).mean() * 100))

     pcost       dcost       gap    pres   dres
 0: -3.6643e+04  2.4753e+04  8e+04  2e+01  2e+03
 1: -1.7248e+03 -4.9573e+03  2e+04  4e+00  5e+02
 2:  4.7553e+02 -3.3901e+03  1e+04  2e+00  2e+02
 3:  1.0082e+03 -1.4933e+03  5e+03  9e-01  9e+01
 4:  8.6716e+02 -4.2482e+02  2e+03  3e-01  4e+01
 5:  5.0712e+02  9.9270e+01  6e+02  7e-02  8e+00
 6:  4.5756e+02  1.8448e+02  4e+02  3e-02  4e+00
 7:  4.2379e+02  2.1449e+02  3e+02  2e-02  2e+00
 8:  3.8802e+02  2.4142e+02  2e+02  1e-02  1e+00
 9:  3.6076e+02  2.6155e+02  1e+02  6e-03  6e-01
10:  3.3283e+02  2.8005e+02  6e+01  2e-03  2e-01
11:  3.2647e+02  2.7991e+02  5e+01  7e-04  7e-02
12:  3.1374e+02  2.8905e+02  3e+01  2e-04  2e-02
13:  3.0212e+02  2.9725e+02  5e+00  4e-05  4e-03
14:  2.9962e+02  2.9889e+02  7e-01  8e-15  2e-10
15:  2.9925e+02  2.9924e+02  1e-02  7e-15  2e-09
16:  2.9924e+02  2.9924e+02  2e-04  7e-15  7e-09
Optimal solution found.
Training set accuracy： 97.6526 %
Testing set accuracy： 95.8042 %


### 4. C = 100 

In [7]:
svm_soft.fit(cancer_x_train, cancer_y_train, 100)

print("Training set accuracy：{:8.6} %".format((svm_soft.predict(cancer_x_train) == cancer_y_train).mean() * 100))
print("Testing set accuracy：{:8.6} %".format((svm_soft.predict(cancer_x_test) == cancer_y_test).mean() * 100))

     pcost       dcost       gap    pres   dres
 0: -3.7140e+06  1.8770e+06  6e+06  2e+02  2e+03
 1: -4.6506e+05 -3.8584e+05  1e+06  4e+01  4e+02
 2: -1.0816e+05 -2.5811e+05  8e+05  2e+01  2e+02
 3:  8.5495e+03 -1.0262e+05  3e+05  7e+00  6e+01
 4:  3.5538e+04 -1.7831e+04  1e+05  1e+00  1e+01
 5:  2.3768e+04 -5.7732e+01  2e+04  2e-13  1e-10
 6:  6.0089e+03  6.5320e+02  5e+03  6e-14  1e-10
 7:  5.8448e+03  8.4877e+02  5e+03  5e-14  2e-10
 8:  5.1260e+03  1.1174e+03  4e+03  4e-14  1e-10
 9:  4.6600e+03  1.1461e+03  4e+03  3e-14  2e-10
10:  3.2560e+03  1.3936e+03  2e+03  1e-14  2e-10
11:  3.2653e+03  1.6005e+03  2e+03  1e-14  8e-11
12:  2.7493e+03  1.6673e+03  1e+03  8e-15  2e-10
13:  2.6646e+03  1.6981e+03  1e+03  7e-15  1e-10
14:  2.3249e+03  1.8632e+03  5e+02  6e-15  7e-11
15:  2.2333e+03  1.9034e+03  3e+02  6e-15  1e-10
16:  2.0754e+03  1.9853e+03  9e+01  7e-15  1e-10
17:  2.0404e+03  2.0066e+03  3e+01  6e-15  6e-10
18:  2.0238e+03  2.0173e+03  6e+00  6e-15  6e-10
19:  2.0204e+03  2.02

### 5. C = 0.01

In [8]:
svm_soft.fit(cancer_x_train, cancer_y_train, 1e-2)

print("Training set accuracy：{:8.6} %".format((svm_soft.predict(cancer_x_train) == cancer_y_train).mean() * 100))
print("Testing set accuracy：{:8.6} %".format((svm_soft.predict(cancer_x_test) == cancer_y_test).mean() * 100))

     pcost       dcost       gap    pres   dres
 0:  1.5469e+00  6.0390e+01  3e+03  3e+00  2e+05
 1:  8.4936e+00 -1.0587e+02  1e+02  1e-01  6e+03
 2:  7.3162e+00 -1.3283e+01  2e+01  1e-02  8e+02
 3:  4.4006e+00 -2.7591e+00  7e+00  4e-03  2e+02
 4:  2.0767e+00 -5.9346e-01  3e+00  1e-03  8e+01
 5:  1.2542e+00 -5.3720e-02  1e+00  6e-04  4e+01
 6:  8.5149e-01  1.9384e-01  7e-01  3e-04  2e+01
 7:  7.3421e-01  2.7530e-01  5e-01  2e-04  1e+01
 8:  5.3148e-01  4.1719e-01  1e-01  3e-05  2e+00
 9:  4.7478e-01  4.5625e-01  2e-02  3e-06  2e-01
10:  4.6681e-01  4.6166e-01  5e-03  3e-07  2e-02
11:  4.6466e-01  4.6340e-01  1e-03  8e-15  3e-11
12:  4.6402e-01  4.6399e-01  3e-05  8e-15  2e-11
13:  4.6400e-01  4.6400e-01  1e-06  8e-15  7e-10
14:  4.6400e-01  4.6400e-01  1e-08  8e-15  8e-10
Optimal solution found.
Training set accuracy： 95.7746 %
Testing set accuracy：  93.007 %


### 6. C = 0.0001

In [9]:
svm_soft.fit(cancer_x_train, cancer_y_train, 1e-4)

print("Training set accuracy：{:8.6} %".format((svm_soft.predict(cancer_x_train) == cancer_y_train).mean() * 100))
print("Testing set accuracy：{:8.6} %".format((svm_soft.predict(cancer_x_test) == cancer_y_test).mean() * 100))

     pcost       dcost       gap    pres   dres
 0:  1.0388e+00  5.2069e+01  3e+03  3e+00  2e+05
 1:  4.5787e-01 -1.1974e+02  1e+02  1e-01  7e+03
 2:  9.3226e-02 -7.4320e+00  8e+00  7e-03  4e+02
 3:  8.6052e-02 -3.1371e-01  4e-01  3e-04  2e+01
 4:  6.9402e-02 -9.9119e-02  2e-01  1e-04  6e+00
 5:  4.2124e-02 -2.5996e-02  7e-02  4e-05  2e+00
 6:  2.2383e-02 -5.3093e-03  3e-02  2e-05  9e-01
 7:  1.4590e-02  1.2355e-03  1e-02  7e-06  4e-01
 8:  1.0050e-02  4.6608e-03  5e-03  2e-06  1e-01
 9:  7.9701e-03  6.1053e-03  2e-03  7e-07  4e-02
10:  7.4128e-03  6.5384e-03  9e-04  2e-07  1e-02
11:  7.0887e-03  6.7618e-03  3e-04  4e-08  2e-03
12:  6.9422e-03  6.8706e-03  7e-05  3e-09  2e-04
13:  6.9187e-03  6.8863e-03  3e-05  6e-10  4e-05
14:  6.9033e-03  6.9000e-03  3e-06  2e-11  1e-06
15:  6.9016e-03  6.9016e-03  5e-08  3e-13  2e-08
Optimal solution found.
Training set accuracy： 93.4272 %
Testing set accuracy： 94.4056 %


## Results under sklearn

In [10]:
from sklearn.svm import SVC

In [11]:
svm_1 = SVC(kernel = 'linear', C=1e-10, random_state=1)
svm_2 = SVC(kernel = 'linear', C=1, random_state=1)
svm_3 = SVC(kernel = 'linear', C=10, random_state=1)
svm_4 = SVC(kernel = 'linear', C=100, random_state=1)
svm_5 = SVC(kernel = 'linear', C=1e-2, random_state=1)
svm_6 = SVC(kernel = 'linear', C=1e-4, random_state=1)
svm_1.fit(cancer_x_train, cancer_y_train)
svm_2.fit(cancer_x_train, cancer_y_train)
svm_3.fit(cancer_x_train, cancer_y_train)
svm_4.fit(cancer_x_train, cancer_y_train)
svm_5.fit(cancer_x_train, cancer_y_train)
svm_6.fit(cancer_x_train, cancer_y_train)

print("C = 1e-10: {:8.6} %".format((svm_1.predict(cancer_x_test) == cancer_y_test).mean() * 100))
print("C = 1: {:8.6} %".format((svm_2.predict(cancer_x_test) == cancer_y_test).mean() * 100))
print("C = 10: {:8.6} %".format((svm_3.predict(cancer_x_test) == cancer_y_test).mean() * 100))
print("C = 100: {:8.6} %".format((svm_4.predict(cancer_x_test) == cancer_y_test).mean() * 100))
print("C = 1e-2: {:8.6} %".format((svm_5.predict(cancer_x_test) == cancer_y_test).mean() * 100))
print("C = 1e-4: {:8.6} %".format((svm_6.predict(cancer_x_test) == cancer_y_test).mean() * 100))

C = 1e-10:  62.9371 %
C = 1:  95.8042 %
C = 10:  96.5035 %
C = 100:  95.8042 %
C = 1e-2:   93.007 %
C = 1e-4:  94.4056 %


From the above statistics, it is very obvious that my SVM has a similar performance with the SVM in the sklearn library.

## Conclusion

1. When C is small, the training set accuracy is small or near zero(when C = 0 i.e no slack). And when C gets bigger, the training set accuracy will also be bigger.
2. When C gets bigger, the time used to find the optimal solution is longer(shown by the number of lines each fit).
3. When C gets bigger, the testing set accuracy has a tendency to be bigger. However, bigger C can also give a lower testing set accuracy than a smaller C considering individual cases.


# 2.Dealing with several classes

## Basic Idea

When we have to deal with multiple case, we will reduce the problem into dealing with 2 classes and use the above SVM:
1. One verses one SVM(OVO):  we will create a SVM between each 2 classes. Thus it will be k(k-1)/2 in total(k is the number of classes). Then we will use a voting strategy called 'Max win' to decide its final result. During this process, each sample will be classified into one class each SVM and we will add one credit to this class. In the end, each sample will have different credits on each class labels and we will label it with the max credit class. If the credit is equal. Then we choose the one that come last for simplicity.

2. One verses rest SVM(OVR): we label one class to be the positive label, and the others to be the negative label. Thus we still have 2 labels in total for one SVM. And we will need to create k SVMs in total(k is the number of classes) and will get k results. If several SVM label one same class to be positive, then we just choose the one that come last for simplicity.If a sample has no positive cases in each SVM, then we label it to be the last one for simplicity. 


## Data loading
* I use Pandas to load in the data from the iris dataset. 
* For the y(label), when training the model, I will use 1 to represent the positive class and -1 to represent the negative class.

In [12]:
iris_x_train = pd.read_csv('dataset_files/iris_X_train.csv')
iris_y_train = pd.read_csv('dataset_files/iris_Y_train.csv')
iris_x_train = iris_x_train.iloc[0:, ].values
iris_y_train = iris_y_train.iloc[0:, 0].values

iris_x_test = pd.read_csv('dataset_files/iris_X_test.csv')
iris_y_test = pd.read_csv('dataset_files/iris_Y_test.csv')
iris_x_test = iris_x_test.iloc[0:, ].values
iris_y_test = iris_y_test.iloc[0:, 0].values.astype(float)

### OVO(One vs one)
#### Data Preprocessing

In [13]:
iris_y_train_0 = np.where(iris_y_train == 0, 1, -1).astype(float)
iris_y_train_1 = np.where(iris_y_train == 1, 1, -1).astype(float)
iris_y_train_2 = np.where(iris_y_train == 2, 1, -1).astype(float)

#### Train models

In [14]:
svm_soft_0 = SVM_soft()
svm_soft_1 = SVM_soft()
svm_soft_2 = SVM_soft()

svm_soft_0.fit(iris_x_train, iris_y_train_0, 1)
svm_soft_1.fit(iris_x_train, iris_y_train_1, 1)
svm_soft_2.fit(iris_x_train, iris_y_train_2, 1)

     pcost       dcost       gap    pres   dres
 0: -2.8841e+01  2.2664e+02  8e+02  4e+00  4e+01
 1:  1.3817e+02  2.0346e+01  1e+02  2e-01  2e+00
 2:  6.6936e+01  5.0224e+01  2e+01  3e-02  3e-01
 3:  6.0970e+01  5.8252e+01  3e+00  4e-03  4e-02
 4:  6.0100e+01  5.9598e+01  5e-01  6e-04  6e-03
 5:  5.9908e+01  5.9828e+01  9e-02  9e-05  8e-04
 6:  5.9874e+01  5.9871e+01  3e-03  3e-06  3e-05
 7:  5.9873e+01  5.9873e+01  8e-05  5e-08  4e-07
 8:  5.9873e+01  5.9873e+01  2e-06  5e-10  5e-09
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -6.1989e+01  1.9277e+02  8e+02  4e+00  3e+01
 1:  8.4593e+01 -2.1960e+01  1e+02  4e-01  3e+00
 2:  4.4669e+01 -1.1474e+01  7e+01  2e-01  1e+00
 3:  2.0368e+01  6.2209e+00  2e+01  3e-02  3e-01
 4:  1.4985e+01  1.0931e+01  5e+00  9e-03  6e-02
 5:  1.3704e+01  1.2384e+01  1e+00  2e-03  1e-02
 6:  1.3129e+01  1.2901e+01  2e-01  3e-04  2e-03
 7:  1.3037e+01  1.2973e+01  6e-02  1e-15  5e-13
 8:  1.3006e+01  1.3003e+01  2e-03  2e-15  1e-1

(array([-0.04603435,  0.52172246, -1.00316485, -0.46417955]),
 1.4505610996768012)

#### Predict

In [23]:
predict1 = svm_soft_0.predict(iris_x_test)
predict2 = svm_soft_1.predict(iris_x_test)
predict3 = svm_soft_2.predict(iris_x_test)


In [24]:
list0 = []
list1 = []
list2 = []

for i in range(len(predict1)):
    if predict1[i] == 1:
        list0.append(i)
    if predict2[i] == 1:
        list1.append(i)
    if predict3[i] == 1:
        list2.append(i)


result = [2] * len(predict1)

for k in list0:
    result[k] = 0
for k in list1:
    result[k] = 1
for k in list2:
    result[k] = 2

result0 = [0] * len(predict1)

for k in list0:
    result0[k] = 0
for k in list1:
    result0[k] = 1
for k in list2:
    result0[k] = 2

#### Testing

In [25]:
print("if the non-positive are all set to label 0 Accuracy：{:8.6} %".format((result0 == iris_y_test).mean() * 100))
print("if the non-positive are all set to label 1/2 Accuracy：{:8.6} %".format((result == iris_y_test).mean() * 100))

if the non-positive are all set to label 0 Accuracy：   100.0 %
if the non-positive are all set to label 1/2 Accuracy：    82.0 %


### OVR(One vs Rest)
#### Data Preprocessing

In [18]:
iris_y_train_0_1 = list(iris_y_train)
iris_y_train_1_2 = list(iris_y_train)
iris_y_train_0_2 = list(iris_y_train)

iris_x_train_0_1 = list(iris_x_train) 
iris_x_train_1_2 = list(iris_x_train)
iris_x_train_0_2 = list(iris_x_train)

ylist0 = []
ylist1 = []
ylist2 = []

for i in range(len(iris_y_train)):
    if iris_y_train[i] == 2:
        ylist2.append(i)
        
    if iris_y_train[i] == 0:
        ylist0.append(i)
        
    if iris_y_train[i] == 1:
        ylist1.append(i)


ylist0.reverse()
ylist1.reverse()
ylist2.reverse()


for k in ylist2:
    del iris_y_train_0_1[k]
    del iris_x_train_0_1[k]

for k in ylist0:
    del iris_y_train_1_2[k]
    del iris_x_train_1_2[k]
    
for k in ylist1:
    del iris_y_train_0_2[k]
    del iris_x_train_0_2[k]

# 1 is positive and 0 is negative 
iris_x_train_0_1 = np.array(iris_x_train_0_1)
iris_y_train_0_1 = np.array(iris_y_train_0_1)
iris_y_train_0_1 = np.where(iris_y_train_0_1 == 0, -1, 1).astype(float)

# 2 is positive and 1 is negative 
iris_x_train_1_2 = np.array(iris_x_train_1_2)
iris_y_train_1_2 = np.array(iris_y_train_1_2)
iris_y_train_1_2 = np.where(iris_y_train_1_2 == 1, -1, 1).astype(float)

# 2 is positive and 0 is negative 
iris_x_train_0_2 = np.array(iris_x_train_0_2)
iris_y_train_0_2 = np.array(iris_y_train_0_2)
iris_y_train_0_2 = np.where(iris_y_train_0_2 == 0, -1, 1).astype(float)



#### Train models

In [19]:
svm_soft_0_1 = SVM_soft()
svm_soft_1_2 = SVM_soft()
svm_soft_0_2 = SVM_soft()


svm_soft_0_1.fit(iris_x_train_0_1, iris_y_train_0_1, 1)
svm_soft_1_2.fit(iris_x_train_1_2, iris_y_train_1_2, 1)
svm_soft_0_2.fit(iris_x_train_0_2, iris_y_train_0_2, 1)

     pcost       dcost       gap    pres   dres
 0: -4.5715e+01  1.2526e+02  6e+02  4e+00  1e+01
 1:  5.9157e+01 -1.7550e+01  1e+02  4e-01  1e+00
 2:  1.7584e+01  8.3929e+00  1e+01  2e-02  7e-02
 3:  1.4898e+01  1.0949e+01  4e+00  9e-03  3e-02
 4:  1.3801e+01  1.2175e+01  2e+00  3e-03  9e-03
 5:  1.3207e+01  1.2830e+01  4e-01  5e-04  1e-03
 6:  1.3078e+01  1.2943e+01  1e-01  1e-04  4e-04
 7:  1.3017e+01  1.2992e+01  3e-02  1e-15  2e-12
 8:  1.3005e+01  1.3004e+01  6e-04  1e-15  5e-12
 9:  1.3005e+01  1.3005e+01  6e-06  1e-15  2e-11
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -6.2694e+01  8.5896e+01  4e+02  3e+00  3e+01
 1:  2.1707e+01 -1.7735e+01  6e+01  3e-01  2e+00
 2:  2.5018e+00 -9.8669e-01  4e+00  1e-02  1e-01
 3:  3.7026e-01  7.9160e-02  3e-01  1e-04  1e-03
 4:  2.1866e-01  1.5251e-01  7e-02  3e-05  2e-04
 5:  2.2145e-01  1.8613e-01  4e-02  7e-06  6e-05
 6:  2.0596e-01  2.0098e-01  5e-03  8e-07  7e-06
 7:  2.0377e-01  2.0359e-01  2e-04  2e-08  1e-0

(array([-0.04603488,  0.52172272, -1.00316443, -0.46418011]),
 1.4505624971977702)

In [20]:
predict0_1 = svm_soft_0.predict(iris_x_test).astype(int)
predict1_2 = svm_soft_1.predict(iris_x_test).astype(int)
predict0_2 = svm_soft_2.predict(iris_x_test).astype(int)

In [21]:
final = [-1] *  len(predict0_1)
for i in range(len(final)):
    if predict0_1[i] == -1 and predict0_2[i] == -1:
        final[i] = 0
        continue
    if predict0_1[i] == 1 and predict1_2[i] == -1:
        final[i] = 1
        continue
    if predict1_2[i] == 1 and predict1_2[i] == 1:
        final[i] = 2
        continue
    final[i] = 2

In [22]:
print("Testing set Accuracy：{:8.6} %".format((final == iris_y_test).mean() * 100))


Testing set Accuracy：    56.0 %


## Conclusion


1. OVR 
    * Advantage:
       For problems with large number of labels, it requires to train less SVM. It is also relatively easier to implement in the preprocessing period. For small size problem(like this 3 label), it has a good performance
    * Disadvantage:
        Each SVM are treated equally while they may have different credibility. For problems with large number of labels, it may fail to give a satisfying result.

2. OVO 
    * Advantage: For problems with large number of labels, it may give a satisfying result as the voting strategy will be very useful in large scale.
    * Disadvantage: For problems with small number of labels, it may fail to give a satisfying result as the voting strategy will still cause a lot of uncertainty. For problems with large number of labels, it requires to train more SVMs and it is more computationally expensive.