# Learning From Data - Homework 8
## Ognen Nastov

![](hw8_images/hw8p1.png)

**Answer:**

Primal form: 

minimize: 

$$\frac{1}{2}w^Tw$$

subject to:

$$y_n(w^T x_n + b) ≥ 1$$

$w$ is $d$-dimensional and $b$ is a scalar.

$b$ affects the value of $w$ and the objective function through the constraint.

It is a quadratic programming problem with $d+1$ variables.

Answer is [d].

---

![](hw8_images/hw8p2a.png)
![](hw8_images/hw8p2b.png)

**Answer:**

Consider digits 0, 2, 4, 6, and 8 vs. all.

In [1]:
import numpy as np
import time
from sklearn import svm

In [2]:
# SVM with soft margin
# problems 2-6 - polynomial kernels

# read handwritten digits input and test sets
# format of each row is: digit intensity symmetry 
# d = 2
# return X,Y where X = [[x1, ...]] and Y = [y1, ...]
def read_training_set():
    S = np.loadtxt("http://www.amlbook.com/data/zip/features.train")
    return S[:,1:3], S[:,0]

def read_test_set():
    S = np.loadtxt("http://www.amlbook.com/data/zip/features.test")
    return S[:,1:3], S[:,0]

In [3]:
# make Y for one-vs-all classification
# Y is +1 for selected digit, and -1 for the rest of the digits
def make_Y_ova(digit, Y):
    Y_ova = (Y == digit)*(+1.0) + (Y != digit)*(-1.0)
    return Y_ova

# SVM with soft margin
# polynomial kernel
# use sklearn package
def svm_soft_poly(Q, C):
    clf = svm.SVC(C=C, kernel='poly', degree=Q, gamma=1.0, coef0=1.0)
    return clf

# one-vs-all
def problems_2_to_4(X, Y, digit):
    N = np.size(Y)
    clf = svm_soft_poly(2.0, 0.01)
    Y_ova = make_Y_ova(digit, Y)
    clf.fit(X, Y_ova)
    Y_model = clf.predict(X)
    E_in = np.count_nonzero(Y_ova != Y_model) / N
    num_support_vectors = np.size(clf.support_)
    print(f"E_in = {E_in}, # support vectors = {num_support_vectors}")
    return E_in, num_support_vectors

In [4]:
X,Y = read_training_set()

In [5]:
problems_2_to_4(X,Y,0)

E_in = 0.10588396653408312, # support vectors = 2179


(0.10588396653408312, 2179)

In [6]:
problems_2_to_4(X,Y,2)

E_in = 0.10026059525442327, # support vectors = 1970


(0.10026059525442327, 1970)

In [7]:
problems_2_to_4(X,Y,4)

E_in = 0.08942531888629818, # support vectors = 1856


(0.08942531888629818, 1856)

In [8]:
problems_2_to_4(X,Y,6)

E_in = 0.09107118365107666, # support vectors = 1893


(0.09107118365107666, 1893)

In [9]:
problems_2_to_4(X,Y,8)

E_in = 0.07433822520916199, # support vectors = 1776


(0.07433822520916199, 1776)

Highest `E_in` for digit 0. Answer is [a].

---

![](hw8_images/hw8p3a.png)
![](hw8_images/hw8p3b.png)

**Answer:**

Consider digits 1, 3, 5, 7, and 9 vs. all.

In [10]:
problems_2_to_4(X,Y,1)

E_in = 0.014401316691811822, # support vectors = 386


(0.014401316691811822, 386)

In [11]:
problems_2_to_4(X,Y,3)

E_in = 0.09024825126868742, # support vectors = 1950


(0.09024825126868742, 1950)

In [12]:
problems_2_to_4(X,Y,5)

E_in = 0.07625840076807022, # support vectors = 1585


(0.07625840076807022, 1585)

In [13]:
problems_2_to_4(X,Y,7)

E_in = 0.08846523110684405, # support vectors = 1704


(0.08846523110684405, 1704)

In [14]:
problems_2_to_4(X,Y,9)

E_in = 0.08832807570977919, # support vectors = 1978


(0.08832807570977919, 1978)

Lowest `E_in` for digit 1. Answer is [a].

---

![](hw8_images/hw8p4.png)

**Answer:**

- digit 0 classifier has 2179 support vectors.
- digit 1 classifier has 386 support vectors.

Difference is 1793. Answer is [c].

---

![](hw8_images/hw8p5.png)

**Answer:**

In [15]:
def make_X_Y_ovo(digit_1, digit_2, X, Y):
    Y_ovo = (Y == digit_1)*(+1.0) + (Y == digit_2)*(-1.0) + \
    np.all(np.array([(Y != digit_1) , (Y != digit_2)]))*(0.0)
    indices_to_be_deleted = np.nonzero(Y_ovo == 0.0)
    Y_ovo_trunc = np.delete(Y_ovo, indices_to_be_deleted)
    X_ovo_trunc = np.delete(X, indices_to_be_deleted, axis=0)
    return X_ovo_trunc, Y_ovo_trunc

In [16]:
#one-vs-one
def problems_5_and_6(X, Y, X_test, Y_test, digit_1, digit_2, Q, C):
     X_ovo_trunc, Y_ovo_trunc = make_X_Y_ovo(digit_1, digit_2, X, Y)
     X_test_ovo_trunc, Y_test_ovo_trunc = \
         make_X_Y_ovo(digit_1, digit_2, X_test, Y_test)
     N_trunc = np.size(Y_ovo_trunc)
     N_test_trunc = np.size(Y_test_ovo_trunc)
     clf = svm_soft_poly(Q, C)
     clf.fit(X_ovo_trunc, Y_ovo_trunc)
     Y_model_trunc = clf.predict(X_ovo_trunc)
     E_in = np.count_nonzero(Y_ovo_trunc != Y_model_trunc) / N_trunc
     Y_test_model_trunc = clf.predict(X_test_ovo_trunc)
     E_out = np.count_nonzero(Y_test_ovo_trunc != \
                              Y_test_model_trunc) / N_test_trunc
     num_support_vectors = np.size(clf.support_)
     print(f"E_in = {E_in}, E_out = {E_out}, \
# support vectors = {num_support_vectors}")
     return E_in, E_out, num_support_vectors

In [17]:
X_test,Y_test = read_test_set()

In [18]:
problems_5_and_6(X, Y, X_test, Y_test, 1, 5, 2, 0.001)

E_in = 0.004484304932735426, E_out = 0.01650943396226415, # support vectors = 76


(0.004484304932735426, 0.01650943396226415, 76)

In [19]:
problems_5_and_6(X, Y, X_test, Y_test, 1, 5, 2, 0.01)

E_in = 0.004484304932735426, E_out = 0.018867924528301886, # support vectors = 34


(0.004484304932735426, 0.018867924528301886, 34)

In [20]:
problems_5_and_6(X, Y, X_test, Y_test, 1, 5, 2, 0.1)

E_in = 0.004484304932735426, E_out = 0.018867924528301886, # support vectors = 24


(0.004484304932735426, 0.018867924528301886, 24)

In [21]:
problems_5_and_6(X, Y, X_test, Y_test, 1, 5, 2, 1.0)

E_in = 0.0032030749519538757, E_out = 0.018867924528301886, # support vectors = 24


(0.0032030749519538757, 0.018867924528301886, 24)

- The number of support vectors goes down for `C = 0.001`, `0.01`, and `0.1`.
- The number of support vectors is the same for `C = 0.1` and `1.0`.

Thus the number of support vectors does not go down strictly so.

The number of support vectors does not go up when `C` goes up.

`E_out` goes down when `C` goes up from `0.001` to `0.01`, but then it stays the same for `0.1` and `1.0`.

The lowest `E_in` is attained for `C = 1.0`, i.e. for the maximum `C`.

Thus, the answer is [d].

---

![](hw8_images/hw8p6.png)

**Answer:**

In [22]:
problems_5_and_6(X, Y, X_test, Y_test, 1, 5, Q=2, C=0.0001)

E_in = 0.008968609865470852, E_out = 0.01650943396226415, # support vectors = 236


(0.008968609865470852, 0.01650943396226415, 236)

In [23]:
problems_5_and_6(X, Y, X_test, Y_test, 1, 5, Q=5, C=0.0001)

E_in = 0.004484304932735426, E_out = 0.018867924528301886, # support vectors = 26


(0.004484304932735426, 0.018867924528301886, 26)

=> When `C = 0.0001`, `E_in` is NOT higher at `Q = 5`.

In [24]:
problems_5_and_6(X, Y, X_test, Y_test, 1, 5, Q=2, C=0.001)

E_in = 0.004484304932735426, E_out = 0.01650943396226415, # support vectors = 76


(0.004484304932735426, 0.01650943396226415, 76)

In [25]:
problems_5_and_6(X, Y, X_test, Y_test, 1, 5, Q=5, C=0.001)

E_in = 0.004484304932735426, E_out = 0.02122641509433962, # support vectors = 25


(0.004484304932735426, 0.02122641509433962, 25)

=> When `C = 0.001`, the number of support vectors is lower at `Q = 5`.

In [26]:
problems_5_and_6(X, Y, X_test, Y_test, 1, 5, Q=2, C=0.01)

E_in = 0.004484304932735426, E_out = 0.018867924528301886, # support vectors = 34


(0.004484304932735426, 0.018867924528301886, 34)

In [27]:
problems_5_and_6(X, Y, X_test, Y_test, 1, 5, Q=5, C=0.01)

E_in = 0.003843689942344651, E_out = 0.02122641509433962, # support vectors = 23


(0.003843689942344651, 0.02122641509433962, 23)

=> When `C = 0.01`, `E_in` is NOT higher at `Q = 5`.

In [28]:
problems_5_and_6(X, Y, X_test, Y_test, 1, 5, Q=2, C=1.0)

E_in = 0.0032030749519538757, E_out = 0.018867924528301886, # support vectors = 24


(0.0032030749519538757, 0.018867924528301886, 24)

In [29]:
problems_5_and_6(X, Y, X_test, Y_test, 1, 5, Q=5, C=1.0)

E_in = 0.0032030749519538757, E_out = 0.02122641509433962, # support vectors = 21


(0.0032030749519538757, 0.02122641509433962, 21)

=> When `C = 1`, `E_out` is NOT lower at `Q = 5`.

Thus, the answer is [b].

---

![](hw8_images/hw8p7a.png)
![](hw8_images/hw8p7b.png)

**Answer:**

In [30]:
# Cross Validation
# problems 7-8

# discard digits not needed, and split X and Y
def split_X_Y(S, num_folds, digit_1, digit_2):
    X = S[:,1:3]
    Y = S[:,0]
    X_ovo_trunc, Y_ovo_trunc = make_X_Y_ovo(digit_1, digit_2, X, Y)
    X_split = np.array_split(X_ovo_trunc, num_folds)
    Y_split = np.array_split(Y_ovo_trunc, num_folds)
    return (X_split, Y_split)

In [31]:
# training set = concatenate num_folds-1 subsets 
# validation set = remaining (i_val-th) fold
def make_train_val_sets(X_split, Y_split, i_val):
    num_cols_X_split = 2
    X_val = X_split[i_val]
    Y_val = Y_split[i_val]
    num_folds = np.size(X_split)
    i = 0
    X_train = np.empty((0, num_cols_X_split))
    Y_train = np.empty(0)
    while True:
        if i != i_val:
            X_train = np.concatenate((X_train, X_split[i]))
            Y_train = np.concatenate((Y_train, Y_split[i]))
        i += 1
        if i == num_folds:
            break
    return (X_train, Y_train, X_val, Y_val)

In [32]:
# one-vs-one using cross-validation
def ovo_cv(S, num_folds, digit_1, digit_2, Q, C): 
    X_split, Y_split = split_X_Y(S, num_folds, digit_1, digit_2)
    i = 0
    E_cv_array = np.empty(num_folds)
    clf = svm_soft_poly(Q, C)
    while True:
        X_train, Y_train, X_val, Y_val = \
            make_train_val_sets(X_split, Y_split, i)
        clf.fit(X_train, Y_train)
        Y_val_model = clf.predict(X_val)
        N = np.size(Y_val)
        E_cv_array[i] = np.count_nonzero(Y_val != Y_val_model) / N
        i += 1
        if (i == num_folds):
            break
    return np.mean(E_cv_array)

In [33]:
# order C_array from smallest C to largest
# np.argmin() will return the index of the first occurence of the min value
# thus it will correspond to the smallest C in case of a tie
def problems_7_and_8(num_folds, digit_1, digit_2, Q, C_array, num_runs):
    time_start = time.time()
    S = np.loadtxt("http://www.amlbook.com/data/zip/features.train")
    N_C = np.size(C_array)
    E_cv = np.empty(N_C)
    E_cv_all = np.empty((num_runs, N_C))
    i_C_best_array = np.empty(num_runs)
    i = 0 # tracks runs
    while True:
        np.random.shuffle(S)
        i_C = 0
        for C in C_array:
            E_cv[i_C] = ovo_cv(S, num_folds, digit_1, digit_2, Q, C)
            i_C += 1
        E_cv_all[i, :] = E_cv
        # index of smallest E_cv
        i_C_best = np.argmin(E_cv)
        # collect indices in an array, they point to the selected C
        i_C_best_array[i] = i_C_best
        i += 1
        if i == num_runs:
            break
    # count number of occurences of each index of C_array
    num_occur_array = np.empty(N_C)
    for i in range(0, N_C):
        num_occur_array[i] = np.count_nonzero(i_C_best_array == i)
    index_max = np.argmax(num_occur_array)
    print(f"Most often selected ({num_occur_array[index_max]} times) \
is C = {C_array[index_max]}")
    # average E_cv for the selected C
    E_cv_sel_avg = np.mean(E_cv_all[:, index_max])
    print(f"Average winning selection E_cv = {E_cv_sel_avg}")
    time_end = time.time()
    print(f"Run time = {(time_end - time_start):3.1f} seconds.")
    return num_occur_array

In [34]:
C_array = np.array([0.0001, 0.001, 0.01, 0.1, 1.0])

In [35]:
problems_7_and_8(10, 1, 5, 2, C_array, 100)

Most often selected (39.0 times) is C = 0.001
Average winning selection E_cv = 0.004849706026457618
Run time = 22.1 seconds.


array([ 0., 39., 35., 14., 12.])

In each run, use the smallest `E_cv` to select the `C`.

After `100` runs, count how many times each `C` was selected.

Most often selected was `C = 0.001`.

The answer is [b].

---

![](hw8_images/hw8p8.png)

**Answer:**

We got average `E_cv = 0.0048` for the winning selection in problem 7.

The answer is [c].

---

![](hw8_images/hw8p9.png)

**Answer:**

In [36]:
# RBF Kernel
# problems 9 and 10
    
# SVM with soft margin
# RBF kernel
# use sklearn package
def svm_soft_rbf(C):
    clf = svm.SVC(C=C, kernel='rbf', gamma=1.0)
    return clf

In [37]:
# one-vs-one
# fn almost identical to problems_5_and_6(), except for RBF kernel used here   
def problems_9_and_10(X, Y, X_test, Y_test, digit_1, digit_2, C):
     X_ovo_trunc, Y_ovo_trunc = make_X_Y_ovo(digit_1, digit_2, X, Y)
     X_test_ovo_trunc, Y_test_ovo_trunc = \
         make_X_Y_ovo(digit_1, digit_2, X_test, Y_test)
     N_trunc = np.size(Y_ovo_trunc)
     N_test_trunc = np.size(Y_test_ovo_trunc)
     clf = svm_soft_rbf(C)
     clf.fit(X_ovo_trunc, Y_ovo_trunc)
     Y_model_trunc = clf.predict(X_ovo_trunc)
     E_in = np.count_nonzero(Y_ovo_trunc != Y_model_trunc) / N_trunc
     Y_test_model_trunc = clf.predict(X_test_ovo_trunc)
     E_out = np.count_nonzero(Y_test_ovo_trunc != \
                              Y_test_model_trunc) / N_test_trunc
     num_support_vectors = np.size(clf.support_)
     print(f"E_in = {E_in}, E_out = {E_out}, \
# support vectors = {num_support_vectors}")
     return E_in, E_out, num_support_vectors

In [38]:
problems_9_and_10(X, Y, X_test, Y_test, 1, 5, C=0.01)

E_in = 0.003843689942344651, E_out = 0.02358490566037736, # support vectors = 406


(0.003843689942344651, 0.02358490566037736, 406)

In [39]:
problems_9_and_10(X, Y, X_test, Y_test, 1, 5, C=1.0)

E_in = 0.004484304932735426, E_out = 0.02122641509433962, # support vectors = 31


(0.004484304932735426, 0.02122641509433962, 31)

In [40]:
problems_9_and_10(X, Y, X_test, Y_test, 1, 5, C=100.0)

E_in = 0.0032030749519538757, E_out = 0.018867924528301886, # support vectors = 22


(0.0032030749519538757, 0.018867924528301886, 22)

In [41]:
problems_9_and_10(X, Y, X_test, Y_test, 1, 5, C=1e4)

E_in = 0.0025624599615631004, E_out = 0.02358490566037736, # support vectors = 19


(0.0025624599615631004, 0.02358490566037736, 19)

In [42]:
problems_9_and_10(X, Y, X_test, Y_test, 1, 5, C=1e6)

E_in = 0.0006406149903907751, E_out = 0.02358490566037736, # support vectors = 17


(0.0006406149903907751, 0.02358490566037736, 17)

Lowest `E_in` achieved for `C = 1e6`.

Answer is [e].

---

![](hw8_images/hw8p10.png)

**Answer:**

Looking at the runs from problem 9, lowest `E_out` achieved for `C = 100`.

Answer is [c].

---