Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Kernel SVM implementation is wrong (?) #104
Take a look at the first formula on page 103 of your book. This is the objective function for (batch) kernel SVM. Parameter is vector b and each b_i is dedicated for only one x_i and one y_i alone. So it does not make sense to use the same vector b (with the size equal to the batch size) for all mini-batches as you did in your code because x and y in each mini-batch are different.
Take a look at this http://cs229.stanford.edu/extra-notes/representer-function.pdf. I think the (naive) proper way to implement kernel SVM is to use parameter vector b with the size equal to the size of the whole training set, and in each epoch, update those b_i corresponding to training examples that are chosen at that epoch.
Why your code gives a good result? Because vector b is used for random examples across epoches, so the only reasonable value of b is the one in which all elements are equal (you can verify this by printing value of b after training with many more epoches) In prediction formula for kernel SVM, when all b_i are equal, the formula become a kind of k-Nearest Neighbors (k equal to batch size, and the weight is given by kernel function)
You can get good result without any training, just initialize vector b to all one (line 40)
Hi @tatnguyennguyen , This is super interesting! Thanks for finding this. I'm just now triaging and going through the issues in preparation for a book/code v2.
When I get to the SVM (chapter 4), I will investigate this. I see your point and you are probably right and I think the fix will be to increase the batch size to the data size. Although I'll see if I can edit it for smaller batches first.
Yes, I find the fix to be to make the batch_size equal to the size of the training dataset.
I think the long run fix would be, as you suggest, to find which indices are selected for the training and only update those in the b-matrix. But for now, making the batch_size equal to the dataset size is sufficient.