Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel SVM implementation is wrong (?) #104

Closed
tatnguyennguyen opened this issue Sep 16, 2017 · 3 comments

Comments

@tatnguyennguyen
Copy link

commented Sep 16, 2017

Take a look at the first formula on page 103 of your book. This is the objective function for (batch) kernel SVM. Parameter is vector b and each b_i is dedicated for only one x_i and one y_i alone. So it does not make sense to use the same vector b (with the size equal to the batch size) for all mini-batches as you did in your code because x and y in each mini-batch are different.

Take a look at this http://cs229.stanford.edu/extra-notes/representer-function.pdf. I think the (naive) proper way to implement kernel SVM is to use parameter vector b with the size equal to the size of the whole training set, and in each epoch, update those b_i corresponding to training examples that are chosen at that epoch.

Why your code gives a good result? Because vector b is used for random examples across epoches, so the only reasonable value of b is the one in which all elements are equal (you can verify this by printing value of b after training with many more epoches) In prediction formula for kernel SVM, when all b_i are equal, the formula become a kind of k-Nearest Neighbors (k equal to batch size, and the weight is given by kernel function)

You can get good result without any training, just initialize vector b to all one (line 40)
b = tf.Variable(tf.ones(shape=[1,batch_size]))
and comment out the training step (line 89)
# sess.run(train_step, feed_dict={x_data: rand_x, y_target: rand_y})
Then, run the script, you will get accuracy is above 0.97 and the contour map looks decent

@Doulrs

This comment has been minimized.

Copy link

commented Mar 16, 2018

I also find out this problem.

@nfmcclure

This comment has been minimized.

Copy link
Owner

commented Mar 21, 2018

Hi @tatnguyennguyen , This is super interesting! Thanks for finding this. I'm just now triaging and going through the issues in preparation for a book/code v2.

When I get to the SVM (chapter 4), I will investigate this. I see your point and you are probably right and I think the fix will be to increase the batch size to the data size. Although I'll see if I can edit it for smaller batches first.

@nfmcclure nfmcclure self-assigned this Mar 21, 2018

@nfmcclure

This comment has been minimized.

Copy link
Owner

commented Apr 9, 2018

Yes, I find the fix to be to make the batch_size equal to the size of the training dataset.

I think the long run fix would be, as you suggest, to find which indices are selected for the training and only update those in the b-matrix. But for now, making the batch_size equal to the dataset size is sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.