Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking Scripts for Mlpack's LMNN, Shogun's LMNN & Matlab's LMNN #123

Open
wants to merge 15 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@manish7294
Copy link
Member

manish7294 commented Jun 15, 2018

Benchmarking scripts for Mlpack, Shogun & Matlab's LMNN.

manish7294 added some commits Jun 15, 2018

raise Exception("unknown parameters")

# Perform LMNN.
prep = ShogunLMNN(feat, labels, k)

This comment has been minimized.

@zoq

zoq Jun 18, 2018

Member

Let's use self.k here.

This comment has been minimized.

@manish7294

manish7294 Jun 18, 2018

Author Member

Right, I again missed pushing the changes :)

@vigsterkr

This comment has been minimized.

Copy link

vigsterkr commented Jun 18, 2018

@iglesias @manish7294 has some question about LMNN implementation in shogun :)

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jun 18, 2018

@iglesias Hi! Hope you are doing good :)
I was doing benchmarking of LMNN on both Mlpack and Shogun. I am using shogun's KNN for measuring the accuracy metric on transformed data. But here comes the problem, if I used query set same as the training set then the KNN consider the query point itself as the first nearest neighbor(giving 100% accuracy for k =1) but it's not same as what Mlpack's KNN do, Mlpack's KNN avoid the situation of same point being the nearest neighbor and hence in this case Shogun's KNN second nearest neighbor is Mlpack's KNN first nearest neighbor.
So, can you think of a way so that both the libraries are on the same page for benchmarking purpose? Or, Is there any way to avoid the point itself being consider as first nearest neighbor in Shogun. Any kind of help will be appreciated :)

@iglesias

This comment has been minimized.

Copy link

iglesias commented Jun 19, 2018

Hi @manish7294. So this is about KNN.

What about the following: in mlpack use KNN with k=k' and in Shogun KNN with k=k'+1. Then you could compare the nearest neighbors given by mlpack with the 2nd to (k+1)th nearest neighbors given by Shogun.

Does this make sense to you?

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jun 19, 2018

@iglesias Thanks for looking into this. We thought about that but it would create problem in calculating predictions. As in case of Shogun's KNN prediciton, the point itself will always be counted as the positive nearest neighbor and will definitely give some amount of error in the measuring accuracy.
Let me take an example: Let's say have 2 classes we want to have k = 2, so here we will be taking Mlpack's k as k = 2 and Shogun's k as k = 3. And suppose we got the first neighbor as of class 1 and second of class 2. So, ideally(as per what we do in Mlpack's KNN) we should be predicting point as of class 1. But here earlier we took Shogun's k as k as k = 3 and suppose in this case the point itself has label 2 as it will always be counted in the prediction), giving predicition as label 2 which would be different from Mlpack's KNN.
On a second thought, If I assume that Shogun has a distance weighted KNN(Not sure) running behind the scenes, then the output will always be the labels of the point itself, no matter what k we choose.

@rcurtin

This comment has been minimized.

Copy link
Member

rcurtin commented Jun 19, 2018

Would it be possible to just get the k+1 nearest neighbors from Shogun itself? If that is the case, then we can do the classification ourselves (and ensure that we are always doing it the same way). If we can't get those from Shogun, maybe it is better to use a different technique for the nearest neighbor search. I don't think it makes such a huge difference either way, since we aren't timing the nearest neighbor search, we are just using it for finding the true nearest neighbors to calculate a metric.

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jun 19, 2018

Thanks all for helping out. Hopefully, the new custom KNN accuracy predictor will work as expected :)

count += 1

accuracy = (count / NN.shape[1]) * 100
return [time, accuracy]

This comment has been minimized.

@rcurtin

rcurtin Jun 19, 2018

Member

This looks correct to me. But I think maybe it would be useful to split this into a separate function that we could put into methods/metrics/? It seems to me like we could calculate a few metrics using variations of this code (but that code could all be one function):

  • 1-NN accuracy
  • 3-NN accuracy
  • distance-weighted 3-NN accuracy
  • 5-NN accuracy
  • distance-weighted 5-NN accuracy

And the input to the function could be the nearest neighbor indices and distances and the labels, plus the number of neighbors to use in the metric calculation and whether or not to use distance calculation. Does that make sense? What do you think?

This comment has been minimized.

@manish7294

manish7294 Jun 19, 2018

Author Member

This seems about right. But one thing bugging me --- You said we can calculate 3-NN, 5-NN from this code, which is possible but what significance would 5-NN have if we train our lmnn let's say for k = 3.

This comment has been minimized.

@rcurtin

rcurtin Jun 19, 2018

Member

Right, so LMNN is optimizing for k=3 if you run it with k=3, but it may still improve the 5-NN performance. Given that it would be so easy to add, I don't see any issue with doing that, but you are right, that may not be the most interesting metric to compare. Still it is not a problem to add too many metrics. :)

@manish7294

This comment has been minimized.

Copy link
Member Author

manish7294 commented Jun 19, 2018

No problem then, I will add it for sure :)

manish7294 added some commits Jun 20, 2018

@manish7294 manish7294 changed the title Benchmarking Scripts for mlpack_lmnn & shogun_lmnn Benchmarking Scripts for Mlpack's LMNN, Shogun's LMNN & Matlab's LMNN Jun 21, 2018

manish7294 added some commits Jun 21, 2018

@rcurtin

This comment has been minimized.

Copy link
Member

rcurtin commented Jun 27, 2018

Everything looks good to me, but the build will fail until LMNN is part of the newest release of mlpack. So we can wait to merge until then, and we can be sure to release a newer version of mlpack soon-ish (before the end of the summer) with the LMNN support. For now, you can use the branch to run timing tests.

@zoq

This comment has been minimized.

Copy link
Member

zoq commented Jun 28, 2018

Do you think it would be useful to include mlpack-master as another library, that would allow us to merge this earlier.

@rcurtin

This comment has been minimized.

Copy link
Member

rcurtin commented Jul 9, 2018

@mlpack-jenkins test this please

2 similar comments
@rcurtin

This comment has been minimized.

Copy link
Member

rcurtin commented Jul 9, 2018

@mlpack-jenkins test this please

@rcurtin

This comment has been minimized.

Copy link
Member

rcurtin commented Jul 9, 2018

@mlpack-jenkins test this please

@mlpack-jenkins

This comment has been minimized.

Copy link
Collaborator

mlpack-jenkins commented Jul 9, 2018

Can one of the admins verify this patch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.