Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximum Inner-Product Search using Tree Data-structures in MLPACK ? #427

Closed
se4u opened this issue Mar 30, 2015 · 5 comments
Closed

Maximum Inner-Product Search using Tree Data-structures in MLPACK ? #427

se4u opened this issue Mar 30, 2015 · 5 comments

Comments

@se4u
Copy link

se4u commented Mar 30, 2015

Hi, I am a CS graduate student at JHU and I came across the paper "Maximum Inner-Product Search using Tree Data-structures", by Parikshit Ram and Alexander Gray while reading "Speeding Up the Xbox Recommender System..." by Pacquet et. al. who cite it as being a good algorithm for doing NN queries over high dimensional sparse data. According to the paper by Parikshit the cone-tree based NN search should be in MLPACK but I couldn't find it in the source or documentation. I just wanted to know if there was a stable release of the code of that paper in some other branch or repo somewhere ?

Thanks.

@rcurtin
Copy link
Member

rcurtin commented Mar 30, 2015

Hi there,

The exact MIPS algorithm with the cone tree actually isn't in mlpack, so the paper is incorrect. However, I bet Parikshit has an implementation somewhere, which could be polished and added to mlpack (you can email him: p.ram@gatech.edu; whatever code he gives you, I can help you make it compile and work, and possibly just adapt and add it to mlpack so the paper is correct).

On the other hand, what we do have implemented is FastMKS, which solves inner-product search in a more general sense (http://ratml.org/pub/pdf/2014fastmks.pdf). Here's a tutorial on how to use it in mlpack: http://mlpack.org/doxygen.php?doc=fmkstutorial.html . Perhaps that would be useful to you too? I don't know how it will perform for nearest neighbor---our FastMKS implementation doesn't (currently) use cone trees.

@se4u
Copy link
Author

se4u commented Mar 30, 2015

Hi, thanks for the response,
This might be a dumb question but does FastMKS gracefully handle sparse
vectors ? I saw in your history that you rely on Armadillo for sparse
matrix handling, does the FastMKS implementation work with that ?
On Mar 30, 2015 12:30 AM, "Ryan Curtin" notifications@github.com wrote:

Hi there,

The exact MIPS algorithm with the cone tree actually isn't in mlpack, so
the paper is incorrect. However, I bet Parikshit has an implementation
somewhere, which could be polished and added to mlpack (you can email him:
p.ram@gatech.edu; whatever code he gives you, I can help you make it
compile and work, and possibly just adapt and add it to mlpack so the paper
is correct).

On the other hand, what we do have implemented is FastMKS, which solves
inner-product search in a more general sense (
http://ratml.org/pub/pdf/2014fastmks.pdf). Here's a tutorial on how to
use it in mlpack: http://mlpack.org/doxygen.php?doc=fmkstutorial.html .
Perhaps that would be useful to you too? I don't know how it will perform
for nearest neighbor---our FastMKS implementation doesn't (currently) use
cone trees.


Reply to this email directly or view it on GitHub
#427 (comment).

@rcurtin
Copy link
Member

rcurtin commented Apr 5, 2015

Sorry for the slow response; it took a little while to do some refactoring (90a9d93 to 81fe638). Anyway, now you should be able to do FastMKS with sparse datasets (in the git master branch, at least). You'll have to do a little bit of type wrangling to get it to work right, since it's through template parameters. Take a look at the examples in src/mlpack/tests/fastmks_test.cpp to see how to use FastMKS with sparse datasets. Anyway, I hope this is helpful -- if you have any problems, please feel free to ask for help! :)

@rcurtin rcurtin closed this as completed Apr 5, 2015
@rcurtin rcurtin added this to the mlpack 1.1.0 milestone Apr 5, 2015
@se4u
Copy link
Author

se4u commented Apr 5, 2015

Wow, thanks.

@rcurtin
Copy link
Member

rcurtin commented Apr 5, 2015

Hmm, I should also add, I don't know how fast it will actually perform in practice, or how useful it will be. Trees on high-dimensional sparse data don't generally work very well, although in this case we are using the cover tree, whose runtime depends instead on a measure of intrinsic dimensionality, so, maybe you can get away with using sparse data here (if it's intrinsically low-dimensional). Either way, another consideration is that arma::sp_mat, in practice, can be slower than arma::mat unless the dimension of the sparse data is sufficiently high, and the sparsity level is high enough. Exactly what "sufficiently high" and "high enough" mean are a bit hard to pin down... :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants