Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support other Minkowski metrics #3

Closed
krlmlr opened this issue Jan 16, 2015 · 20 comments
Closed

Support other Minkowski metrics #3

krlmlr opened this issue Jan 16, 2015 · 20 comments

Comments

@krlmlr
Copy link
Collaborator

krlmlr commented Jan 16, 2015

The ANN docs suggest that the Euclidean metric is used by default. Some digging in the code and in above docs suggest that this is also true for RANN.

According to Section 2.2.1 (p. 14 ff.), only the macros in the code linked above need to be redefined to use other Minkowski norms. To define a code base where Manhattan, Euclidean and "max" metric are simultaneously accessible, this probably needs to be replaced by a template parameter that defines the implementation of the four operators.

The purpose is to accelerate statistical matching using the Gower distance: I think I can transform a "Gower distance problem" to a "Manhattan distance problem" but then RANN should be able to solve it (it is called by the StatMatch package).

Does this make sense to you?

@jefferis
Copy link
Member

Hi Kirill,

That all sounds right. Pull requests welcome.

If you do start working on the distance macros:

https://github.com/jefferis/RANN/blob/master/src/ANN/ANN.h#L329-L363

make sure you add the std:: prefix for pow/fabs to keep CRAN happy.

You may also want to take a look at github.com/jefferis/RANN2 (switch to Rcpp, not released to CRAN) and, for template code + Rcpp, https://github.com/jefferis/nabor.

@krlmlr
Copy link
Collaborator Author

krlmlr commented Jan 16, 2015

Thanks, Gregory. Is there a reason why you don't simply upload RANN2 as RANN to CRAN, especially if the interface is fully backward-compatible?

@jefferis
Copy link
Member

I think the main thing was I hadn't got round to documenting the WANN class (like https://github.com/jefferis/nabor/blob/master/R/WKNNClasses.R). I would also need to go everything with a fine tooth comb to avoid the wrath of BDR on submission and I've been busy!

@krlmlr
Copy link
Collaborator Author

krlmlr commented Jan 19, 2015

I've looked into it, and I've found three options:

  1. Use virtual functions for the three (!) operators involved
    • not sure about the performance penalty
  2. Convert ANN to a header-only library and use templates
    • a lot more work
    • will essentially give a new library
    • will also affect the thin R<->C++ glue
  3. Invent three new packages RANN2, RANN1 and RANNInf that use the old code base where only the definition of the operators changes, and convert the RANN package to a wrapper package for the three other packages
    • not sure what CRAN will think about it
    • this is by far the easiest option in the short term
    • other kNN implementations such as FNN::knn[x].dist could be made accessible through the same interface

Do you have a preference?

@jefferis
Copy link
Member

Thanks for looking onto this. I have to say I would be much keener on a one-package solution but having something that works for your case with minimum fuss is important.

Do you want to submit this functionality to CRAN?

One option would be to keep the existing package for Euclidean distance, but require 2 packages for the other distances - these could be github only initially.

Gregory Jefferis

On 19 Jan 2015, at 09:36, Kirill Müller notifications@github.com wrote:

I've looked into it, and I've found three options:

Use virtual functions for the three (!) operators involved
not sure about the performance penalty
Convert ANN to a header-only library and use templates
a lot more work
will essentially give a new library
will also affect the thin R<->C++ glue
Invent three new packages RANN2, RANN1 and RANNInf that use the old code base where only the definition of the operators changes, and convert the RANN package to a wrapper package for the three other packages
not sure what CRAN will think about it
this is by far the easiest option in the short term
other kNN implementations such as FNN::knn[x].dist could be made accessible through the same interface
Do you have a preference?


Reply to this email directly or view it on GitHub.

@krlmlr
Copy link
Collaborator Author

krlmlr commented Jan 19, 2015

I'd like to have it on CRAN eventually, but this doesn't have to happen yesterday. Also, I'd need RANN1 more than RANNInf. (It is for nearest-neighbor hot-deck imputation based on Gower's distance -- I've found at least two implementations on CRAN that use a naive approach which is too slow for largish data sets. I'm working on packaging the Gower -> Manhattan translation, looks promising but it's far from being complete.)

Your idea sounds great -- keeping RANN as is and forwarding calls that use other metrics to the other package(s). Will RANN still pass CRAN checks if it calls a non-CRAN package internally?

@jefferis
Copy link
Member

I think the normal mechanism would be to make the other package suggested and to fail gracefully when it is missing.

Gregory Jefferis

On 19 Jan 2015, at 13:26, Kirill Müller notifications@github.com wrote:

I'd like to have it on CRAN eventually, but this doesn't have to happen yesterday. Also, I'd need RANN1 more than RANNInf. (It is for nearest-neighbor hot-deck imputation based on Gower's distance -- I've found at least two implementations on CRAN that use a naive approach which is too slow for largish data sets. I'm working on packaging the Gower -> Manhattan translation, looks promising but it's far from being complete.)

Your idea sounds great -- keeping RANN as is and forwarding calls that use other metrics to the other package(s). Will RANN still pass CRAN checks if it calls a non-CRAN package internally?


Reply to this email directly or view it on GitHub.

@krlmlr
Copy link
Collaborator Author

krlmlr commented Jan 19, 2015

I have requested transfer of ownership to you for the brand new RANN1 repository -- I don't want to take credit for your work, but I can also act as maintainer if you prefer. Next would be a pull request that adds a new parameter to RANN::nn2.

@krlmlr
Copy link
Collaborator Author

krlmlr commented Jan 19, 2015

On the other hand, the RANN1 package should probably live in a branch in this repo. Would you like to create a new branch from RANN1's master?

@krlmlr
Copy link
Collaborator Author

krlmlr commented Jan 21, 2015

I still think a branch in this repo for the other package is the best option. If you create the target branch (perhaps master-L1?), I can file a pull request against it.

@jefferis
Copy link
Member

Hi Kirill,

I’m on sabbatical at the moment, so a bit busy. I just gave you access to the repo. Do you want to make the branch and then do a pull request as suggested. thanks, Greg.

On 21 Jan 2015, at 08:26, Kirill Müller notifications@github.com wrote:

I still think a branch in this repo for the other package is the best option. If you create the target branch (perhaps master-L1?), I can file a pull request against it.


Reply to this email directly or view it on GitHub.

Gregory Jefferis, PhD
Division of Neurobiology
MRC Laboratory of Molecular Biology
Francis Crick Avenue
Cambridge Biomedical Campus
Cambridge, CB2 OQH, UK

http://www2.mrc-lmb.cam.ac.uk/group-leaders/h-to-m/g-jefferis
http://jefferislab.org
http://flybrain.stanford.edu

This was referenced Jan 22, 2015
@krlmlr
Copy link
Collaborator Author

krlmlr commented Jan 22, 2015

Hi Gregory

Thanks for giving me access to the repo. The implementation is now contained in #5 and #6 (against master-L1 and master, respectively), #4 is already included in both pull requests but I'm keeping it for reference. Would you like to review and merge?

@jefferis
Copy link
Member

Thanks. I merged the separate unsquare pull. Hope that does not get in the way of the others. I would also have been happy to have the defunct function removal (seems time) in a separate pull but not worth redoing.

For the main pulls, I t looks like you're missing a roxygen run, but I'm sure you've already spotted that.

@krlmlr
Copy link
Collaborator Author

krlmlr commented Apr 13, 2015

Started another round, with a simpler implementation (without the metric argument). If you merge #7 and #11 first, the other pull requests (#9 and #10) will become smaller. Hope it works.

@jefferis
Copy link
Member

OK, @krlmlr I think we're all good on this. Thanks for all your work!

In detail, merged in 6a935ff (closing #7 and also #11) to remove defunct function, then 59e78b4 (replacing #9) to add your new RANN1 package on a branch of the same name, and finally d7dadc5 (replacing #10) to add info about RANN1 package to main RANN README.

@krlmlr
Copy link
Collaborator Author

krlmlr commented Apr 14, 2015

Looks great to me, thanks for your help! Are you releasing to CRAN?

@jefferis
Copy link
Member

Do you mean releasing RANN1 to CRAN or sending the updated RANN (where the changes are essentially cosmetic)?

@krlmlr
Copy link
Collaborator Author

krlmlr commented Apr 14, 2015

I mean RANN1 -- of course RANN can wait as long as the checks pass.

@jefferis
Copy link
Member

OK @krlmlr. Would you mind making yourself maintainer of RANN1 and submitting yourself? I have a v2.5 of RANN on master pending with some minor fixes. You will probably want to merge those changes to get a fix for e.g. a title case violation on 3.2 revealed by build_win().

@jefferis
Copy link
Member

I made that suggestion a bit more specific in #13 – hope that's OK!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants