-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify and simplify UI? #6
Comments
I'm not against this in principle, but it seems premature to do a lot of this at this point in the life cycle of the package; it's nowhere near close to finished. One thing I really want to do with the package to bring it closer to pynndescent in usefulness is to add a new method: a forest of random projection trees. Eventually, this will be the default initialization method for nearest neighbor descent, but can be used standalone. This is going to complicate the interface to any single-function API substantially. My other big concern with this is that it reduces copy-and-paste at the cost of increasing abstraction. You can take this for what it's worth (probably not much): in my career I have often come to bitterly regret too much abstraction (almost always introduced by me), but copy-and-paste has only ever caused me mild annoyance. It's possible to have too much of either and I understand if the current state of the package goes too far in one direction for your taste. Here are some more specific thoughts on these proposals.
Sorry to be negative. I appreciate the thought and attention you have given this and I hope my response isn't too demotivating. I think it's unlikely I would want to merge any PRs about 3) or 4) at the moment but for 2) your proposed cleanup for brute force knn is a lot simpler. If you can get a PR working, I would certainly take a look at it favorably. The random neighbors interface might also be fixable. I think merging the nearest neighbor descent routines would be a lot more challenging, so they might not be a good idea in one PR. For 1) if I can understand the problem it's solving that is also something I would be interested in learning more about. |
Ha ha. Interesting. I didn't expect such a strong opinion on 3 :) To me 3 feels very natural. One commonly structures UI by purpose and returned output. If two functions have same input, similar parameter set and same output, and differ just in an algorithm, it should be one function IMO. I think it's a very common practice in R world.
In fact that's my main motivation for proposing all 4 of them. When I come to a package I usually look at the index to have a feeling what is there. And with rnndescend I was quite confused (and still am, to be honest). There seem to be a lot of functionality with a lot of documentation, but most of it is redundant. With multiple documents with the roughly same doc, I find myself doing a lot of skimming as the docs generally look familiar, but there might be an important difference which is easy to miss. If there would be only one or two entry points in the entire package, the user remembers those very easily and always comes to the same doc, again and again, thus fully understanding the system from one place.
The main disadvantage is tab completion I agree. Extra parameter in the doc is not an issue in my experience. I was personally never bothered by it. But I was never bothered by extra unused arguments in the tab completion either :)
True, but it's still easier for them to remember one entry point and come back to it whatever they want. It's easier to remember that there is a "method" parameter with whatever possible N values than dealing with N differently named functions.
I agree that it's annoing but it's so common nowadays that I always put initialization in a By far more annoying is the inconsistency in naming. I really had a lot of trouble with remembering nnd_ prefix for that one function. Both when typing the function name or when searching for the docs. I think, a cryptic prefix like that should be either used for all functions in a package or just drop it.
Hmm, my feeling is that such distinctions are artificial. I don't see why querying from self, should be different functionally from querying from a reference table. For instance, don't
Arh, no problem. It's your package and you should do whatever you see fit. I just recognize that this package has a great potential. Clearly a high quality stuff going on in here. So I just wanted to share my experience as a starter which wasn't quite smooth, but still great!
Ok. Let me give it a try and be back with you. |
I actually just had a counter use case for this. I am developing a system which currently relies on a brute force on a dev set. It might still stay brute force in the production eventually, but I don't know as yet. I want to be able to quickly switch between the two algorithms etc. More generally, if a dependent package or a production system wants to keep the flexibility of the algorithm, they have to implement an ugly if else loop which would dispatch on a different function name. Instead, if a parameter is used in nnd functions, then the same parameter can be kept in dependent functions or as a global configuration option. |
I think I understand a bit better now and I think we probably just have some different expectations around what the package does (at the moment). Goal 1: a package to present functions for doing (approximate) nearest neighbor search by a specific method. Currently, that's pure nearest-neighbor descent. The brute force method is there to help with benchmarking and comparisons. The task I want to solve with this package is "I would like to generate some nearest neighbors using NND", not "I want to accurately generate nearest neighbors". Therefore it makes sense to me offer specific functions for each task. This is the current aim of the package. Goal 2: a package to provide a user-friendly interface to generate nearest neighbors, where the implementation method is a simple detail. This is not currently a goal of the package. This might seem user-hostile but I just haven't given it any though (also my personal long-term reason to build this package is for use by another method, which would hide all these unpleasant details). Basically, if you are using this package it's because you are interested in nearest neighbor descent, not generating nearest neighbors per se. I understand that the audience for such a package is basically me. That's ok for now: that's why the version number isn't even Goal 2 is worthy. And in that case having a single, simply-named function would be very useful. But I think if that gets added now it will break a lot and be more trouble to maintain than it's worth.
Could this be solved by a package documentation or vignettes?
I do not at all share your confidence about that!
It doesn't seem that cryptic to me, it stands for "nearest neighbor descent", which is specifically what that function does.
No. You can't query with nearest neighbor descent, the I am quite sure that the distinctions between building and querying here are not artificial. It is tempting to assume that the concepts behind the brute force code can be extrapolated to other methods, and I have made that mistake myself several times while developing the package. To understand why it gets more complex you need to know a bit of technical detail about how nearest neighbor descent works. That underlines my point about not attempting to hide these differences behind a single This discussion does raise the issue that it might make sense to change |
I don't think so. Vignette would be useful to get more into technical details of the algo (which I admit I haven't done so far). The part which is trouble is the naming conventions and that logically related functionality is in different functions and documentation pages, but that's all and neither is a big deal.
I see. Thanks for clarifying. |
Currently every rnn method has its own function and there is a distinction between query and build. This results in quite a lot of copy paste code lying around.
What would you think about the following unification in the increased order of "density"?
query
argument.knn_query
andknn
withmethod =
argument which can be one ofrandom
,bruteforce
ornnd
.knn_query
andknn
into one function with two arguments,query
andreference = NULL
. If reference is NULL, it is assumed to bequery
.To illustrate bullet 2, the entire
rnn_bruteforce.cpp
can be probably boiled down to something like this:The text was updated successfully, but these errors were encountered: