-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate to methods of Index #15
Conversation
Should also be a member method.
Need an additional trait constraint on the `query` function for now.
@marcus-pousette, I think this is more or less there. Later I think it would be nice to spend a bit of time improving the API documentation but for the moment I think this is good enough. One thing that bothers me a bit is that |
Now that query() is a method of Index it makes more sense to have the query helper functions in the index module. Keep the query module just for tests. Some improvements to the handling of test utility functions and flatten the score module hierarchy for better documentation output.
Thanks for the PR! @tmpfs The only thing I am considering is that, if it is necessary to have "everything" is in the For example you could perhaps make use crate::query::query;
impl Index {
fn query(....) {
query(...)
}
} instead of impl Index {
fn query(....) {
" ----- 100 lines of code ----- "
}
}
fn query_helper_method_1 ( ... ) {}
fn query_helper_method_2 ( ... ) {} What do you think? I fully agree that passing a mutable index to the query is not right / looks weird. The reason for this is that is that I am vacuuming removed documents during query time since you might be iterating over them. Not sure if this feature is worth the price of having to pass a mutable reference. Maybe it would be better to return pointers to nodes that you later can remove instead, or something. |
Sure @marcus-pousette that sounds reasonable to separate out the modules. I will update this PR with those changes soonish. I thought that automatic vacuuming was the reason for the mutable reference and thanks for clarifying that. I would argue that it is not worthwhile to automatically vacuum. When i was integrating with my project i had a nicely encapsulated index but had to expose a mutable reference in order to query and i would like to remove that as the owner of the index should only be able to mutate it. For consumers of the library that need vacuuming then having the query ignore the removed documents is enough i think. Eventually the index will be purged when vacuum is called explicitly. |
@marcus-pousette, please take another look, I think it works better now; in particular the trait bounds for |
Yes, looks great now! Now. |
Yes, we could put |
I agree with you. I realize now that there might even be immediate performance benefits to do this your way since we are to remove some checks. I created an issue for this: #16 |
Query capabilities will always be necessary in someway but now at least you can implement your own query module and not include this query module when depending on this library (if there is a feature flag). I am not sure how far back one need to go to implement the fuzzy matching. But I am quite sure you do not need to reinvent everything in the query module, there must be some smart abstraction to make that allows existing functionality coexist nicely with fuzzy matching features. We could take this in the #12 issue, but maybe also think about what fuzzy matching could represent in a bigger picture, for example, many "fuzzy" matching systems today are using synonym/word2vec modules to measure distances from what you searching for and what exist in the index. Fuzzy matching is in some sense a property where we set the allowed "error distance" where distance could represent Levenshtein distance, or whatever distance type that is feasible for your need. Continuing on this, perhaps it would be interesting to pass an "error" function, that allows you during query to set the precision of your query. This error function could itself be a function of some distance measurement algorithm, like Levenshtein or word2vec word distance norm |
Thanks for landing this @marcus-pousette, I agree that we shouldn't need to make many changes to support fuzzy matching. Allowing complete control of First I will deal with #16 and I want to spend a bit more time on the documentation to help my understanding of the code. |
Closes #13.