Migrate to methods of Index #15

tmpfs · 2022-08-19T05:19:40Z

Closes #13.

Should also be a member method.

Need an additional trait constraint on the `query` function for now.

tmpfs · 2022-08-19T06:06:57Z

@marcus-pousette, I think this is more or less there. Later I think it would be nice to spend a bit of time improving the API documentation but for the moment I think this is good enough.

One thing that bothers me a bit is that query() takes a mutable reference to the Index and that seems counter-intuitive to what a query function should do. I wonder if there is any way to remove the mutability or otherwise to use interior mutability so we could use &self - maybe better in a separate issue/PR.

Now that query() is a method of Index it makes more sense to have the query helper functions in the index module. Keep the query module just for tests. Some improvements to the handling of test utility functions and flatten the score module hierarchy for better documentation output.

marcus-pousette · 2022-08-19T07:06:12Z

Thanks for the PR! @tmpfs

The only thing I am considering is that, if it is necessary to have "everything" is in the index file (query just contain tests with this PR). I see why you made this changes (because the query fn is now a member of Index). I wonder if it would be possible to have separation of this, so I as a reader can more quickly get an overview/insight into different parts of this library without having to read through a long index file. I am also taking height for a future where the query capabilities are much more feature rich. Which would make the index file even larger if the same approach is used.

For example you could perhaps make

use crate::query::query;

impl Index {
    fn query(....) {
        query(...)
    }
}

instead of

impl Index {
    fn query(....) {
       " ----- 100 lines of code ----- "
    }
}

fn query_helper_method_1 ( ... ) {}

fn query_helper_method_2 ( ... ) {}

What do you think?

I fully agree that passing a mutable index to the query is not right / looks weird. The reason for this is that is that I am vacuuming removed documents during query time since you might be iterating over them. Not sure if this feature is worth the price of having to pass a mutable reference. Maybe it would be better to return pointers to nodes that you later can remove instead, or something.

tmpfs · 2022-08-19T10:29:45Z

Sure @marcus-pousette that sounds reasonable to separate out the modules. I will update this PR with those changes soonish.

I thought that automatic vacuuming was the reason for the mutable reference and thanks for clarifying that. I would argue that it is not worthwhile to automatically vacuum. When i was integrating with my project i had a nicely encapsulated index but had to expose a mutable reference in order to query and i would like to remove that as the owner of the index should only be able to mutate it.

For consumers of the library that need vacuuming then having the query ignore the removed documents is enough i think. Eventually the index will be purged when vacuum is called explicitly.

tmpfs · 2022-08-19T12:28:20Z

@marcus-pousette, please take another look, I think it works better now; in particular the trait bounds for T are clearer and more consistent we only define them twice (in the index and query modules) for each impl Index block 👍

marcus-pousette · 2022-08-19T12:49:08Z

Yes, looks great now!

Now. query module is completely independent of index (i.e. there can be an index without any query capabilities potentially).

tmpfs · 2022-08-19T12:51:29Z

Yes, we could put query behind a feature flag if necessary. I would imagine we would certainly want to do that for the fuzzy matching feature but not sure whether it makes sense for query - is the index useful without query?

marcus-pousette · 2022-08-19T12:54:12Z

I thought that automatic vacuuming was the reason for the mutable reference and thanks for clarifying that. I would argue that it is not worthwhile to automatically vacuum. When i was integrating with my project i had a nicely encapsulated index but had to expose a mutable reference in order to query and i would like to remove that as the owner of the index should only be able to mutate it.

For consumers of the library that need vacuuming then having the query ignore the removed documents is enough i think. Eventually the index will be purged when vacuum is called explicitly.

I agree with you. I realize now that there might even be immediate performance benefits to do this your way since we are to remove some checks. I created an issue for this: #16

marcus-pousette · 2022-08-19T13:09:34Z

Yes, we could put query behind a feature flag if necessary. I would imagine we would certainly want to do that for the fuzzy matching feature but not sure whether it makes sense for query - is the index useful without query?

Query capabilities will always be necessary in someway but now at least you can implement your own query module and not include this query module when depending on this library (if there is a feature flag).

I am not sure how far back one need to go to implement the fuzzy matching. But I am quite sure you do not need to reinvent everything in the query module, there must be some smart abstraction to make that allows existing functionality coexist nicely with fuzzy matching features. We could take this in the #12 issue, but maybe also think about what fuzzy matching could represent in a bigger picture, for example, many "fuzzy" matching systems today are using synonym/word2vec modules to measure distances from what you searching for and what exist in the index. Fuzzy matching is in some sense a property where we set the allowed "error distance" where distance could represent Levenshtein distance, or whatever distance type that is feasible for your need. Continuing on this, perhaps it would be interesting to pass an "error" function, that allows you during query to set the precision of your query. This error function could itself be a function of some distance measurement algorithm, like Levenshtein or word2vec word distance norm

tmpfs · 2022-08-20T01:09:46Z

Thanks for landing this @marcus-pousette, I agree that we shouldn't need to make many changes to support fuzzy matching. Allowing complete control of query through a feature flag is possible but I think a better approach is to allow complete control of the query behavior (scoring, matching, filtering, sorting) via the arguments to query.

First I will deal with #16 and I want to spend a bit more time on the documentation to help my understanding of the code.

tmpfs added 10 commits August 19, 2022 13:18

Migrate add_document_to_index().

a869403

Update README.

3b341f7

Migrate remove_document_from_index().

6de2fbe

Migrate vacuum_index().

031841b

Migrate create_index().

f43f363

Tidy vacuum_node().

ff6771e

Should also be a member method.

Migrate disconnect_and_count_documents().

a685e8f

Need an additional trait constraint on the `query` function for now.

Migrate query() function.

21a994d

Update exports for better documentation.

e36a14a

Move count_nodes() to tests module.

e39f508

tmpfs requested a review from marcus-pousette August 19, 2022 06:07

marcus-pousette mentioned this pull request Aug 19, 2022

query method requires a mutable reference to Index #16

Closed

tmpfs force-pushed the methods branch from bdc031c to 204841e Compare August 19, 2022 12:36

Restore query module.

84f0921

tmpfs force-pushed the methods branch from 204841e to 84f0921 Compare August 19, 2022 12:37

Tidy doc comments.

1bab96c

marcus-pousette merged commit a2cef1d into quantleaf:master Aug 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate to methods of Index #15

Migrate to methods of Index #15

tmpfs commented Aug 19, 2022

tmpfs commented Aug 19, 2022

marcus-pousette commented Aug 19, 2022 •

edited

Loading

tmpfs commented Aug 19, 2022

tmpfs commented Aug 19, 2022

marcus-pousette commented Aug 19, 2022

tmpfs commented Aug 19, 2022

marcus-pousette commented Aug 19, 2022

marcus-pousette commented Aug 19, 2022 •

edited

Loading

tmpfs commented Aug 20, 2022

Migrate to methods of Index #15

Migrate to methods of Index #15

Conversation

tmpfs commented Aug 19, 2022

tmpfs commented Aug 19, 2022

marcus-pousette commented Aug 19, 2022 • edited Loading

tmpfs commented Aug 19, 2022

tmpfs commented Aug 19, 2022

marcus-pousette commented Aug 19, 2022

tmpfs commented Aug 19, 2022

marcus-pousette commented Aug 19, 2022

marcus-pousette commented Aug 19, 2022 • edited Loading

tmpfs commented Aug 20, 2022

marcus-pousette commented Aug 19, 2022 •

edited

Loading

marcus-pousette commented Aug 19, 2022 •

edited

Loading