Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"More Like This" feature #466

Closed
dimiro1 opened this issue Feb 7, 2020 · 7 comments
Closed

"More Like This" feature #466

dimiro1 opened this issue Feb 7, 2020 · 7 comments
Labels
feature request & feedback Go to https://github.com/meilisearch/product/

Comments

@dimiro1
Copy link

dimiro1 commented Feb 7, 2020

Do you have any plans to implement a "More Like This"[1, 2] feature?

[1] https://lucene.apache.org/core/8_4_1/queries/org/apache/lucene/queries/mlt/MoreLikeThis.html
[2] https://whoosh.readthedocs.io/en/latest/api/searching.html?highlight=more%20like#whoosh.searching.Hit.more_like_this

@tpayet
Copy link
Member

tpayet commented Feb 9, 2020

Hey, this is not currently on the roadmap. MeiliSearch do have pagination if you want to have more results based on a single search but that's it.
I think it may add complexity when designing a search-as-you-type box. The search results should be found within two or three words in a query, thus reachable with pagination.
Can you provide a use-case?

@dimiro1
Copy link
Author

dimiro1 commented Feb 9, 2020

Hello @tpayet,

My use case is pretty simple. I am using a "More Like This" to implement a very crude item-to-item [1] product recommendation based on a few actions a user can perform.

e.g:

"Product A" is similar to "Product B", if the user "Likes" "Product A", she might also like "Product B."

Currently, I am using Whoosh [2] to perform this task; however, I am also planning to integrate Meilisearch in this project, it would be very helpful to have such a feature on Meilisearch so I could remove my dependency on Whoosh.

I guess I can implement something similar with Meilisearch by somehow extracting the "Key terms" of a document and performing a search with the terms.

[1] https://en.wikipedia.org/wiki/Item-item_collaborative_filtering
[2] https://whoosh.readthedocs.io/en/latest

@tpayet
Copy link
Member

tpayet commented Feb 12, 2020

I understand ^^

It is not currently on our roadmap, but if you are willing to implement something, @Kerollmops can help you and guides you through the understanding of MeiliSearch :)

Let me know if you are interested, we can invite you on a Slack, it will be easier for us to collaborate

@dimiro1
Copy link
Author

dimiro1 commented Feb 16, 2020

@tpayet I could try, but, besides a few experiments, I really don't have a good experience with Rust.

@qdequele qdequele added the feature request & feedback Go to https://github.com/meilisearch/product/ label Feb 17, 2020
@Kerollmops
Copy link
Member

Kerollmops commented Feb 18, 2020

Hey @dimiro1,

I though a little bit about this "More Like This" feature.

As I understand it, to find documents that looks like a given document or a group of documents we could take all the rarest words in the origin document or group of documents and search for all other documents that have those words too (remove the most common words when not enough documents are found). A good improvement of this method could be to also use the synonyms of those rare words.

The problem with this approach is that there is not way to get all the words related to a given document, MeiliSearch use an inverted index: the key is the word and the value is the list of documents ids containing the given word.

So retrieving the words contained in a document or a group of documents can only be done by iterating through all the words and searching for those that are in the documents. It can take an huge amount of time as it is O(n * m) where n is the number of words and m is the size of all the lists of documents ids associated with each words.

@dimiro1
Copy link
Author

dimiro1 commented Feb 19, 2020

@Kerollmops Thanks for your reply, your explanation makes sense to me. A possible solution would be to start storing the keywords together with the documents. Unfortunately, this has the disadvantages of increasing the index size and will require some sort of transaction mechanism to make sure the keywords are in sync.

@qdequele
Copy link
Member

qdequele commented Dec 3, 2020

Hello, Thanks for this feature proposal!

I will close this issue in favor of our public roadmap.

I invite everyone interested in this feature to update it on the roadmap.

@qdequele qdequele closed this as completed Dec 3, 2020
bors bot added a commit that referenced this issue Jan 16, 2023
466: Bump version to 0.23.1 r=curquiza a=Kerollmops

This PR bumps the crate versions to 0.23.1. Nothing seems to be breaking in the next release.

Co-authored-by: Kerollmops <clement@meilisearch.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request & feedback Go to https://github.com/meilisearch/product/
Projects
None yet
Development

No branches or pull requests

4 participants