"More Like This" feature #466

dimiro1 · 2020-02-07T15:53:11Z

Do you have any plans to implement a "More Like This"[1, 2] feature?

[1] https://lucene.apache.org/core/8_4_1/queries/org/apache/lucene/queries/mlt/MoreLikeThis.html
[2] https://whoosh.readthedocs.io/en/latest/api/searching.html?highlight=more%20like#whoosh.searching.Hit.more_like_this

tpayet · 2020-02-09T10:34:03Z

Hey, this is not currently on the roadmap. MeiliSearch do have pagination if you want to have more results based on a single search but that's it.
I think it may add complexity when designing a search-as-you-type box. The search results should be found within two or three words in a query, thus reachable with pagination.
Can you provide a use-case?

dimiro1 · 2020-02-09T21:11:56Z

Hello @tpayet,

My use case is pretty simple. I am using a "More Like This" to implement a very crude item-to-item [1] product recommendation based on a few actions a user can perform.

e.g:

"Product A" is similar to "Product B", if the user "Likes" "Product A", she might also like "Product B."

Currently, I am using Whoosh [2] to perform this task; however, I am also planning to integrate Meilisearch in this project, it would be very helpful to have such a feature on Meilisearch so I could remove my dependency on Whoosh.

I guess I can implement something similar with Meilisearch by somehow extracting the "Key terms" of a document and performing a search with the terms.

[1] https://en.wikipedia.org/wiki/Item-item_collaborative_filtering
[2] https://whoosh.readthedocs.io/en/latest

tpayet · 2020-02-12T10:55:16Z

I understand ^^

It is not currently on our roadmap, but if you are willing to implement something, @Kerollmops can help you and guides you through the understanding of MeiliSearch :)

Let me know if you are interested, we can invite you on a Slack, it will be easier for us to collaborate

dimiro1 · 2020-02-16T19:56:34Z

@tpayet I could try, but, besides a few experiments, I really don't have a good experience with Rust.

Kerollmops · 2020-02-18T15:17:20Z

Hey @dimiro1,

I though a little bit about this "More Like This" feature.

As I understand it, to find documents that looks like a given document or a group of documents we could take all the rarest words in the origin document or group of documents and search for all other documents that have those words too (remove the most common words when not enough documents are found). A good improvement of this method could be to also use the synonyms of those rare words.

The problem with this approach is that there is not way to get all the words related to a given document, MeiliSearch use an inverted index: the key is the word and the value is the list of documents ids containing the given word.

So retrieving the words contained in a document or a group of documents can only be done by iterating through all the words and searching for those that are in the documents. It can take an huge amount of time as it is O(n * m) where n is the number of words and m is the size of all the lists of documents ids associated with each words.

dimiro1 · 2020-02-19T09:47:42Z

@Kerollmops Thanks for your reply, your explanation makes sense to me. A possible solution would be to start storing the keywords together with the documents. Unfortunately, this has the disadvantages of increasing the index size and will require some sort of transaction mechanism to make sure the keywords are in sync.

qdequele · 2020-12-03T13:37:04Z

Hello, Thanks for this feature proposal!

I will close this issue in favor of our public roadmap.

I invite everyone interested in this feature to update it on the roadmap.

466: Bump version to 0.23.1 r=curquiza a=Kerollmops This PR bumps the crate versions to 0.23.1. Nothing seems to be breaking in the next release. Co-authored-by: Kerollmops <clement@meilisearch.com>

qdequele added the feature request & feedback Go to https://github.com/meilisearch/product/ label Feb 17, 2020

qdequele closed this as completed Dec 3, 2020

bors bot added a commit that referenced this issue Jan 16, 2023

Merge #466

f04ab67

466: Bump version to 0.23.1 r=curquiza a=Kerollmops This PR bumps the crate versions to 0.23.1. Nothing seems to be breaking in the next release. Co-authored-by: Kerollmops <clement@meilisearch.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"More Like This" feature #466

"More Like This" feature #466

dimiro1 commented Feb 7, 2020

tpayet commented Feb 9, 2020

dimiro1 commented Feb 9, 2020

tpayet commented Feb 12, 2020

dimiro1 commented Feb 16, 2020

Kerollmops commented Feb 18, 2020 •

edited

Loading

dimiro1 commented Feb 19, 2020

qdequele commented Dec 3, 2020

"More Like This" feature #466

"More Like This" feature #466

Comments

dimiro1 commented Feb 7, 2020

tpayet commented Feb 9, 2020

dimiro1 commented Feb 9, 2020

tpayet commented Feb 12, 2020

dimiro1 commented Feb 16, 2020

Kerollmops commented Feb 18, 2020 • edited Loading

dimiro1 commented Feb 19, 2020

qdequele commented Dec 3, 2020

Kerollmops commented Feb 18, 2020 •

edited

Loading