Define field to search on at search-time #3772

gillian-meilisearch · 2023-05-24T09:10:51Z

Related product team resources: roadmap card (internal only)

Motivation

Both Algolia and Typesense support this feature and it has been flagged several times as a blocker to adoption.

Usage

Enables running searches over a subset of searchableAttributes without modifying Meilisearch’s index settings.

⚠️ Refer to the final spec to know the details and the final decisions about the usage.

TODO

Release a prototype
If prototype validated, merge changes into main
Update the spec

Technical prototype's TODO (@ManyTheFish addition)

Remaining Uncertainties (@ManyTheFish addition) (poke @macraig)

After making a technical tour of the feature, some uncertainties remain:

API: The dedicated search parameter is not defined, here is the comment link showing some competitor APIs
- restrictSearchableAttributes
Undefined behavior: when the end user search in a field that is not part of the searchable attribute the behavior of Meilisearch is undefined, we could return an error or return an empty response
- Empty response
Expectations of the feature: Does the order of the field given at search time reorder the searchable attribute order given in the settings
- The order doesn't change
Expectations of the feature, the feature could be implemented as an exclusive filter ("I only want the documents matching in this field") or as a ranking priority ("I want the documents matching in this field to be ranked before the others")
- The feature is implemented as an exclusive filter

Impacted teams

@meilisearch/docs-team @meilisearch/integration-team

The text was updated successfully, but these errors were encountered:

dureuill · 2023-06-14T08:51:55Z

Hello @ManyTheFish 👋

Seeing how the PR is already open for this feature, I hope I'm not arriving too late to the party 😃

I have feedback regarding this feature, in light of a discussion I had with @LukasKalbertodt

[...] the core problem is that Meili doesn't know how to compare attributes of different indexes regarding "importance". Just throwing ideas out there: one could, in a future version of Meili, allow users to not only specify the order of importance of attributes, but also assign a value the user picks. So instead of sending ["title", "author", "description"] to POST /indexes/../settings/searchable-attributes, one would send { "title": 1000, "author": 800, ..}. With that, the core problem is solved right? Because it can be now found out which of two attributes of different indexes is more important by comparing their number value.

(I'm quoting them)

So, I think it would be great if the "Define field to search on at search-time" could take this need into account, either in this first cycle if attainable at all, or at least planned as a future extension.

Concretely, the changes it would entail IMO:

drop the restrict from restrictSearchableAttributes as it could be used to communicate distinct weights rather than a subset of the searchable attributes (must be done in first cycle)
Allow to pass either a list (like today) or a JSON object where the key is the attribute name, and the value is its weight. The unclear part for me, though is what the weights should be. (can be done later)
- The score on a scale of 1000 makes sense with today's relevancy prototype, but a future prototype could well change that scale to a unit float scale (from 0.0 to 1.0).
- Implementation-wise, what we need is both the cost associated with each field, as well as the maximum cost that is considered. Ideally both of these could be derived from the weighted list and be index independent such that different lists still yield to comparable scores. Maybe this could be achieved by asking the user for both the weights (higher is better) and the scale (what is the max weight that gives a perfect score)?
The cost computation in the attribute_fid ranking rule should take into account the list of "restricted" searchable attributes and their weights. Once we know how to convert the weights to costs and a max cost, this can be done easily by enriching the attribute_fid ranking rule with a map of the fid to the cost, and using that map instead of directly the fid as the cost in the conditions (implementation wise it may entail modifying the RankingRuleGraphTrait to accept &self parameters so we can have state in the graph based ranking rules, but it is not a very large change) (not sure if can be done later or must be done in first cycle)

Thanks for reading,
I'm available to discuss this anytime 😃

Side note:

Undefined behavior: when the end user search in a field that is not part of the searchable attribute the behavior of Meilisearch is undefined, we could return an error or return an empty response
- Empty response

What is the rationale for returning an "empty response" in that case? I would expect either:

(strongly preferred) a synchronous error, so that you can catch my typos when putting genre instead of genres or colour instead of color
(less preferred) since this field is not indexed, no document will match it (but other searchable attributes might return some hits). This will certainly be unexpected for users, though, if the field appears in some of their documents, is not searchable in the index settings. In this situation they won't get an error but these documents won't match either, not ideal.

ManyTheFish · 2023-06-15T08:58:46Z

Hello @dureuill,
I'll start with the easiest:

What is the rationale for returning an "empty response" in that case

Well, it's not really the case, Meilisearch behaves as you explained in (2) other searchable attributes might return some hits, we only filter on the restricted attributes that Meilisearch knows, the other attributes are ignored.
About the error, I don't have a strong opinion on that, I personally prefer returning an error, but, after discussing with @irevoire, it's possible that the documents containing the restricted field are not yet indexed and we would return an error until the task is processed, which could be disturbing.

Drop the restrict from restrictSearchableAttributes as it could be used to communicate distinct weights rather than a subset of the searchable attributes (must be done in first cycle)

We can do this 😄 we only chose restrictSearchableAttributes because it's consistent with competitors' API, but searchableAttributes is fine. (poke @macraig on this change)

Allow to pass either a list (like today) or a JSON object where the key is the attribute name, and the value is its weight. The unclear part for me, though is what the weights should be.

I personally don't like the weight approach, it's too tricky to set a good weight for a field from the end-user perspective. Moreover, it would be inconsistent with the setting API that defines an order.
However, in the current PR, we chose to take an ordered list in parameters that could be used to reorder the fields at search time in the future. 🤔

The cost computation in the attribute_fid ranking rule should take into account the list of "restricted" searchable attributes and their weights. Once we know how to convert the weights to costs and a max cost, this can be done easily by enriching the attribute_fid ranking rule with a map of the fid to the cost, and using that map instead of directly the fid as the cost in the conditions

It's completely doable, but I feel that is something different that the planned feature for v1.3.

dureuill · 2023-06-15T09:31:58Z

Thank you ManyTheFish for the detailed answer ☺️

About the error, I don't have a strong opinion on that, I personally prefer returning an error, but, after discussing with @irevoire, it's possible that the documents containing the restricted field are not yet indexed and we would return an error until the task is processed, which could be disturbing.

Ah OK, this happens in the case where restrictSearchableAttributes is used with an inferred searchableAttributes, that then gets modified by indexing new documents.

If I have to choose between this particular edge case and fixing the typos I prefer we fix the typos. Maybe this is inconsistent with some other attributes related APIs usable at search time, though? We would need to check that. I'm noting that our competitors only allow restrictSearchableAttributes at search time when the searchableAttributes setting is non empty and non null. I wonder if that makes sense for us to do the same? It would remove the edge case that @irevoire is thinking of. It would also be good to check what our competitor is doing in the case where an attribute that is not part of searchableAttributes is passed to restrictSearchableAttributes, as another data point.

About the naming change and scope change, I think it would be best if we discussed this synchronously. To be clear I'm not suggesting we completely remove the possibility to specify a list of attributes, just advocating that we add the additional capability of specifying the weights. In an ideal world we would also upgrade the setting to accept the weights additionally to the list.

3834: Define searchable fields at runtime r=Kerollmops a=ManyTheFish ## Summary This feature allows the end-user to search in one or multiple attributes using the search parameter `attributesToSearchOn`: ```json { "q": "Captain Marvel", "attributesToSearchOn": ["title"] } ``` This feature act like a filter, forcing Meilisearch to only return the documents containing the requested words in the attributes-to-search-on. Note that, with the matching strategy `last`, Meilisearch will only ensure that the first word is in the attributes-to-search-on, but, the retrieved documents will be ordered taking into account the word contained in the attributes-to-search-on. ## Trying the prototype A dedicated docker image has been released for this feature: #### last prototype version: ```bash docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-1 ``` #### others prototype versions: ```bash docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-0 ``` ## Technical Detail The attributes-to-search-on list is given to the search context, then, the search context uses the `fid_word_docids`database using only the allowed field ids instead of the global `word_docids` database. This is the same for the prefix databases. The database cache is updated with the merged values, meaning that the union of the field-id-database values is only made if the requested key is missing from the cache. ### Relevancy limits Almost all ranking rules behave as expected when ordering the documents. Only `proximity` could miss-order documents if all the searched words are in the restricted attribute but a better proximity is found in an ignored attribute in a document that should be ranked lower. I put below a failing test showing it: ```rust #[actix_rt::test] async fn proximity_ranking_rule_order() { let server = Server::new().await; let index = index_with_documents( &server, &json!([ { "title": "Captain super mega cool. A Marvel story", // Perfect distance between words in an ignored attribute "desc": "Captain Marvel", "id": "1", }, { "title": "Captain America from Marvel", "desc": "a Shazam ersatz", "id": "2", }]), ) .await; // Document 2 should appear before document 1. index .search(json!({"q": "Captain Marvel", "attributesToSearchOn": ["title"], "attributesToRetrieve": ["id"]}), |response, code| { assert_eq!(code, 200, "{}", response); assert_eq!( response["hits"], json!([ {"id": "2"}, {"id": "1"}, ]) ); }) .await; } ``` Fixing this would force us to create a `fid_word_pair_proximity_docids` and a `fid_word_prefix_pair_proximity_docids` databases which may multiply the keys of `word_pair_proximity_docids` and `word_prefix_pair_proximity_docids` by the number of attributes in the searchable_attributes list. If we think we should fix this test, I'll suggest doing it in another PR. ## Related Fixes #3772 Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>

3876: Fix invalid attributeToSearchOn error code r=Kerollmops a=ManyTheFish Fix the invalid attributeToSearchOn error code to be consistent with the other search parameters' error codes: error code `invalid_attributes_to_search_on` becomes `invalid_search_attributes_to_search_on`: ```diff - invalid_attributes_to_search_on + invalid_search_attributes_to_search_on ``` related to #3772 Co-authored-by: ManyTheFish <many@meilisearch.com>

gillian-meilisearch · 2023-07-06T07:41:43Z

Hello everyone 👋

We have just released the first RC (release candidate) of Meilisearch containing this change!
You can test it by using

the release assets
the Meilisearch Docker image

docker run -it --rm -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:v1.3.0-rc.0

If you encounter any bugs, please report them here.
Thanks in advance for your help and your involvement in Meilisearch ❤️

🎉 The official and stable release containing this change will be available on July 31st, 2023

⚠️ RC (release candidates) are not recommended for production

gillian-meilisearch added this to the v1.3.0 milestone May 24, 2023

gillian-meilisearch assigned irevoire May 24, 2023

ManyTheFish self-assigned this May 29, 2023

curquiza unassigned irevoire Jun 5, 2023

ManyTheFish mentioned this issue Jun 13, 2023

Define searchable fields at runtime #3834

Merged

maryamsulemani97 mentioned this issue Jun 19, 2023

v1.3: Roadmap meilisearch/documentation#2467

Closed

24 tasks

guimachiavelli mentioned this issue Jun 20, 2023

v1.3: Define search field at search time meilisearch/documentation#2478

Closed

3 tasks

meili-bors bot closed this as completed in d4f1080 Jun 28, 2023

mmachatschek mentioned this issue Jul 3, 2023

attributesToRetrieve: array_flip(): Can only flip string and integer values, entry skipped laravel/scout#750

Closed

This was referenced Jul 3, 2023

Define fields to search on at runtime meilisearch/specifications#251

Merged

Fix invalid attributeToSearchOn error code #3876

Merged

brunoocasali mentioned this issue Jul 5, 2023

Changes regarding Meilisearch v1.3.0 meilisearch/integration-guides#280

Closed

bidoubiwa mentioned this issue Jul 10, 2023

Add attributes to search on for Meilisearch v1.3 meilisearch/meilisearch-js#1538

Merged

meili-bot added the v1.3.0 PRs/issues solved in v1.3.0 released on 2023-07-31 label Aug 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define field to search on at search-time #3772

Define field to search on at search-time #3772

gillian-meilisearch commented May 24, 2023 •

edited by ManyTheFish

Loading

dureuill commented Jun 14, 2023

ManyTheFish commented Jun 15, 2023 •

edited

Loading

dureuill commented Jun 15, 2023

gillian-meilisearch commented Jul 6, 2023

Define field to search on at search-time #3772

Define field to search on at search-time #3772

Comments

gillian-meilisearch commented May 24, 2023 • edited by ManyTheFish Loading

Motivation

Usage

TODO

Technical prototype's TODO (@ManyTheFish addition)

Remaining Uncertainties (@ManyTheFish addition) (poke @macraig)

Impacted teams

dureuill commented Jun 14, 2023

ManyTheFish commented Jun 15, 2023 • edited Loading

dureuill commented Jun 15, 2023

gillian-meilisearch commented Jul 6, 2023

gillian-meilisearch commented May 24, 2023 •

edited by ManyTheFish

Loading

ManyTheFish commented Jun 15, 2023 •

edited

Loading