Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define field to search on at search-time #3772

Closed
17 of 20 tasks
gillian-meilisearch opened this issue May 24, 2023 · 4 comments · Fixed by #3834
Closed
17 of 20 tasks

Define field to search on at search-time #3772

gillian-meilisearch opened this issue May 24, 2023 · 4 comments · Fixed by #3834
Assignees
Labels
enhancement New feature or improvement impacts docs This issue involves changes in the Meilisearch's documentation impacts integrations This issue involves changes in the Meilisearch's integrations missing usage Description of the feature usage is missing in the appropriate place v1.3.0 PRs/issues solved in v1.3.0 released on 2023-07-31
Milestone

Comments

@gillian-meilisearch
Copy link
Contributor

gillian-meilisearch commented May 24, 2023

Related product team resources: roadmap card (internal only)

Related product discussions:

Related spec: WIP

Motivation

Both Algolia and Typesense support this feature and it has been flagged several times as a blocker to adoption.

Usage

Enables running searches over a subset of searchableAttributes without modifying Meilisearch’s index settings.

⚠️ Refer to the final spec to know the details and the final decisions about the usage.

TODO

  • Release a prototype
  • If prototype validated, merge changes into main
  • Update the spec

Technical prototype's TODO (@ManyTheFish addition)

  • Milli side:
  • API side:
    • Define a new Search parameter (restrictSearchableAttributes)
    • Make Milli work with the search parameter
  • Test side (@ManyTheFish):
    • A query searching in an unsearchable field MAY return an error or MAY return an empty result (TBD)
    • A query with the terms matching strategy All must only return documents containing all the requested words in the searched field
    • Ensure the criteria take into account the filtered field:
      • Word: A document containing query-words in the searched field must be ranked before a document containing more query-words in all fields but fewer in the searched field.
      • Typo: A document containing query-words without typos in the searched field must be ranked before a document containing more query-words without typos in all fields but fewer in the searched field.
      • Proximity: A document containing query-words with a distance of x in the searched field must be ranked before a document containing query-words with a perfect distance in another field but a distance y worse than x in the searched field.
      • Attribute: A document containing query-words in the searched field must be ranked before a document containing more query-words in a higher-ranked field but fewer in the searched field.
      • Exactness: A document containing exact query-words in the searched field must be ranked before a document containing more exact query-words in all fields but fewer in the searched field

Remaining Uncertainties (@ManyTheFish addition) (poke @macraig)

After making a technical tour of the feature, some uncertainties remain:

  • API: The dedicated search parameter is not defined, here is the comment link showing some competitor APIs
    • restrictSearchableAttributes
  • Undefined behavior: when the end user search in a field that is not part of the searchable attribute the behavior of Meilisearch is undefined, we could return an error or return an empty response
    • Empty response
  • Expectations of the feature: Does the order of the field given at search time reorder the searchable attribute order given in the settings
    • The order doesn't change
  • Expectations of the feature, the feature could be implemented as an exclusive filter ("I only want the documents matching in this field") or as a ranking priority ("I want the documents matching in this field to be ranked before the others")
    • The feature is implemented as an exclusive filter

Impacted teams

@meilisearch/docs-team @meilisearch/integration-team

@gillian-meilisearch gillian-meilisearch added this to the v1.3.0 milestone May 24, 2023
@gillian-meilisearch gillian-meilisearch added enhancement New feature or improvement impacts docs This issue involves changes in the Meilisearch's documentation impacts integrations This issue involves changes in the Meilisearch's integrations missing usage Description of the feature usage is missing in the appropriate place labels May 24, 2023
@ManyTheFish ManyTheFish self-assigned this May 29, 2023
@dureuill
Copy link
Contributor

Hello @ManyTheFish 👋

Seeing how the PR is already open for this feature, I hope I'm not arriving too late to the party 😃

I have feedback regarding this feature, in light of a discussion I had with @LukasKalbertodt

[...] the core problem is that Meili doesn't know how to compare attributes of different indexes regarding "importance". Just throwing ideas out there: one could, in a future version of Meili, allow users to not only specify the order of importance of attributes, but also assign a value the user picks. So instead of sending ["title", "author", "description"] to POST /indexes/../settings/searchable-attributes, one would send { "title": 1000, "author": 800, ..}. With that, the core problem is solved right? Because it can be now found out which of two attributes of different indexes is more important by comparing their number value.

(I'm quoting them)

So, I think it would be great if the "Define field to search on at search-time" could take this need into account, either in this first cycle if attainable at all, or at least planned as a future extension.

Concretely, the changes it would entail IMO:

  • drop the restrict from restrictSearchableAttributes as it could be used to communicate distinct weights rather than a subset of the searchable attributes (must be done in first cycle)
  • Allow to pass either a list (like today) or a JSON object where the key is the attribute name, and the value is its weight. The unclear part for me, though is what the weights should be. (can be done later)
    • The score on a scale of 1000 makes sense with today's relevancy prototype, but a future prototype could well change that scale to a unit float scale (from 0.0 to 1.0).
    • Implementation-wise, what we need is both the cost associated with each field, as well as the maximum cost that is considered. Ideally both of these could be derived from the weighted list and be index independent such that different lists still yield to comparable scores. Maybe this could be achieved by asking the user for both the weights (higher is better) and the scale (what is the max weight that gives a perfect score)?
  • The cost computation in the attribute_fid ranking rule should take into account the list of "restricted" searchable attributes and their weights. Once we know how to convert the weights to costs and a max cost, this can be done easily by enriching the attribute_fid ranking rule with a map of the fid to the cost, and using that map instead of directly the fid as the cost in the conditions (implementation wise it may entail modifying the RankingRuleGraphTrait to accept &self parameters so we can have state in the graph based ranking rules, but it is not a very large change) (not sure if can be done later or must be done in first cycle)

Thanks for reading,
I'm available to discuss this anytime 😃

Side note:

  • Undefined behavior: when the end user search in a field that is not part of the searchable attribute the behavior of Meilisearch is undefined, we could return an error or return an empty response
    - Empty response

What is the rationale for returning an "empty response" in that case? I would expect either:

  1. (strongly preferred) a synchronous error, so that you can catch my typos when putting genre instead of genres or colour instead of color
  2. (less preferred) since this field is not indexed, no document will match it (but other searchable attributes might return some hits). This will certainly be unexpected for users, though, if the field appears in some of their documents, is not searchable in the index settings. In this situation they won't get an error but these documents won't match either, not ideal.

@ManyTheFish
Copy link
Member

ManyTheFish commented Jun 15, 2023

Hello @dureuill,
I'll start with the easiest:

What is the rationale for returning an "empty response" in that case

Well, it's not really the case, Meilisearch behaves as you explained in (2) other searchable attributes might return some hits, we only filter on the restricted attributes that Meilisearch knows, the other attributes are ignored.
About the error, I don't have a strong opinion on that, I personally prefer returning an error, but, after discussing with @irevoire, it's possible that the documents containing the restricted field are not yet indexed and we would return an error until the task is processed, which could be disturbing.

Drop the restrict from restrictSearchableAttributes as it could be used to communicate distinct weights rather than a subset of the searchable attributes (must be done in first cycle)

We can do this 😄 we only chose restrictSearchableAttributes because it's consistent with competitors' API, but searchableAttributes is fine. (poke @macraig on this change)

Allow to pass either a list (like today) or a JSON object where the key is the attribute name, and the value is its weight. The unclear part for me, though is what the weights should be.

I personally don't like the weight approach, it's too tricky to set a good weight for a field from the end-user perspective. Moreover, it would be inconsistent with the setting API that defines an order.
However, in the current PR, we chose to take an ordered list in parameters that could be used to reorder the fields at search time in the future. 🤔

The cost computation in the attribute_fid ranking rule should take into account the list of "restricted" searchable attributes and their weights. Once we know how to convert the weights to costs and a max cost, this can be done easily by enriching the attribute_fid ranking rule with a map of the fid to the cost, and using that map instead of directly the fid as the cost in the conditions

It's completely doable, but I feel that is something different that the planned feature for v1.3.

@dureuill
Copy link
Contributor

Thank you ManyTheFish for the detailed answer ☺️

About the error, I don't have a strong opinion on that, I personally prefer returning an error, but, after discussing with @irevoire, it's possible that the documents containing the restricted field are not yet indexed and we would return an error until the task is processed, which could be disturbing.

Ah OK, this happens in the case where restrictSearchableAttributes is used with an inferred searchableAttributes, that then gets modified by indexing new documents.

If I have to choose between this particular edge case and fixing the typos I prefer we fix the typos. Maybe this is inconsistent with some other attributes related APIs usable at search time, though? We would need to check that. I'm noting that our competitors only allow restrictSearchableAttributes at search time when the searchableAttributes setting is non empty and non null. I wonder if that makes sense for us to do the same? It would remove the edge case that @irevoire is thinking of. It would also be good to check what our competitor is doing in the case where an attribute that is not part of searchableAttributes is passed to restrictSearchableAttributes, as another data point.

About the naming change and scope change, I think it would be best if we discussed this synchronously. To be clear I'm not suggesting we completely remove the possibility to specify a list of attributes, just advocating that we add the additional capability of specifying the weights. In an ideal world we would also upgrade the setting to accept the weights additionally to the list.

meili-bors bot added a commit that referenced this issue Jun 21, 2023
3834: Define searchable fields at runtime r=Kerollmops a=ManyTheFish

## Summary
This feature allows the end-user to search in one or multiple attributes using the search parameter `attributesToSearchOn`:

```json
{
  "q": "Captain Marvel",
  "attributesToSearchOn": ["title"]
}
```

This feature act like a filter, forcing Meilisearch to only return the documents containing the requested words in the attributes-to-search-on. Note that, with the matching strategy `last`, Meilisearch will only ensure that the first word is in the attributes-to-search-on, but, the retrieved documents will be ordered taking into account the word contained in the attributes-to-search-on. 

## Trying the prototype

A dedicated docker image has been released for this feature:

#### last prototype version:

```bash
docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-1
```

#### others prototype versions:

```bash
docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-0
```

## Technical Detail

The attributes-to-search-on list is given to the search context, then, the search context uses the `fid_word_docids`database using only the allowed field ids instead of the global `word_docids` database. This is the same for the prefix databases.
The database cache is updated with the merged values, meaning that the union of the field-id-database values is only made if the requested key is missing from the cache.

### Relevancy limits

Almost all ranking rules behave as expected when ordering the documents.
Only `proximity` could miss-order documents if all the searched words are in the restricted attribute but a better proximity is found in an ignored attribute in a document that should be ranked lower. I put below a failing test showing it:
```rust
#[actix_rt::test]
async fn proximity_ranking_rule_order() {
    let server = Server::new().await;
    let index = index_with_documents(
        &server,
        &json!([
        {
            "title": "Captain super mega cool. A Marvel story",
            // Perfect distance between words in an ignored attribute
            "desc": "Captain Marvel",
            "id": "1",
        },
        {
            "title": "Captain America from Marvel",
            "desc": "a Shazam ersatz",
            "id": "2",
        }]),
    )
    .await;

    // Document 2 should appear before document 1.
    index
        .search(json!({"q": "Captain Marvel", "attributesToSearchOn": ["title"], "attributesToRetrieve": ["id"]}), |response, code| {
            assert_eq!(code, 200, "{}", response);
            assert_eq!(
                response["hits"],
                json!([
                    {"id": "2"},
                    {"id": "1"},
                ])
            );
        })
        .await;
}
```

Fixing this would force us to create a `fid_word_pair_proximity_docids` and a `fid_word_prefix_pair_proximity_docids` databases which may multiply the keys of `word_pair_proximity_docids` and `word_prefix_pair_proximity_docids` by the number of attributes in the searchable_attributes list. If we think we should fix this test, I'll suggest doing it in another PR.

## Related

Fixes #3772

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
@meili-bors meili-bors bot closed this as completed in d4f1080 Jun 28, 2023
meili-bors bot added a commit that referenced this issue Jul 3, 2023
3876: Fix invalid attributeToSearchOn error code r=Kerollmops a=ManyTheFish

Fix the invalid attributeToSearchOn error code to be consistent with the other search parameters' error codes:

error code `invalid_attributes_to_search_on` becomes `invalid_search_attributes_to_search_on`:
```diff
- invalid_attributes_to_search_on
+ invalid_search_attributes_to_search_on
```

related to #3772


Co-authored-by: ManyTheFish <many@meilisearch.com>
@gillian-meilisearch
Copy link
Contributor Author

Hello everyone 👋

We have just released the first RC (release candidate) of Meilisearch containing this change!
You can test it by using

docker run -it --rm -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:v1.3.0-rc.0

If you encounter any bugs, please report them here.
Thanks in advance for your help and your involvement in Meilisearch ❤️

🎉 The official and stable release containing this change will be available on July 31st, 2023

⚠️ RC (release candidates) are not recommended for production

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or improvement impacts docs This issue involves changes in the Meilisearch's documentation impacts integrations This issue involves changes in the Meilisearch's integrations missing usage Description of the feature usage is missing in the appropriate place v1.3.0 PRs/issues solved in v1.3.0 released on 2023-07-31
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants