Split the field id map from the weight of each fields #4631

irevoire · 2024-05-13T14:03:45Z

Pull Request

Related issue

What does this PR do?

Make the (internal) searchable fields database always contain the searchable fields (instead of None when the user-defined searchable fields were not defined)
Introduce a new « fieldids_weights_map » that does the mapping between a fieldId and its Weight
Ensure that when two searchable fields are swapped, the field ID map doesn't change anymore (and thus, doesn't re-index)
Uses the weight instead of the order of the searchable fields in the attribute ranking rule at search time
When no searchable attributes are defined, make all their weights equal to zero
When a field is declared as searchable and contains nested fields, all its subfields share the same weight

Impact on relevancy

When no searchable attributes are declared

When no searchable attributes are declared, all the fields have the same importance instead of randomly giving more importance to the field we've encountered « the most early » in the life of the index.

This means that before this PR, send the following json:

[
  { "id": 0, "name": "kefir", "color": "white" },
  { "id": 1, "name": "white", "last name": "spirit" }
]

Would make the field name more important than the field color or last name.
This means that searching for white would make the document 1 automatically higher ranked than the document 0.

After this PR, all the fields have the same weight, and none are considered more important than others.

When a nested field is made searchable

The second behavior change that happened with this PR is in the case you're sending this document, for example:

{
  "id": 0,
  "name": "tamo",
  "doggo": {
    "name": "kefir",
    "surname": "le kef"
  },
  "catto": "gromez"
}

Previously, defining the searchable attributes as: ["tamo", "doggo", "catto"] was actually defining the « real » searchable attributes in the engine as: ["tamo", "doggo", "catto", "doggo.name", "doggo.surname"], which means that doggo.name and doggo.surname were NOT where the user expected them and had completely different weights than doggo.
In this PR all the weights have been unified, and the « real » searchable fields look like this:

[ "tamo", "doggo", "doggo.name", "doggo.surname", "catto"]
   ^^^^    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^    ^^^^^
Weight 0                 Weight 1                  Weight 2

meilisearch/src/search_queue.rs

milli/src/fieldids_weights_map.rs

milli/src/index.rs

milli/src/search/new/ranking_rule_graph/fid/mod.rs

milli/src/update/settings.rs

milli/src/search/new/db_cache.rs

…efined

…stricted field id

Kerollmops

Looks perfect to me! Can you also ensure that the field IDs have equal weights, please?

…s to zero

Kerollmops

We can merge whenever we can 🙂
bors merge

4631: Split the field id map from the weight of each fields r=Kerollmops a=irevoire # Pull Request ## Related issue Fixes #4484 ## What does this PR do? - Make the (internal) searchable fields database always contain the searchable fields (instead of None when the user-defined searchable fields were not defined) - Introduce a new « fieldids_weights_map » that does the mapping between a fieldId and its Weight - Ensure that when two searchable fields are swapped, the field ID map doesn't change anymore (and thus, doesn't re-index) - Uses the weight instead of the order of the searchable fields in the attribute ranking rule at search time - When no searchable attributes are defined, make all their weights equal to zero - When a field is declared as searchable and contains nested fields, all its subfields share the same weight ## Impact on relevancy ### When no searchable attributes are declared When no searchable attributes are declared, all the fields have the same importance instead of randomly giving more importance to the field we've encountered « the most early » in the life of the index. This means that before this PR, send the following json: ```json [ { "id": 0, "name": "kefir", "color": "white" }, { "id": 1, "name": "white", "last name": "spirit" } ] ``` Would make the field `name` more important than the field `color` or `last name`. This means that searching for `white` would make the document `1` automatically higher ranked than the document `0`. After this PR, all the fields have the same weight, and none are considered more important than others. ### When a nested field is made searchable The second behavior change that happened with this PR is in the case you're sending this document, for example: ```json { "id": 0, "name": "tamo", "doggo": { "name": "kefir", "surname": "le kef" }, "catto": "gromez" } ``` Previously, defining the searchable attributes as: `["tamo", "doggo", "catto"]` was actually defining the « real » searchable attributes in the engine as: `["tamo", "doggo", "catto", "doggo.name", "doggo.surname"]`, which means that `doggo.name` and `doggo.surname` were _NOT_ where the user expected them and had completely different weights than `doggo`. In this PR all the weights have been unified, and the « real » searchable fields look like this: ```json [ "tamo", "doggo", "doggo.name", "doggo.surname", "catto"] ^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^ Weight 0 Weight 1 Weight 2 Co-authored-by: Tamo <tamo@meilisearch.com>

dureuill · 2024-05-16T09:13:29Z

The windows failure here is a bit disheartening

meili-bors · 2024-05-16T09:28:21Z

Canceled.

Kerollmops

bors merge

meili-bors · 2024-05-16T10:27:36Z

Build succeeded:

irevoire added 6 commits May 14, 2024 17:00

Stops returning an option in the internal searchable fields

c224600

gate a test behind the required feature

4e4a1dd

Fix the indexing of the searchable

685f452

add a test on the current behaviour

9ecde41

stop updating the fields ids map when fields are only swapped

b0afe09

add a failing test on the attribute ranking rule

a0082c4

irevoire force-pushed the introduce-the-horrifying-fields-ids-fieldids-weights-map-to-avoid-reindexing-the-searchable-when-only-updating-their-order branch from 8e6ea3a to 31ed3a2 Compare May 14, 2024 15:00

irevoire added this to the v1.9.0 milestone May 14, 2024

irevoire marked this pull request as ready for review May 14, 2024 15:22

irevoire added 2 commits May 14, 2024 17:36

make the attribute ranking rule use the weights and fix the tests

caa6a71

make clippy happy

9fffb8e

irevoire force-pushed the introduce-the-horrifying-fields-ids-fieldids-weights-map-to-avoid-reindexing-the-searchable-when-only-updating-their-order branch from c30371b to 9fffb8e Compare May 14, 2024 15:36

irevoire requested a review from Kerollmops May 14, 2024 17:09

irevoire changed the title ~~Introduce the horrifying fields ids fieldids weights map to avoid reindexing the searchable when only updating their order~~ Split the field id map from the weight of each fields May 14, 2024

Kerollmops requested changes May 15, 2024

View reviewed changes

irevoire added 4 commits May 15, 2024 15:02

apply all style review comments

7ec4e2a

stops storing the whole fieldids weights map when no searchable are d…

ad4d850

…efined

get back to what we were doingb efore in the DB cache and with the re…

5542f1d

…stricted field id

rename method and variable around the attributes to search on feature

c78a2fa

irevoire requested a review from Kerollmops May 15, 2024 16:17

Kerollmops requested changes May 15, 2024

View reviewed changes

irevoire added performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption search relevancy Related to the relevancy of the search results settings diff-indexing Issues related to settings diff-indexing indexing labels May 15, 2024

when no searchable attributes are defined, makes all the weight equal…

f2d0a59

…s to zero

irevoire requested a review from Kerollmops May 15, 2024 23:20

Kerollmops previously approved these changes May 16, 2024

View reviewed changes

fix a flaky test

673b6e1

irevoire dismissed Kerollmops’s stale review via 673b6e1 May 16, 2024 09:28

Kerollmops approved these changes May 16, 2024

View reviewed changes

meili-bors bot merged commit 7c19c07 into main May 16, 2024
10 checks passed

meili-bors bot deleted the introduce-the-horrifying-fields-ids-fieldids-weights-map-to-avoid-reindexing-the-searchable-when-only-updating-their-order branch May 16, 2024 10:27

irevoire mentioned this pull request May 16, 2024

[v1.9.0] Relevancy changes #4639

Closed

Kerollmops added a commit that referenced this pull request May 21, 2024

squash-me: FidMap no longer useful thanks to #4631

f43c2b8

Kerollmops added a commit that referenced this pull request May 21, 2024

squash-me: FidMap no longer useful thanks to #4631

c41e979

Kerollmops added a commit that referenced this pull request May 22, 2024

FieldIdsMap no longer useful thanks to #4631

bc5663e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split the field id map from the weight of each fields #4631

Split the field id map from the weight of each fields #4631

irevoire commented May 13, 2024 •

edited

Kerollmops left a comment

Kerollmops left a comment

dureuill commented May 16, 2024

meili-bors bot commented May 16, 2024

Kerollmops left a comment

meili-bors bot commented May 16, 2024

Split the field id map from the weight of each fields #4631

Split the field id map from the weight of each fields #4631

Conversation

irevoire commented May 13, 2024 • edited

Pull Request

Related issue

What does this PR do?

Impact on relevancy

When no searchable attributes are declared

When a nested field is made searchable

Kerollmops left a comment

Choose a reason for hiding this comment

Kerollmops left a comment

Choose a reason for hiding this comment

dureuill commented May 16, 2024

meili-bors bot commented May 16, 2024

Kerollmops left a comment

Choose a reason for hiding this comment

meili-bors bot commented May 16, 2024

irevoire commented May 13, 2024 •

edited