Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

searchableAttributes change order of fields in response hits #1495

Open
bidoubiwa opened this issue Jul 7, 2021 · 24 comments
Open

searchableAttributes change order of fields in response hits #1495

bidoubiwa opened this issue Jul 7, 2021 · 24 comments
Labels
bug Something isn't working as expected contribution accepted This issue accepts external contributions help wanted Extra attention is needed
Projects

Comments

@bidoubiwa
Copy link
Contributor

bidoubiwa commented Jul 7, 2021

Describe the bug

When adding fields in searchableAttributes the order of fields in the returned document are impacted:

{
  "searchableAttributes": [
    "title",
    "description"
  ],
  "filterableAttributes": [
    "genres"
  ]
}

id field of document goes from first position to second position.
Given this document:

{
  "id": 28,
  "title": "Apocalypse Now",
  "overview": "At the height of the Vietnam war, Captain Benjamin Willard is sent on a dangerous mission that, officially, \"does not exist, nor will it ever exist.\" His goal is to locate - and eliminate - a mysterious Green Beret Colonel named Walter Kurtz, who has been leading his personal army on illegal guerrilla missions into enemy territory.",
  "genres": [
    "Drama",
    "War"
  ],
  "poster": "https://image.tmdb.org/t/p/w500/gQB8Y5RCMkv2zwzFHbUJX3kAhvA.jpg",
  "release_date": 303523200
}

after a search this is how it is returned in the response body:

        {
            "title": "Apocalypse Now",
            "id": 28,
            "overview": "At the height of the Vietnam war, Captain Benjamin Willard is sent on a dangerous mission that, officially, \"does not exist, nor will it ever exist.\" His goal is to locate - and eliminate - a mysterious Green Beret Colonel named Walter Kurtz, who has been leading his personal army on illegal guerrilla missions into enemy territory.",
            "genres": [
                "Drama",
                "War"
            ],
            "poster": "https://image.tmdb.org/t/p/w500/gQB8Y5RCMkv2zwzFHbUJX3kAhvA.jpg",
            "release_date": 303523200
        },

Expected behavior
Same order

MeiliSearch Version: 0.21.0

@bidoubiwa
Copy link
Contributor Author

To reproduce, you should add this as setting:

{
  "searchableAttributes": [
    "title",
    "description"
  ],
  "filterableAttributes": [
    "genres"
  ]
}

@bidoubiwa bidoubiwa changed the title Order of fields in documents after search are not the same as at indexation time SearchableAttributes change order of fields in response hits Jul 8, 2021
@gmourier
Copy link
Member

gmourier commented Jul 8, 2021

Hello!

@MarinPostma (Please, tell me if it's not related to the transplant code), can we keep the order of the fields of the initial document in the response to be ISO with 0.20?

searchableAttributes should not impact this, it would be nice to keep this initial logic in my opinion!

@MarinPostma
Copy link
Contributor

Not possible, this is tied to the representation on the documents in milli @Kerollmops

@Kerollmops
Copy link
Member

Indeed, it can't be returned in the original order, it can only be returned in the searchableAttribute order, it is tied to the internal representation of the documents.

@gmourier
Copy link
Member

gmourier commented Jul 8, 2021

Thanks!

If I understand correctly the user must rewrite all these initially ordered fields in searchableAttributes if he wants to keep the original order.

In this case, he will not search on specific fields and the searchableAttributes feature does not fulfill its initial role.

Should we explain this somewhere to guide users and make it clear that the order can no longer be guaranteed from the moment searchableAttributes operates @dichotommy? Something nice with that is that searched attributes are put at the top of the returned documents but my guess is that we will have some questions about that.

@MarinPostma
Copy link
Contributor

TBH it should not matter. JSON are manipulated programatically, and the order is irrelevant. We make assumptions on the order of the fields at indexing, but the JSON spec specifies that a json object is unordered. Implementation of json deserializer may not preserve order.

@gmourier
Copy link
Member

gmourier commented Jul 8, 2021

Thanks for the clarification, @MarinPostma 👍. Indeed, this is something that should be clarified in this way. I think of the user who might wonder why and not being able to find an answer.

@qdequele
Copy link
Member

qdequele commented Jul 8, 2021

@MarinPostma however we use a linked hashmap to keep the order when indexing. As long as the searchableAttribute or displayedAttributes parameters are not changed the order will remain the same as the first update. @meilisearch/core-team do you know on what is the return order based on? Why does changing searchableAttributes impact the final result?

@MarinPostma
Copy link
Contributor

@qdequele the fieldId doubles as field position, that's why

@dichotommy
Copy link
Contributor

Does displayedAttributes have the same effect on the order of the results?

@MarinPostma
Copy link
Contributor

@dichotommy nope

@qdequele qdequele added this to Unexpected Changes in Fixed issues with v0.21.0 Jul 20, 2021
@curquiza curquiza added the bug Something isn't working as expected label Jul 28, 2021
@curquiza
Copy link
Member

curquiza commented Jul 28, 2021

Even if the order is relevant in JSON, this behavior can impact the front-end. Cf the mini-dashboard: we clearly see the switch of the fields.
Change the front-end is not a behavior the users can expect when they update the searchableAttributes.
But this is indeed not a major problem at the moment and will not impact all the front-end applications. Let's wait for the users feedbacks.

@bidoubiwa
Copy link
Contributor Author

@curquiza Is this not fixed with this meilisearch/milli/pull/293 ?

@curquiza
Copy link
Member

curquiza commented Jul 28, 2021

Nope, it's not in the v0.21.0 milestones. You can check it directly on the issue

I was not there when the decision of removing this from the v0.21.0 was made, but, by reading this issue, it does not seem to be an easy fix since we don't know how to do it.

@curquiza curquiza removed this from Unexpected Changes in Fixed issues with v0.21.0 Jul 28, 2021
@curquiza curquiza reopened this Jul 29, 2021
@curquiza curquiza changed the title SearchableAttributes change order of fields in response hits searchableAttributes change order of fields in response hits Aug 10, 2021
@curquiza curquiza added this to Candidate in Bug triage via automation Aug 10, 2021
@curquiza curquiza moved this from Candidate to Bugs - severity 3 in Bug triage Aug 10, 2021
@curquiza curquiza added the help wanted Extra attention is needed label Jun 16, 2022
@curquiza curquiza added the contribution accepted This issue accepts external contributions label Aug 31, 2022
@mou
Copy link
Contributor

mou commented Oct 16, 2022

I created a draft PR not only to declare my intention to fix this issue but also to provide a reproduction case to anyone who decided to make their own attempt.

@mou
Copy link
Contributor

mou commented Oct 23, 2022

Oh, This is a tricky one. I spent quite some time locating the cause and still could not figure out the best approach to fix it. So let me explain the logic that leads to undesirable output first:

  • order of field names in searchableAttributes determines the order of fields in FieldsIdsMap
  • when we set searchableAttributes, it triggers reindex
  • new representation of the document is created (I do not know why, but I believe it's used in relevancy calculation somehow), and a new document with new fields order is saved in place of the old one
  • at this point, we lost the original order of fields
  • later, the order of fields in resulted JSON representation is determined by the order of fields in the stored document

The whole week I thought about this issue, and I think, in the current implementation, this could not be definitely identified as a bug. Because there are other cases when the order of fields in the original document is not preserved. For example, if we add the first document with fields ["first", "second"] and later add another document with fields ["second", "first']. After that, the order of fields in the second document will always be ["first", "second"] (until it changed via searchableAttributes or displayableAttributes). This is because the order of fields is determined by order of the first appearance in the index.

I think this issue can not be solved without the introduction of additional metadata, which will preserve field order disregarding all other settings. But only if you decide this is a bug, not a feature.

@mou
Copy link
Contributor

mou commented Oct 23, 2022

@curquiza please, take a look at the previous comment. I believe this issue requires an additional round of triage.

@mou
Copy link
Contributor

mou commented Oct 23, 2022

Reordering of fields occurs in milli/src/update/index_documents/transform.rs:545 remap_index_documents, which is called from milli/src/update/settings.rs:269 reindex.

@mou
Copy link
Contributor

mou commented Oct 23, 2022

Here are the reproduction steps for the scenario I mentioned:

curl -X POST -H "Content-Type: application/json" --data-binary '{"uid": "field_order_test", "primaryKey": "id"}' http://localhost:7700/indexes
curl -X POST -H "Content-Type: application/json" --data-binary '[{"id": 1, "first": "foo", "second": "bar"}]' http://localhost:7700/indexes/field_order_test/documents
curl -X POST -H "Content-Type: application/json" --data-binary '[{"id": 2, "second": "foo", "first": "bar"}]' http://localhost:7700/indexes/field_order_test/documents
curl -s -X GET http://localhost:7700/indexes/field_order_test/documents | jq
{
  "results": [
    {
      "id": 1,
      "first": "foo",
      "second": "bar"
    },
    {
      "id": 2,
      "first": "bar",
      "second": "foo"
    }
  ],
  "offset": 0,
  "limit": 20,
  "total": 2
}

@Kerollmops
Copy link
Member

Thank you very much for your investigation @mou,

We have been aware of the issue for a long time and would like to take the time to fix that. Unfortunately, this is a quite complex feature to implement, we will prioritize that in the future. This is related to the internal way we store the documents, the current system is efficient in storing the documents without taking up too much space, but it can be improved a lot by compressing the documents, for example. The new design of this system should try to compress the documents and also keep the order of the fields.

@mou
Copy link
Contributor

mou commented Nov 18, 2022

@Kerollmops, so, if you have already decided to reimplement document storage with new requirements, then I do not see any reason for trying to fix this issue in the current architecture. Or I did not understand you well? Also, feel free to discard my draft PR with the test case.

@Kerollmops
Copy link
Member

Kerollmops commented Nov 23, 2022

Indeed, we want to do a bunch of optimization on the way we store our documents internally to consume less space, and we will evaluate the amount of work required to keep the original order of the fields at this moment.

Thank you very much for your PR, but we will not keep it as you correctly pointed out that we will refactor many of the code parts related to this behavior, and your PR will likely be outdated by then ❤️

@mou
Copy link
Contributor

mou commented Nov 23, 2022

Ok. I will close it then.

@curquiza
Copy link
Member

curquiza commented Jan 9, 2023

Discussed with @Kerollmops: could be done once the Meilisearch DB storage will be re-worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected contribution accepted This issue accepts external contributions help wanted Extra attention is needed
Projects
No open projects
Bug triage
Bugs - severity 3
Development

Successfully merging a pull request may close this issue.

8 participants