Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested fields #2298

Merged
merged 1 commit into from Apr 11, 2022
Merged

Nested fields #2298

merged 1 commit into from Apr 11, 2022

Conversation

irevoire
Copy link
Member

@irevoire irevoire commented Apr 7, 2022

There are a few things that I want to fix AFTER merging this PR.
For the following RCs.

Stop the useless conversion

In the search.rs I convert a Document to a Value, and then the Value to a Document and then back to a Value etc. I should stop doing all these conversion and stick to one format.
Probably by merging my permissive-json-pointer crate into meilisearch.
That would also give me the opportunity to work directly with obkvs and stops deserializing fields I don't need.

Add more test specific to the nested

Everything seems to works but I should write tests to double check that the nested works well with the formatted field.

See how I could stop iterating on hashmap and instead fill them correctly

This is related to milli. I really often needs to iterate over hashmap to see if a field is a subset of another field. I could probably generate a structure containing all the possible key values.
ie. the user say doggo is an attribute to retrieve. Instead of iterating on all the attributes to retrieve to check if doggo.name is a subset of doggo. I should insert doggo.name in the attributes to retrieve map.

@irevoire irevoire added the breaking change The related changes are breaking for the users label Apr 7, 2022
@curquiza curquiza added this to the v0.27.0 milestone Apr 7, 2022
meilisearch-lib/src/index/search.rs Outdated Show resolved Hide resolved
meilisearch-lib/src/index/search.rs Outdated Show resolved Hide resolved
meilisearch-lib/src/index/search.rs Outdated Show resolved Hide resolved
Comment on lines +468 to +504
// then we need to convert the `serde_json::Map` into an `IndexMap`.
let document = document.into_iter().collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering, is there a risk that the fields in the Map are reordered, and returned in a different order than the original document?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was concerned about that too. From what I see it works but it's not specified it should so I guess I'm just lucky.
I would like to move the permissive-json-pointer crate into meilisearch so we could make it work directly with the IndexMap.
Do you think this needs to be fixed before the RC0 or can it wait until the RC1?

@gmourier @curquiza
Basically, the jsons in hits could have their fields in a different order each time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, when the preserve_order feature from serde_json is enables, a serde_json::Map is nothing but an IndexMap under the hood: https://docs.serde.rs/src/serde_json/map.rs.html#23-26

and it turns out that we're lucky: https://github.com/meilisearch/meilisearch/blob/main/meilisearch-lib/Cargo.toml#L45

This now begs the question: Do we need to do the back and forth conversion?

Copy link
Member Author

@irevoire irevoire Apr 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! No we don't, and I guess we could just remove entirely the IndexMap and define a Document as a serde_json::Map<String, Value>!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can indeed 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, the jsons in hits could have their fields in a different order each time.

Ok to wait for rc1 to fix this!

meilisearch-lib/src/index/search.rs Show resolved Hide resolved
meilisearch-lib/src/index/search.rs Outdated Show resolved Hide resolved
meilisearch-lib/src/index/search.rs Show resolved Hide resolved
@irevoire irevoire force-pushed the nested_fields branch 2 times, most recently from 7e45f8a to 68da743 Compare April 7, 2022 15:03
Copy link
Member

@Kerollmops Kerollmops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. We can now wait for the users to try to break this feature! I am also awaiting the benchmarks too.

meilisearch-auth/Cargo.toml Outdated Show resolved Hide resolved
meilisearch-lib/Cargo.toml Outdated Show resolved Hide resolved
meilisearch-lib/src/index/search.rs Show resolved Hide resolved
meilisearch-lib/src/index/search.rs Show resolved Hide resolved
meilisearch-lib/src/index/search.rs Show resolved Hide resolved
Kerollmops
Kerollmops previously approved these changes Apr 7, 2022
Copy link
Member

@Kerollmops Kerollmops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Are we waiting for meilisearch/milli#458 to be merged and released?

Kerollmops
Kerollmops previously approved these changes Apr 7, 2022
Copy link
Member

@Kerollmops Kerollmops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clippy fix!

@irevoire
Copy link
Member Author

irevoire commented Apr 7, 2022

Yep we need to wait for the release because we need to publish the new Cargo.lock for the CI to run

Kerollmops
Kerollmops previously approved these changes Apr 7, 2022
Copy link
Member

@Kerollmops Kerollmops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you @irevoire!

@irevoire
Copy link
Member Author

irevoire commented Apr 7, 2022

bors merge

bors bot added a commit that referenced this pull request Apr 7, 2022
2298: Nested fields r=irevoire a=irevoire

There are a few things that I want to fix _AFTER_ merging this PR.
For the following RCs.

## Stop the useless conversion
In the `search.rs` I convert a `Document` to a `Value`, and then the `Value` to a `Document` and then back to a `Value` etc. I should stop doing all these conversion and stick to one format.
Probably by merging my `permissive-json-pointer` crate into meilisearch.
That would also give me the opportunity to work directly with obkvs and stops deserializing fields I don't need.

## Add more test specific to the nested
Everything seems to works but I should write tests to double check that the nested works well with the `formatted` field.

## See how I could stop iterating on hashmap and instead fill them correctly
This is related to milli. I really often needs to iterate over hashmap to see if a field is a subset of another field. I could probably generate a structure containing all the possible key values.
ie. the user say `doggo` is an attribute to retrieve. Instead of iterating on all the attributes to retrieve to check if `doggo.name` is a subset of `doggo`. I should insert `doggo.name` in the attributes to retrieve map.

Co-authored-by: Tamo <tamo@meilisearch.com>
@bors
Copy link
Contributor

bors bot commented Apr 7, 2022

Build failed:

@Kerollmops
Copy link
Member

Is it a bug in Clippy?

@irevoire
Copy link
Member Author

irevoire commented Apr 7, 2022

yeah look like, it says it's an internal compiler error 😒

Also the benchmarks just finished running and it's worse than I thought 😩

group                                                             indexing_main_4ae7aea3                 indexing_nested_fields_0438ab7d
-----                                                             ----------------------                 -------------------------------
indexing/Indexing geo_point                                       1.00       8.7±0.19s        ? ?/sec    2.81      24.5±0.17s        ? ?/sec
indexing/Indexing movies in three batches                         1.00      18.2±0.26s        ? ?/sec    1.03      18.7±0.27s        ? ?/sec
indexing/Indexing movies with default settings                    1.00      17.6±0.11s        ? ?/sec    1.01      17.9±0.08s        ? ?/sec
indexing/Indexing songs in three batches with default settings    1.00      63.4±0.41s        ? ?/sec    1.08      68.3±0.93s        ? ?/sec
indexing/Indexing songs with default settings                     1.00      53.2±0.95s        ? ?/sec    1.21      64.2±0.93s        ? ?/sec
indexing/Indexing songs without any facets                        1.00      49.2±0.96s        ? ?/sec    1.20      59.2±1.22s        ? ?/sec
indexing/Indexing songs without faceted numbers                   1.00      52.0±0.23s        ? ?/sec    1.21      62.8±0.28s        ? ?/sec
indexing/Indexing wiki                                            1.00   1007.1±10.25s        ? ?/sec    1.01   1014.7±10.74s        ? ?/sec
indexing/Indexing wiki in three batches                           1.00    1136.9±7.83s        ? ?/sec    1.01    1152.7±9.64s        ? ?/sec

I don't really understand how the geosearch can be so slow

@irevoire
Copy link
Member Author

bors merge

@bors
Copy link
Contributor

bors bot commented Apr 11, 2022

@bors bors bot merged commit 31584f3 into main Apr 11, 2022
@bors bors bot deleted the nested_fields branch April 11, 2022 12:01
@curquiza curquiza mentioned this pull request Apr 11, 2022
3 tasks
bors bot added a commit that referenced this pull request Apr 12, 2022
2313: fix(search): remove the back and forth between the IndexMap and the serde_json::Map r=irevoire a=irevoire

This is ok because we're using the preserve_order feature in serde_json which is already internally using an IndexMap.

See #2298 (comment)


Co-authored-by: Tamo <tamo@meilisearch.com>
@curquiza curquiza added the v0.27.0 PRs/issues solved in v0.27.0 label Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change The related changes are breaking for the users v0.27.0 PRs/issues solved in v0.27.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants