Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Order of elements in list[Link] is not preserved #930

Open
moonrail opened this issue May 6, 2024 · 2 comments
Open

[BUG] Order of elements in list[Link] is not preserved #930

moonrail opened this issue May 6, 2024 · 2 comments

Comments

@moonrail
Copy link

moonrail commented May 6, 2024

Describe the bug
As #414 and #527 were closed without any interaction by a stale bot (yay...) here another attempt.

When using DBRefs/Links, the ordering is stable when using MongoDB directly or any of its lower level libraries.

This is however not always the case with beanie.

Upon insert/update the order is preserved, as defined.

Upon link-fetching via e.g. fetch_links or fetch_all_links, it is not.

A possible solution can be found here: #414 (comment)
This just copies all Ids to a list with the original order and sorts the fetched documents later based on this order.

To Reproduce
Examples can be found in both older issues.

Expected behavior
Using fetch_all_links and fetch_links keeps the order of items as defined as DBRefs in the database.

Additional context

@jamesmurphy-mc
Copy link

jamesmurphy-mc commented May 31, 2024

Note that in addition to not preserving the order of elements, fetch_links may even change the length of the returned list by removing duplicate elements. In summary, fetch_links behaves like an $in query, not like a list fetch.

The offending code is here, in the LinkTypes.LIST branch of the construct_query function in beanie.odm.utils.find:

"$lookup": {
"from": link_info.document_class.get_motor_collection().name, # type: ignore
"localField": f"{link_info.lookup_field_name}.$id",
"foreignField": "_id",
"as": link_info.field_name,
}

I see this as more of a correctness issue than a convenience issue. As a mongodb datastructure, arrays can contain duplicates and their order is part of their semantics. Users will expect fetching an array to fetch elements in order, including duplicates. I believe there are ways to dereference an array of ids efficiently, but even if it turns out that this is a fundamentally slow operation, then I think the solution would be to warn users of the conditions under which the operation is slow rather than to break array semantics.

As a proof of concept, here is an example query that does an in-order list fetch that could be modified to replace the existing $lookup stage in construct_query (playground link: https://mongoplayground.net/p/t7TyJK1oOEI)

db.orders.aggregate([
  {
    $unwind: "$items"
  },
  {
    $lookup: {
      from: "inventory",
      localField: "items.$id",
      foreignField: "_id",
      as: "items"
    }
  },
  {
    $group: {
      _id: "$_id",
      items: {
        $push: {
          $first: "$items"
        }
      },
      __oldRoot: {
        $last: "$$ROOT"
      }
    }
  },
  {
    $replaceRoot: {
      newRoot: {
        $mergeObjects: [
          "$__oldRoot",
          {
            items: "$items"
          }
        ]
      }
    }
  }
])

The idea is to unwind the array, perform an efficient lookup on the id, group the results into an object containing the fetched array and the old document, then merge the old document with a new document containing the fetched items.

@jamesmurphy-mc
Copy link

@roman-right How do you feel about the solution proposed here? Have I missed anything or do you have other concerns that I could help address? Before submitting a PR I would want to have you on-board with the general plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants