Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable filtering on populate parameter in em.find() #567

Closed
followben opened this issue May 12, 2020 · 4 comments
Closed

Enable filtering on populate parameter in em.find() #567

followben opened this issue May 12, 2020 · 4 comments
Labels
enhancement New feature or request

Comments

@followben
Copy link
Contributor

followben commented May 12, 2020

Is your feature request related to a problem? Please describe.
We have Fee and Discount entities with a nullable (and inversed) 1:many relation of fee.discounts. Each of these also has a nullable many:1 relation with Organisation. I need to select only fees and associated discounts that (among other things) match a particular organisation or none at all, i.e. excluding all other organisations.

As it stands I can construct a "match or null" where clause on em.getRespository(Fee).find() to retrieve the fee entities, but I can only populate all discounts for a fee or none at all.

Describe the solution you'd like
I'd like to be able to filter the relation populated like I can filter the primary entity.

Describe alternatives you've considered
We've encountered this requirement numerous times while using mikro-orm. As a workaround we currently either:

  1. populate all entities in the relation and filter in the application code, or
  2. execute multiple queries, or
  3. use QueryBuilder to select multiple rows per primary entity (each with the relation matching a where on the join) and map result columns to entities manually

1 & 2 are suboptimal performance-wise. 3 is better for the db but feels unwieldy and unnecessary - constructing a (psuedo-)sql query and deserialising results is... why we have an ORM :)

Additional context
LLBL achieves this via the PrefetchPath to which you can attach a predicate, sort etc.

Sequelize enables the same via an include statement during eager loading. IMHO, a similar syntax wouldn't look out of place here.

Looks like work is well underway on LoadStrategy at #556. Could this be achieved via the subquery option suggested in #440?

@followben followben added the enhancement New feature or request label May 12, 2020
@B4nan
Copy link
Member

B4nan commented May 12, 2020

I'd like to be able to filter the relation populated like I can filter the primary entity.

As discussed on slack, this is already possible, the partial condition that apply to child/populated entity will be propagated to the query that loads it.

const res = await orm.em.find(Author2, { books: { title: { $in: ['b1', 'b2'] } } }, ['books']);
// [query] select "e0".* from "author2" as "e0" left join "book2" as "e1" on "e0"."id" = "e1"."author_id" where "e1"."title" in ($1, $2) [took 6 ms]
// [query] select "e0".*, "e0".price * 1.19 as "price_taxed" from "book2" as "e0" where "e0"."author_id" in ($1) and "e0"."title" in ($2, $3) order by "e0"."title" asc [took 5 ms]

1 & 2 are suboptimal performance-wise. 3 is better for the db but feels unwieldy and unnecessary - constructing a (psuedo-)sql query and deserialising results is... why we have an ORM :)

First of all, the approach with multiple queries can be actually faster when the DB is really large. Having too complex queries can result in real slowness, I've been there. I don't know why everybody thinks that more simple queries are always slower, that is simply not true, and its not true especially for large databases (and we are talking about those anyway, right?).

Why do you think we need subqueries? How is that different from joined query? (from my experience, subqueries are usually slower)

Sequelize enables the same via an include statement during eager loading. IMHO, a similar syntax wouldn't look out of place here.

I said it somewhere already, I can't understand why we should have so complex api, why not use a query builder for that? QB does work with entities, and support mapping results to entities. Also supports auto-joining, so the API is really similar to the one from EM. And with #556 it will also support mapping of multiple (nested) entities from single (joined) query.

@followben
Copy link
Contributor Author

followben commented May 13, 2020

this is already possible the partial condition that apply to child/populated entity will be propagated to the query that loads it

That's great news. Alas, I'm struggling to get this working - I've created a test case to demonstrate. Be great if you could set me straight on what we're doing wrong :)

I don't know why everybody thinks that more simple queries are always slower, that is simply not true

You're right of course - I meant to say "1 and 2 are potentially suboptimal". OTOH, complex single queries can be faster the multiples if they're correctly designed, planned and optimised. If em.find() supports both then we have the option to choose.

Why do you think we need subqueries?

I don't. You're right: joins can be (although aren't always) better optimised. I was just asking if that's how we'd achieve populate filtering using the forthcoming changes on #556, given that's what Will had mentioned originally at #440.

why not use a query builder... it will also support mapping of multiple (nested) entities from single (joined) query

Because it doesn't support nested mapping yet :) Once it does, we can certainly use QB to achieve this.

I'm arguing filter on populate is sufficiently useful it should be incorporated in higher-level find methods. If it already is and we're just using it wrong... great! If not, why not add it?

QB is essentially SQL re-written in a DSL: powerful and great to have at hand, but verbose and somewhat cumbersome for a useful feature that could have a simpler api.

@B4nan
Copy link
Member

B4nan commented May 13, 2020

Your problem with first query is the { foos: null } in the query condition, this is not yet supported (not even sure if we will support it like this, I was thinking about adding new operators for this use case, $empty and $size). Here is an issue about it #548.

You can do this:

const result1 = await orm.em.find(Bar, { $or: [{ foos: { prop: 'a' } }, { foos: { id: null } }] }, ['foos']);

which does produce this query:

select "e0".* from "bar" as "e0" left join "foo" as "e1" on "e0"."id" = "e1"."bar_id" where ("e1"."prop" = 'a' or "e1"."id" is null)

Still not what you want, as it will return only one entity. In fact there are relations set in both your entities.

So the question is more like - what SQL are you expecting to solve this?

In the comment you have:

Goal: find all bars but only populate the foo relation if foo.prop is 'a', i.e.

Well but this is not what you are doing in the query above. You are requesting bars that either have a relation with foo.prop = a, or there is no relation at all. The condition is first applied to the root entity, so you can't get all bars this way.

If you want to have all bars but populate only something, correct solution (and I am pretty sure that also the fastest when it comes to big data) is to do one query to load the roots and another to populate:

const result1 = await orm.em.find(Bar, {});
await orm.em.populate(result1, ['foos'], { foos: { prop: 'a' } } as any); // there are some issues in v4 typings of the populate method, so had to cast as any otherwise the type of result1 was not used as T
expect(result1).toHaveLength(2);
expect(flatMap(result1, b => b.foos.getItems())).toHaveLength(1);

Doing it otherwise just seems overcomplicated. Also your links to other solutions are about something different I think, I don't think they can do this in single query (and let me say it one more time, the solution in typeorm looks terrible, I would never let this pass my code review :trollface:, if we want to optimise for performance, its much easier to use raw sql instead).

Let's put it this way - if you want to do it in single query, how do you think the query should look like?

If you are fine with multiple queries, then my approach with em.populate() feels much more readable.


Btw I don't understand what are you expecting from having 1:m nullable. Its an inverse side, collection, it can't be nullable, there is nothing to be nulled (you can have it there, but its a no-op, nullable have a meaning only on owning sides).

QB is essentially SQL re-written in a DSL: powerful and great to have at hand, but verbose and somewhat cumbersome for a useful feature that could have a simpler api.

QB in MikroORM is much more powerful, it handles auto-joins, it is aware of entity metadata. I would argue about it being verbose, it can be used almost the same way as with the EM API. You can use the very same query condition here and there, no need to manually join things. And what you are proposing here would basically make the EM api more close to the QB in terms of complexity.

@B4nan
Copy link
Member

B4nan commented Jun 16, 2020

Closing due to inactivity. As said above, this is already possible, conditions are propagated to the populate queries, and in v4 it is possible to use the joined strategy too. I don't see a value in having another strategy, introducing the joined one ended up being the second most complex thing in the whole ORM development (right after composite keys).

@B4nan B4nan closed this as completed Jun 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants