Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several SQL queries for each related item instead of one aggregated query #6035

Closed
MurzNN opened this issue Jun 30, 2021 · 11 comments
Closed

Comments

@MurzNN
Copy link
Contributor

MurzNN commented Jun 30, 2021

Bug report

Describe the bug

When executing very simple GraphQL query with querying data from one relationship field, Keystone executes separate query for each relation, instead of one aggregated query with list of needed ids

To Reproduce

  1. Launch the "examples/basic" demo, enable database logging via enableLogging: true.
  2. Fill 2 authors and 4 posts to database (some linked with author, some - not).
  3. Execute this GraphQL query:
query {
  allPosts {
    id, author {id, name}
  }
}
  1. Lookup to the output, you will see several SQL queries:
prisma:query SELECT `main`.`Post`.`id`, `main`.`Post`.`title`, `main`.`Post`.`status`, `main`.`Post`.`content`, `main`.`Post`.`publishDate`, `main`.`Post`.`author` FROM `main`.`Post` WHERE 1=1 LIMIT ? OFFSET ?
prisma:query SELECT 1
prisma:query SELECT `main`.`Author`.`id`, `main`.`Author`.`name`, `main`.`Author`.`email` FROM `main`.`Author` WHERE (`main`.`Author`.`id`) IN (SELECT `t0`.`id` FROM `main`.`Author` AS `t0` INNER JOIN `main`.`Post` AS `j0` ON (`j0`.`author`) = (`t0`.`id`) WHERE (`j0`.`id` = ? AND `t0`.`id` IS NOT NULL)) LIMIT ? OFFSET ?
prisma:query SELECT `main`.`Author`.`id`, `main`.`Author`.`name`, `main`.`Author`.`email` FROM `main`.`Author` WHERE (`main`.`Author`.`id`) IN (SELECT `t0`.`id` FROM `main`.`Author` AS `t0` INNER JOIN `main`.`Post` AS `j0` ON (`j0`.`author`) = (`t0`.`id`) WHERE (`j0`.`id` = ? AND `t0`.`id` IS NOT NULL)) LIMIT ? OFFSET ?
prisma:query SELECT `main`.`Author`.`id`, `main`.`Author`.`name`, `main`.`Author`.`email` FROM `main`.`Author` WHERE (`main`.`Author`.`id`) IN (SELECT `t0`.`id` FROM `main`.`Author` AS `t0` INNER JOIN `main`.`Post` AS `j0` ON (`j0`.`author`) = (`t0`.`id`) WHERE (`j0`.`id` = ? AND `t0`.`id` IS NOT NULL)) LIMIT ? OFFSET ?
prisma:query SELECT `main`.`Author`.`id`, `main`.`Author`.`name`, `main`.`Author`.`email` FROM `main`.`Author` WHERE (`main`.`Author`.`id`) IN (SELECT `t0`.`id` FROM `main`.`Author` AS `t0` INNER JOIN `main`.`Post` AS `j0` ON (`j0`.`author`) = (`t0`.`id`) WHERE (`j0`.`id` = ? AND `t0`.`id` IS NOT NULL)) LIMIT ? OFFSET ?

Problems:

  1. Why each author queried via separate SQL query? And even when author is empty!
  2. Why all fields are queried from "Post" table, when we need only id and author for query results?
  3. Why author email is queried using inner query and even inner join inside it?

Those problems give large performance degradation on databases with many items!

Expected behavior

  1. To optimize performance, Keystone must collect id of all relationships, and query them via one SQL query with list of need ids.
  2. We should query only requested fields, not all available fields from table.
  3. There must be no inner joins and subqueries for such simple query.

System information

Keystone version: 21.0.2

@MurzNN MurzNN changed the title Searate SQL queries for each relation instead of one aggregated Several SQL queries for each relation instead of one aggregated query Jun 30, 2021
@MurzNN MurzNN changed the title Several SQL queries for each relation instead of one aggregated query Several SQL queries for each related item instead of one aggregated query Jun 30, 2021
@molomby
Copy link
Member

molomby commented Sep 17, 2021

I believe this is fixed in the current builds.

Not sure where the "basic" example ended up; I tested against the examples/blog project in @keystone-next/keystone v25.0.1 (which uses @prisma/client v2.30.3), running on SQLite and again on Postgres.

Running a GraphQL query like yours:

query {
  authors {
    id
    name
    posts { id title }
  }
}

Now logs:

prisma:query SELECT 1
prisma:query SELECT `main`.`Author`.`id`, `main`.`Author`.`name`, `main`.`Author`.`email` FROM `main`.`Author` WHERE 1=1 LIMIT ? OFFSET ?
prisma:query SELECT 1
prisma:query SELECT `main`.`Post`.`id`, `main`.`Post`.`title`, `main`.`Post`.`status`, `main`.`Post`.`content`, `main`.`Post`.`publishDate`, `main`.`Post`.`author` FROM `main`.`Post` WHERE (`main`.`Post`.`id`) IN (SELECT `t0`.`id` FROM `main`.`Post` AS `t0` INNER JOIN `main`.`Author` AS `j0` ON (`j0`.`id`) = (`t0`.`author`) WHERE (`j0`.`id` = ? AND `t0`.`id` IS NOT NULL)) LIMIT ? OFFSET ?

If you format that last query you'll see it's selecting multiple posts:

SELECT `main`.`Post`.`id`, `main`.`Post`.`title`, `main`.`Post`.`status`, `main`.`Post`.`content`, `main`.`Post`.`publishDate`, `main`.`Post`.`author`
FROM `main`.`Post`
WHERE (`main`.`Post`.`id`) IN (
	SELECT `t0`.`id`
	FROM `main`.`Post` AS `t0`
	INNER JOIN `main`.`Author` AS `j0` ON (`j0`.`id`) = (`t0`.`author`)
	WHERE (`j0`.`id` = ? AND `t0`.`id` IS NOT NULL)
)
LIMIT ?
OFFSET ?

So one query for the author then another for the posts (I'd created 3). It's not the SQL I'd write by hand (that WHERE .. IN seems pretty redundant), but should run ok.

Let me know if you're still seeing an issue.

@molomby molomby closed this as completed Sep 17, 2021
@MurzNN
Copy link
Contributor Author

MurzNN commented Sep 21, 2021

Yes, this looks good when there is only one item in relation, but if there are many items, the problem is still here with current fresh version (commit id 7a7f3f6). How to reproduce with examples/blog:

  1. Create 2 authors.
  2. Create 6 posts with relation to random author.
  3. Enable SQL logging.
  4. Launch the query:
query {
  posts {
    id, title
    author { id, name }
  }
}

and see the several queries in result:

prisma:query SELECT 1
prisma:query SELECT `main`.`Post`.`id`, `main`.`Post`.`title`, `main`.`Post`.`status`, `main`.`Post`.`content`, `main`.`Post`.`publishDate`, `main`.`Post`.`author` FROM `main`.`Post` WHERE 1=1 LIMIT ? OFFSET ?
prisma:query SELECT `main`.`Author`.`id`, `main`.`Author`.`name`, `main`.`Author`.`email` FROM `main`.`Author` WHERE (`main`.`Author`.`id`) IN (SELECT `t0`.`id` FROM `main`.`Author` AS `t0` INNER JOIN `main`.`Post` AS `j0` ON (`j0`.`author`) = (`t0`.`id`) WHERE (`j0`.`id` = ? AND `t0`.`id` IS NOT NULL)) LIMIT ? OFFSET ?
prisma:query SELECT `main`.`Author`.`id`, `main`.`Author`.`name`, `main`.`Author`.`email` FROM `main`.`Author` WHERE (`main`.`Author`.`id`) IN (SELECT `t0`.`id` FROM `main`.`Author` AS `t0` INNER JOIN `main`.`Post` AS `j0` ON (`j0`.`author`) = (`t0`.`id`) WHERE (`j0`.`id` = ? AND `t0`.`id` IS NOT NULL)) LIMIT ? OFFSET ?
prisma:query SELECT `main`.`Author`.`id`, `main`.`Author`.`name`, `main`.`Author`.`email` FROM `main`.`Author` WHERE (`main`.`Author`.`id`) IN (SELECT `t0`.`id` FROM `main`.`Author` AS `t0` INNER JOIN `main`.`Post` AS `j0` ON (`j0`.`author`) = (`t0`.`id`) WHERE (`j0`.`id` = ? AND `t0`.`id` IS NOT NULL)) LIMIT ? OFFSET ?
prisma:query SELECT `main`.`Author`.`id`, `main`.`Author`.`name`, `main`.`Author`.`email` FROM `main`.`Author` WHERE (`main`.`Author`.`id`) IN (SELECT `t0`.`id` FROM `main`.`Author` AS `t0` INNER JOIN `main`.`Post` AS `j0` ON (`j0`.`author`) = (`t0`.`id`) WHERE (`j0`.`id` = ? AND `t0`.`id` IS NOT NULL)) LIMIT ? OFFSET ?
prisma:query SELECT `main`.`Author`.`id`, `main`.`Author`.`name`, `main`.`Author`.`email` FROM `main`.`Author` WHERE (`main`.`Author`.`id`) IN (SELECT `t0`.`id` FROM `main`.`Author` AS `t0` INNER JOIN `main`.`Post` AS `j0` ON (`j0`.`author`) = (`t0`.`id`) WHERE (`j0`.`id` = ? AND `t0`.`id` IS NOT NULL)) LIMIT ? OFFSET ?
prisma:query SELECT `main`.`Author`.`id`, `main`.`Author`.`name`, `main`.`Author`.`email` FROM `main`.`Author` WHERE (`main`.`Author`.`id`) IN (SELECT `t0`.`id` FROM `main`.`Author` AS `t0` INNER JOIN `main`.`Post` AS `j0` ON (`j0`.`author`) = (`t0`.`id`) WHERE (`j0`.`id` = ? AND `t0`.`id` IS NOT NULL)) LIMIT ? OFFSET ?

There are 6 separate queries for each related author, even having 2 authors in database!

Ideally here must be two queries (first - grab all posts and gather relation ids, second - grab authors, mentioned in posts relations), or at least three queries (all posts, and one query per each unique author id).

@MurzNN
Copy link
Contributor Author

MurzNN commented Sep 21, 2021

The source of this issue is the "N+1" problem, that is unsolved yet in Keystone's GraphQL implementation. So maybe reopen this issue to keep this problem active?

And this happens in many GraphQL based API, here are some articles with more detailed description and possible solutions for it:

@molomby
Copy link
Member

molomby commented Sep 23, 2021

Ah, you're right, my test case was flawed.

I've reproduced this now but still surprised this is occurring. We lean heavily on Prisma for query building and it uses a data loaders to avoid N+1 query behaviour. I guess Keystone must be doing something that inadvertently prevents that approach from working.

Thanks @MurzNN, I'll put this on the list to investigate.

@molomby molomby reopened this Sep 23, 2021
@molomby
Copy link
Member

molomby commented Sep 23, 2021

Ok, I've talked this though with @timleslie (apparently you and him have been chatting recently too). My current understanding is that, yes, Prisma does proactively apply a data loader pattern to avoid N+1 problems but only does so when it's really obvious that it can (only findUnique calls using the the same parameters). The access control that Keystone applies unintentionally prevents this from occurring – we call findFirst instead of findUnique so we can supply the additional access filters.

There may be some low hanging fruit here whereby, maybe we can only call findFirst when access filters are actual in effect. This would allow some lists to take advantage of the automatic data loading Prisma provides. It still only solves for querying via many-to-one relationships; coming the direction (one-to-many) will trigger N+1 queries in any case. Otherwise, the longer term solution is to add a data loader layer in our own code and be smarter about how queries are constructed.

Regardless, you shouldn't be relying on Keystone to always build optimal queries. If you have complex queries that are used regularly in your main frontend, you're probably better moving to custom mutations that do exactly what's needed. The CRUD schema Keystone provides is intended to support the Admin UI and as a starting point for general app dev.

@molomby
Copy link
Member

molomby commented Sep 23, 2021

Partial fix underway in #6639 🥳

@molomby
Copy link
Member

molomby commented Sep 27, 2021

Hey @MurzNN, this should now be fixed by #6639. Would love your confirmation.

Note the fix only helps when you're resolving a relationship "many-to-one", as in your example. Ie. when the query builder has the foreign key value and is looking that up in the parent table. There are still cases where N+1 behaviour will be observed, eg. other relationship types, the other query "direction" or if filter-based access control is being applied. Neither Prisma or Keystone is setup to optimally resolve these types of query. Regardless, I believe this specific issue is can be closed.

@molomby molomby closed this as completed Sep 27, 2021
@rburgst
Copy link
Contributor

rburgst commented Oct 15, 2022

Hi, I am running into this issue when using filter-based access control. So I do have n+1 queries. I am wondering whether there is a better way of handling this. E.g. maybe do the filtering on the loaded elements (post query).

Right now I do have a similar filter access query

access: {
  filter: {
     query: (args) => {
        return { 
            tenant: { id: { equals: args.session.tenantId } }
        }
     }
  }
}

This leads to prisma.findMany() in getRelationVal which cannot be 'data-loaded' and therefore leads to an n+1 query.

It would be cool to be able to add a hook that could perform the filtering on the relation result set (e.g. load all the foreign related objects, and then filter out the ones that wont match).

I am guessing that keystone is waiting for prisma to pick up the issue (see prisma/prisma#1477) however, it does not appear that this will happen anytime soon.

@rburgst
Copy link
Contributor

rburgst commented Oct 15, 2022

Hmm, maybe I am in luck, this look like it would do the trick: #8000

@MurzNN
Copy link
Contributor Author

MurzNN commented Oct 16, 2022

The PR number says that you're in luck! 😉🤣

@mariomnts
Copy link

I'm seeing this behaviour when doing simple queries again, not sure if it's a regression or an issue that never got solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants