Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query is 5 times slower when using findMany compared to running identical query via queryRaw #11130

Closed
peterrogov opened this issue Jan 12, 2022 · 14 comments
Labels
bug/2-confirmed Bug has been reproduced and confirmed. domain/client Issue in the "Client" domain: Prisma Client, Prisma Studio etc. kind/bug A reported bug. topic: performance/queries topic: performance topic: skip topic: take

Comments

@peterrogov
Copy link

peterrogov commented Jan 12, 2022

Bug description

I am experiencing a huge discrepancy in query execution time when using prisma 3.8.0 with postgres (local installation).
When I query a large table made of simple records and use pagination params (skip and take) this query takes a very long time to execute via prisma.<model>.findMany however when I use a similar query with prisma.queryRaw the execution time is much less. Normal prisma queries are dramatically slower.

The model I use is very simple and defined as follows

model StringIdModel {
    id    String @id @default(uuid())
    value String
}

Assuming the database table is filled with 1M random records and I run tthis code:

const queryLimit = 1000;
const queryOffset = 950000;
await prisma.stringIdModel.findMany({ skip: queryLimit, take: queryOffset });
await prisma.$queryRaw`SELECT * FROM public."StringIdModel" LIMIT ${queryLimit} OFFSET ${queryOffset}`;

This is the result I get

findMany took 10938.153129ms.
queryRaw took 84.126091ms.

As you can see the difference is 130 (!) times.
I have tried running this many times with different ID column types, different dataset sizes and I can't see any other reason why this is the case. The only thing I can think of is something going wrong inside prisma.

Here's a repo that reproduces the problem
https://github.com/peterrogov/prisma-performance-test

How to reproduce

Use the following repository that has code with reproduction https://github.com/peterrogov/prisma-performance-test

OR

  1. Create a simple model that only has ID field and maybe a single scalar field.
  2. Deploy to DB and fill with random data (I use like 1M records)
  3. Run query with prisma.<model>.findMany and specify skip and take params. Provide skip and take such that you're fetching from the end of the dataset. Measure query execution time.
  4. Make the same query but now use prisma.queryRaw and make query like SELECT * FROM ... OFFSET ... LIMIT .... Run the query and measure execution time
  5. See the huge difference in time.

Expected behavior

There shouldn't be any significant difference executing such simple queries directly via SQL or via prisma methods. The time it takes a query with prisma must be close to a direct raw SQL.

Prisma information

@prisma/client 3.8.0

Environment & setup

PostgreSQL 12.4
MacOS 11.6.2
NodeJS 17.3

Prisma Version

prisma                  : 3.8.0
@prisma/client          : 3.8.0
Current platform        : darwin
Query Engine (Node-API) : libquery-engine 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f (at node_modules/@prisma/engines/libquery_engine-darwin.dylib.node)
Migration Engine        : migration-engine-cli 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f (at node_modules/@prisma/engines/migration-engine-darwin)
Introspection Engine    : introspection-core 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f (at node_modules/@prisma/engines/introspection-engine-darwin)
Format Binary           : prisma-fmt 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f (at node_modules/@prisma/engines/prisma-fmt-darwin)
Default Engines Hash    : 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f
Studio                  : 0.452.0
@peterrogov peterrogov added the kind/bug A reported bug. label Jan 12, 2022
@janpio
Copy link
Member

janpio commented Jan 12, 2022

Can you share the log output of the query generated and executed by Prisma under the hood? (https://pris.ly/d/logging or via https://github.com/peterrogov/prisma-performance-test/blob/master/src/index.ts#L10)

@janpio janpio added domain/client Issue in the "Client" domain: Prisma Client, Prisma Studio etc. topic: performance bug/0-unknown Bug is new, does not have information for reproduction or reproduction could not be confirmed. labels Jan 12, 2022
@peterrogov
Copy link
Author

peterrogov commented Jan 12, 2022

Here's the query generated by Prisma

SELECT "public"."StringIdModel"."id", "public"."StringIdModel"."value" FROM "public"."StringIdModel" WHERE 1=1 ORDER BY "public"."StringIdModel"."id" ASC LIMIT $1 OFFSET $2

If I modify my test code to re-execute the generated query via queryRaw this is what I get in terms of execution time

...findMany took 5415.553473ms.                   <--- run this query and capture the generated query
...queryRaw took 918.921837ms.                    <--- the same query captured on the previous step
...queryRaw took 118.010476ms.                    <--- run my own query 

@janpio janpio added bug/1-unconfirmed Bug should have enough information for reproduction, but confirmation has not happened yet. and removed bug/0-unknown Bug is new, does not have information for reproduction or reproduction could not be confirmed. labels Jan 12, 2022
@peterrogov
Copy link
Author

peterrogov commented Jan 12, 2022

Apparently, the most of the execution time difference comes from this

ORDER BY "public"."StringIdModel"."id" ASC

Also, apparently, this bit is a must. Otherwise postgres will return N rows but random ones... To get exactly the same rows it has to order them first.

However, even after rewriting my custom query and including ORDER BY in it, there is still a significant time difference between findMany and queryRaw.

Also, I do not fully understand why sequential integer Id requires ORDER BY but that's probably a question to DBMS not to prisma.

@peterrogov
Copy link
Author

peterrogov commented Jan 12, 2022

Below are the time measurements for for three different models. One has String UUID ID, the other one is Int and the last one is BigInt.

--- String ID model ---
prisma.findMany
...findMany took 1045.720738ms.

prisma.queryRaw -> generated query
...queryRaw took 900.809099ms.

prisma.queryRaw -> custom query
...queryRaw took 770.333674ms.


--- Integer ID model ---
prisma.findMany
...findMany took 252.138898ms.

prisma.queryRaw -> generated query
...queryRaw took 221.481055ms.

prisma.queryRaw -> custom query
...queryRaw took 228.59065ms.


--- BigInt ID model ---
prisma.findMany
...findMany took 293.972744ms.

prisma.queryRaw -> generated query
...queryRaw took 204.726444ms.

prisma.queryRaw -> custom query
...queryRaw took 205.228213ms.

Custom query now includes ORDER BY Id ASC and all three queries to each table return exactly the same data. You can see that the custom query is still significantly faster. Especially when String UUID Id column is used.

@janpio
Copy link
Member

janpio commented Jan 12, 2022

But I see correctly that we are down from 130x performance impact in the initial post to <20% for using Prisma Client API vs. running the same generated query with the $queryRaw method, correct?

What is the difference left between the generated and the custom query?

@peterrogov
Copy link
Author

@janpio Yes, you are right the 130x difference was due to ORDER BY but that was my silly mistake. I can't see any other difference that would make sense apart from a more verbose syntax of Prisma queries. But since the final query is still quite short and simple I don't think that alone could create a ~20% difference in execution time.

Also, as you can see, when I take the same query (generated by Prisma) and execute it via prisma.queryRaw the time is better. I have run many tests and this observation is consistent.

It feels like there's still something happening in prisma that adds 10-20% overhead to query execution even for such a simplistic scenario. I didn't test on more complex queries but I probably should. And I wish to emphasize that this difference in execution time is worse for models that have a string ID column. Those where ID is Int or BigInt are very close in terms of execution time.

@janpio
Copy link
Member

janpio commented Jan 13, 2022

That is the difference between running Prisma the ORM, and using Prisma to execute a plain, predefined SQL statement. The additional work around typing, creating the query and so on is expected to add some overhead.

We still might want to investigate where this overhead comes from, and if we can improve it of course. But it will not have very high priority as we are currently still focused on building out missing functionality. For most users this overhead is totally acceptable (and if not, they can use the raw query as you described).

@peterrogov
Copy link
Author

@janpio I agree with you on this one. I was investigating another issue (#11138) when I stumbled upon this. Initially I failed to realise the huge difference between using and not using ORDER BY. That's where 130x gap came from.

@georgekrax

This comment was marked as outdated.

@janpio

This comment was marked as outdated.

@paulbrickellmaven
Copy link

Just came across this thread. There is an optimisation for order by and skip, to wrap the order query and do the skip after. Like this...

SELECT * FROM (
SELECT * FROM table
WHERE whatever = true
ORDER BY date_field DESC ) tmp LIMIT 100 OFFSET 9000

Is there any way to hint Prisma to do this, or do I have to go to raw SQL? Just asking out of curiosity. TIA.

@ArnaudBarre
Copy link

Note sure if this is an oversight or the root cause but in your example you inverted the skip and offset parameter for the findMany call

@aqrln
Copy link
Member

aqrln commented Aug 21, 2023

While in the original repro the difference was mainly caused by the different query that was missing an ORDER BY and thus produced different results (although on my machine the difference between them is in a ballpark of 45–55 times, not 130x), I can't unfortunately confirm that using the identical query reduces the measured overhead to less than 20%. On my machine, when I run the exact same query via queryRaw instead of findMany, I observe findMany to typically be 5–6 times slower (in one run it was just 2 times slower; overall these are quite hand-wavy stats without actually measuring the parameters of the distribution but the difference seems consistent):

$ ts-node index.ts
prisma:query SELECT "public"."StringIdModel"."id", "public"."StringIdModel"."value" FROM "public"."StringIdModel" WHERE 1=1 ORDER BY "public"."StringIdModel"."id" ASC LIMIT $1 OFFSET $2
findMany: 2.775s
prisma:query SELECT "public"."StringIdModel"."id", "public"."StringIdModel"."value" FROM "public"."StringIdModel" WHERE 1=1 ORDER BY "public"."StringIdModel"."id" ASC LIMIT $1 OFFSET $2
queryRaw: 550.888ms

The order in which these queries are run doesn't matter:

$ ts-node index.ts
prisma:query SELECT "public"."StringIdModel"."id", "public"."StringIdModel"."value" FROM "public"."StringIdModel" WHERE 1=1 ORDER BY "public"."StringIdModel"."id" ASC LIMIT $1 OFFSET $2
queryRaw: 551.117ms
prisma:query SELECT "public"."StringIdModel"."id", "public"."StringIdModel"."value" FROM "public"."StringIdModel" WHERE 1=1 ORDER BY "public"."StringIdModel"."id" ASC LIMIT $1 OFFSET $2
findMany: 2.747s

Seeding script:

import { PrismaClient } from '@prisma/client'

async function main() {
  const prisma = new PrismaClient()
  const total = 1_000_000;

  for (let i = 0; i < total; i++) {
    if (i % 1000 === 0) {
      console.log(`${Math.round((i / total) * 100)}%`)
    }
    await prisma.stringIdModel.create({
      data: {
        value: String(Math.random()),
      },
    })
  }
}

void main()

Reproduction script:

import { PrismaClient } from '@prisma/client'

async function main() {
  const prisma = new PrismaClient({
    log: ['query'],
  })

  const queryLimit = 1000
  const queryOffset = 950000

  console.time('findMany')
  await prisma.stringIdModel.findMany({ skip: queryLimit, take: queryOffset })
  console.timeEnd('findMany')

  console.time('queryRaw')
  await prisma.$queryRaw`SELECT "public"."StringIdModel"."id", "public"."StringIdModel"."value" FROM "public"."StringIdModel" WHERE 1=1 ORDER BY "public"."StringIdModel"."id" ASC LIMIT ${queryLimit} OFFSET ${queryOffset}`
  console.timeEnd('queryRaw')
}

void main()

The schema is based on the original report, posting it here for convenience as well:

generator client {
  provider = "prisma-client-js"
}

datasource db {
  provider = "postgresql"
  url      = env("POSTGRESQL_DATABASE_URL")
}

model StringIdModel {
  id    String @id @default(uuid())
  value String
}

I get approximately the same difference both with latest Prisma 5 and with reported Prisma 3.8.0.

Looking at the traces, one significant contributor for findMany is prisma:engine:serialize but for some reason there's a visible difference even in prisma:engine:db_query alone.

findMany:
Image

queryRaw:
Image

Marking this as "confirmed" for now so we can look into it in more details and see if it makes sense and whether that's expected or not.

@aqrln aqrln removed the bug/1-unconfirmed Bug should have enough information for reproduction, but confirmation has not happened yet. label Aug 21, 2023
@aqrln aqrln added the bug/2-confirmed Bug has been reproduced and confirmed. label Aug 21, 2023
@aqrln aqrln changed the title Serious performance issue when executing SELECT queries Query is 5 times slower when using findMany compared to running identical query via queryRaw Aug 21, 2023
@aqrln
Copy link
Member

aqrln commented Aug 21, 2023

@ArnaudBarre

Note sure if this is an oversight or the root cause but in your example you inverted the skip and offset parameter for the findMany call

Thanks so much for catching this! I totally overlooked both this and your comment, and copied the query as is from the original repro.

Fixing this indeed makes the difference to be in the much more expected range:

$ ts-node index.ts
prisma:query SELECT "public"."StringIdModel"."id", "public"."StringIdModel"."value" FROM "public"."StringIdModel" WHERE 1=1 ORDER BY "public"."StringIdModel"."id" ASC LIMIT $1 OFFSET $2 /* traceparent=00-4a24f1bbb09dc546edc6862e35d96ad1-ce64d33559e6dae6-01 */
findMany: 536.658ms
prisma:query SELECT "public"."StringIdModel"."id", "public"."StringIdModel"."value" FROM "public"."StringIdModel" WHERE 1=1 ORDER BY "public"."StringIdModel"."id" ASC LIMIT $1 OFFSET $2
queryRaw: 427.648ms

Tracing shows that the actual overhead of findMany here was around 3 ms, and most of the difference was actually caused by the first query establishing the database connection (which became apparent with lower magnitude of difference), I should've added an explicit $connect on the second thought.

I'll go ahead and close the issue. Please let me know if something was not taken into account and it should be reopened, or open an issue if you are facing a significant overhead with Prisma for any query.

@aqrln aqrln closed this as completed Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug/2-confirmed Bug has been reproduced and confirmed. domain/client Issue in the "Client" domain: Prisma Client, Prisma Studio etc. kind/bug A reported bug. topic: performance/queries topic: performance topic: skip topic: take
Projects
None yet
Development

No branches or pull requests

8 participants