Query is 5 times slower when using `findMany` compared to running identical query via `queryRaw` #11130

peterrogov · 2022-01-12T19:54:58Z

Bug description

I am experiencing a huge discrepancy in query execution time when using prisma 3.8.0 with postgres (local installation).
When I query a large table made of simple records and use pagination params (skip and take) this query takes a very long time to execute via prisma.<model>.findMany however when I use a similar query with prisma.queryRaw the execution time is much less. Normal prisma queries are dramatically slower.

The model I use is very simple and defined as follows

model StringIdModel {
    id    String @id @default(uuid())
    value String
}

Assuming the database table is filled with 1M random records and I run tthis code:

const queryLimit = 1000;
const queryOffset = 950000;
await prisma.stringIdModel.findMany({ skip: queryLimit, take: queryOffset });
await prisma.$queryRaw`SELECT * FROM public."StringIdModel" LIMIT ${queryLimit} OFFSET ${queryOffset}`;

This is the result I get

findMany took 10938.153129ms.
queryRaw took 84.126091ms.

As you can see the difference is 130 (!) times.
I have tried running this many times with different ID column types, different dataset sizes and I can't see any other reason why this is the case. The only thing I can think of is something going wrong inside prisma.

Here's a repo that reproduces the problem
https://github.com/peterrogov/prisma-performance-test

How to reproduce

Use the following repository that has code with reproduction https://github.com/peterrogov/prisma-performance-test

OR

Create a simple model that only has ID field and maybe a single scalar field.
Deploy to DB and fill with random data (I use like 1M records)
Run query with prisma.<model>.findMany and specify skip and take params. Provide skip and take such that you're fetching from the end of the dataset. Measure query execution time.
Make the same query but now use prisma.queryRaw and make query like SELECT * FROM ... OFFSET ... LIMIT .... Run the query and measure execution time
See the huge difference in time.

Expected behavior

There shouldn't be any significant difference executing such simple queries directly via SQL or via prisma methods. The time it takes a query with prisma must be close to a direct raw SQL.

Prisma information

@prisma/client 3.8.0

Environment & setup

PostgreSQL 12.4
MacOS 11.6.2
NodeJS 17.3

Prisma Version

prisma                  : 3.8.0
@prisma/client          : 3.8.0
Current platform        : darwin
Query Engine (Node-API) : libquery-engine 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f (at node_modules/@prisma/engines/libquery_engine-darwin.dylib.node)
Migration Engine        : migration-engine-cli 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f (at node_modules/@prisma/engines/migration-engine-darwin)
Introspection Engine    : introspection-core 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f (at node_modules/@prisma/engines/introspection-engine-darwin)
Format Binary           : prisma-fmt 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f (at node_modules/@prisma/engines/prisma-fmt-darwin)
Default Engines Hash    : 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f
Studio                  : 0.452.0

The text was updated successfully, but these errors were encountered:

janpio · 2022-01-12T20:03:38Z

Can you share the log output of the query generated and executed by Prisma under the hood? (https://pris.ly/d/logging or via https://github.com/peterrogov/prisma-performance-test/blob/master/src/index.ts#L10)

peterrogov · 2022-01-12T20:08:02Z

Here's the query generated by Prisma

SELECT "public"."StringIdModel"."id", "public"."StringIdModel"."value" FROM "public"."StringIdModel" WHERE 1=1 ORDER BY "public"."StringIdModel"."id" ASC LIMIT $1 OFFSET $2

If I modify my test code to re-execute the generated query via queryRaw this is what I get in terms of execution time

...findMany took 5415.553473ms.                   <--- run this query and capture the generated query
...queryRaw took 918.921837ms.                    <--- the same query captured on the previous step
...queryRaw took 118.010476ms.                    <--- run my own query

peterrogov · 2022-01-12T20:48:49Z

Apparently, the most of the execution time difference comes from this

ORDER BY "public"."StringIdModel"."id" ASC

Also, apparently, this bit is a must. Otherwise postgres will return N rows but random ones... To get exactly the same rows it has to order them first.

However, even after rewriting my custom query and including ORDER BY in it, there is still a significant time difference between findMany and queryRaw.

Also, I do not fully understand why sequential integer Id requires ORDER BY but that's probably a question to DBMS not to prisma.

peterrogov · 2022-01-12T20:58:37Z

Below are the time measurements for for three different models. One has String UUID ID, the other one is Int and the last one is BigInt.

--- String ID model ---
prisma.findMany
...findMany took 1045.720738ms.

prisma.queryRaw -> generated query
...queryRaw took 900.809099ms.

prisma.queryRaw -> custom query
...queryRaw took 770.333674ms.


--- Integer ID model ---
prisma.findMany
...findMany took 252.138898ms.

prisma.queryRaw -> generated query
...queryRaw took 221.481055ms.

prisma.queryRaw -> custom query
...queryRaw took 228.59065ms.


--- BigInt ID model ---
prisma.findMany
...findMany took 293.972744ms.

prisma.queryRaw -> generated query
...queryRaw took 204.726444ms.

prisma.queryRaw -> custom query
...queryRaw took 205.228213ms.

Custom query now includes ORDER BY Id ASC and all three queries to each table return exactly the same data. You can see that the custom query is still significantly faster. Especially when String UUID Id column is used.

janpio · 2022-01-12T21:39:32Z

But I see correctly that we are down from 130x performance impact in the initial post to <20% for using Prisma Client API vs. running the same generated query with the $queryRaw method, correct?

What is the difference left between the generated and the custom query?

peterrogov · 2022-01-13T07:46:02Z

@janpio Yes, you are right the 130x difference was due to ORDER BY but that was my silly mistake. I can't see any other difference that would make sense apart from a more verbose syntax of Prisma queries. But since the final query is still quite short and simple I don't think that alone could create a ~20% difference in execution time.

Also, as you can see, when I take the same query (generated by Prisma) and execute it via prisma.queryRaw the time is better. I have run many tests and this observation is consistent.

It feels like there's still something happening in prisma that adds 10-20% overhead to query execution even for such a simplistic scenario. I didn't test on more complex queries but I probably should. And I wish to emphasize that this difference in execution time is worse for models that have a string ID column. Those where ID is Int or BigInt are very close in terms of execution time.

janpio · 2022-01-13T13:57:54Z

That is the difference between running Prisma the ORM, and using Prisma to execute a plain, predefined SQL statement. The additional work around typing, creating the query and so on is expected to add some overhead.

We still might want to investigate where this overhead comes from, and if we can improve it of course. But it will not have very high priority as we are currently still focused on building out missing functionality. For most users this overhead is totally acceptable (and if not, they can use the raw query as you described).

peterrogov · 2022-01-13T14:02:27Z

@janpio I agree with you on this one. I was investigating another issue (#11138) when I stumbled upon this. Initially I failed to realise the huge difference between using and not using ORDER BY. That's where 130x gap came from.

paulbrickellmaven · 2022-04-26T16:26:01Z

Just came across this thread. There is an optimisation for order by and skip, to wrap the order query and do the skip after. Like this...

SELECT * FROM (
SELECT * FROM table
WHERE whatever = true
ORDER BY date_field DESC ) tmp LIMIT 100 OFFSET 9000

Is there any way to hint Prisma to do this, or do I have to go to raw SQL? Just asking out of curiosity. TIA.

ArnaudBarre · 2022-09-15T10:15:40Z

Note sure if this is an oversight or the root cause but in your example you inverted the skip and offset parameter for the findMany call

aqrln · 2023-08-21T15:37:18Z

While in the original repro the difference was mainly caused by the different query that was missing an ORDER BY and thus produced different results (although on my machine the difference between them is in a ballpark of 45–55 times, not 130x), I can't unfortunately confirm that using the identical query reduces the measured overhead to less than 20%. On my machine, when I run the exact same query via queryRaw instead of findMany, I observe findMany to typically be 5–6 times slower (in one run it was just 2 times slower; overall these are quite hand-wavy stats without actually measuring the parameters of the distribution but the difference seems consistent):

$ ts-node index.ts
prisma:query SELECT "public"."StringIdModel"."id", "public"."StringIdModel"."value" FROM "public"."StringIdModel" WHERE 1=1 ORDER BY "public"."StringIdModel"."id" ASC LIMIT $1 OFFSET $2
findMany: 2.775s
prisma:query SELECT "public"."StringIdModel"."id", "public"."StringIdModel"."value" FROM "public"."StringIdModel" WHERE 1=1 ORDER BY "public"."StringIdModel"."id" ASC LIMIT $1 OFFSET $2
queryRaw: 550.888ms

The order in which these queries are run doesn't matter:

$ ts-node index.ts
prisma:query SELECT "public"."StringIdModel"."id", "public"."StringIdModel"."value" FROM "public"."StringIdModel" WHERE 1=1 ORDER BY "public"."StringIdModel"."id" ASC LIMIT $1 OFFSET $2
queryRaw: 551.117ms
prisma:query SELECT "public"."StringIdModel"."id", "public"."StringIdModel"."value" FROM "public"."StringIdModel" WHERE 1=1 ORDER BY "public"."StringIdModel"."id" ASC LIMIT $1 OFFSET $2
findMany: 2.747s

Seeding script:

import { PrismaClient } from '@prisma/client'

async function main() {
  const prisma = new PrismaClient()
  const total = 1_000_000;

  for (let i = 0; i < total; i++) {
    if (i % 1000 === 0) {
      console.log(`${Math.round((i / total) * 100)}%`)
    }
    await prisma.stringIdModel.create({
      data: {
        value: String(Math.random()),
      },
    })
  }
}

void main()

Reproduction script:

import { PrismaClient } from '@prisma/client'

async function main() {
  const prisma = new PrismaClient({
    log: ['query'],
  })

  const queryLimit = 1000
  const queryOffset = 950000

  console.time('findMany')
  await prisma.stringIdModel.findMany({ skip: queryLimit, take: queryOffset })
  console.timeEnd('findMany')

  console.time('queryRaw')
  await prisma.$queryRaw`SELECT "public"."StringIdModel"."id", "public"."StringIdModel"."value" FROM "public"."StringIdModel" WHERE 1=1 ORDER BY "public"."StringIdModel"."id" ASC LIMIT ${queryLimit} OFFSET ${queryOffset}`
  console.timeEnd('queryRaw')
}

void main()

The schema is based on the original report, posting it here for convenience as well:

generator client {
  provider = "prisma-client-js"
}

datasource db {
  provider = "postgresql"
  url      = env("POSTGRESQL_DATABASE_URL")
}

model StringIdModel {
  id    String @id @default(uuid())
  value String
}

I get approximately the same difference both with latest Prisma 5 and with reported Prisma 3.8.0.

Looking at the traces, one significant contributor for findMany is prisma:engine:serialize but for some reason there's a visible difference even in prisma:engine:db_query alone.

findMany:

queryRaw:

Marking this as "confirmed" for now so we can look into it in more details and see if it makes sense and whether that's expected or not.

aqrln · 2023-08-21T16:18:19Z

@ArnaudBarre

Note sure if this is an oversight or the root cause but in your example you inverted the skip and offset parameter for the findMany call

Thanks so much for catching this! I totally overlooked both this and your comment, and copied the query as is from the original repro.

Fixing this indeed makes the difference to be in the much more expected range:

$ ts-node index.ts
prisma:query SELECT "public"."StringIdModel"."id", "public"."StringIdModel"."value" FROM "public"."StringIdModel" WHERE 1=1 ORDER BY "public"."StringIdModel"."id" ASC LIMIT $1 OFFSET $2 /* traceparent=00-4a24f1bbb09dc546edc6862e35d96ad1-ce64d33559e6dae6-01 */
findMany: 536.658ms
prisma:query SELECT "public"."StringIdModel"."id", "public"."StringIdModel"."value" FROM "public"."StringIdModel" WHERE 1=1 ORDER BY "public"."StringIdModel"."id" ASC LIMIT $1 OFFSET $2
queryRaw: 427.648ms

Tracing shows that the actual overhead of findMany here was around 3 ms, and most of the difference was actually caused by the first query establishing the database connection (which became apparent with lower magnitude of difference), I should've added an explicit $connect on the second thought.

I'll go ahead and close the issue. Please let me know if something was not taken into account and it should be reopened, or open an issue if you are facing a significant overhead with Prisma for any query.

peterrogov added the kind/bug A reported bug. label Jan 12, 2022

janpio added domain/client Issue in the "Client" domain: Prisma Client, Prisma Studio etc. topic: performance bug/0-unknown Bug is new, does not have information for reproduction or reproduction could not be confirmed. labels Jan 12, 2022

janpio added bug/1-unconfirmed Bug should have enough information for reproduction, but confirmation has not happened yet. and removed bug/0-unknown Bug is new, does not have information for reproduction or reproduction could not be confirmed. labels Jan 12, 2022

peterrogov mentioned this issue Jan 13, 2022

Slow query for findMany/findFirst with cursor navigation #11138

Closed

This comment was marked as outdated.

Sign in to view

matthewmueller added the topic: core query optimization label Jun 29, 2022

dpetrick added size/unknown labels Jul 8, 2022

matthewmueller added topic: performance/queries and removed topic: core query optimization labels Jul 13, 2022

janpio removed priority/unknown labels Nov 4, 2022

janpio added the topic: take label Apr 27, 2023

janpio added the topic: skip label Jul 26, 2023

aqrln removed the bug/1-unconfirmed Bug should have enough information for reproduction, but confirmation has not happened yet. label Aug 21, 2023

aqrln added the bug/2-confirmed Bug has been reproduced and confirmed. label Aug 21, 2023

aqrln changed the title ~~Serious performance issue when executing SELECT queries~~ Query is 5 times slower when using findMany compared to running identical query via queryRaw Aug 21, 2023

aqrln closed this as completed Aug 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query is 5 times slower when using `findMany` compared to running identical query via `queryRaw` #11130

Query is 5 times slower when using `findMany` compared to running identical query via `queryRaw` #11130

peterrogov commented Jan 12, 2022 •

edited

Loading

janpio commented Jan 12, 2022

peterrogov commented Jan 12, 2022 •

edited

Loading

peterrogov commented Jan 12, 2022 •

edited

Loading

peterrogov commented Jan 12, 2022 •

edited

Loading

janpio commented Jan 12, 2022

peterrogov commented Jan 13, 2022

janpio commented Jan 13, 2022 •

edited

Loading

peterrogov commented Jan 13, 2022

This comment was marked as outdated.

This comment was marked as outdated.

paulbrickellmaven commented Apr 26, 2022

ArnaudBarre commented Sep 15, 2022

aqrln commented Aug 21, 2023 •

edited

Loading

aqrln commented Aug 21, 2023

Query is 5 times slower when using findMany compared to running identical query via queryRaw #11130

Query is 5 times slower when using findMany compared to running identical query via queryRaw #11130

Comments

peterrogov commented Jan 12, 2022 • edited Loading

Bug description

How to reproduce

Expected behavior

Prisma information

Environment & setup

Prisma Version

janpio commented Jan 12, 2022

peterrogov commented Jan 12, 2022 • edited Loading

peterrogov commented Jan 12, 2022 • edited Loading

peterrogov commented Jan 12, 2022 • edited Loading

janpio commented Jan 12, 2022

peterrogov commented Jan 13, 2022

janpio commented Jan 13, 2022 • edited Loading

peterrogov commented Jan 13, 2022

This comment was marked as outdated.

This comment was marked as outdated.

paulbrickellmaven commented Apr 26, 2022

ArnaudBarre commented Sep 15, 2022

aqrln commented Aug 21, 2023 • edited Loading

aqrln commented Aug 21, 2023

Query is 5 times slower when using `findMany` compared to running identical query via `queryRaw` #11130

Query is 5 times slower when using `findMany` compared to running identical query via `queryRaw` #11130

peterrogov commented Jan 12, 2022 •

edited

Loading

peterrogov commented Jan 12, 2022 •

edited

Loading

peterrogov commented Jan 12, 2022 •

edited

Loading

peterrogov commented Jan 12, 2022 •

edited

Loading

janpio commented Jan 13, 2022 •

edited

Loading

aqrln commented Aug 21, 2023 •

edited

Loading