Slow query for findMany/findFirst with cursor navigation #11138

peterrogov · 2022-01-13T12:22:38Z

Bug description

I am researching this behaviour for some time now and the other possible bug I reported (#11130) was discovered while actually testing this one.

The essence of the problem is a query generated by prisma to fetch data from a large table (postgres) using cursor-based pagination. Such query execution time is too high.

My data model looks is

model Person {
    id       BigInt  @id @default(autoincrement())
    email    String  @unique
    name     String
    isActive Boolean
    address  String
}

I use postgres 12.4 and I fill the table with random data (20 million records).

Then I execute the query with prisma and capture the generated SQL query and execution time using prisma's logging and middleware.

Here's the query that I do

const data = await prisma.person.findMany({
    cursor: {
        id: 10000000,
    },
    take: 100,
    skip: 1
});

And the output captured from prisma is like this

======================
params: [10000000,100,1]               <-- these are event.params used in the SQL query
----------------------
SELECT "public"."Person"."id", "public"."Person"."email", "public"."Person"."name", "public"."Person"."isActive", "public"."Person"."address" FROM "public"."Person", (SELECT "public"."Person"."id" AS "Person_id_0" FROM "public"."Person" WHERE ("public"."Person"."id") = ($1)) AS "order_cmp" WHERE "public"."Person"."id" >= "order_cmp"."Person_id_0" ORDER BY "public"."Person"."id" ASC LIMIT $2 OFFSET $3
----------------------
duration: 3851
======================

The query takes almost 4 seconds (my other runs it was up to 10 sec) to execute which is a lot. If I copy this query and execute it in pgAdmin without any modification I get the same execution time.

I reformat the query below for convenience sake

SELECT "public"."Person"."id",
	"public"."Person"."email",
	"public"."Person"."name",
	"public"."Person"."isActive",
	"public"."Person"."address"
FROM "public"."Person",
	(SELECT "public"."Person"."id" AS "Person_id_0"
		FROM "public"."Person"
		WHERE ("public"."Person"."id") = ($1)) AS "order_cmp"
WHERE "public"."Person"."id" >= "order_cmp"."Person_id_0"
ORDER BY "public"."Person"."id" ASC
LIMIT $2
OFFSET $3

Prisma docs tell us that cursor-based pagination does not use OFFSET and therefore scales well with a large dataset.

Cursor-based pagination scales. The underlying SQL does not use OFFSET, but instead queries all Post records with an ID greater than the value of cursor.

However, as we can see, the generated query does use OFFSET and in fact, is very slow exactly because of this. Furthermore, it uses subquery the purpose of which is completely obscure to me and does not make sense.

If I change this query and get rid of OFFSET and subquery it will look like this

SELECT "public"."Person"."id",
	"public"."Person"."email",
	"public"."Person"."name",
	"public"."Person"."isActive",
	"public"."Person"."address"
FROM "public"."Person"
WHERE "public"."Person"."id" > 10000000
ORDER BY "public"."Person"."id" ASC
LIMIT 100

This query time is only 40 msec according to pgAdmin and it returns exactly the same result as prisma autogenerated one and this is consistent.

I am not such a good expert in the underlying specifics of the DBMS engine and do not know exactly why prisma generates the query this way. Perhaps there's something I am not aware of and this query must be like this. However, as I have indicated the query result both for the prisma query and for the one I run manually is exactly the same. The only difference is execution time. And this difference is huge.

My other tests show that in general cursor-based pagination in prisma currently has the same performance as skip-take and does not create any benefit whatsoever on large datasets. I tested with datasets from 500k to 50M rows, different index types (String, Int, BigInt) and the results are always the same. Cursor based pagination is currently very slow.

How to reproduce

Create a simple flat model in Prisma
Fill the table with some random data (preferably a large table, over 1M rows)
Run the query with prisma.findMany and use cursor based pagination fetching data from a far end of the table
Capture the query from prisma logs and measure execution time.

Expected behavior

The query for cursor-based pagination should work as explained in docs and should not use OFFSET as well as subqueries (unless absolutely necessary). The execution time of findMany or findFirst with the cursor must be less than it is now. Cursor-based pagination on large datasets must be faster than skip-take.

Prisma information

Model

model Person {
    id       BigInt  @id @default(autoincrement())
    email    String  @unique
    name     String
    isActive Boolean
    address  String
}

Query

const data = await prisma.person.findMany({
    cursor: {
        id: 10000000,
    },
    take: 100,
    skip: 1
});

SQL

SELECT "public"."Person"."id",
	"public"."Person"."email",
	"public"."Person"."name",
	"public"."Person"."isActive",
	"public"."Person"."address"
FROM "public"."Person",
	(SELECT "public"."Person"."id" AS "Person_id_0"
		FROM "public"."Person"
		WHERE ("public"."Person"."id") = ($1)) AS "order_cmp"
WHERE "public"."Person"."id" >= "order_cmp"."Person_id_0"
ORDER BY "public"."Person"."id" ASC
LIMIT $2
OFFSET $3

Environment & setup

OS: MacOS 11.6.2
Database: postgres 12.4
Node.js version: 17.3.0

Prisma Version

prisma                  : 3.8.0
@prisma/client          : 3.8.0
Current platform        : darwin
Query Engine (Node-API) : libquery-engine 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f (at node_modules/@prisma/engines/libquery_engine-darwin.dylib.node)
Migration Engine        : migration-engine-cli 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f (at node_modules/@prisma/engines/migration-engine-darwin)
Introspection Engine    : introspection-core 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f (at node_modules/@prisma/engines/introspection-engine-darwin)
Format Binary           : prisma-fmt 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f (at node_modules/@prisma/engines/prisma-fmt-darwin)
Default Engines Hash    : 34df67547cf5598f5a6cd3eb45f14ee70c3fb86f
Studio                  : 0.452.0

The text was updated successfully, but these errors were encountered:

peterrogov · 2022-01-13T12:59:43Z

Effectively, cursor-based pagination as described in prisma docs is a way to specify a condition like WHERE ID > XXX. Based on this idea I have tried the following test:

// skip-take pagination with manual ID > X clause
// this mimics cursor-based pagination behaviour
const t1 = await prisma.person.findFirst({
    where: {
        id: {
            gt: 10000000,
        },
    },
    take: 1,
    skip: 0,
});

// built-in cursor-based pagination call
const t2 = await prisma.person.findFirst({
    cursor: {
        id: 10000000,
    },
    take: 1,
    skip: 1,
});

// built-in skip-take call
const t3 = await prisma.person.findFirst({
    take: 1,
    skip: 10000000,
});

console.log("Result IDs:", t1?.id, t2?.id, t3?.id);

The expectation is that all three queries will return exactly the same item from the database. Here's the output

======================
params: [10000000,1,0]
----------------------
SELECT "public"."Person"."id", "public"."Person"."email", "public"."Person"."name", "public"."Person"."isActive", "public"."Person"."address" FROM "public"."Person" WHERE "public"."Person"."id" > $1 ORDER BY "public"."Person"."id" ASC LIMIT $2 OFFSET $3
----------------------
duration: 6
======================

======================
params: [10000000,1,1]
----------------------
SELECT "public"."Person"."id", "public"."Person"."email", "public"."Person"."name", "public"."Person"."isActive", "public"."Person"."address" FROM "public"."Person", (SELECT "public"."Person"."id" AS "Person_id_0" FROM "public"."Person" WHERE ("public"."Person"."id") = ($1)) AS "order_cmp" WHERE "public"."Person"."id" >= "order_cmp"."Person_id_0" ORDER BY "public"."Person"."id" ASC LIMIT $2 OFFSET $3
----------------------
duration: 6282
======================

======================
params: [1,10000000]
----------------------
SELECT "public"."Person"."id", "public"."Person"."email", "public"."Person"."name", "public"."Person"."isActive", "public"."Person"."address" FROM "public"."Person" WHERE 1=1 ORDER BY "public"."Person"."id" ASC LIMIT $1 OFFSET $2
----------------------
duration: 4325
======================

Result IDs: 10000001n 10000001n 10000001n

You can see the queries generated by prisma for each call and their execution times. The manual query that mimics cursor-based access is the fastest and wins by a sheer margin. Built-in cursor call is the slowest of them and the built-in skip-take call is also slow (though that is expected).

The results are the same for each query, I fetch the same record from the DB.

peterrogov · 2022-01-13T21:24:19Z

I've been testing over and over again. With the approach that I showed above, I could reduce the query time for cursor-based pagination from 20-30 seconds per query to 10ms. Both forward and backwards navigation. Both with additional filter and without.

Forward query, no added filter

await prisma.person.findMany({
    where: {
        id: {
            gte: cursor,
        },
    },
    orderBy: {
        id: "asc",
    },
    take: take,
    skip: 1,
});

Forward query, with filter

await prisma.person.findMany({
    where: {
        id: {
            gte: cursor,
        },
        email: {
            contains: "HH"
        }
    },
    orderBy: {
        id: "asc",
    },
    take: take,
    skip: 1,
});

Backwards query, no filter

const data = await prisma.person.findMany({
    where: {
        id: {
            lte: cursor,
        },
    },
    orderBy: {
        id: "desc",
    },
    take: Math.abs(take),
    skip: 1,
});
// Sorting is required because items come from prisma in reversed order
// Do ascendig sort by item id.
data.sort((a, b) => /* compare a.id to b.id depending on the id field type */);

Backwards query, with filter

await prisma.person.findMany({
    where: {
        id: {
            lte: cursor,
        },
        email: {
            contains: "HH"
        }
    },
    orderBy: {
        id: "desc",
    },
    take: Math.abs(take),
    skip: 1,
});
// Sorting is required because items come from prisma in reversed order
// Do ascendig sort by item id.
t1.sort((a, b) => /* compare a.id to b.id depending on the id field type */);

My tests (thousands of random tests with a random cursor and take size) show that these queries are way faster compared to prisma's native cursor-based pagination (especially for backwards). I would appreciate any hints/ideas if you think this approach is somehow flawed or does not account for some very obvious and popular use case.

I am running the test on a dataset with 10M items. I do backwards cursor navigation and apply filter on email column. email is marked as @unique in prisma schema and is indexed. My query average time is under 10msec, prisma native cursor navigation is well over 40sec (not a typo, 40 seconds). Both queries return exactly the same data.

I haven't tested yet on queries with relations or complex filters.

shtse8 · 2022-08-07T19:25:20Z

I found that prisma doesn't always add the limit to the query when using cursor. For example, take: 5 should be LIMIT 0, 5. It seems there are some mechanism to buffer data or trying to solve N+1 issue. but without limit, it would lead to load whole table out (my table has +10M rows.) which needs 1x minutes to get a response. It's terrible slow and crash my database.

For workaround, I made a middleware to bypass it's mechanism.

prisma.$use(async (params, next) => {
    if (params.args.cursor) {
      const key = _.lowerFirst(params.model)
      const result = await prisma[key].findUniqueOrThrow({ where: params.args.cursor, select: _(params.args.orderBy).mapValues(x => true).value() })
      delete params.args.cursor
      params.args.where = {
        ...params.args.where,
        ..._(params.args.orderBy).mapValues((x, k) => ({
          [x === 'desc' ? 'lte' : 'gte']: result[k],
        })).value(),
      }
    }
    return await next(params)
  })

Now, speed is boosted up and can get a response in <10ms.

Prisma Team, Is it a bug or by design?

brian-mcallister-lab49 · 2022-08-10T22:36:59Z

Just chiming in to say that I'm seeing very similar behavior in my use case. I generated a table with 1MM records in it. Looks like this:

// schema.prisma
model transactions {
  id          Int     @unique @default(autoincrement())
  code        String?
  type        String?
  amount      String?
  description String?
  date        String?
}

Using offset pagination, my code looks like this:

const data = await prisma.transactions.findMany({
  skip: 900000,
  take: 10,
});

...which gives me a SQL query like this (I filled in the parameters in this snippet):

--- prisma generated offset query
SELECT "public"."transactions"."id", "public"."transactions"."code", "public"."transactions"."type", "public"."transactions"."amount", "public"."transactions"."description", "public"."transactions"."date" FROM "public"."transactions" WHERE 1=1 ORDER BY "public"."transactions"."id" ASC LIMIT 10 OFFSET 900000;

This consistently takes about 170ms to run.

And with cursor pagination, my code looks like this:

const data = await prisma.transactions.findMany({
  take: 10,
  skip: 1,
  cursor: {
    id: 900000,
  },
  orderBy: {
    id: 'asc',
  },
});

...and generates SQL like this (I filled in the parameters in this snippet):

-- prisma generated cursor query
SELECT "public"."transactions"."id", "public"."transactions"."code", "public"."transactions"."type", "public"."transactions"."amount", "public"."transactions"."description", "public"."transactions"."date" FROM "public"."transactions", (SELECT "public"."transactions"."id" AS "transactions_id_0" FROM "public"."transactions" WHERE ("public"."transactions"."id") = (900000)) AS "order_cmp" WHERE "public"."transactions"."id" >= "order_cmp"."transactions_id_0" ORDER BY "public"."transactions"."id" ASC LIMIT 10 OFFSET 1;

This takes 250-260ms to run.

Finally, if I hand write some SQL like this:

SELECT 
	*
FROM
	transactions
WHERE
	id >= (
		SELECT
			id
		FROM
			transactions
		WHERE 
			id = 900000
	)
ORDER BY id ASC
LIMIT 10
OFFSET 1;

This takes about 1ms to run (there's no difference if I update the query to use <= and ORDER BY id DESC).

Not entirely sure where the issue is, I'm not terribly experienced with SQL. All three of these queries return the same exact data set.

peterrogov added the kind/bug A reported bug. label Jan 13, 2022

peterrogov mentioned this issue Jan 13, 2022

Query is 5 times slower when using findMany compared to running identical query via queryRaw #11130

Closed

janpio added topic: performance/queries bug/0-unknown Bug is new, does not have information for reproduction or reproduction could not be confirmed. domain/client Issue in the "Client" domain: Prisma Client, Prisma Studio etc. topic: cursor labels Jan 13, 2022

casey-chow mentioned this issue Apr 14, 2022

Cursor-based pagination postgresql performance issue with desc order #12650

Closed

pantharshit00 added bug/1-unconfirmed Bug should have enough information for reproduction, but confirmation has not happened yet. and removed bug/0-unknown Bug is new, does not have information for reproduction or reproduction could not be confirmed. labels Jun 2, 2022

matthewmueller added the topic: core query optimization label Jul 8, 2022

dpetrick added priority/high labels Jul 8, 2022

matthewmueller removed the topic: core query optimization label Jul 13, 2022

matus-sabo mentioned this issue Jul 20, 2022

Prisma generated query takes 10x as long than a manually composed one #14402

Open

janpio removed priority/high labels Nov 4, 2022

KhooHaoYit mentioned this issue Feb 1, 2023

Improve cursor performance prisma/prisma-engines#3661

Merged

tomhoule closed this as completed in prisma/prisma-engines#3661 Feb 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow query for findMany/findFirst with cursor navigation #11138

Slow query for findMany/findFirst with cursor navigation #11138

peterrogov commented Jan 13, 2022 •

edited

peterrogov commented Jan 13, 2022

peterrogov commented Jan 13, 2022

shtse8 commented Aug 7, 2022 •

edited

brian-mcallister-lab49 commented Aug 10, 2022

Slow query for findMany/findFirst with cursor navigation #11138

Slow query for findMany/findFirst with cursor navigation #11138

Comments

peterrogov commented Jan 13, 2022 • edited

Bug description

How to reproduce

Expected behavior

Prisma information

Environment & setup

Prisma Version

peterrogov commented Jan 13, 2022

peterrogov commented Jan 13, 2022

shtse8 commented Aug 7, 2022 • edited

brian-mcallister-lab49 commented Aug 10, 2022

peterrogov commented Jan 13, 2022 •

edited

shtse8 commented Aug 7, 2022 •

edited