-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow query for findMany/findFirst with cursor navigation #11138
Comments
Effectively, cursor-based pagination as described in prisma docs is a way to specify a condition like // skip-take pagination with manual ID > X clause
// this mimics cursor-based pagination behaviour
const t1 = await prisma.person.findFirst({
where: {
id: {
gt: 10000000,
},
},
take: 1,
skip: 0,
});
// built-in cursor-based pagination call
const t2 = await prisma.person.findFirst({
cursor: {
id: 10000000,
},
take: 1,
skip: 1,
});
// built-in skip-take call
const t3 = await prisma.person.findFirst({
take: 1,
skip: 10000000,
});
console.log("Result IDs:", t1?.id, t2?.id, t3?.id); The expectation is that all three queries will return exactly the same item from the database. Here's the output
You can see the queries generated by prisma for each call and their execution times. The manual query that mimics cursor-based access is the fastest and wins by a sheer margin. Built-in cursor call is the slowest of them and the built-in skip-take call is also slow (though that is expected). The results are the same for each query, I fetch the same record from the DB. |
I've been testing over and over again. With the approach that I showed above, I could reduce the query time for cursor-based pagination from 20-30 seconds per query to 10ms. Both forward and backwards navigation. Both with additional filter and without. Forward query, no added filter await prisma.person.findMany({
where: {
id: {
gte: cursor,
},
},
orderBy: {
id: "asc",
},
take: take,
skip: 1,
}); Forward query, with filter await prisma.person.findMany({
where: {
id: {
gte: cursor,
},
email: {
contains: "HH"
}
},
orderBy: {
id: "asc",
},
take: take,
skip: 1,
}); Backwards query, no filter const data = await prisma.person.findMany({
where: {
id: {
lte: cursor,
},
},
orderBy: {
id: "desc",
},
take: Math.abs(take),
skip: 1,
});
// Sorting is required because items come from prisma in reversed order
// Do ascendig sort by item id.
data.sort((a, b) => /* compare a.id to b.id depending on the id field type */); Backwards query, with filter await prisma.person.findMany({
where: {
id: {
lte: cursor,
},
email: {
contains: "HH"
}
},
orderBy: {
id: "desc",
},
take: Math.abs(take),
skip: 1,
});
// Sorting is required because items come from prisma in reversed order
// Do ascendig sort by item id.
t1.sort((a, b) => /* compare a.id to b.id depending on the id field type */); My tests (thousands of random tests with a random cursor and take size) show that these queries are way faster compared to prisma's native cursor-based pagination (especially for backwards). I would appreciate any hints/ideas if you think this approach is somehow flawed or does not account for some very obvious and popular use case. I am running the test on a dataset with 10M items. I do backwards cursor navigation and apply filter on email column. email is marked as I haven't tested yet on queries with relations or complex filters. |
I found that prisma doesn't always add the limit to the query when using For workaround, I made a middleware to bypass it's mechanism. prisma.$use(async (params, next) => {
if (params.args.cursor) {
const key = _.lowerFirst(params.model)
const result = await prisma[key].findUniqueOrThrow({ where: params.args.cursor, select: _(params.args.orderBy).mapValues(x => true).value() })
delete params.args.cursor
params.args.where = {
...params.args.where,
..._(params.args.orderBy).mapValues((x, k) => ({
[x === 'desc' ? 'lte' : 'gte']: result[k],
})).value(),
}
}
return await next(params)
}) Now, speed is boosted up and can get a response in <10ms. Prisma Team, Is it a bug or by design? |
Just chiming in to say that I'm seeing very similar behavior in my use case. I generated a table with 1MM records in it. Looks like this: // schema.prisma
model transactions {
id Int @unique @default(autoincrement())
code String?
type String?
amount String?
description String?
date String?
} Using offset pagination, my code looks like this: const data = await prisma.transactions.findMany({
skip: 900000,
take: 10,
}); ...which gives me a SQL query like this (I filled in the parameters in this snippet): --- prisma generated offset query
SELECT "public"."transactions"."id", "public"."transactions"."code", "public"."transactions"."type", "public"."transactions"."amount", "public"."transactions"."description", "public"."transactions"."date" FROM "public"."transactions" WHERE 1=1 ORDER BY "public"."transactions"."id" ASC LIMIT 10 OFFSET 900000; This consistently takes about 170ms to run. And with cursor pagination, my code looks like this: const data = await prisma.transactions.findMany({
take: 10,
skip: 1,
cursor: {
id: 900000,
},
orderBy: {
id: 'asc',
},
}); ...and generates SQL like this (I filled in the parameters in this snippet): -- prisma generated cursor query
SELECT "public"."transactions"."id", "public"."transactions"."code", "public"."transactions"."type", "public"."transactions"."amount", "public"."transactions"."description", "public"."transactions"."date" FROM "public"."transactions", (SELECT "public"."transactions"."id" AS "transactions_id_0" FROM "public"."transactions" WHERE ("public"."transactions"."id") = (900000)) AS "order_cmp" WHERE "public"."transactions"."id" >= "order_cmp"."transactions_id_0" ORDER BY "public"."transactions"."id" ASC LIMIT 10 OFFSET 1; This takes 250-260ms to run. Finally, if I hand write some SQL like this: SELECT
*
FROM
transactions
WHERE
id >= (
SELECT
id
FROM
transactions
WHERE
id = 900000
)
ORDER BY id ASC
LIMIT 10
OFFSET 1; This takes about 1ms to run (there's no difference if I update the query to use Not entirely sure where the issue is, I'm not terribly experienced with SQL. All three of these queries return the same exact data set. |
Bug description
I am researching this behaviour for some time now and the other possible bug I reported (#11130) was discovered while actually testing this one.
The essence of the problem is a query generated by prisma to fetch data from a large table (postgres) using cursor-based pagination. Such query execution time is too high.
My data model looks is
I use postgres 12.4 and I fill the table with random data (20 million records).
Then I execute the query with prisma and capture the generated SQL query and execution time using prisma's logging and middleware.
Here's the query that I do
And the output captured from prisma is like this
The query takes almost 4 seconds (my other runs it was up to 10 sec) to execute which is a lot. If I copy this query and execute it in pgAdmin without any modification I get the same execution time.
I reformat the query below for convenience sake
Prisma docs tell us that cursor-based pagination does not use OFFSET and therefore scales well with a large dataset.
However, as we can see, the generated query does use OFFSET and in fact, is very slow exactly because of this. Furthermore, it uses subquery the purpose of which is completely obscure to me and does not make sense.
If I change this query and get rid of OFFSET and subquery it will look like this
This query time is only
40 msec
according to pgAdmin and it returns exactly the same result as prisma autogenerated one and this is consistent.I am not such a good expert in the underlying specifics of the DBMS engine and do not know exactly why prisma generates the query this way. Perhaps there's something I am not aware of and this query must be like this. However, as I have indicated the query result both for the prisma query and for the one I run manually is exactly the same. The only difference is execution time. And this difference is huge.
My other tests show that in general cursor-based pagination in prisma currently has the same performance as skip-take and does not create any benefit whatsoever on large datasets. I tested with datasets from 500k to 50M rows, different index types (String, Int, BigInt) and the results are always the same. Cursor based pagination is currently very slow.
How to reproduce
prisma.findMany
and use cursor based pagination fetching data from a far end of the tableExpected behavior
The query for cursor-based pagination should work as explained in docs and should not use OFFSET as well as subqueries (unless absolutely necessary). The execution time of
findMany
orfindFirst
with the cursor must be less than it is now. Cursor-based pagination on large datasets must be faster than skip-take.Prisma information
Model
Query
SQL
Environment & setup
Prisma Version
The text was updated successfully, but these errors were encountered: