Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prisma generated query takes 10x as long than a manually composed one #14402

Open
jschuur opened this issue Jul 20, 2022 · 8 comments
Open

Prisma generated query takes 10x as long than a manually composed one #14402

jschuur opened this issue Jul 20, 2022 · 8 comments
Labels
bug/2-confirmed Bug has been reproduced and confirmed. kind/bug A reported bug. team/client Issue for team Client. topic: performance/queries topic: performance topic: postgresql

Comments

@jschuur
Copy link

jschuur commented Jul 20, 2022

I have a pretty standard GraphQL resolver Prisma query that ends up being horribly inefficient when I let Prisma generate the SQL, as opposed to just writing the (Postgre) SQL myself. At my current database size, it takes over 5 seconds via Prisma, and about 400 ms via SQL that I have composed.

At this point, I am not sure if this is just a fundamental Prisma issue in how it generates SQL in my case, or if there is anything I can do to avoid having to resort to a raw query.

More details, including my indexes and an EXPLAIN (ANALYZE, BUFFERS) output over at Stack Overflow, but here is the Prisma syntax...

  const videos = await ctx.prisma.video.findMany({
    where: {
      channel: {
        NOT: {
          status: {
            in: [ChannelStatus.HIDDEN, ChannelStatus.ARCHIVED],
          },
        },
      },
      status: {
        in: [VideoStatus.UPCOMING, VideoStatus.LIVE, VideoStatus.PUBLISHED],
      },
    },
    include: {
      channel: {
        include: {
          links: true,
        },
      },
    },
    cursor: _args.cursor
      ? {
          id: _args.cursor,
        }
      : undefined,
    skip: _args.cursor ? 1 : 0,
    orderBy: {
      sortTime: 'desc',
    },
    take: Math.min(_args.limit, config.GRAPHQL_MAX_RECENT_VIDEOS),
  });

...the relevant parts of my Prisma schema (some fields left out for brevity, and some of my indexes have temporarily been defined outside of my Prisma schema, directly on the DB while I debug this)...

model Channel {
  id                 Int           @id @default(autoincrement())
  status             ChannelStatus @default(value: ACTIVE)
  videos             Video[]
  links              ChannelLink[]
}

model Video {
  id                 Int           @id @default(autoincrement())
  channel            Channel       @relation(fields: [channelId], references: [id])
  channelId          Int
  status             VideoStatus   @default(value: PUBLISHED)
  sortTime           DateTime

  @@index([status])
}

...the SQL that this generates (Prisma 4.0.0)...

SELECT
	"public"."Video"."id",
	"public"."Video"."youtubeId",
	"public"."Video"."channelId",
	"public"."Video"."type",
	"public"."Video"."status",
	"public"."Video"."reviewed",
	"public"."Video"."category",
	"public"."Video"."youtubeTags",
	"public"."Video"."language",
	"public"."Video"."title",
	"public"."Video"."description",
	"public"."Video"."duration",
	"public"."Video"."durationSeconds",
	"public"."Video"."viewCount",
	"public"."Video"."likeCount",
	"public"."Video"."commentCount",
	"public"."Video"."scheduledStartTime",
	"public"."Video"."actualStartTime",
	"public"."Video"."actualEndTime",
	"public"."Video"."sortTime",
	"public"."Video"."createdAt",
	"public"."Video"."updatedAt",
	"public"."Video"."publishedAt"
FROM
	"public"."Video",
	(
		SELECT
			"public"."Video"."sortTime" AS "Video_sortTime_0"
		FROM
			"public"."Video"
		WHERE ("public"."Video"."id") = (29949)) AS "order_cmp"
WHERE (("public"."Video"."id")
	IN(
		SELECT
			"t0"."id" FROM "public"."Video" AS "t0"
			INNER JOIN "public"."Channel" AS "j0" ON ("j0"."id") = ("t0"."channelId")
		WHERE ((NOT "j0"."status" IN('HIDDEN', 'ARCHIVED'))
		AND "t0"."id" IS NOT NULL))
	AND "public"."Video"."status" IN('UPCOMING', 'LIVE', 'PUBLISHED')
	AND "public"."Video"."sortTime" <= "order_cmp"."Video_sortTime_0")
ORDER BY
	"public"."Video"."sortTime" DESC OFFSET 0;

...and my much faster version version that I wrote:

SELECT
	*
FROM
	"Video",
	"Channel"
WHERE
	"Channel".id = "Video"."channelId"
	AND "Channel".status NOT IN ('HIDDEN', 'ARCHIVED')
	AND "Video"."status" in('UPCOMING', 'LIVE', 'PUBLISHED')
	AND "Video"."sortTime" <= (
		SELECT
			"sortTime"
		FROM
			"Video"
		WHERE
			id = 29949)
	ORDER BY
		"sortTime" DESC
	LIMIT 84;

I also noticed that the Prisma syntax does not add a LIMIT statement, despite there being a take parameter. Is that to be expected? Curiously, if I add an explicit LIMIT to the Prisma generated SQL, the query takes about twice as long.

I've also tried using a composite index (CREATE INDEX "Video_sortTime_status_idx" ON "Video"("sortTime" DESC, status)) as suggested on SO, but this hasn't helped (updated EXPLAIN output). Over on SO, they suggested Postgres will only use one index, and I guess it's favouring Prisma's default one it set up for the primary key?

@matus-sabo
Copy link

Maybe its relatated to cursor pagination with desc order #12650

@jschuur
Copy link
Author

jschuur commented Jul 20, 2022

Maybe its relatated to cursor pagination with desc order #12650

I do have an explicit DESC index set for sortTime though.

@matus-sabo
Copy link

U can try implement cursor pagination without prisma cursor. You can find some inspiration there #11138 (comment)

@matus-sabo
Copy link

Prisma cursor don't work when u provide non existent cursor #14283

@jschuur
Copy link
Author

jschuur commented Jul 21, 2022

At this point, I've gone back to using an offset based pagination (code or see it in action here when you scroll to the bottom of the page). While this is normally not ideal, I usually only need the first few hundred records of a sorted list of over 130k, so the offset value never gets to big that the usual performance problems of having to traverse too much data via an offset will come up.

As there are a few cursor-related issues reported here, perhaps there will be some optimizations on the Prisma side down the line so I can revisit this.

@jschuur
Copy link
Author

jschuur commented Jul 21, 2022

U can try implement cursor pagination without prisma cursor. You can find some inspiration there #11138 (comment)

I can't do a simple greater than comparison on the ID as a cursor replacement like it's suggested in this comment. The sortTime order and my ID sort order are not always in sync, so I would need to do something like this:

AND "Video"."sortTime" <= (
	SELECT
		"sortTime"
	FROM
		"Video"
	WHERE
		id = 29949)

I can't do that natively in Prisma (without Prisma cursors), and if I split this into two Prisma calls to get the sortTime first, that's another performance hit that negates any improvements.

@aqrln aqrln added team/client Issue for team Client. topic: performance topic: postgresql bug/1-unconfirmed Bug should have enough information for reproduction, but confirmation has not happened yet. kind/bug A reported bug. labels Jul 21, 2022
@pranavbadami
Copy link

Having a very similar issue to this with related tables as well.

In my case, I'm trying to filter on a few related tables and seeing much more complicated logic generated by prisma versus when I add a simple join manually.

Here's the query I'm passing to prismaClient:

const invocations = await prismaClient.triggerInvocation.findMany({
    include: {
      triggerStep: {
        include: {
          transitions: true,
        },
      },
    },
    where: {
      scheduledNextStepCount: 0,
      scheduledTask: {
        status: "completed",
      },
      variables: {
        path: ["user", "id"],
        equals: userId,
      },
  
      triggerStep: {
        platform: platform,
        communityId: communityId,
        trigger: {
          isActive: true, < === THIS IS THE PROBLEMATIC PART
        },
        transitions: {
          some: {
            OR: USER_RESPONSE_TYPES.map((type) => ({
              when: {
                path: ["userResponse", "type"],
                equals: type,
              },
            })),
          },
        },
      },
    },
  });

I added the trigger: {isActive: true} because I realized for my app this was going to be more performant. Here's how I added this join in my own SQL (left) vs how Prisma added it (right):

image

The full query diff is here

Does anyone know why Prisma created the join this way? When I run explain analyze with my query I had 198ms execution time and Prisma's is 1.9 seconds (10x difference)

I would love to use the query syntax and not use a raw query, so any help would be greatly appreciated! Thanks in advance 🙏

@pranavbadami
Copy link

My issue above seems to be related to this: #13306 Seems like I might have to go with a raw query after all...

@Weakky Weakky added bug/2-confirmed Bug has been reproduced and confirmed. and removed bug/1-unconfirmed Bug should have enough information for reproduction, but confirmation has not happened yet. labels Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug/2-confirmed Bug has been reproduced and confirmed. kind/bug A reported bug. team/client Issue for team Client. topic: performance/queries topic: performance topic: postgresql
Projects
None yet
Development

No branches or pull requests

6 participants