chore: cache parsed requests by request #1900

mattcosta7 · 2023-11-30T14:17:48Z

Mostly opening for some discussion, but if we decide this path is reasonable, we should be able to land this.

Leaving in draft state for now, until I get some cycles to make a repro and do a bit of profiling to validate the assumption that this is worth it.

Improve on performance when many graphql handlers are defined and the actual handler is towards the bottom.

PR should look much smaller when viewed without whitespce.

I noticed in a project with many grapqhl handlers that it seemed resolution was a bit slow.

It turns out that we re-parse and re-log the same request for every handler, even though the resulting values produced are identical. This takes a slightly modified approach, where we'll only parse a single time per request object, and immediately return that result if it exists for graphql handlers

This should save on memory allocation and parsing, because we do fewer request.clone() and .text() | .json() calls to parse those, as well as on time since we avoid parsing per handler and instead only parse the result once.

If the request objects aren't identical, we'll get new responses, and we cache in a WeakMap, so once the request is gc'd memory in the cache will get cleared

mattcosta7 · 2023-11-30T14:31:54Z

src/core/utils/internal/parseGraphQLRequest.ts

+
+    if (parsedResult instanceof Error) {
+      const requestPublicUrl = getPublicUrlFromRequest(request)
+      parsedGraphQLRequestCache.set(request, undefined)


this is the only functional difference, since we'll throw on the first request only and undefined will be the result for other requests, which should avoid spamming the logs

this still leads to the log occuring, and may be better anyway, since 1 log per request instead of 1 log per handler

kettanaito · 2023-11-30T15:32:36Z

src/core/utils/internal/parseGraphQLRequest.ts

@@ -167,39 +167,44 @@ async function getGraphQLInput(request: Request): Promise<GraphQLInput | null> {
  }
 }

+const parsedGraphQLRequestCache = new WeakMap<Request, ParsedGraphQLRequest>()


A silly question here: why not lift the caching to the *Handler individual classes and make it apply for HTTP requests too? The issue you describes is in no way exclusive to the GraphQL handler, it simply surfaces there better.

I think this is a good proposal, and we should add something like private cache: WeakMap<Request, X> onto the HttpRequest/GraphQL request classes. I don't think this belongs to the generic RequestHandler though.

But I'm open to discussing this in the context of RequestHandler class too. I suppose there aren't many cases when we wouldn't want to cache the parsing result using the intercepted Request as the key.

If we move this to the RH class, all userland handlers will benefit from caching automatically. If we decide to go this route, we should make the cache property protected instead.

The main task at hand is to figure out the cases when caching would be harmful.

Yep! I think that's a very natural followup to make this more generalized.

Right now there are a few issues with doing that, which i'll dig into a bit as well. I don't think they're insurmountable, but just issues as it stands now

We can't define these on the handler classes, because we want to spread the result across handlers.

if there are 10 graphql handlers defined, we only want to parse the request once, and share that across all 10 of the handlers. If we add the cache to the handler class, there will be a single cache per handler, which puts us back where we are now - where parsing occurs in each handler instead of sharing across them

RH parse doesn't implement anything yet (and should probably be abstract? - not an issue, just an observation) but because of this there's no simple place to define this (and it probably won't go on this class or at least would appear different if we did)

msw/src/core/handlers/RequestHandler.ts

Lines 148 to 157 in b440f12

/**

* Parse the intercepted request to extract additional information from it.

* Parsed result is then exposed to other methods of this request handler.

*/

async parse(_args: {

request: Request

resolutionContext?: ResponseResolutionContext

}): Promise<ParsedResult> {

return {} as ParsedResult

}

HttpHandlers really only match url and extract cookies

msw/src/core/handlers/HttpHandler.ts

Lines 109 to 125 in b440f12

async parse(args: {

request: Request

resolutionContext?: ResponseResolutionContext

}) {

const url = new URL(args.request.url)

const match = matchRequestUrl(

url,

this.info.path,

args.resolutionContext?.baseUrl,

)

const cookies = getAllRequestCookies(args.request)

return {

match,

cookies,

}

}

We could similarly cache the cookie parsing here, but the url match is a per-handler and not per-request match.

In the end we have 2 slight variations:

Stuff that we parse per request and stuff that we parse per handler, so mostly this would be around helping to manage that efficiently.

graphql parsing is more expensive, so handling that first, and then bringing that elsewhere is probably the lowest cost to highest value - but definitely think there's a pattern we could pull on here for more request caching.

Oh, a great point!

This really falls under a bit larger refactoring I had in mind for getResponse. Maybe we can discuss and kick off that?

Regarding no. 2, the caching would happen in the .run() method anyway. This way the base class would ensure consistent caching across different overrides of the parse/predicate methods.

I see that the cache should come from above. That's why I mentioned the refactoring. I think the getResponse() function should be made into a class. An orchestrator of sorts. It would hold the cache, perhaps.

#1901 (comment)

One thing that's a bit more intricate here, is the cache is a bit more complicated if we pass it around.

Each Request Handler type would have a separate bit of caching. We might not need to generalize this in practice though, and optimizing a few distinct methods might be less effort for the major payout (as I mentioned on that discussion)

mattcosta7 · 2023-12-01T13:52:05Z

Super naiive test example:

2 pages

2 buttons
- graphql request that matches first handler
- graphql request that matches last handler
2 buttons
- http request that matches first handler
- http request that matches last handler

Click first wait a few seconds after response
Click last

On each page. I cleared handlers in between the request type pages.

these are single runs, but pretty close to general experience over time. I just haven't bene more scientific due to time

Notice, graphql responses take significantly more time, and that time grows as the number of defined handlers increases.

Notice, http handlers take a much smaller amount of extra time

mattcosta7 · 2023-12-03T13:19:41Z

closing this one in favor of experiment in #1905

chore: cache parsed requests by request

e7dfb59

mattcosta7 commented Nov 30, 2023

View reviewed changes

mattcosta7 requested a review from kettanaito November 30, 2023 14:36

kettanaito reviewed Nov 30, 2023

View reviewed changes

mattcosta7 closed this Dec 3, 2023

mattcosta7 mentioned this pull request Dec 3, 2023

[experiment] chore: performance updates #1905

Closed

kettanaito deleted the proposal-cache-parsed-request branch March 15, 2024 09:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: cache parsed requests by request #1900

chore: cache parsed requests by request #1900

mattcosta7 commented Nov 30, 2023 •

edited

Loading

mattcosta7 Nov 30, 2023

mattcosta7 Nov 30, 2023

kettanaito Nov 30, 2023

kettanaito Nov 30, 2023 •

edited

Loading

mattcosta7 Nov 30, 2023

kettanaito Nov 30, 2023 •

edited

Loading

mattcosta7 Dec 1, 2023

mattcosta7 commented Dec 1, 2023

mattcosta7 commented Dec 3, 2023

	/**
	* Parse the intercepted request to extract additional information from it.
	* Parsed result is then exposed to other methods of this request handler.
	*/
	async parse(_args: {
	request: Request
	resolutionContext?: ResponseResolutionContext
	}): Promise<ParsedResult> {
	return {} as ParsedResult
	}

	async parse(args: {
	request: Request
	resolutionContext?: ResponseResolutionContext
	}) {
	const url = new URL(args.request.url)
	const match = matchRequestUrl(
	url,
	this.info.path,
	args.resolutionContext?.baseUrl,
	)
	const cookies = getAllRequestCookies(args.request)

	return {
	match,
	cookies,
	}
	}

chore: cache parsed requests by request #1900

chore: cache parsed requests by request #1900

Conversation

mattcosta7 commented Nov 30, 2023 • edited Loading

mattcosta7 Nov 30, 2023

Choose a reason for hiding this comment

mattcosta7 Nov 30, 2023

Choose a reason for hiding this comment

kettanaito Nov 30, 2023

Choose a reason for hiding this comment

kettanaito Nov 30, 2023 • edited Loading

Choose a reason for hiding this comment

mattcosta7 Nov 30, 2023

Choose a reason for hiding this comment

kettanaito Nov 30, 2023 • edited Loading

Choose a reason for hiding this comment

mattcosta7 Dec 1, 2023

Choose a reason for hiding this comment

mattcosta7 commented Dec 1, 2023

mattcosta7 commented Dec 3, 2023

mattcosta7 commented Nov 30, 2023 •

edited

Loading

kettanaito Nov 30, 2023 •

edited

Loading

kettanaito Nov 30, 2023 •

edited

Loading