Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: cache parsed requests by request #1900

Closed
wants to merge 1 commit into from

Conversation

mattcosta7
Copy link
Contributor

@mattcosta7 mattcosta7 commented Nov 30, 2023

Mostly opening for some discussion, but if we decide this path is reasonable, we should be able to land this.

Leaving in draft state for now, until I get some cycles to make a repro and do a bit of profiling to validate the assumption that this is worth it.

Improve on performance when many graphql handlers are defined and the actual handler is towards the bottom.

PR should look much smaller when viewed without whitespce.

I noticed in a project with many grapqhl handlers that it seemed resolution was a bit slow.

It turns out that we re-parse and re-log the same request for every handler, even though the resulting values produced are identical. This takes a slightly modified approach, where we'll only parse a single time per request object, and immediately return that result if it exists for graphql handlers

This should save on memory allocation and parsing, because we do fewer request.clone() and .text() | .json() calls to parse those, as well as on time since we avoid parsing per handler and instead only parse the result once.

If the request objects aren't identical, we'll get new responses, and we cache in a WeakMap, so once the request is gc'd memory in the cache will get cleared


if (parsedResult instanceof Error) {
const requestPublicUrl = getPublicUrlFromRequest(request)
parsedGraphQLRequestCache.set(request, undefined)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the only functional difference, since we'll throw on the first request only and undefined will be the result for other requests, which should avoid spamming the logs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this still leads to the log occuring, and may be better anyway, since 1 log per request instead of 1 log per handler

@@ -167,39 +167,44 @@ async function getGraphQLInput(request: Request): Promise<GraphQLInput | null> {
}
}

const parsedGraphQLRequestCache = new WeakMap<Request, ParsedGraphQLRequest>()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A silly question here: why not lift the caching to the *Handler individual classes and make it apply for HTTP requests too? The issue you describes is in no way exclusive to the GraphQL handler, it simply surfaces there better.

I think this is a good proposal, and we should add something like private cache: WeakMap<Request, X> onto the HttpRequest/GraphQL request classes. I don't think this belongs to the generic RequestHandler though.

Copy link
Member

@kettanaito kettanaito Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'm open to discussing this in the context of RequestHandler class too. I suppose there aren't many cases when we wouldn't want to cache the parsing result using the intercepted Request as the key.

If we move this to the RH class, all userland handlers will benefit from caching automatically. If we decide to go this route, we should make the cache property protected instead.

The main task at hand is to figure out the cases when caching would be harmful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! I think that's a very natural followup to make this more generalized.

Right now there are a few issues with doing that, which i'll dig into a bit as well. I don't think they're insurmountable, but just issues as it stands now

  1. We can't define these on the handler classes, because we want to spread the result across handlers.

if there are 10 graphql handlers defined, we only want to parse the request once, and share that across all 10 of the handlers. If we add the cache to the handler class, there will be a single cache per handler, which puts us back where we are now - where parsing occurs in each handler instead of sharing across them

  1. RH parse doesn't implement anything yet (and should probably be abstract? - not an issue, just an observation) but because of this there's no simple place to define this (and it probably won't go on this class or at least would appear different if we did)

/**
* Parse the intercepted request to extract additional information from it.
* Parsed result is then exposed to other methods of this request handler.
*/
async parse(_args: {
request: Request
resolutionContext?: ResponseResolutionContext
}): Promise<ParsedResult> {
return {} as ParsedResult
}

  1. HttpHandlers really only match url and extract cookies

async parse(args: {
request: Request
resolutionContext?: ResponseResolutionContext
}) {
const url = new URL(args.request.url)
const match = matchRequestUrl(
url,
this.info.path,
args.resolutionContext?.baseUrl,
)
const cookies = getAllRequestCookies(args.request)
return {
match,
cookies,
}
}

We could similarly cache the cookie parsing here, but the url match is a per-handler and not per-request match.

In the end we have 2 slight variations:

Stuff that we parse per request and stuff that we parse per handler, so mostly this would be around helping to manage that efficiently.

graphql parsing is more expensive, so handling that first, and then bringing that elsewhere is probably the lowest cost to highest value - but definitely think there's a pattern we could pull on here for more request caching.

Copy link
Member

@kettanaito kettanaito Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, a great point!

This really falls under a bit larger refactoring I had in mind for getResponse. Maybe we can discuss and kick off that?

Regarding no. 2, the caching would happen in the .run() method anyway. This way the base class would ensure consistent caching across different overrides of the parse/predicate methods.

I see that the cache should come from above. That's why I mentioned the refactoring. I think the getResponse() function should be made into a class. An orchestrator of sorts. It would hold the cache, perhaps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1901 (comment)

One thing that's a bit more intricate here, is the cache is a bit more complicated if we pass it around.

Each Request Handler type would have a separate bit of caching. We might not need to generalize this in practice though, and optimizing a few distinct methods might be less effort for the major payout (as I mentioned on that discussion)

@mattcosta7
Copy link
Contributor Author

Super naiive test example:

2 pages

  • 2 buttons
    • graphql request that matches first handler
    • graphql request that matches last handler
  • 2 buttons
    • http request that matches first handler
    • http request that matches last handler

Click first wait a few seconds after response
Click last

On each page. I cleared handlers in between the request type pages.

these are single runs, but pretty close to general experience over time. I just haven't bene more scientific due to time

Notice, graphql responses take significantly more time, and that time grows as the number of defined handlers increases.

Screen Shot 2023-12-01 at 8 46 39 AM

Notice, http handlers take a much smaller amount of extra time

Screen Shot 2023-12-01 at 8 44 37 AM

@mattcosta7
Copy link
Contributor Author

closing this one in favor of experiment in #1905

@mattcosta7 mattcosta7 closed this Dec 3, 2023
@kettanaito kettanaito deleted the proposal-cache-parsed-request branch March 15, 2024 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants