[experiment] chore: performance updates #1905

mattcosta7 · 2023-12-03T13:19:16Z

Expanding from: #1900

I was interested in trying to optimize the handleRequest pipeline without making substantial changes. I hope this can help inform some future updates suggested in #1901 and other related areas.

I don't expect this branch to land directly at all, but solely to be used as discovery

Takeaways

We can make significant improvements to performance and scalability with a few relatively small code changes.

Prior to executing handlers create a requestUrl = new URL(request.url) and mainRequestRef = request.clone()passing them directly to each handler. We may not even need amainRequestRefat all, as it doesn't seem used anywhere directly. We might not need to memoizenew URL` if we extract that outside of the handlers, but we'd still have this cost where we might not need it as frequently
Cache matchRequestUrl lazily. Every request has to hit all paths, but once that occurs we'll be able to precompute the matches later, so we should be ok to re-use those after initial matching, giving a good cache key
Cache parseGraphQLRequest results across handlers for a given request. Should we also cache the actual query parse here, which may be the same for multiple variables, where that input might currently not be identical - we should verify this (I did not try that yet)
Cache getAllRequestCookies - this is actually cheap compared to everything else discussed above

Test structure

Before: MSW v2.0.9 (with the unshift fix from this branch)
After: this branch

Results are described as a single run, and not a many run average, but they are very stable results so they're indicative of larger test windows

Using my m1 macbook pro, latest chrome. 2 tabs open (this issue and a vite sample project - code below)
100,000 handlers configured.

Fresh page load:

Perform a single http request to the first handler.
Perform a single http request to the last handler.
Perform a second http request to the last handler

Fresh page load:

Perform a single graphql request to the first handler.
Perform a single graphql request to the last handler.
Perform a second graphql request to the last handler

Before

Http

1st handler: 5ms
Last handler (1x): 10.42s
Last handler (2x): 10.97s

Graphql

1st handler: 6ms
Last handler (1x): 13.20s
Last handler (2x): 13.15s

After

Http

1st handler: 5ms
Last handler (1x): 2.41s
Last handler (2x): 818ms

Graphql

1st handler: 6ms
Last handler (1x): 1.83s
Last handler (2x): 1.81s

Findings

1st handler results stay about identical.
Last handler results improve drastically

HTTP
- last handler first request - 76.3%
- last handler last request - 92.5%
- average last handler improvement - 84.98%
GraphQL
- last handler first request - 86.1%
- last handler last request - 86.2%
- average last handler improvement - 86.15%

Obviously these results are extreme because of the number of handlers, but even for more moderate handler numbers these results are very promising

When a smaller, 50 handler set is used (raw numbers only)

Before:

Http
- first handler - 4ms
- last handler (1x) - 13ms
- last handler (2x) - 13ms
GrapQL
- first handler - 5ms
- last handler (1x) - 15ms
- last handler (2x) - 16ms

After:

Http
- first handler - 4ms
- last handler (1x) - 6ms
- last handler (2x) - 6ms
GrapQL
- first handler - 4ms
- last handler (1x) - 6ms
- last handler (2x) - 6ms

Description of changes

Setup changes, equal across tests

src/core/SetupApi.ts and src/core/utils/internal/requestHandlerUtils.ts: Instead if calling unshift with a spread, loop through the handlers in reverse and unshift them individually. This avoids maximum stack issues when a large number of handlers are 'use'd. Spreading forces the arguments all onto the stack, which is why it overflows. [1] [2]
tsconfig.base.json: Updated the TypeScript target version from ES6 to ES2020. These made debugging/testing simpler, since we don't create generators for promise resolution

Performance improvements:

src/core/utils/memoizedUrl.ts: Introduced a memoizedUrl function to create and cache URL objects, reducing the overhead of repeatedly creating new URL objects.
src/core/handlers/RequestHandler.ts: Reduced unnecessary request cloning by introducing a mainRefCache to store a reference to the original request. We probably don't need to clone this at all, but minimizing changes to call signatures meant we needed to provide a clone to the execution result for the one handler that matches. We don't seem to read this, so maybe we don't need it. If we do can we do this prior to starting the resolution pipeline? [1] [2] [3]
src/core/utils/internal/parseGraphQLRequest.ts: Added a caching mechanism to the parseGraphQLRequest function to store and reuse the parsed result of a GraphQL request. [1] [2] [3]
src/core/utils/matching/matchRequestUrl.ts: Introduced a cache in the matchRequestUrl function to store and reuse the result of URL-path matching. [1] [2]
src/core/utils/request/getRequestCookies.ts: Added a cache to the getAllRequestCookies function to store and reuse the parsed cookies from a request. [1] [2]

Code used in test setup:

import { createRoot } from "react-dom/client";
import { setupWorker } from "msw/browser";
import { Outlet, RouterProvider, createBrowserRouter } from "react-router-dom";
import { HttpResponse, graphql, http } from "msw";

const worker = setupWorker();
const FIRST = 1;
const LAST = 100_000;
const Layout = () => {
  return (
    <main>
      <nav>
        <ul>
          <li>
            <a href="/graphql">GraphQL</a>
          </li>
          <li>
            <a href="/http">HTTP</a>
          </li>
        </ul>
      </nav>
      <Outlet />
    </main>
  );
};

function GraphQL() {
  return (
    <>
      <h1>GraphQL</h1>
      <button
        onClick={() => {
          makeGraphqlRequest(FIRST);
        }}
      >
        {FIRST}
      </button>
      <button
        onClick={() => {
          makeGraphqlRequest(LAST);
        }}
      >
        {LAST}
      </button>
    </>
  );
}

function Http() {
  return (
    <>
      <h1>Http</h1>
      <button
        onClick={() => {
          makeHttpRequest(FIRST);
        }}
      >
        {FIRST}
      </button>
      <button
        onClick={() => {
          makeHttpRequest(LAST);
        }}
      >
        {LAST}
      </button>
    </>
  );
}

worker.start().then(() => {
  const router = createBrowserRouter([
    {
      path: "/",
      element: <Layout />,
      children: [
        {
          path: "graphql",
          element: <GraphQL />,
          loader() {
            worker.restoreHandlers();
            worker.use(
              ...Array.from({ length: LAST }, (_, i) => {
                return graphql.query(`GetUser${i + 1}`, () => {
                  return HttpResponse.json({
                    data: {
                      user: {
                        id: i + 1,
                      },
                    },
                  });
                });
              })
            );
            return {};
          },
        },
        {
          path: "http",
          element: <Http />,
          loader() {
            worker.restoreHandlers();
            worker.use(
              ...Array.from({ length: LAST }, (_, i) => {
                return http.post(`/http/${i + 1}`, () => {
                  return HttpResponse.json({
                    data: {
                      user: {
                        id: i + 1,
                      },
                    },
                  });
                });
              })
            );
            return {};
          },
        },
      ],
    },
  ]);
  createRoot(document.getElementById("app")!).render(
    <RouterProvider router={router} />
  );
});

// share identical request shape and data to minimze graphql<-> http differences
function toRequestInit(id: number): RequestInit {
  return {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      query: `
            query GetUser${id} {
              user {
                id
              }
            }
          `,
    }),
  };
}

function makeGraphqlRequest(id: number) {
  return fetch("/graphql", toRequestInit(id));
}

function makeHttpRequest(id: number) {
  return fetch(`/http/${id}`, toRequestInit(id));
}

mattcosta7 · 2023-12-03T13:28:29Z

src/core/SetupApi.ts

@@ -59,7 +59,9 @@ export abstract class SetupApi<EventsMap extends EventMap> extends Disposable {
      ),
    )

-    this.currentHandlers.unshift(...runtimeHandlers)


I tested a case of applying a milion handlers, which failed due to the stack size, since spread places all of the array items onto the stack

mattcosta7 · 2023-12-03T13:28:58Z

src/core/handlers/GraphQLHandler.ts

@@ -138,7 +139,7 @@ export class GraphQLHandler extends RequestHandler<
     * If the request doesn't match a specified endpoint, there's no
     * need to parse it since there's no case where we would handle this
     */
-    const match = matchRequestUrl(new URL(args.request.url), this.endpoint)


URL parsing is kind of expensive, and we can memoize this easily

mattcosta7 · 2023-12-03T13:29:50Z

src/core/handlers/RequestHandler.ts

+     * We don't want to copy this for _every_ handler, as it
+     * is expensive to do so.
+     */
+    const mainRequestRef = (() => {


ugly for now, but cloning on every handler is actually quite expensive (one of the most expensive operations, attributing to the growth of both event listeners (for aborts), memory and time

mattcosta7 · 2023-12-03T13:30:31Z

src/core/utils/internal/parseGraphQLRequest.ts

 /**
 * Determines if a given request can be considered a GraphQL request.
 * Does not parse the query and does not guarantee its validity.
 */
 export async function parseGraphQLRequest(
  request: Request,
 ): Promise<ParsedGraphQLRequest> {
+  if (cache.has(request)) return cache.get(request)


graphql parsing is shared across all graphql handlers, and onUnhandledRequest, so doing this once per requst is a great optimization

mattcosta7 · 2023-12-03T13:31:24Z

src/core/utils/matching/matchRequestUrl.ts

 /**
 * Returns the result of matching given request URL against a mask.
 */
 export function matchRequestUrl(url: URL, path: Path, baseUrl?: string): Match {
+  const key = `${url}|${path}|${baseUrl}`


request matching is expensive, but for any url/path/base triplet, the result is always identical, so we can eat this cost once and cache it

This cache works across handlers, since handlers can fallthrough to the same urls

We also would want to use a better key than this, since we'll strip url params/hashes/etc before mathching. Maybe we can extract some logic here and only do this 'url preparation' once per request vs per attempt to match

mattcosta7 · 2023-12-03T13:31:44Z

src/core/utils/memoizedUrl.ts

@@ -0,0 +1,9 @@
+const cache = new Map<string, URL>()
+
+export function memoizedUrl(url: string, base?: string): URL {


new URL is costly, but we have the same urls frequently since every handler calls this

We could probably generate this before executing handlers, and pass it around instead of caching it (my preference)

mattcosta7 · 2023-12-03T13:32:45Z

src/core/utils/request/getRequestCookies.ts

@@ -37,7 +38,9 @@ export function getRequestCookies(request: Request): Record<string, string> {
  }
 }

+const cache = new WeakMap<Request, Record<string, string>>()
 export function getAllRequestCookies(request: Request): Record<string, string> {


reading cookies isn't that expensive but it's idempotent to a single request, so we should be good to cache this

mattcosta7 · 2023-12-04T15:23:19Z

What areas would be next for improvements?

Currently potential handlers are evaluated linearly based on when they were defined. This puts a minimum cost bottleneck that scales with the number of handlers (and the relative location in those handlers of the one that finally matches).

What does this mean?

the first active handler is always going to resolve almost immediately. The last handler is always going to resolve in at minimum O(n) because each handler has to evaluate something to get there.

Can we re-structure this to be more efficient?

Express and similar tools utilize trie structures to handle matching.

A Trie is also an O(n) evaluation model, but the evaluation is done against the length of segments of the url being matched instead of the number of handlers being defined.

This would be a signficiant improvement in some cases, and likely a minimal cost in others, especially since the evaluation cost should be lower than just path extraction.

We'll have some slightly interesting considerations here to still maintain the use type overriding, but I think something like this would be the next step to get the time for any handler to be hit to become stable (vs taking longer if they were defined first)

mattcosta7 · 2024-01-06T06:13:37Z

closing in favor of #1953
closing after #1914

chore: performance updates

af08aca

mattcosta7 changed the title ~~chore: performance updates~~ [experiment] chore: performance updates Dec 3, 2023

mattcosta7 mentioned this pull request Dec 3, 2023

chore: cache parsed requests by request #1900

Closed

mattcosta7 added 2 commits December 3, 2023 08:25

chore: green ci

dce9bbf

chore: doc weird cache

bcd3a1f

mattcosta7 commented Dec 3, 2023

View reviewed changes

mattcosta7 requested a review from kettanaito December 3, 2023 14:15

mattcosta7 mentioned this pull request Dec 6, 2023

test: reproduction of MaxListenersExceededWarning in msw/node #1910

Closed

kettanaito added the performance label Jan 3, 2024

mattcosta7 mentioned this pull request Jan 6, 2024

fix: additional caching for improved handler performance #1953

Closed

mattcosta7 closed this Jan 6, 2024

kettanaito deleted the performance-improvement branch March 15, 2024 09:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[experiment] chore: performance updates #1905

[experiment] chore: performance updates #1905

mattcosta7 commented Dec 3, 2023 •

edited

Loading

mattcosta7 Dec 3, 2023

mattcosta7 Dec 3, 2023

mattcosta7 Dec 3, 2023 •

edited

Loading

mattcosta7 Dec 3, 2023

mattcosta7 Dec 3, 2023 •

edited

Loading

mattcosta7 Dec 3, 2023

mattcosta7 Dec 3, 2023 •

edited

Loading

mattcosta7 Dec 3, 2023

mattcosta7 commented Dec 4, 2023

mattcosta7 commented Jan 6, 2024

		@@ -0,0 +1,9 @@
		const cache = new Map<string, URL>()

		export function memoizedUrl(url: string, base?: string): URL {

[experiment] chore: performance updates #1905

[experiment] chore: performance updates #1905

Conversation

mattcosta7 commented Dec 3, 2023 • edited Loading

Takeaways

Test structure

Before

Http

Graphql

After

Http

Graphql

Findings

Description of changes

Code used in test setup:

mattcosta7 Dec 3, 2023

Choose a reason for hiding this comment

mattcosta7 Dec 3, 2023

Choose a reason for hiding this comment

mattcosta7 Dec 3, 2023 • edited Loading

Choose a reason for hiding this comment

mattcosta7 Dec 3, 2023

Choose a reason for hiding this comment

mattcosta7 Dec 3, 2023 • edited Loading

Choose a reason for hiding this comment

mattcosta7 Dec 3, 2023

Choose a reason for hiding this comment

mattcosta7 Dec 3, 2023 • edited Loading

Choose a reason for hiding this comment

mattcosta7 Dec 3, 2023

Choose a reason for hiding this comment

mattcosta7 commented Dec 4, 2023

mattcosta7 commented Jan 6, 2024

mattcosta7 commented Dec 3, 2023 •

edited

Loading

mattcosta7 Dec 3, 2023 •

edited

Loading

mattcosta7 Dec 3, 2023 •

edited

Loading

mattcosta7 Dec 3, 2023 •

edited

Loading