Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(engine-core): implement static parts #3694

Merged
merged 11 commits into from
Oct 6, 2023
Merged

Conversation

nolanlawson
Copy link
Collaborator

Details

This is step 1 of doing #3624 – optimizing how we render semi-static DOM trees, taking some inspiration from Solid and Lit.

Basically, this optimizes static vnodes beyond the optimizations for event listeners (#3519) and refs (#3550). Rather than just optimizing listeners/refs at the top level, we can deep-traverse into static DOM trees and apply listeners/refs to elements inside.

To do so, I introduce a concept called "static parts," which is similar to what Lit calls template parts. (Total coincidence! I came up with the name before I realized Lit used the same name. 😆 )

Design

Consider a template with many deep lwc:refs:

<template>
    <div>
        <div></div>
        <div>
            <div>
                <div lwc:ref="foo"></div>
                <div lwc:ref="baz"></div>
            </div>
        </div>
        <div lwc:ref="bar"></div>
        <div lwc:ref="quux"></div>
    </div>
</template>

This is one big static fragment, but we need to be able to traverse inside to set the refs. So we can have an array of static parts, which look like this:

const stc0 = {ref: "foo"};
const stc1 = {ref: "baz"};
const stc2 = {ref: "bar"};
const stc3 = {ref: "quux"};
api_static_fragment($fragment1(), 1, [
      api_static_part(4, stc0),
      api_static_part(5, stc1),
      api_static_part(6, stc2),
      api_static_part(7, stc3),
    ])

api_static_part takes two arguments:

  1. The partId, which is an integer corresponding to the order of the element inside a depth-first traversal of the tree (starting from the fragment root).
  2. The data bag for that element, which right now is just ref/on (listener). In the future, this could be much more props, e.g. text content, attributes, etc.

Perf improvement

I wrote a small benchmark to demonstrate the gain (nolanlawson@d427867). (I did not put this benchmark in this PR, because I don't think it's worth running over and over again.)

The improvement is 14-17%:

Screenshot 2023-08-31 at 3 19 41 PM

The template for the component looks like this:

click to see
<template>
    <div>
        <div>
            <div>
              <div onclick={onClick}></div>
            </div>
        </div>
        <div>
            <div>
                <div onclick={onClick}></div>
            </div>
        </div>
        <div>
            <div>
                <div onclick={onClick}></div>
            </div>
        </div>
        <div>
            <div>
                <div onclick={onClick}></div>
            </div>
        </div>
        <div>
            <div>
                <div onclick={onClick}></div>
            </div>
        </div>
    </div>
</template>

If you compare the compiled output before and after, you can see that, instead of many small fragments, we have one big fragment with parts.

Before:

click to see

Screenshot 2023-08-31 at 3 24 51 PM

After:

click to see

Screenshot 2023-08-31 at 3 24 39 PM

Does this pull request introduce a breaking change?

  • ✅ No, it does not introduce a breaking change.

Does this pull request introduce an observable change?

  • ✅ No, it does not introduce an observable change.

@nolanlawson nolanlawson requested a review from a team as a code owner August 31, 2023 22:47
// in depth-first traversal order.
if (!(ref in refVNodes) || refVNodes[ref].key < vnode.key) {
refVNodes[ref] = vnode;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably need to confirm that, by removing this check and just using the normal rendering order, I didn't introduce an observable change into how conflicting lwc:refs are resolved.

It turns out that this logic was wrong anyway, because we cannot assume that keys are in depth-first traversal order. Sometimes the key is a string:

if (slotParentName !== undefined) {
// Prefixing the key is necessary to avoid conflicts with default content for the
// slot which might have similar keys. Each vnode will always have a key that starts
// with a numeric character from compiler. In this case, we add a unique notation
// for slotted vnodes keys, e.g.: `@foo:1:1`. Note that this is *not* needed for
// dynamic keys, since `api.k` already scopes based on the iteration.
key = `@${slotParentName}:${key}`;
}

This might not be super-observable anyway, because 1) refs are a new, under-utilized features, and 2) it's pretty uncommon for them to conflict in the first place. But it would be nice to have a consistent ordering here and to know that it hasn't changed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I think about this, I'm almost certain that this introduces an observable change, since we messed up key comparison due to slots. But the scenario is so unlikely (duplicate lwc:refs, one of them is in a slotted content) that I don't think we need to worry too much about it.

The new behavior is more consistent, and it's less code, so I think we should just try to sneak it in.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK actually, I guess this isn't an issue, because those string keys only apply to default slotted content, and we don't allow lwc:refs on that. I.e. this will throw a compile-time error:

<template>
  <slot name="bar" lwc:ref="foo"></slot>  
</template>

...which is good, because that <slot> has a string key, so the ordering would be based on the slot name (bar) and not the tree-traversal order.

while (!isNull(child)) {
stack.unshift(child);
child = previousSibling(child);
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could have avoided having to add previousSibling to the Renderer by using firstChild/nextSibling instead, but then I would have had to either create a temporary array and do stack.unshift(...tempArray)or use stack.splice(). Neither option seemed great perf-wise. In any case this function takes up like 1ms total in our entire Karma benchmark.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: benchmark which is fastest:

  1. previousSibling/lastChild/unshift
  2. nextSibling/firstChild/unshift from temporary array
  3. nextSibling/firstChild/splice

return this._renderApiCall(RENDER_APIS.staticPart, [
t.literal(partId),
t.objectExpression(databagProperties),
]);
}
Copy link
Collaborator Author

@nolanlawson nolanlawson Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally considered serializing this as a function rather than as an object. This would be more similar to Solid, which actually generates the minimal firstChild/nextSibling code, avoiding needing to traverse the entire tree. However, this causes several problems:

  1. The on listeners need to be executed synchronously in order for reactivity to work properly, whereas the function would normally be called much later when the elm is actually ready, so we would have to do some fancy hoisting here.
  2. We can't call *Child/*Sibling directly – we have to use the renderer. This makes the codegen way more complex.
  3. Generating the "minimal" sibling/child traversal code is difficult for us because of the renderer, because calling render.nextSibling(node) prevents Terser from dead-code-elimination (unlike node.nextSibling). The only solution would be to mark the function as PURE, but astring does not support comments in the cases where we would need it, so we would need to contribute this to astring.

So in general, it was way simpler to do an object here, and I'm not sure the perf gain from generating the traversal would even be worth it. It would also lead to more bloated code output versus just putting one single generic traversal function in the engine-core.

readonly partId: number;
readonly data: VStaticPartData;
elm: Element | undefined;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback from Dale: can we extract the elm and data from this interface into something shared with the normal VNode/VElement interface.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this, but I think the types have sufficiently little overlap that it actually makes the TypeScript quite a bit more complex to DRY this out. I tried having some HasData and HasElement and HasNode "mixins" but I think this is actually more cumbersome than just copy-pasting the elm: Element | undefined; and data: ... props.

Copy link
Member

@jmsjtu jmsjtu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nolanlawson LGTM!

Is there a plan to add perf benchmarks in a future PR? 😄

@nolanlawson
Copy link
Collaborator Author

@jmsjtu I did write a benchmark (nolanlawson@d427867), but I declined to put it in this PR, because I figured it's not worth testing on every commit. The benchmark is hyper-tailored to this particular optimization.

@nolanlawson nolanlawson merged commit 54c64e2 into master Oct 6, 2023
11 checks passed
@nolanlawson nolanlawson deleted the nolan/deep-static branch October 6, 2023 23:23
nolanlawson added a commit that referenced this pull request Oct 13, 2023
Effectively reverts #3694

Fixes #3796

It keeps a bunch of the tests, however, and also adds some new tests to make sure we don't regress again in the future.

This reverts commit 54c64e2.
@nolanlawson nolanlawson mentioned this pull request Oct 24, 2023
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants