-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(engine-core): implement static parts #3694
Conversation
// in depth-first traversal order. | ||
if (!(ref in refVNodes) || refVNodes[ref].key < vnode.key) { | ||
refVNodes[ref] = vnode; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably need to confirm that, by removing this check and just using the normal rendering order, I didn't introduce an observable change into how conflicting lwc:ref
s are resolved.
It turns out that this logic was wrong anyway, because we cannot assume that key
s are in depth-first traversal order. Sometimes the key is a string:
lwc/packages/@lwc/template-compiler/src/codegen/index.ts
Lines 628 to 635 in 9e3bdd4
if (slotParentName !== undefined) { | |
// Prefixing the key is necessary to avoid conflicts with default content for the | |
// slot which might have similar keys. Each vnode will always have a key that starts | |
// with a numeric character from compiler. In this case, we add a unique notation | |
// for slotted vnodes keys, e.g.: `@foo:1:1`. Note that this is *not* needed for | |
// dynamic keys, since `api.k` already scopes based on the iteration. | |
key = `@${slotParentName}:${key}`; | |
} |
This might not be super-observable anyway, because 1) refs are a new, under-utilized features, and 2) it's pretty uncommon for them to conflict in the first place. But it would be nice to have a consistent ordering here and to know that it hasn't changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I think about this, I'm almost certain that this introduces an observable change, since we messed up key
comparison due to slots. But the scenario is so unlikely (duplicate lwc:ref
s, one of them is in a slotted content) that I don't think we need to worry too much about it.
The new behavior is more consistent, and it's less code, so I think we should just try to sneak it in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK actually, I guess this isn't an issue, because those string keys only apply to default slotted content, and we don't allow lwc:ref
s on that. I.e. this will throw a compile-time error:
<template>
<slot name="bar" lwc:ref="foo"></slot>
</template>
...which is good, because that <slot>
has a string key, so the ordering would be based on the slot name (bar
) and not the tree-traversal order.
while (!isNull(child)) { | ||
stack.unshift(child); | ||
child = previousSibling(child); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could have avoided having to add previousSibling
to the Renderer
by using firstChild
/nextSibling
instead, but then I would have had to either create a temporary array and do stack.unshift(...tempArray)
or use stack.splice()
. Neither option seemed great perf-wise. In any case this function takes up like 1ms total in our entire Karma benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: benchmark which is fastest:
- previousSibling/lastChild/unshift
- nextSibling/firstChild/unshift from temporary array
- nextSibling/firstChild/splice
return this._renderApiCall(RENDER_APIS.staticPart, [ | ||
t.literal(partId), | ||
t.objectExpression(databagProperties), | ||
]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally considered serializing this as a function rather than as an object. This would be more similar to Solid, which actually generates the minimal firstChild
/nextSibling
code, avoiding needing to traverse the entire tree. However, this causes several problems:
- The
on
listeners need to be executed synchronously in order for reactivity to work properly, whereas the function would normally be called much later when theelm
is actually ready, so we would have to do some fancy hoisting here. - We can't call
*Child
/*Sibling
directly – we have to use the renderer. This makes the codegen way more complex. - Generating the "minimal" sibling/child traversal code is difficult for us because of the renderer, because calling
render.nextSibling(node)
prevents Terser from dead-code-elimination (unlikenode.nextSibling
). The only solution would be to mark the function asPURE
, butastring
does not support comments in the cases where we would need it, so we would need to contribute this toastring
.
So in general, it was way simpler to do an object here, and I'm not sure the perf gain from generating the traversal would even be worth it. It would also lead to more bloated code output versus just putting one single generic traversal function in the engine-core
.
readonly partId: number; | ||
readonly data: VStaticPartData; | ||
elm: Element | undefined; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feedback from Dale: can we extract the elm
and data
from this interface into something shared with the normal VNode
/VElement
interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this, but I think the types have sufficiently little overlap that it actually makes the TypeScript quite a bit more complex to DRY this out. I tried having some HasData
and HasElement
and HasNode
"mixins" but I think this is actually more cumbersome than just copy-pasting the elm: Element | undefined;
and data: ...
props.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nolanlawson LGTM!
Is there a plan to add perf benchmarks in a future PR? 😄
@jmsjtu I did write a benchmark (nolanlawson@d427867), but I declined to put it in this PR, because I figured it's not worth testing on every commit. The benchmark is hyper-tailored to this particular optimization. |
Details
This is step 1 of doing #3624 – optimizing how we render semi-static DOM trees, taking some inspiration from Solid and Lit.
Basically, this optimizes static vnodes beyond the optimizations for event listeners (#3519) and refs (#3550). Rather than just optimizing listeners/refs at the top level, we can deep-traverse into static DOM trees and apply listeners/refs to elements inside.
To do so, I introduce a concept called "static parts," which is similar to what Lit calls template parts. (Total coincidence! I came up with the name before I realized Lit used the same name. 😆 )
Design
Consider a template with many deep
lwc:ref
s:This is one big static fragment, but we need to be able to traverse inside to set the refs. So we can have an array of static parts, which look like this:
api_static_part
takes two arguments:partId
, which is an integer corresponding to the order of the element inside a depth-first traversal of the tree (starting from the fragment root).data
bag for that element, which right now is justref
/on
(listener). In the future, this could be much more props, e.g. text content, attributes, etc.Perf improvement
I wrote a small benchmark to demonstrate the gain (nolanlawson@d427867). (I did not put this benchmark in this PR, because I don't think it's worth running over and over again.)
The improvement is 14-17%:
The template for the component looks like this:
click to see
If you compare the compiled output before and after, you can see that, instead of many small fragments, we have one big fragment with parts.
Before:
click to see
After:
click to see
Does this pull request introduce a breaking change?
Does this pull request introduce an observable change?