-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profoundly improve tree-shaking performance #2119
Conversation
Also to note: I have made sure that in order to re-use an existing rollup AST, all that should be necessary is to call |
😍 This is really awesome work, @lukastaegert! Looking forward to this being merged! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably these comments are already thought through, and won't amount to much.
src/ast/nodes/Program.ts
Outdated
this.included = true; | ||
for (const node of this.body) { | ||
if (node.shouldBeIncluded() && node.include()) { | ||
anotherPassNeeded = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor opt: does this require a break
when it finds at least one true
clause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, if we break once the first include
states that we need another pass, this will destroy performance. include
is meant to include everything we know we need based on the currently included variables. The return value is basically a flag indicating that new variables have been added and previous includes could possibly include more statements.
However if you break as soon as a variable is added, a new tree-shaking pass will start for each added variable, no matter how local it is. This will result in hundreds of tree-shaking passes instead of maybe a dozen. Even though those passes will on average be twice as fast (due to the early bail out), this will basically stall everything. The current goal is to instead include as much as possible as early as possible to reduce the number of passes. Also for many types of included statements, the side-effect detection can quickly skip over these statements.
However I agree that it is confusing that a method that has side-effects (i.e. including the children) also has a return value. I think I should rework this to instead let the nodes that add variables set a "dirty" flag on the module to indicate another pass is needed.
src/ast/nodes/SequenceExpression.ts
Outdated
this.included = true; | ||
for (let i = 0; i < this.expressions.length - 1; i++) { | ||
const node = this.expressions[i]; | ||
if (node.shouldBeIncluded() && node.includeInBundle()) addedNewNodes = true; | ||
if (node.shouldBeIncluded() && node.include()) anotherPassNeeded = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this loop also can break out early if one condition matches? Likely won't be an issue if this.expressions
is a small array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated above, this is a bad idea.
src/ast/nodes/shared/Node.ts
Outdated
(<GenericEsTreeNode>this)[key] = []; | ||
for (const child of value) { | ||
(<GenericEsTreeNode>this)[key].push( | ||
child && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this mean that there need to be checks for boolean
wherever this[key]
is looped over? Instead, if the if (child)
check is moved out of the push
, won't the array structure remain consistent with the type of the items? Was it already considered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was considered. The situation is e.g. an array pattern with wholes, i.e.
const [,a,,b] = [1, 2, 3, 4]
Acorn will represent this as [null, Identifier, null, Identifier]
. The check will make sure that if the child is null
, then null
will be pushed, otherwise a new Identifier
instance will be pushed. null
is in fact the only non-entity value I am aware of that can occur in Acorn ESTree node array fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps an explicit child !== null
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Changed the node inclusion logic to use a dirty flag directly on the graph instead of passing it up as a return value with each inclusion. Breaks encapsulation a little but makes the code much cleaner! |
src/ast/nodes/index.ts
Outdated
UpdateExpression, | ||
VariableDeclarator, | ||
VariableDeclaration, | ||
WhileStatement, | ||
YieldExpression | ||
}; | ||
|
||
export { NodeType } from './NodeType'; | ||
export default nodes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you change this to export { nodes as default }
that should avoid the circular dependency issue by ensuring a live binding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried it but it does not seem to work. As the circular dependency is still present (though with the live binding), the modules are executed in an order where the actual node definitions, e.g. ArrayExpression
, are evaluated before the base class NodeBase
is evaluated. As TypeScript then wants to synchronously access NodeBase.prototype
, rollup crashes. But maybe there is another way without passing it explicitly as a parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is the first access to fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the NodeBase.prototype
access in ArrayExpression
:)
In order to improve encapsulation, I currently investigate if instead of attaching the module to each node, I attach a module proxy object with all the interface that the nodes need. That way we could reduce the API surface of Module and Graph but I could also just attach the node constructors there to avoid the circularity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to rearrange the import statements to ensure circular execution happens in the desired order.
I just checked master and we currently have shared/Node.ts
executing before ArrayExpression.ts
, so perhaps somehow this post-order changed in this PR? If so, it should be possible to ensure shared/Node.ts
comes first in the post-order to maintain the correct circular execution order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be but I'm not a big fan. For once, I have to add an exception to ignore circular dependency warnings when building rollup (not a big issue but still). My bigger concern, however, is that new code can always break the execution order again. My new approach, which also improves encapsulation, is to create a special wrapper layer between the module and the AST/scopes/variables that also contains the node constructors. That way, ast nodes never directly access module or graph which means those can be easily refactored as long as the wrapper layer is refactored accordingly. I could even make a few more fields private on Module
.
As it turns out, instead of making things slower this wrapper layer seems to make things slightly faster, probably because all wrappers have the same shape and thus property access can be monomorphic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! This has been so needed, excellent to see things moving forward on JS engine optimization. This also inspires a bunch of stuff I want to try further (if time will ever allow!).
I left a comments out of general interest in the approach, nothing that absolutely needs changing though I don't think.
Edit: actually just the nodeConstructors live binding would be nice to see fixed.
src/ast/nodes/shared/Node.ts
Outdated
constructor( | ||
esTreeNode: GenericEsTreeNode, | ||
// we need to pass down the node constructors to avoid a circular dependency | ||
nodeConstructors: { [p: string]: typeof NodeBase }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the previous comment, it should be possible to ensure this is a live binding by switching the default export form in nodes/index.ts
.
src/ast/nodes/BlockStatement.ts
Outdated
this.scope = scope; | ||
this.initialiseNode(scope); | ||
this.initialiseChildren(scope); | ||
createScope(parentScope: Scope, preventNewScope: boolean) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of passing preventNewScope
all the way through here, could we rather reverse and simplify the bandwidth of information transfer and just have a check of if (isFunction(this.parent) || isCatch(this.parent) && this.parent.catchClause === this)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not like when child nodes need to access their parent to determine the type. I feel this somehow separates logic specific to the parent node from the parent node which is easily overlooked when refactoring the parent unless you do a full text search.
In understand, though, that two usages is probably not enough to warrant its own constructor argument so I reused a pattern we established for the rendering logic of adding a flag preventChildBlockScope
to functions and catch clauses that is checked by the block statement to determine if it should create a new scope.
src/Graph.ts
Outdated
@@ -62,6 +62,7 @@ export default class Graph { | |||
scope: GlobalScope; | |||
treeshakingOptions: TreeshakingOptions; | |||
varOrConst: 'var' | 'const'; | |||
needsTreeshakingPass: boolean = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of interest, what was the reason for this change? I actually preferred the dirty
state being passed through... I feel like we should try to reduce the surface area of Node assumptions on Module / Graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was more or less inspired by @kgrz 's comments. It just makes the include
functions so much cleaner as
- dirty flags are only added in very few places now (basically identifiers and member expressions), but
- all include functions have to handle the overhead of tracking return values
I totally agree with you that this is bad for the Graph API, though. That was originally the reason why I built it using return values.
One thing I could do is move it to the module. At the moment, there are already a few reasons for nodes to access the module, both to register imports/exports and to access certain flags. Will need to think about this a little.
src/Module.ts
Outdated
@@ -203,7 +214,7 @@ export default class Module { | |||
code: string; | |||
originalCode: string; | |||
originalSourcemap: RawSourceMap; | |||
ast: Program; | |||
ast: any; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely this can remain Program
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the problem is when casting back into this.esTreeAst
, perhaps an <any>
can be used when this.esTreeAst = <any>ast || ...
is assigned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed back to Program
src/ast/nodes/ImportDeclaration.ts
Outdated
|
||
initialise() { | ||
this.included = false; | ||
this.module.addImport(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much nicer!
|
||
bind() { | ||
this.isBound = true; | ||
if (this.bound) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Although ideally wouldn't we ensure that bind()
in general is visit-once? I'd be interested to know why we might duplicate call bind, what the logic reasons are behind that? (I know this has been here a while, just thinking about it now...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that if another node calls reassignPath
or forEachReturnExpressionAtPath
, it is possible that those methods are called on this node before it has been bound which means if this represents a namespace, the variable will not yet be assigned. This is hard to control as both methods are forwarded through variable assignments.
Another reason I would leave this in place is that, if time permits, one of my next goals is to inline binding into the actual tree-shaking so that
- unused nodes will not be bound, and
- in case unused code mutates included variables, these mutations will no longer taint the tree-shaking result and, at least in some scenarios, more code can be removed.
My expectation is that this will provide another slight performance boost.
Btw Identifier
s follow a similar logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it helps a lot to get a better idea of how this logic works. I guess it could be avoided by a multi-pass process, but this does sound more optimized.
src/ast/nodes/shared/Node.ts
Outdated
this.parseNode(esTreeNode, nodeConstructors); | ||
this.initialise(); | ||
this.module.magicString.addSourcemapLocation(this.start); | ||
this.module.magicString.addSourcemapLocation(this.end); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps set this.included = false
here rather than in each implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, this would not make a difference. However I actively decided to reset included
on each node individually so that if we want to reuse an existing AST, we just need to call initialise
on each node to get it back to its state before tree-shaking.
If you feel this is a premature change as this logic is currently not used, I can of course move this here.
[name: string]: string[]; | ||
} = { | ||
Program: ['body'], | ||
Literal: [] | ||
}; | ||
|
||
export default keys; | ||
export function getAndCreateKeys(esTreeNode: GenericEsTreeNode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps fully encapsulate the implementation here as getOrCreateKeys
where this does the check of return keys[esTreeNode.type] || keys[esTreeNode.type] = ...
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be more elegant but, as Node
construction is one of the most performance critical tasks, I wanted to prevent the function call if possible. The checking logic is still only performed once, albeit in Node.parseNode
.
src/ast/nodes/shared/Node.ts
Outdated
(<GenericEsTreeNode>this)[key] = []; | ||
for (const child of value) { | ||
(<GenericEsTreeNode>this)[key].push( | ||
child && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps an explicit child !== null
then?
} | ||
} else if (value.hasEffects(options)) return true; | ||
} | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be interested to know how much inlining these loops itself affected performance. Reason being it might be nice to have centrailzed visitor pattern at some point that is highly optimized, and this would effectively be moving away from that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is one of the ugliest parts of this refactoring as we basically have the same loop four times with only slight variations (five times if you count parseNode
but this method also does a lot more). The performance gains, however, were pretty substantial. Of the 50% less tree-shaking time, at least 15-20% can be accounted to these loops.
Unless we change everything fundamentally (and I thought about this many times already; did not come up with a good approach yet which I would expect to provide an actual performance-improvement due to the zoo of different node effects—run, called, accessed, assigned + circularity prevention mechanisms for each—that need to be taken into account), I think this is close to the best we can do. Even a magically better optimized visitor capable of execting arbitrary methods on nodes would probably still be callback based and thus need two nested function calls to e.g. call bind
on a child + the additional call to the visitor itself. I think this is as flat as it can get in the current pattern. Incidentally, call stacks are also more than halved with this approach which is not a paradigm shift but at least should not hurt.
As all loops of this kind are in Node
, it should not be difficult to change them together if we find a better approach in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the stack savings alone seem worth inlining these loops I suppose, even if it means forgoing the visitor pattern.
Btw I did experiment with a flat for-based visitor pattern on a performance branch previously to avoid stack calls and there was an absolutely negligible performance change, so that might even void the argument that a unified visitor could be more optimized.
daaa374
to
2127cc0
Compare
Rebased to the new typings now and also finally updated the source-map dependency to its latest version. I hoped this would also provide a performance boost but it seems performance did not change very much with this update. |
Had to undo the source-map dependency update as the new version seems to contain a WebAssembly reference that is incompatible with older node versions. |
0346f53
to
be8b955
Compare
Re source map, see also #2055. Perhaps it's indeed worth just removing the |
Ah, thanks! Once I stumbled upon this it sounded somewhat familiar but I could not remember what the context was. Yes, we should definitely do that! |
return this.someRelevantBranch(node => | ||
node.someReturnExpressionWhenCalledAtPath(path, callOptions, predicateFunction, options) | ||
); | ||
return this.hasUnknownLeftValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I see it is fully idempotent, not to worry then.
|
||
bind() { | ||
this.isBound = true; | ||
if (this.bound) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it helps a lot to get a better idea of how this logic works. I guess it could be avoided by a multi-pass process, but this does sound more optimized.
} | ||
} else if (value.hasEffects(options)) return true; | ||
} | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the stack savings alone seem worth inlining these loops I suppose, even if it means forgoing the visitor pattern.
Btw I did experiment with a flat for-based visitor pattern on a performance branch previously to avoid stack calls and there was an absolutely negligible performance change, so that might even void the argument that a unified visitor could be more optimized.
* Slightly more efficient algorithm to find included children
should be included but include missing parents when including a variable
* Do not ignore side-effects in if-statement condition
imports with their module
but let those nodes register themselves on the module
the future, we can just call "initialise" on all nodes to reuse a module
a WebAssembly reference that is incompatible with node 6 and earlier
be8b955
to
f942d6e
Compare
This is in response to #2050 and, apart from switching from prototype manipulations to AST re-creation, also includes many other improvements to make tree-shaking faster, see below. There is still much more I want to do but the diff was getting large and I am also not so sure how much time I will have on my hands over the next weeks to keep on working on this.
First, some measurements. This is when using rollup to build a modified (older) version of rollup that also imports (unused) ramda to have a little more dead code in our code in our code base :). The base line is current master.
Things to note:
Overall performance has not improved as much (as a percentile) due to the fact that the majority of the time is still spend within plugins, mostly TypeScript.
Detailed list of implemented improvements:
eachChild
andsomeChild
no longer exist but are done ad-hoc where needed which gives a noticeable speed boost.includeInBundle
andincludeVariable
are now bothinclude
.include
only returns true if new variables have been added and not if just any node has been added. This usually makes the last tree-shaking pass unnecessary (which was unnecessary anyway).Program
node now works like any other node and gets an included flag if and only if at least one statement has been included. This is not actually used at the moment but could be used to prevent emitting empty files when preserving modules.There is still much more I want to implement in the future to make things even faster but for now, I hope to get this into master soon first.