Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profoundly improve tree-shaking performance #2119

Merged
merged 23 commits into from
Apr 16, 2018

Conversation

lukastaegert
Copy link
Member

This is in response to #2050 and, apart from switching from prototype manipulations to AST re-creation, also includes many other improvements to make tree-shaking faster, see below. There is still much more I want to do but the diff was getting large and I am also not so sure how much time I will have on my hands over the next weeks to keep on working on this.

First, some measurements. This is when using rollup to build a modified (older) version of rollup that also imports (unused) ramda to have a little more dead code in our code in our code base :). The base line is current master.

bildschirmfoto 2018-04-11 um 07 39 48

Things to note:

  • The fact that "generate ast" is faster is probably random noise
  • "analyse ast" ist slightly slower. This is where the actual re-creation of the AST happens via copying all attributes from acorn's nodes. Also, scopes are created and populated here.
  • "analyse dependency graph" is slightly faster as identifier binding has been slightly stream-lined. In the future, I hope to move binding into the tree-shaking phase.
  • And lastly, tree-shaking is now twice as fast!

Overall performance has not improved as much (as a percentile) due to the fact that the majority of the time is still spend within plugins, mostly TypeScript.

Detailed list of implemented improvements:

  • The AST is re-created instead of switching the prototypes. This alone already provided about 15% improvement to tree-shaking despite the higher initialisation cost.
  • Node initialisation has been moved into the constructors. To perform custom initialisation, nodes can overwrite
    • Node.initialise to perform extra initialisation and e.g. declare variables or
    • Node.parseNode, which is responsible for adding the child nodes to this node
  • (Dynamic and static) imports and exports now register themselves on the module during initialisation which makes another recursion over the AST unnecessary
  • Many loops have been stream-lined and inlined. eachChild and someChild no longer exist but are done ad-hoc where needed which gives a noticeable speed boost.
  • Some method names have been improved. includeInBundle and includeVariable are now both include.
  • include only returns true if new variables have been added and not if just any node has been added. This usually makes the last tree-shaking pass unnecessary (which was unnecessary anyway).
  • When including declarations for a variable, all parents are marked as included right away. That way, we no longer need to check all children to see if a node has an included child.
  • If statement handling regarding hoisted variables in dead branches was greatly simplified
  • The Program node now works like any other node and gets an included flag if and only if at least one statement has been included. This is not actually used at the moment but could be used to prevent emitting empty files when preserving modules.

There is still much more I want to implement in the future to make things even faster but for now, I hope to get this into master soon first.

@lukastaegert
Copy link
Member Author

Also to note: I have made sure that in order to re-use an existing rollup AST, all that should be necessary is to call initialise on all nodes. This could be used to make watch mode faster; this is not done yet, though, as I was becoming a little overwhelmed by the scope of the PR and re-using the rollup AST would probably need some adjustments to the overall API.

@mourner
Copy link
Contributor

mourner commented Apr 11, 2018

😍 This is really awesome work, @lukastaegert! Looking forward to this being merged!

Copy link

@kgrz kgrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably these comments are already thought through, and won't amount to much.

this.included = true;
for (const node of this.body) {
if (node.shouldBeIncluded() && node.include()) {
anotherPassNeeded = true;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor opt: does this require a break when it finds at least one true clause?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, if we break once the first include states that we need another pass, this will destroy performance. include is meant to include everything we know we need based on the currently included variables. The return value is basically a flag indicating that new variables have been added and previous includes could possibly include more statements.

However if you break as soon as a variable is added, a new tree-shaking pass will start for each added variable, no matter how local it is. This will result in hundreds of tree-shaking passes instead of maybe a dozen. Even though those passes will on average be twice as fast (due to the early bail out), this will basically stall everything. The current goal is to instead include as much as possible as early as possible to reduce the number of passes. Also for many types of included statements, the side-effect detection can quickly skip over these statements.

However I agree that it is confusing that a method that has side-effects (i.e. including the children) also has a return value. I think I should rework this to instead let the nodes that add variables set a "dirty" flag on the module to indicate another pass is needed.

this.included = true;
for (let i = 0; i < this.expressions.length - 1; i++) {
const node = this.expressions[i];
if (node.shouldBeIncluded() && node.includeInBundle()) addedNewNodes = true;
if (node.shouldBeIncluded() && node.include()) anotherPassNeeded = true;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this loop also can break out early if one condition matches? Likely won't be an issue if this.expressions is a small array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated above, this is a bad idea.

(<GenericEsTreeNode>this)[key] = [];
for (const child of value) {
(<GenericEsTreeNode>this)[key].push(
child &&
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this mean that there need to be checks for boolean wherever this[key] is looped over? Instead, if the if (child) check is moved out of the push, won't the array structure remain consistent with the type of the items? Was it already considered?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was considered. The situation is e.g. an array pattern with wholes, i.e.

const [,a,,b] = [1, 2, 3, 4]

Acorn will represent this as [null, Identifier, null, Identifier]. The check will make sure that if the child is null, then null will be pushed, otherwise a new Identifier instance will be pushed. null is in fact the only non-entity value I am aware of that can occur in Acorn ESTree node array fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps an explicit child !== null then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@lukastaegert
Copy link
Member Author

Changed the node inclusion logic to use a dirty flag directly on the graph instead of passing it up as a return value with each inclusion. Breaks encapsulation a little but makes the code much cleaner!

UpdateExpression,
VariableDeclarator,
VariableDeclaration,
WhileStatement,
YieldExpression
};

export { NodeType } from './NodeType';
export default nodes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you change this to export { nodes as default } that should avoid the circular dependency issue by ensuring a live binding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried it but it does not seem to work. As the circular dependency is still present (though with the live binding), the modules are executed in an order where the actual node definitions, e.g. ArrayExpression, are evaluated before the base class NodeBase is evaluated. As TypeScript then wants to synchronously access NodeBase.prototype, rollup crashes. But maybe there is another way without passing it explicitly as a parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is the first access to fail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the NodeBase.prototype access in ArrayExpression:)

In order to improve encapsulation, I currently investigate if instead of attaching the module to each node, I attach a module proxy object with all the interface that the nodes need. That way we could reduce the API surface of Module and Graph but I could also just attach the node constructors there to avoid the circularity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to rearrange the import statements to ensure circular execution happens in the desired order.

I just checked master and we currently have shared/Node.ts executing before ArrayExpression.ts, so perhaps somehow this post-order changed in this PR? If so, it should be possible to ensure shared/Node.ts comes first in the post-order to maintain the correct circular execution order.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be but I'm not a big fan. For once, I have to add an exception to ignore circular dependency warnings when building rollup (not a big issue but still). My bigger concern, however, is that new code can always break the execution order again. My new approach, which also improves encapsulation, is to create a special wrapper layer between the module and the AST/scopes/variables that also contains the node constructors. That way, ast nodes never directly access module or graph which means those can be easily refactored as long as the wrapper layer is refactored accordingly. I could even make a few more fields private on Module.

As it turns out, instead of making things slower this wrapper layer seems to make things slightly faster, probably because all wrappers have the same shape and thus property access can be monomorphic.

Copy link
Contributor

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! This has been so needed, excellent to see things moving forward on JS engine optimization. This also inspires a bunch of stuff I want to try further (if time will ever allow!).

I left a comments out of general interest in the approach, nothing that absolutely needs changing though I don't think.

Edit: actually just the nodeConstructors live binding would be nice to see fixed.

constructor(
esTreeNode: GenericEsTreeNode,
// we need to pass down the node constructors to avoid a circular dependency
nodeConstructors: { [p: string]: typeof NodeBase },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the previous comment, it should be possible to ensure this is a live binding by switching the default export form in nodes/index.ts.

this.scope = scope;
this.initialiseNode(scope);
this.initialiseChildren(scope);
createScope(parentScope: Scope, preventNewScope: boolean) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of passing preventNewScope all the way through here, could we rather reverse and simplify the bandwidth of information transfer and just have a check of if (isFunction(this.parent) || isCatch(this.parent) && this.parent.catchClause === this)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like when child nodes need to access their parent to determine the type. I feel this somehow separates logic specific to the parent node from the parent node which is easily overlooked when refactoring the parent unless you do a full text search.

In understand, though, that two usages is probably not enough to warrant its own constructor argument so I reused a pattern we established for the rendering logic of adding a flag preventChildBlockScope to functions and catch clauses that is checked by the block statement to determine if it should create a new scope.

src/Graph.ts Outdated
@@ -62,6 +62,7 @@ export default class Graph {
scope: GlobalScope;
treeshakingOptions: TreeshakingOptions;
varOrConst: 'var' | 'const';
needsTreeshakingPass: boolean = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest, what was the reason for this change? I actually preferred the dirty state being passed through... I feel like we should try to reduce the surface area of Node assumptions on Module / Graph.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was more or less inspired by @kgrz 's comments. It just makes the include functions so much cleaner as

  • dirty flags are only added in very few places now (basically identifiers and member expressions), but
  • all include functions have to handle the overhead of tracking return values

I totally agree with you that this is bad for the Graph API, though. That was originally the reason why I built it using return values.

One thing I could do is move it to the module. At the moment, there are already a few reasons for nodes to access the module, both to register imports/exports and to access certain flags. Will need to think about this a little.

src/Module.ts Outdated
@@ -203,7 +214,7 @@ export default class Module {
code: string;
originalCode: string;
originalSourcemap: RawSourceMap;
ast: Program;
ast: any;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely this can remain Program?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the problem is when casting back into this.esTreeAst, perhaps an <any> can be used when this.esTreeAst = <any>ast || ... is assigned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed back to Program


initialise() {
this.included = false;
this.module.addImport(this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much nicer!


bind() {
this.isBound = true;
if (this.bound) return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Although ideally wouldn't we ensure that bind() in general is visit-once? I'd be interested to know why we might duplicate call bind, what the logic reasons are behind that? (I know this has been here a while, just thinking about it now...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that if another node calls reassignPath or forEachReturnExpressionAtPath, it is possible that those methods are called on this node before it has been bound which means if this represents a namespace, the variable will not yet be assigned. This is hard to control as both methods are forwarded through variable assignments.

Another reason I would leave this in place is that, if time permits, one of my next goals is to inline binding into the actual tree-shaking so that

  • unused nodes will not be bound, and
  • in case unused code mutates included variables, these mutations will no longer taint the tree-shaking result and, at least in some scenarios, more code can be removed.

My expectation is that this will provide another slight performance boost.

Btw Identifiers follow a similar logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it helps a lot to get a better idea of how this logic works. I guess it could be avoided by a multi-pass process, but this does sound more optimized.

this.parseNode(esTreeNode, nodeConstructors);
this.initialise();
this.module.magicString.addSourcemapLocation(this.start);
this.module.magicString.addSourcemapLocation(this.end);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps set this.included = false here rather than in each implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, this would not make a difference. However I actively decided to reset included on each node individually so that if we want to reuse an existing AST, we just need to call initialise on each node to get it back to its state before tree-shaking.

If you feel this is a premature change as this logic is currently not used, I can of course move this here.

[name: string]: string[];
} = {
Program: ['body'],
Literal: []
};

export default keys;
export function getAndCreateKeys(esTreeNode: GenericEsTreeNode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps fully encapsulate the implementation here as getOrCreateKeys where this does the check of return keys[esTreeNode.type] || keys[esTreeNode.type] = ....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be more elegant but, as Node construction is one of the most performance critical tasks, I wanted to prevent the function call if possible. The checking logic is still only performed once, albeit in Node.parseNode.

(<GenericEsTreeNode>this)[key] = [];
for (const child of value) {
(<GenericEsTreeNode>this)[key].push(
child &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps an explicit child !== null then?

}
} else if (value.hasEffects(options)) return true;
}
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be interested to know how much inlining these loops itself affected performance. Reason being it might be nice to have centrailzed visitor pattern at some point that is highly optimized, and this would effectively be moving away from that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is one of the ugliest parts of this refactoring as we basically have the same loop four times with only slight variations (five times if you count parseNode but this method also does a lot more). The performance gains, however, were pretty substantial. Of the 50% less tree-shaking time, at least 15-20% can be accounted to these loops.

Unless we change everything fundamentally (and I thought about this many times already; did not come up with a good approach yet which I would expect to provide an actual performance-improvement due to the zoo of different node effects—run, called, accessed, assigned + circularity prevention mechanisms for each—that need to be taken into account), I think this is close to the best we can do. Even a magically better optimized visitor capable of execting arbitrary methods on nodes would probably still be callback based and thus need two nested function calls to e.g. call bind on a child + the additional call to the visitor itself. I think this is as flat as it can get in the current pattern. Incidentally, call stacks are also more than halved with this approach which is not a paradigm shift but at least should not hurt.

As all loops of this kind are in Node, it should not be difficult to change them together if we find a better approach in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the stack savings alone seem worth inlining these loops I suppose, even if it means forgoing the visitor pattern.

Btw I did experiment with a flat for-based visitor pattern on a performance branch previously to avoid stack calls and there was an absolutely negligible performance change, so that might even void the argument that a unified visitor could be more optimized.

@lukastaegert
Copy link
Member Author

Rebased to the new typings now and also finally updated the source-map dependency to its latest version. I hoped this would also provide a performance boost but it seems performance did not change very much with this update.

@lukastaegert
Copy link
Member Author

Had to undo the source-map dependency update as the new version seems to contain a WebAssembly reference that is incompatible with older node versions.

@lukastaegert lukastaegert force-pushed the preserve-hidden-classes branch 2 times, most recently from 0346f53 to be8b955 Compare April 16, 2018 07:41
@mourner
Copy link
Contributor

mourner commented Apr 16, 2018

Re source map, see also #2055. Perhaps it's indeed worth just removing the source-map dependency fully in favor of just 20 lines of code.

@lukastaegert
Copy link
Member Author

Ah, thanks! Once I stumbled upon this it sounded somewhat familiar but I could not remember what the context was. Yes, we should definitely do that!

return this.someRelevantBranch(node =>
node.someReturnExpressionWhenCalledAtPath(path, callOptions, predicateFunction, options)
);
return this.hasUnknownLeftValue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see it is fully idempotent, not to worry then.


bind() {
this.isBound = true;
if (this.bound) return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it helps a lot to get a better idea of how this logic works. I guess it could be avoided by a multi-pass process, but this does sound more optimized.

}
} else if (value.hasEffects(options)) return true;
}
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the stack savings alone seem worth inlining these loops I suppose, even if it means forgoing the visitor pattern.

Btw I did experiment with a flat for-based visitor pattern on a performance branch previously to avoid stack calls and there was an absolutely negligible performance change, so that might even void the argument that a unified visitor could be more optimized.

@lukastaegert lukastaegert added this to the 0.58.0 milestone Apr 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants