Profoundly improve tree-shaking performance #2119

lukastaegert · 2018-04-11T06:05:38Z

This is in response to #2050 and, apart from switching from prototype manipulations to AST re-creation, also includes many other improvements to make tree-shaking faster, see below. There is still much more I want to do but the diff was getting large and I am also not so sure how much time I will have on my hands over the next weeks to keep on working on this.

First, some measurements. This is when using rollup to build a modified (older) version of rollup that also imports (unused) ramda to have a little more dead code in our code in our code base :). The base line is current master.

Things to note:

The fact that "generate ast" is faster is probably random noise
"analyse ast" ist slightly slower. This is where the actual re-creation of the AST happens via copying all attributes from acorn's nodes. Also, scopes are created and populated here.
"analyse dependency graph" is slightly faster as identifier binding has been slightly stream-lined. In the future, I hope to move binding into the tree-shaking phase.
And lastly, tree-shaking is now twice as fast!

Overall performance has not improved as much (as a percentile) due to the fact that the majority of the time is still spend within plugins, mostly TypeScript.

Detailed list of implemented improvements:

The AST is re-created instead of switching the prototypes. This alone already provided about 15% improvement to tree-shaking despite the higher initialisation cost.
Node initialisation has been moved into the constructors. To perform custom initialisation, nodes can overwrite
- Node.initialise to perform extra initialisation and e.g. declare variables or
- Node.parseNode, which is responsible for adding the child nodes to this node
(Dynamic and static) imports and exports now register themselves on the module during initialisation which makes another recursion over the AST unnecessary
Many loops have been stream-lined and inlined. eachChild and someChild no longer exist but are done ad-hoc where needed which gives a noticeable speed boost.
Some method names have been improved. includeInBundle and includeVariable are now both include.
include only returns true if new variables have been added and not if just any node has been added. This usually makes the last tree-shaking pass unnecessary (which was unnecessary anyway).
When including declarations for a variable, all parents are marked as included right away. That way, we no longer need to check all children to see if a node has an included child.
If statement handling regarding hoisted variables in dead branches was greatly simplified
The Program node now works like any other node and gets an included flag if and only if at least one statement has been included. This is not actually used at the moment but could be used to prevent emitting empty files when preserving modules.

There is still much more I want to implement in the future to make things even faster but for now, I hope to get this into master soon first.

lukastaegert · 2018-04-11T06:22:18Z

Also to note: I have made sure that in order to re-use an existing rollup AST, all that should be necessary is to call initialise on all nodes. This could be used to make watch mode faster; this is not done yet, though, as I was becoming a little overwhelmed by the scope of the PR and re-using the rollup AST would probably need some adjustments to the overall API.

mourner · 2018-04-11T07:45:49Z

😍 This is really awesome work, @lukastaegert! Looking forward to this being merged!

kgrz

Probably these comments are already thought through, and won't amount to much.

kgrz · 2018-04-11T08:09:23Z

src/ast/nodes/Program.ts

+		this.included = true;
+		for (const node of this.body) {
+			if (node.shouldBeIncluded() && node.include()) {
+				anotherPassNeeded = true;


minor opt: does this require a break when it finds at least one true clause?

No, if we break once the first include states that we need another pass, this will destroy performance. include is meant to include everything we know we need based on the currently included variables. The return value is basically a flag indicating that new variables have been added and previous includes could possibly include more statements.

However if you break as soon as a variable is added, a new tree-shaking pass will start for each added variable, no matter how local it is. This will result in hundreds of tree-shaking passes instead of maybe a dozen. Even though those passes will on average be twice as fast (due to the early bail out), this will basically stall everything. The current goal is to instead include as much as possible as early as possible to reduce the number of passes. Also for many types of included statements, the side-effect detection can quickly skip over these statements.

However I agree that it is confusing that a method that has side-effects (i.e. including the children) also has a return value. I think I should rework this to instead let the nodes that add variables set a "dirty" flag on the module to indicate another pass is needed.

kgrz · 2018-04-11T08:11:47Z

src/ast/nodes/SequenceExpression.ts

 		this.included = true;
 		for (let i = 0; i < this.expressions.length - 1; i++) {
 			const node = this.expressions[i];
-			if (node.shouldBeIncluded() && node.includeInBundle()) addedNewNodes = true;
+			if (node.shouldBeIncluded() && node.include()) anotherPassNeeded = true;


Perhaps this loop also can break out early if one condition matches? Likely won't be an issue if this.expressions is a small array.

As stated above, this is a bad idea.

kgrz · 2018-04-11T08:24:19Z

src/ast/nodes/shared/Node.ts

+				(<GenericEsTreeNode>this)[key] = [];
+				for (const child of value) {
+					(<GenericEsTreeNode>this)[key].push(
+						child &&


Won't this mean that there need to be checks for boolean wherever this[key] is looped over? Instead, if the if (child) check is moved out of the push, won't the array structure remain consistent with the type of the items? Was it already considered?

This was considered. The situation is e.g. an array pattern with wholes, i.e.

const [,a,,b] = [1, 2, 3, 4]

Acorn will represent this as [null, Identifier, null, Identifier]. The check will make sure that if the child is null, then null will be pushed, otherwise a new Identifier instance will be pushed. null is in fact the only non-entity value I am aware of that can occur in Acorn ESTree node array fields.

Perhaps an explicit child !== null then?

lukastaegert · 2018-04-12T10:14:58Z

Changed the node inclusion logic to use a dirty flag directly on the graph instead of passing it up as a return value with each inclusion. Breaks encapsulation a little but makes the code much cleaner!

guybedford · 2018-04-12T11:44:46Z

src/ast/nodes/index.ts

 	UpdateExpression,
 	VariableDeclarator,
 	VariableDeclaration,
 	WhileStatement,
 	YieldExpression
 };

-export { NodeType } from './NodeType';
 export default nodes;


If you change this to export { nodes as default } that should avoid the circular dependency issue by ensuring a live binding.

I just tried it but it does not seem to work. As the circular dependency is still present (though with the live binding), the modules are executed in an order where the actual node definitions, e.g. ArrayExpression, are evaluated before the base class NodeBase is evaluated. As TypeScript then wants to synchronously access NodeBase.prototype, rollup crashes. But maybe there is another way without passing it explicitly as a parameter.

Which is the first access to fail?

It is the NodeBase.prototype access in ArrayExpression:)

In order to improve encapsulation, I currently investigate if instead of attaching the module to each node, I attach a module proxy object with all the interface that the nodes need. That way we could reduce the API surface of Module and Graph but I could also just attach the node constructors there to avoid the circularity.

It should be possible to rearrange the import statements to ensure circular execution happens in the desired order.

I just checked master and we currently have shared/Node.ts executing before ArrayExpression.ts, so perhaps somehow this post-order changed in this PR? If so, it should be possible to ensure shared/Node.ts comes first in the post-order to maintain the correct circular execution order.

It might be but I'm not a big fan. For once, I have to add an exception to ignore circular dependency warnings when building rollup (not a big issue but still). My bigger concern, however, is that new code can always break the execution order again. My new approach, which also improves encapsulation, is to create a special wrapper layer between the module and the AST/scopes/variables that also contains the node constructors. That way, ast nodes never directly access module or graph which means those can be easily refactored as long as the wrapper layer is refactored accordingly. I could even make a few more fields private on Module.

As it turns out, instead of making things slower this wrapper layer seems to make things slightly faster, probably because all wrappers have the same shape and thus property access can be monomorphic.

guybedford

Great work! This has been so needed, excellent to see things moving forward on JS engine optimization. This also inspires a bunch of stuff I want to try further (if time will ever allow!).

I left a comments out of general interest in the approach, nothing that absolutely needs changing though I don't think.

Edit: actually just the nodeConstructors live binding would be nice to see fixed.

guybedford · 2018-04-12T11:45:31Z

src/ast/nodes/shared/Node.ts

+	constructor(
+		esTreeNode: GenericEsTreeNode,
+		// we need to pass down the node constructors to avoid a circular dependency
+		nodeConstructors: { [p: string]: typeof NodeBase },


As mentioned in the previous comment, it should be possible to ensure this is a live binding by switching the default export form in nodes/index.ts.

guybedford · 2018-04-12T11:54:58Z

src/ast/nodes/BlockStatement.ts

-		this.scope = scope;
-		this.initialiseNode(scope);
-		this.initialiseChildren(scope);
+	createScope(parentScope: Scope, preventNewScope: boolean) {


Instead of passing preventNewScope all the way through here, could we rather reverse and simplify the bandwidth of information transfer and just have a check of if (isFunction(this.parent) || isCatch(this.parent) && this.parent.catchClause === this)?

I do not like when child nodes need to access their parent to determine the type. I feel this somehow separates logic specific to the parent node from the parent node which is easily overlooked when refactoring the parent unless you do a full text search.

In understand, though, that two usages is probably not enough to warrant its own constructor argument so I reused a pattern we established for the rendering logic of adding a flag preventChildBlockScope to functions and catch clauses that is checked by the block statement to determine if it should create a new scope.

guybedford · 2018-04-12T12:01:48Z

src/Graph.ts

@@ -62,6 +62,7 @@ export default class Graph {
 	scope: GlobalScope;
 	treeshakingOptions: TreeshakingOptions;
 	varOrConst: 'var' | 'const';
+	needsTreeshakingPass: boolean = false;


Out of interest, what was the reason for this change? I actually preferred the dirty state being passed through... I feel like we should try to reduce the surface area of Node assumptions on Module / Graph.

This was more or less inspired by @kgrz 's comments. It just makes the include functions so much cleaner as

dirty flags are only added in very few places now (basically identifiers and member expressions), but

all include functions have to handle the overhead of tracking return values

I totally agree with you that this is bad for the Graph API, though. That was originally the reason why I built it using return values.

One thing I could do is move it to the module. At the moment, there are already a few reasons for nodes to access the module, both to register imports/exports and to access certain flags. Will need to think about this a little.

guybedford · 2018-04-12T12:02:29Z

src/Module.ts

@@ -203,7 +214,7 @@ export default class Module {
 		code: string;
 		originalCode: string;
 		originalSourcemap: RawSourceMap;
-		ast: Program;
+		ast: any;


Surely this can remain Program?

If the problem is when casting back into this.esTreeAst, perhaps an <any> can be used when this.esTreeAst = <any>ast || ... is assigned.

Changed back to Program

guybedford · 2018-04-12T12:04:16Z

src/ast/nodes/ImportDeclaration.ts

+
+	initialise() {
+		this.included = false;
+		this.module.addImport(this);


Much nicer!

guybedford · 2018-04-12T12:08:29Z

src/ast/nodes/MemberExpression.ts


 	bind() {
-		this.isBound = true;
+		if (this.bound) return;


Nice! Although ideally wouldn't we ensure that bind() in general is visit-once? I'd be interested to know why we might duplicate call bind, what the logic reasons are behind that? (I know this has been here a while, just thinking about it now...)

The problem is that if another node calls reassignPath or forEachReturnExpressionAtPath, it is possible that those methods are called on this node before it has been bound which means if this represents a namespace, the variable will not yet be assigned. This is hard to control as both methods are forwarded through variable assignments.

Another reason I would leave this in place is that, if time permits, one of my next goals is to inline binding into the actual tree-shaking so that

unused nodes will not be bound, and

in case unused code mutates included variables, these mutations will no longer taint the tree-shaking result and, at least in some scenarios, more code can be removed.

My expectation is that this will provide another slight performance boost.

Btw Identifiers follow a similar logic.

Thanks, it helps a lot to get a better idea of how this logic works. I guess it could be avoided by a multi-pass process, but this does sound more optimized.

guybedford · 2018-04-12T12:10:55Z

src/ast/nodes/shared/Node.ts

+		this.parseNode(esTreeNode, nodeConstructors);
+		this.initialise();
+		this.module.magicString.addSourcemapLocation(this.start);
+		this.module.magicString.addSourcemapLocation(this.end);


Perhaps set this.included = false here rather than in each implementation.

At the moment, this would not make a difference. However I actively decided to reset included on each node individually so that if we want to reuse an existing AST, we just need to call initialise on each node to get it back to its state before tree-shaking.

If you feel this is a premature change as this logic is currently not used, I can of course move this here.

guybedford · 2018-04-12T12:15:36Z

src/ast/keys.ts

 	[name: string]: string[];
 } = {
 	Program: ['body'],
 	Literal: []
 };

-export default keys;
+export function getAndCreateKeys(esTreeNode: GenericEsTreeNode) {


Perhaps fully encapsulate the implementation here as getOrCreateKeys where this does the check of return keys[esTreeNode.type] || keys[esTreeNode.type] = ....

This would be more elegant but, as Node construction is one of the most performance critical tasks, I wanted to prevent the function call if possible. The checking logic is still only performed once, albeit in Node.parseNode.

guybedford · 2018-04-12T12:17:18Z

src/ast/nodes/shared/Node.ts

+				(<GenericEsTreeNode>this)[key] = [];
+				for (const child of value) {
+					(<GenericEsTreeNode>this)[key].push(
+						child &&


Perhaps an explicit child !== null then?

guybedford · 2018-04-12T12:20:00Z

src/ast/nodes/shared/Node.ts

+				}
+			} else if (value.hasEffects(options)) return true;
+		}
+		return false;


I'd be interested to know how much inlining these loops itself affected performance. Reason being it might be nice to have centrailzed visitor pattern at some point that is highly optimized, and this would effectively be moving away from that.

I know this is one of the ugliest parts of this refactoring as we basically have the same loop four times with only slight variations (five times if you count parseNode but this method also does a lot more). The performance gains, however, were pretty substantial. Of the 50% less tree-shaking time, at least 15-20% can be accounted to these loops.

Unless we change everything fundamentally (and I thought about this many times already; did not come up with a good approach yet which I would expect to provide an actual performance-improvement due to the zoo of different node effects—run, called, accessed, assigned + circularity prevention mechanisms for each—that need to be taken into account), I think this is close to the best we can do. Even a magically better optimized visitor capable of execting arbitrary methods on nodes would probably still be callback based and thus need two nested function calls to e.g. call bind on a child + the additional call to the visitor itself. I think this is as flat as it can get in the current pattern. Incidentally, call stacks are also more than halved with this approach which is not a paradigm shift but at least should not hurt.

As all loops of this kind are in Node, it should not be difficult to change them together if we find a better approach in the future.

Yes the stack savings alone seem worth inlining these loops I suppose, even if it means forgoing the visitor pattern.

Btw I did experiment with a flat for-based visitor pattern on a performance branch previously to avoid stack calls and there was an absolutely negligible performance change, so that might even void the argument that a unified visitor could be more optimized.

lukastaegert · 2018-04-15T14:24:18Z

Rebased to the new typings now and also finally updated the source-map dependency to its latest version. I hoped this would also provide a performance boost but it seems performance did not change very much with this update.

lukastaegert · 2018-04-16T06:20:38Z

Had to undo the source-map dependency update as the new version seems to contain a WebAssembly reference that is incompatible with older node versions.

mourner · 2018-04-16T08:31:31Z

Re source map, see also #2055. Perhaps it's indeed worth just removing the source-map dependency fully in favor of just 20 lines of code.

lukastaegert · 2018-04-16T09:19:56Z

Ah, thanks! Once I stumbled upon this it sounded somewhat familiar but I could not remember what the context was. Yes, we should definitely do that!

guybedford · 2018-04-16T09:18:30Z

src/ast/nodes/LogicalExpression.ts

-		return this.someRelevantBranch(node =>
-			node.someReturnExpressionWhenCalledAtPath(path, callOptions, predicateFunction, options)
-		);
+		return this.hasUnknownLeftValue


Ahh I see it is fully idempotent, not to worry then.

guybedford · 2018-04-16T09:26:52Z

src/ast/nodes/MemberExpression.ts


 	bind() {
-		this.isBound = true;
+		if (this.bound) return;


Thanks, it helps a lot to get a better idea of how this logic works. I guess it could be avoided by a multi-pass process, but this does sound more optimized.

guybedford · 2018-04-16T09:30:57Z

src/ast/nodes/shared/Node.ts

+				}
+			} else if (value.hasEffects(options)) return true;
+		}
+		return false;


Yes the stack savings alone seem worth inlining these loops I suppose, even if it means forgoing the visitor pattern.

Btw I did experiment with a flat for-based visitor pattern on a performance branch previously to avoid stack calls and there was an absolutely negligible performance change, so that might even void the argument that a unified visitor could be more optimized.

* Slightly more efficient algorithm to find included children

should be included but include missing parents when including a variable

* Do not ignore side-effects in if-statement condition

imports with their module

but let those nodes register themselves on the module

.included flag

saves one pass

the future, we can just call "initialise" on all nodes to reuse a module

values everywhere

a WebAssembly reference that is incompatible with node 6 and earlier

kgrz reviewed Apr 11, 2018

View reviewed changes

guybedford reviewed Apr 12, 2018

View reviewed changes

guybedford approved these changes Apr 12, 2018

View reviewed changes

lukastaegert mentioned this pull request Apr 12, 2018

experimentalPreserveModules leaves in unused import statements #2120

Closed

lukastaegert force-pushed the preserve-hidden-classes branch from daaa374 to 2127cc0 Compare April 15, 2018 13:38

lukastaegert force-pushed the preserve-hidden-classes branch 2 times, most recently from 0346f53 to be8b955 Compare April 16, 2018 07:41

guybedford approved these changes Apr 16, 2018

View reviewed changes

lukastaegert added this to the 0.58.0 milestone Apr 16, 2018

lukastaegert added 14 commits April 16, 2018 14:47

Recreate AST instead of switching prototypes

2a07f4f

Move dynamic import collection into initialisation

281421a

* Replace forEach with for loops

a92891f

* Slightly more efficient algorithm to find included children

Do not constantly search for included children when checking if a node

b9764f6

should be included but include missing parents when including a variable

* Simplify hoisted variable logic in if-statements

9039319

* Do not ignore side-effects in if-statement condition

Cache test values

5fdc2b0

Add missing brackets

0582e3a

Do not pass around dynamic import list but directly register dynamic

353655c

imports with their module

Inline initialisation into constructor and clean up nodes

9f477b0

Do not separately parse the top level nodes to find exports and imports

b7b0696

but let those nodes register themselves on the module

Simplify and speed up variable binding, inclusion and rendering

b6d7682

Initialise variables that are not initialised during construction

339f5ab

Clean up variables

4978d7c

Take the module from the parent to avoid a constructor argument

a903b28

lukastaegert added 9 commits April 16, 2018 14:47

Make Program a fully functional ast node that will even receive an

52fb2d7

.included flag

Only start another tree-shaking pass if a variable was included;

67e4080

saves one pass

Streamline some remaining forEach and some loops

93f75b6

Make sure Program nodes are really only included if they have an effect

3e1c6fc

Make sure initialise is all that is necessary to reset a node so that in

2f494c4

the future, we can just call "initialise" on all nodes to reuse a module

Use a "dirty" flag when including variables instead of handling return

b75ac43

values everywhere

Add proxy layer to prevent direct module or graph access from the ast.

abf8218

Update source-map dependency

30928f9

Revert most of the source-map update as there the new version contains

f942d6e

a WebAssembly reference that is incompatible with node 6 and earlier

lukastaegert force-pushed the preserve-hidden-classes branch from be8b955 to f942d6e Compare April 16, 2018 12:47

lukastaegert merged commit f942d6e into master Apr 16, 2018

lukastaegert added the fixed pending release label Apr 16, 2018

lukastaegert deleted the preserve-hidden-classes branch April 16, 2018 12:53

lukastaegert removed the fixed pending release label Apr 16, 2018

mourner mentioned this pull request Apr 16, 2018

Hidden classes and Rollup performance #2050

Closed

jfirebaugh mentioned this pull request Apr 30, 2018

Improve build time mapbox/mapbox-gl-js#6309

Closed

Profoundly improve tree-shaking performance #2119

Profoundly improve tree-shaking performance #2119

Conversation

lukastaegert commented Apr 11, 2018

lukastaegert commented Apr 11, 2018

mourner commented Apr 11, 2018

kgrz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukastaegert commented Apr 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guybedford left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukastaegert commented Apr 15, 2018

lukastaegert commented Apr 16, 2018

mourner commented Apr 16, 2018

lukastaegert commented Apr 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guybedford left a comment •

edited

Loading