Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tree shake logical expressions #2098

Merged
merged 6 commits into from
Apr 1, 2018
Merged

Conversation

lukastaegert
Copy link
Member

This was inspired by, supersedes and closes #2007 and therefore resolves #2004.
Also resolves #2091.

Originally I wanted to collect some ideas on how to improve #2007 to get it merged before I started implementing larger refactorings to our tree-shaking logic. As it turned out, the changes necessary were a little more than I anticipated so I instead created a generic solution which should now be able to handle all combinations of the now three expressions that support tree-shaking:

  • sequence expressions (a, b)
  • logical expressions (a || b, a && b)
  • conditional expressions (a ? b : c)

For now, tree-shaking will only occur for conditional and logical expressions if a is a literal or calculated from literals, not variables. This is usually the case when using something like https://github.com/rollup/rollup-plugin-replace to set compiler flags. However the new generic solution is designed to easily handle more generic situations as well provided the getValue functionality will be extended at some point (which I decided was out of scope for this PR). This PR is focused on functionality only and not speed, which will be the focus of an upcoming PR. Nevertheless, performance seems to be about on par with the previous version.

This enables tree-shaking things like

// input
function x() {
  console.log('unused');
}

if (false && x()) console.log('ok');

// output: nothing

Also, sequence expressions have been refactored to resolve some lingering bugs that no-one has stumbled upon probably because most people are sensible enough not to use a lot of sequence expressions. In case one of the three expression types is simplified and the result is the callee of a call expression and the context would be changed by the simplification, the result is wrapped in a (0, expression) babel style. This works even in nested situations:

// input
(true, true ? true && x.y : null)()

// output
(0, x.y)()

Also, handling of default exports that are rendered for side-effects only has been refined and synced with these changes.

Copy link
Contributor

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. The main thing to clarify is the indirect call handling in strict mode that may not be having an effect as far as I can tell.

Also I've been thinking about the tests and it's starting to seem really redundant to have all module format outputs for cases like this that aren't really testing module format output. .Perhaps worth thinking about a new test section that just outputs es modules that can shorten the diffs for unit testing these cases?

(0, wrapper.foo)();
(0, wrapper.foo)();
(0, wrapper.foo)();
(0, wrapper.foo)();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is true for strict mode actually, are you sure about this? In the following case:

(function () {
"use strict";
var a = { b () { console.log(this) } };
a.b();
var c = a.b;
c();
(0, c)();
})()

the output I get is a, undefined, undefined, not a, undefined, this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you expect. If c = a.b, then c() and (0, c)() will always have the same output. In non-strict mode it is the global object while in strict mode (which rollup always adds to the output) it is undefined. The important problem is rather the following (with a as defined in your example):

'use strict';
(0, a.b)(); // logs undefined
(true && a.b)(); // logs undefined
(true ? a.b : a.b)(); // logs undefined
(a.b)(); // logs a, so this is not equivalent to any of the above
a.b(); // logs a as well

This is one issue that is solved by this transformation. If you remove 'use strict', undefined is just replaced by global.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I added tests both to the form as well as the function section where the latter "prove" the effect of the transformation. I would suggest to play around with these tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Admittedly, I did not trust this myself)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah of course, thanks for explaining.

}
}

if (hasBecomeCallee && name === 'eval') {
Copy link
Contributor

@guybedford guybedford Mar 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory eval can be a variable name. It is probably worth adding an extra check here up the scope hierarchy to be sure it isn't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the check is isGlobalVariable just like in CallExpression bindNode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, doing the (0, ...) transformation does no harm to an identifier except that it is a little inefficient but you are right, we have the information available and should probably go all the way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, as we always have strict mode, this check is actually unnecessary, cf. here: https://developer.mozilla.org/de/docs/Web/JavaScript/Reference/Strict_mode (in "Making eval and arguments simpler")

First, the names eval and arguments can't be bound or assigned in language syntax

Just checked it myself: One cannot create a variable named eval. So any eval must be the global eval and no additional check is necessary. Will add a comment to explain this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes strict mode reserves eval, don't know how I forgot that. Hmm in theory we shouldn't need this check in CallExpression then either... I wonder if we could remove thiis.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

if (this.consequent.includeInBundle()) addedNewNodes = true;
if (this.alternate.includeInBundle()) addedNewNodes = true;
} else if (testValue ? this.consequent.includeInBundle() : this.alternate.includeInBundle())
addedNewNodes = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always worked with the code style that if the if has curlies, so should the else clause.

render(
code: MagicString,
options: RenderOptions,
{ hasBecomeCallee, hasBecomeStatement }: NodeRenderOptions = BLANK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering if we could simplify this interface and if we passed through the parent itself as a consistent third argument and then deduced these from that?

Eg a isStatementOfParent(this, parent) check, although perhaps that is unnecessary repeated work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess some of this could be handled via a parameter renderedParent. Might also come in handy once we start inlining expressions. Will experiment a little with this.

branchToRetain.render(code, options, {
hasBecomeStatement,
hasBecomeCallee:
hasBecomeCallee || (isCallExpression(this.parent) && this.parent.callee === this),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be acting on old parent information, which would then simplify to eg isCallExpression(curParent) && curParent.callee === this?

hasBecomeStatement,
hasBecomeCallee:
hasBecomeCallee || (isCallExpression(this.parent) && this.parent.callee === this),
hasDifferentParent: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly this would reduce to parent !== this.parent?


// Indirectly invoked eval is executed in the global scope
function testEval() {
console.log((0, eval)('this'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indirect eval still holds though of course.

@lukastaegert
Copy link
Member Author

Perhaps worth thinking about a new test section that just outputs es modules that can shorten the diffs for unit testing these cases?

Have been thinking about something like this myself for some time. Either a new section or we let the form test runner check if output.format is provided and if so, it should just create and check the corresponding output. Would also have the advantage of enabling tests that check problems only with specific outputs.

My favourite would be to not add a new section but

  • enable the format check as outlined
  • enable folders to group the samples to make it easier to find tests that handle certain topics

@guybedford
Copy link
Contributor

enable the format check as outlined
enable folders to group the samples to make it easier to find tests that handle certain topics

This sounds nice to me :) Folders could perhaps have an options.js default template that applies in that subfolder precedence.

@lukastaegert
Copy link
Member Author

This sounds nice to me :) Folders could perhaps have an options.js default template that applies in that subfolder precedence.

Very good idea! Thus, sub-folders can really become coherent test suites that cover certain situations with little boilerplate. Hope to find some time to look into this for the next PR (but probably not for this one just yet).

@lukastaegert lukastaegert force-pushed the tree-shake-logical-expressions branch from 3c8771a to 80cfff0 Compare March 30, 2018 21:44
@lukastaegert
Copy link
Member Author

Ok, this is now ready for another review. I actually took up your suggestion and made the logic now much more generic and future-proof (think: large-scale expression inlining):

  • There are now two NodeRenderOptions that control the process:
    • renderedParent?: Node: Only present if the rendered parent is different from the actual parent (to avoid overhead)
    • fieldOfRenderedParent?: string: Indicates on which field of the parent the child will be rendered (e.g. to detect if an expression has become a callee)

I also found and fixed another bug when e.g. an object expression (which always need round brackets when rendered as a statement) becomes a statement due to tree-shaking simplifications.

Copy link
Contributor

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice to see the simplifications, glad it could work out!

Only worry here is the repeated work in lookups for parent and field information if this approach does start to scale out more. Interested to hear your thoughts on this, but this is trickier to implement I know, although you did mention you're looking into performance next, so maybe can come as part of that.

import { ObjectPath } from '../values';

export function isCallExpression(node: Node | { type?: string }): node is CallExpression {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be defined but not actually used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is removed👍

@@ -1,14 +1,17 @@
import CallOptions from '../CallOptions';
import ExecutionPathOptions from '../ExecutionPathOptions';
import SpreadElement from './SpreadElement';
import { isGlobalVariable } from '../variables/GlobalVariable';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be one other use of this in TaggedTemplateExpression. If doing both, that actually removes all usages of this function entirely as well, although can be left to tree shaking too :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

name === 'eval' &&
renderedParent &&
renderedParent.type === 'CallExpression' &&
fieldOfRenderedParent === 'callee'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Slightly shorter names might be renderParent and renderParentField, but perhaps these are more descriptive anyway.

let firstStart = 0,
lastEnd,
includedNodes = 0;
for (const { node, start, end } of getCommaSeparatedNodesWithBoundaries(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I wrote getCommaSeparatedNodesWithBoundaries, I always hoped I could use it not only for variable declarations but also for sequence expressions. So should you decide to write your entire app using sequence expressions, you now get really nice comment handling 😉


export function childIsStatement(parent: { type?: string }) {
return (
parent.type === 'Program' || parent.type === 'ExpressionStatement' // e.g. default exports rendered for side-effects only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it these will expand over time?

An alternative here might be Program.prototype.isStatementContainer = true and ExpressionStatement.prototype.isStatementContainer = true where NodeBase.prototype.isStatementContainer = false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually not. In fact at the moment if an expression is simplified so that it becomes a statement, there will always be an expression statement left as the actual parent except for simplified default exports. Changed this now to get rid of this exception.

if (Array.isArray(value)) {
for (const nestedValue of value) {
if (nestedValue === child) return key;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does worry me a little - eg unnecessary lookup for long sequence expressions. This information is available to the render call itself, so we could just enforce it being passed for all render calls getting us back to a monomorphic render function, but I know that is a lot of work and may not even be itself desirable for performance (although maybe it would...).

If I remember correctly there are only two node types with more than one array field - template elements and tagged template elements, so that knowledge could permit an optimization here, but perhaps worth seeing if it becomes a bottleneck. I just prefer to avoid unnecessary loops that is all, and the keys loop is already unnecessary in theory due to this being parent contextual information the render function could have passed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would running Rollup on uglify code that is rewritten to use long sequence expressions hit this loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, expression tree-shaking is powerful but very rare, and the field will only be determined if the child is actually simplified. I do not like adding this information for every render simply because it is totally unnecessary 98% of the time. The only information currently used is the fact if it is a callee so I simplified it accordingly and removed the loop. Should we require more information at some point, we can certainly revisit that.

@@ -4,7 +4,8 @@ import * as rollup from 'rollup';
import batchWarnings from './batchWarnings';
import relativeId from '../../../src/utils/relativeId';
import { handleError, stderr } from '../logging';
import { InputOptions, OutputChunk } from '../../../src/rollup/index';
import { InputOptions } from '../../../src/rollup/index';
import { Bundle } from '../../../src/rollup';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rollup/index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the /index is unnecessary due to some TypeScript weirdness (which mirrors a node weirdness) 😉

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it seems removing the /index does not really play nice without our setup so I'll leave it.

@lukastaegert
Copy link
Member Author

Posted another version to check out @guybedford.

Copy link
Contributor

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, does seem better to use more specific render options. Can always reconsider more general ones in future too.

@lukastaegert lukastaegert force-pushed the tree-shake-logical-expressions branch from 2ffea8f to b047228 Compare April 1, 2018 16:13
@lukastaegert lukastaegert added this to the 0.58.0 milestone Apr 1, 2018
@lukastaegert lukastaegert merged commit b047228 into master Apr 1, 2018
@lukastaegert lukastaegert deleted the tree-shake-logical-expressions branch April 1, 2018 16:21
@guybedford
Copy link
Contributor

I've just hit the (0, eval) bug in the latest SystemJS release with the Rollup build... great timing on this one :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants