Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor tree-shaking rendering #1949

Merged
merged 27 commits into from
Feb 15, 2018
Merged

Conversation

lukastaegert
Copy link
Member

@lukastaegert lukastaegert commented Feb 7, 2018

This PR is now ready to be reviewed.

To try it out, install rollup via

npm install rollup/rollup#refactor-treeshake-rendering

Issues resolved by this PR:

This will refactor core parts of the rendering logic especially with regard to tree-shaking and comment-handling. Previously, tree-shaking was managed by the rendering logic of the Statement base node, i.e. nodes that were not included would remove themselves from the tree. This often lead to unwanted white-space and sup-optimal results with regard to comments.

Instead, the parent node is now responsible for not rendering its children based on their .included flag.

To that end, modules, block scopes and switch cases now use a new optimized statement border detection algorithm. For block statements, it will e.g. look like this:

// input

{ // retained comment
  // removed leading comment
  removedStatement; // removed trailing comment
  // retained leading comment
  removedStatement; // retained trailing comment  
} // retained comment

// output

{ // retained comment
  // retained leading comment
  removedStatement; // retained trailing comment  
} // retained comment

i.e.

  • if there are no statements on the first line of the block statement, all comments here are retained with the block statement, otherwise they are retained/removed together with the first statement
  • if there are no statements on the same line and after another statement, all comments here are retained with the leading statement, otherwise they are retained/removed together with the next statement
  • all other comments not on the same line as a statement are retained/removed together with the next statement (this is important for e.g. annotations)
  • line-breaks inside comments are ignored

For modules this means that any comment starting on the first line of the module will be retained as a header comment while all subsequent comments belong to the first statement.

Also, the logic for variable declarations was thoroughly rewritten to provide, in my opinion, the best possible output in most situations, e.g.

// input

var /* removed leading comment */ removed1 = 1, // removed trailing comment
  // retained leading comment
  retained = 2, // retained trailing comment
  // removed leading comment
  removed1 = 3;

console.log(retained);

// output

var // retained leading comment
  retained = 2; // retained trailing comment

console.log(retained);

This also affects the way named export declarations are rendered.

The only thing I am not 100% happy about is the complexity of the variable declaration rendering function. Any suggestions for e.g. extracting functionality here are very welcome.

@guybedford
Copy link
Contributor

Great work! Happy to review when ready.

@lukastaegert lukastaegert changed the title [WIP] Refactor tree-shaking rendering Refactor tree-shaking rendering Feb 11, 2018
@lukastaegert
Copy link
Member Author

From my side this is ready for review! I hope I did the system bindings correctly as I rewrote this logic simply by looking at the previous output 😳

@guybedford
Copy link
Contributor

guybedford commented Feb 11, 2018

Sure, will get to it.

Just testing #1937, if I alter the example to use destructuring I'm getting the statement removed incorrectly:

const FONT_HINGE_SIZE = 0.07;
const BACK_HINGE_SIZE = 0.16;

function PointerGauge () {
	const { backHinge, frontHinge } = { backHinge: BACK_HINGE_SIZE, frontHinge: FONT_HINGE_SIZE };

	return [
		//backHinge,
		fontHinge
	]
}

export default PointerGauge;

strips the whole destructuring statement above.

@guybedford
Copy link
Contributor

Ahh, no that was just spelling - seems to be working fine.

Copy link
Contributor

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I really like the direction this rendering logic is going.

In a test on a large codebase I'm getting about a 5 - 10% slowdown, do you think we could try and focus on some performance aspects here for landing?

I've added some suggestions to reduce allocations, but will have to see if that does much.

I have tested on Windows here and all is fine, but even so it may be worth also checking that all of the line break logic correctly handles \r characters too, as it seems quite \n specific currently.

type: NodeType.DoWhileStatement;
body: StatementNode;
body: Node;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure we need to get rid of the concept of a StatementNode here? Quite nice to match the spec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It serves no purpose but I can put in StatementNode as an alias for Node so that it is there if we need it in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh originally I had the compound ESTree types as representing a union of their members so there was some typing benefit here on the generic interfaces. But I remember now you refactored the Node methods to no longer rely on these union properties, which is cleaner anyway. But perhaps nice to at least keep some level of the semantic if it turns out there are places where this typing information might be useful to reconsider in future.

const declarationEnd = findFirstOccurrenceOutsideComment(code, declarationKeyword) + declarationKeyword.length;
code = code.slice(declarationEnd);
code = code.slice(0, findFirstOccurrenceOutsideComment(code, '{'));
const generatorStarPos = findFirstOccurrenceOutsideComment(code, '*');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor performance suggestion here, although its probably trivial, but instead of creating a new string, the findFirstOccurrenceOutsideComment method could take an offset argument like the standard string indexOf function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my shame I must admit I was not actually aware of the second parameter of String.indexOf 😳Yes, of course it makes sense to use it here and in other places. I see how I can rewrite this to reduce the number of slices.

let commentStart, commentLength, lineBreakPos;
while (true) {
commentStart = code.indexOf('/');
lineBreakPos = (~commentStart ? code.slice(0, commentStart) : code).indexOf(searchString);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly these slice calls could be refactored to use an index into the indexOf method.

code = code.slice(declarationEnd);
code = code.slice(0, findFirstOccurrenceOutsideComment(code, '{'));
const generatorStarPos = findFirstOccurrenceOutsideComment(code, '*');
if (~generatorStarPos) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I can handle this convention anymore :P

When using binary operators like this in JavaScript, it means the 64-bit JavaScript float is converted into a 32-bit twos-complement integer with rounding, then the not operation is applied.

In addition these can fail for large numbers when the 32 bit limit gets reached, which can still be in the safe operating range of the integer values of a 64 bit float (although the ranges are well out of what Rollup will ever see).

generatorStarPos === 0 is explicit and a single comparison operation in the final assembly over all the above multi-step complexity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit I do not really like this either but running all different versions of checking that indexOf (or something similar) returned a result through jsben.ch on Chrome, the tilde operator was always fastest

Ok, running with a larger sample set and a slightly different test, the tilde operator actually seems to be on par with !== -1 so I'll use that.

let commentStart, commentLength, lineBreakPos;
while (true) {
commentStart = code.indexOf('/');
lineBreakPos = (~commentStart ? code.slice(0, commentStart) : code).indexOf(searchString);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really lineBreakPos or rather searchPos?

this.declaration.render(code, options);
render (code: MagicString, options: RenderOptions, { start, end }: NodeRenderOptions = {}) {
if (this.declaration === null) {
code.remove(start || this.start, end || this.end);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will start not be defined here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -63,9 +59,7 @@ export default class BlockStatement extends StatementBase {

render (code: MagicString, options: RenderOptions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should make start and end mandatory arguments for all render calls, defaulting to start = this.start?

render (code: MagicString, _options: RenderOptions) {
code.remove(this.leadingCommentStart || this.start, this.next || this.end);
render (code: MagicString, _options: RenderOptions, { start, end }: NodeRenderOptions = {}) {
code.remove(start || this.start, end || this.end);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having start and end as arguments with defaults would avoid these scenarios too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As defaults work with object patterns as well, this is certainly an improvement

this.end - (code.original[this.end - 1] === ';' ? 1 : 0)
);
let renderedContentEnd;
if (/\n\s*$/.test(code.slice(this.start, separatedNodes[0].start))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\r?

}
leadingString = '';
nextSeparatorString = '';
if (isIdentifier(node.id) && isReassignedPartOfExportsObject(node.id.variable)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps isReassignedAsExportsMember? Object as a term isn't descriptive enough for me to immediately think of a member expression form without further prompting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@lukastaegert
Copy link
Member Author

I am pretty sure the reason for the worse performance is that now each statement in every module and block statement has its boundaries calculated though only very few really need this. This was my naive implementation but seeing this, I will try to change the algorithm to only calculate boundaries when they are actually needed (since not running code is always the best performance improvement...).

* Get rid of unnecessary defaults
* Replace tilde operator
@lukastaegert
Copy link
Member Author

lukastaegert commented Feb 13, 2018

Ok, I did not do everything yet (especially line-breaks still assume "\n") but

  • the render helpers now all use start indices
  • boundary detection is skipped if a boundary is not needed

I expect especially the latter to provide a very noticeable performance boost.
@guybedford Could you check your codebase again and tell me if there still is a performance degradation?

@guybedford
Copy link
Contributor

With the more accurate timing I'm only seeing a slowdown now from around 260ms for render to 340ms in a large tree.

The string search times seems to make sense in that well-written string search algorithms offer better than O(n) complexity, while taking an initial hit in additional mechanics setup.

I've created a PR to get the performance timings working again. Perhaps these can be watched closely as any further changes are made... so hard to speculate here.

@lukastaegert
Copy link
Member Author

Nice to hear that render is still rather fast. I have got some ideas on how to skip boundary detection for variable declarations when not necessary that should further noticeably speed up rendering. Let's see what we get to in the end.

@guybedford
Copy link
Contributor

The last commit seems to have added a good 20ms or so speedup. Funny how it doesn't look like it does much.

The performance costs seem pretty negligible at this point so count me approved... just worth making a habit of checking it for all compiler work I think.

@@ -22,4 +24,16 @@ export default class SwitchCase extends NodeBase {
});
return addedNewNodes;
}

render (code: MagicString, options: RenderOptions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this render function not have a third argument? Just wondering how much the polymorphism might bite us in performance as predictable singular argument signatures are important to optimizations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for each SwitchCase, the signature is the same. As far as I know, call signatures are more important. Might be reasonable to consider adding "undefined" as a third parameter when calling render instead. But maybe we wait until we got rollup fit for the V8 web tooling benchmark so that we can talk to the actual experts here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since a switch case is a statement I was thinking of a call site like the render call from renderStatementList, although I see this is distinguished by needsBoundaries now.

@lukastaegert
Copy link
Member Author

Ok, this is again ready for review. I added a few smaller improvements i.e. charCodeAt instead of direct character access and just counting up an index instead of searching in some situations. Most importantly, the expensive variable declaration rendering algorithm will be skipped if none of its transformations are necessary (which is probably the reason for the 20ms speedup).

As for line-breaks, I decided to ignore the uncommon "\r"-only line-breaks (the result will still work, it will just not be as pretty). As for "\r\n" line-breaks, those should have already worked in most situations out of the box except in situations where we remove a line-break. This is now accounted for.

@guybedford
Copy link
Contributor

One super minor nit in testing:

// comment
export var p = 5; /*asdf1*/ switch (test) {
		// comment
	default:
		'asdf'; //asdf
}
/* asdf2 */

Returns export var p = 5;/*asdf*/ losing the spacing before the first block comment.

// discover this module's imports and exports
let lastNode: Node;

enhance(this.ast, this, this.dynamicImports);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're no longer using this.comments does that mean this property can be removed entirely, including the use of the onComment hook?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested this and the answer is yes :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not as it is still required to remove source map comments. This, however, is the only reason why we keep this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh. Can you point me to the code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe at some point we should implement our own algorithm to scan for and remove those comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. A sourceMappingURL comment can only be a block comment so it's fine to do something custom. Also it can be based on lastIndexOf as a backwards search through the string would be quicker.

Here's the code I have used for this previously - https://github.com/systemjs/builder/blob/master/lib/sourcemaps.js#L12.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this example, maybe the issue is not that trivial and we should probably rather trust acorns parsing here and leave it as it is:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise we might break the sourceMap handling of other libraries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's v8's method to do this - https://github.com/v8/v8/blob/8d38c15e04598f9e48e6230327da0e41a2445957/src/inspector/search-util.cc#L16

Note that it doesn't use a lexer at all.

I think it would be much simpler to make this code its own method and deprecate this.comments entirely. But happy to make this a separate PR too if you prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this further detection is a less stringent problem than removal - and that’s what I got confused here. It makes sense that Rollup has to hold to a higher standard for removal not to break code.

Perhaps we can inline the removal into the parsing code without separately gathering the comments at least.

@guybedford
Copy link
Contributor

Including the removal of the this.comments tracking on module, the performance seems just about completely equal now to me.

Good to merge with those changes.

@guybedford
Copy link
Contributor

Just to be clear, this seems good to land anytime to me, given that the points mentioned are no longer relevant.

@lukastaegert
Copy link
Member Author

Nice! Depending on what else is lying around here, I guess I'll try to assemble a release tomorrow!

@lukastaegert lukastaegert added this to the 0.56.0 milestone Feb 14, 2018
@lukastaegert lukastaegert merged commit bb8b81d into master Feb 15, 2018
@lukastaegert lukastaegert deleted the refactor-treeshake-rendering branch February 15, 2018 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment