Refactor AST and variable handling to use interfaces instead of types #1840

lukastaegert · 2018-01-05T15:20:15Z

This should not change any functionality but it fundamentally refactors and cleans up the way nodes and variables are handled. At the core of the refactoring wherever possible, we no longer describe nodes and variables via their types but rather via the minimal interfaces they expose. This is following established best practices, namely the Dependency inversion principle and the Open/closed principle and also has some immediate benefits (see below).

The following interfaces are available:

Entity: Common base type for nodes, variables and virtual entities. Methods:
- toString
WritableEntity (extends Entity): An entity that Expressions can be assigned to. These are most importantly Patterns but also all kinds of Expressions. Additional methods:
- hasEffectsWhenAssignedAtPath
- reassignPath
ExpressionEntity (extends WritableEntity): An entity that Expressions can be assigned to and that itself can be assigned to other WritableEntities. Additional methods:
- forEachReturnExpressionWhenCalledAtPath
- getValue
- hasEffectsWhenAccessedAtPath
- hasEffectsWhenCalledAtPath
- someReturnExpressionWhenCalledAtPath
Node (extends ExpressionEntity): Actual AST nodes that can be rendered. Exposes lifecycle methods as well as hasEffects and render

All variables now implement the ExpressionEntity interface which means that they now also have a getValue method. For now these methods return UNKNOWN_VALUE but in the future we can use these to implement really powerful partial evaluation features that can statically evaluate expressions involving (non-reassigned) variables.

This refactoring also means that in many places, we no longer have to check if we are dealing with a node, a variable, or e.g. an UNKNOWN_EXPRESSION. Which not only feels very nice but also opens the doors for implementing lots of new "virtual expressions" as a kind of internal type system. As a first example, the VirtualObjectExpression used for function prototype objects has been replaced by a static UNKNOWN_OBJECT_EXPRESSION which makes thinks a lot cleaner. I am currently actively working on extending this to cover e.g. string and array prototype methods.

As AST nodes often combine the Node interface with other interfaces, ExpressionNode is available as a shorthand. The previous Node base class has been replaced by NodeBase. Though possible, this should no longer be used as a type!

Other changes in this PR:

Use type guards wherever possible to make the code cleaner. These are usually placed next to the types they guard 😉 (e.g. isIdentifier next to Identifier). Note that type guards can still be a code smell, i.e. they can make things possible that might be better implemented via a common interface.
Inline illegal import reassignment checks into reassignPath methods
Rework and clean up the namespace resolution in MemberExpression

guybedford

Looks great and much neater. Always nice to see negative diffs too :)

Mostly nitpicking from me here... just the one interface / class naming suggestion was the main thing that tripped me up a bit, otherwise seems very sensible.

I just noticed the tests measure their time in seconds for a basic eye to perf... wondering if we should make a habit of checking the diffs on this for PRs. I assume with these changes though there is completely negligible performance difference.

On a somewhat unrelated tangent, one thing I would mention though is that the codebase seems to be riding off into a functional / class / interface style. My one word of warning on this would be that as a compiler, performance is of the utmost concern and is more important than beautiful code patterns. Loops and function calls with flat stacks will always be way faster then classes and deep stack-based algorithms, and we shouldn't let the code style make applying loops and plain functions feel "wrong", which it is currently on the verge of doing.

guybedford · 2018-01-06T18:48:24Z

src/ast/ExecutionPathOptions.ts

 export default class ExecutionPathOptions {
-	_optionValues: Map<string, Node | Variable>;
+	_optionValues: Map<string, any>;


Can we not do better than any here?

guybedford · 2018-01-06T18:49:02Z

src/ast/ExecutionPathOptions.ts


-	/**
-	 * @returns {ExecutionPathOptions}
-	 */


Is this an intentional move to undocument these sorts of classes?

(I did find these quite useful with the TS suggestions, although haven't got into habit of doing it myself)

guybedford · 2018-01-06T18:51:02Z

src/ast/enhance.ts

@@ -31,18 +31,21 @@ export default function enhance (ast: any, module: Module, comments: Comment[])
 	}
 }

-function enhanceNode (raw: Node | Node[], parent: Node | Module, module: Module, code: MagicString) {
+function isArrayOfNodes (raw: Node | Node[]): raw is Node[] {
+	return 'length' in raw;


Any reason for this over raw.length !== undefined?

in was used by the original code but I guess there is no harm in changing this.

Ok, in definitely works better with TypeScript so I'll leave this.

guybedford · 2018-01-06T18:51:47Z

src/ast/enhance.ts

-	if ('length' in raw) {
-		for (let i = 0; i < (<Node[]>raw).length; i += 1) {
-			enhanceNode((<Node[]>raw)[i], parent, module, code);
+	if (isArrayOfNodes(raw)) {


Definitely a step in the right direction, although I'd still prefer a visitor that can work this out from Node context over these detections.

I agree though I guess that would require a larger refactoring beyond the scope of this PR.

guybedford · 2018-01-06T18:54:08Z

src/ast/nodes/shared/Expression.ts

+
+export interface ExpressionNode extends Expression, Node {}
+
+export class GenericExpressionNode extends GenericNode implements Expression {


Any reason not to just call this ExpressionNode?

Once confusion I find with the naming here is it is difficult to tell what is an interface and what is an actual class.

It could be useful to make this clearer in the name, say ExpressionNode and ExpressionNodeInterface or something like that over GenericExpressionNode and ExpressionNode.... still trying to fully get the constructs though, but that's the first impression.

Or how about NodeBase and ExpressionBase instead of GenericNode and GenericExpressionNode... just so it's clear these are actual real base classes and not interfaces?

guybedford · 2018-01-06T19:15:47Z

src/ast/variables/LocalVariable.ts

@@ -18,7 +15,7 @@ export default class LocalVariable extends Variable {

 	constructor (
 		name: string, declarator: Identifier | ExportDefaultDeclaration | null,
-		init: Expression | Declaration | UnknownAssignment
+		init: Expression


Are you sure this will always be an expression? I may be wrong, but I seem to remember testing this before and seeing declarations come up.

Well, the declarations in question all satisfy the Expression interface now (and Declaration no longer is a concept). This will probably be clearer once I employ your interface naming suggestions.

guybedford · 2018-01-06T19:16:42Z

src/ast/variables/Variable.ts


-export default class Variable {
+export default class Variable implements Expression {


How about ExpressionEntity as the base shared between Variable and Node?

Then we've have ExpressionEntity, ExpressionNode and ExpressionBase?

I like that. So that's what it's gonna be!

guybedford · 2018-01-06T19:17:26Z

src/ast/variables/VariableReassignmentTracker.ts


 export const UNKNOWN_KEY: UnknownKey = { type: 'UNKNOWN_KEY' };

-type PathCallback = (path: ObjectPath, expression: Expression | Declaration | UnknownAssignment) => void;
-type PathPredicate = (path: ObjectPath, expression: Expression | Declaration | UnknownAssignment) => boolean;
+type PathCallback = (path: ObjectPath, expression: Expression) => void;


You sure about declaration again here?

guybedford · 2018-01-06T19:18:40Z

src/ast/nodes/MemberExpression.ts

+	const nextPathKey = memberExpression.propertyKey;
+	const object = memberExpression.object;
+	if (isUnknownKey(nextPathKey)) return null;
+	if (isIdentifier(object)) return [{key: object.name, pos: object.start}, {key: nextPathKey, pos: memberExpression.property.start}];


guybedford · 2018-01-06T19:19:07Z

src/ast/nodes/MemberExpression.ts

+	if (isIdentifier(object)) return [{key: object.name, pos: object.start}, {key: nextPathKey, pos: memberExpression.property.start}];
+	if (isMemberExpression(object)) {
+		const parentPath = getPathIfNotComputed(object);
+		return parentPath && [...parentPath, {key: nextPathKey, pos: memberExpression.property.start}];


Are we doing { x: 'asdf' } or {x:'asdf'} style for objects?

alan-agius4 · 2018-01-07T20:37:27Z

src/Module.ts

-interface ExportDescription { localName: string, identifier?: string }
-interface ReexportDescription { localName: string, start: number, source: string, module: Module }
+interface CommentDescription {
+	block: boolean,


I believe this should be a semi colon instead of a comma.

Not sure if that causes a build error or not. But at least that's the norm.

alan-agius4 · 2018-01-07T20:48:58Z

src/ast/nodes/Identifier.ts


-export default class Identifier extends Node {
+export function isIdentifier (node: Node): node is Identifier {
+	return node.type === 'Identifier';


I am seeing a lot of these string running around.

Wouldn't it make sense to have a string enum instead, if you really don't want to have numeric representation.

Example;

export enum NodeType { Identifier = "Identifier", BlockElement = "BlockElement" ... }

And than from here for instance you'd be able to use

return node.type === NodeType.Identifier;

The nice thing about this, if that refactors would be easier and even finding symbol references etc...

That makes sense, I'll add this.

If wanting to switch to an enum here, perhaps consider going for the numeric form? For example add to enhance:

node.nodeType = NodeType[node.type];

Then the is check would be the same as the above.

Actually even better than the above would be to add the type to the prototype of the Node type itself:

class Identifier { } Identifier.prototype.nodeType = NodeType.Identifier;

That would avoid adding to the memory footprint of Nodes or having anything in enhance, while getting the benefit of a numeric enum.

I am not a seasoned TypeScript expert but to me it looks like there are no advantages of using a numeric enum over a const string enum which has no overhead except that the code base might be just a few bytes smaller. Acorn is already adding the right string constants to all nodes anyway. I also could not find any information about performance benefits of one over the other. Unless there is some hard argument for using numeric enums I would go for a const enum with strings (which is basically the original code)

nodes and they can easily be referenced via Entity or WritableEntity

individual nodes

lukastaegert · 2018-01-08T08:00:34Z

I've pushed most of the refactorings so I guess it is safe to rebase your other PRs on this now, @guybedford. Will take a look at the enum this evening then I guess it is good to be reviewed again.

one thing I would mention though is that the codebase seems to be riding off into a functional / class / interface style. My one word of warning on this would be that as a compiler, performance is of the utmost concern and is more important than beautiful code patterns. Loops and function calls with flat stacks will always be way faster then classes and deep stack-based algorithms

I totally agree that especially functional patterns always carry a performance risk. Interfaces and classes not so much as long as you do not create a lot of instances. At least my own benchmarks I did some time ago showed that on current engines, there is hardly any performance penalty to calling class members instead of directly calling functions. And interfaces are mainly a symbolic means of encapsulation for the programmer.

Rollup has greatly benefitted from these changes as a lot of difficult bugs could be resolved and the implementation of new features has become much simpler. And another very important goal is to also make it easier for strangers to make changes to the code as they are only required to understand small portions of the algorithm to be able to make changes in contrast to large loops that need to be read as a whole and easily look like Spaghetti code (even to oneself once enough time passed). This is thus aiming at long-term maintainability.

Nevertheless I would very much welcome a move to e.g. loop based visitor patterns if they

provide a measurable performance benefit
maintain the code separate, e.g. node type specific code should reside with the specific node

guybedford · 2018-01-08T11:12:21Z

src/ast/nodes/ArrayPattern.ts


-export default class ArrayPattern extends Node {
+export default class ArrayPattern extends GenericPatternNode {


GenericPatternNode -> PatternBase?

guybedford · 2018-01-08T11:12:41Z

src/ast/nodes/BreakStatement.ts


-export default class BreakStatement extends Statement {
+export default class BreakStatement extends GenericStatementNode {


StatementBase?

guybedford · 2018-01-08T11:22:02Z

@lukastaegert definitely I'm not arguing against the existing patterns at all, just cautioning that they shouldn't be to the exclusion of loop patterns when loop patters will be faster. In many places I found myself using forEach in the PR to match code style, even though a simple for loop would have been much quicker. I can also guarantee that loop-based visitor algorithms would provide a measurable performance benefit for any code that is dominated by visit iteration time. The enhance function is probably the most ripe for this sort of performance improvement (and I hope to find the time at some point to contribute to this). If you ever want to explore this area further would be good to discuss.

In terms of code separation, my other concern is that Nodes may have some overloaded meaning between render and analysis concerns. I'm wondering here if there might be a pattern where the methods called exist as more virtual layers as opposed to being direct class methods on the nodes themselves... imagining separate folders for ast, analysis and render layers on Nodes. Although I'm not sure what best pattern that might be, or what the performance implications would be.

lukastaegert · 2018-01-08T12:34:17Z

Oh, please do use for loops instead of forEach if they provide a performance benefit! I am really not attached to these micro patterns and a for loop is probably just as readable as a forEach.

Otherwise I guess long prototype chains are probably a bad idea performance-wise. Maybe we should go back to having each node extending ExpressionNodeBase regardless if it actually is an expression. Or rather putting everything back into Node. We could still keep the various interfaces Entity, WritableEntity, ExpressionEntity, Node etc. when referring to them to be able to more easily refactor things. What do you think?

I actually like your suggestion of separating rendering and analysis. We would just need to figure out how to pass data around. If we could find a way to separate data objects and code here, that would be awesome.

guybedford · 2018-01-08T12:59:50Z

@lukastaegert sure sounds good. I'm not too worried about the prototype chains actually... mainly just the use of deep stacks and stack-based visitors.

* Improve some names

* Take care all nodes implement the ExpressionEntity interface to simplify things

lukastaegert · 2018-01-09T07:51:37Z

Ok, simplified the Node interface to contain the ExpressionEntity interface so that all nodes look similar. Should make extensions a little easier.

Also changed all node types to be represented by an enum. As reasoned above, I used a const enum with strings which is 1-to-1 equivalent to the previous situation and will just help to prevent typos.

@guybedford If there are no objections, we can merge this today and I can finally move on to review your stuff which I am really looking forward to :)

guybedford · 2018-01-09T09:07:38Z

src/ast/nodes/Property.ts

+							 innerOptions.addAccessedReturnExpressionAtPath(path, this)
+						 ),
+					 options
+				 ))


Were these tab -> space conversions intended?

I guess not :) I think I need to do the "prettier" PR soon.

guybedford · 2018-01-09T09:09:48Z

Looks great! @lukastaegert the only difference between strings and enums would be the difference between rows 3 and 4 here - https://jsperf.com/string-number-comparison. Which is negligible in that test anyway.

Certainly go ahead with the merge.

guybedford · 2018-01-09T09:22:14Z

Minor nit - if all nodes implement ExpressionEntity then perhaps ExecutableEntity would be a better name?

lukastaegert · 2018-01-09T09:56:32Z

Ah, that benchmark is nice to know if we need to do higher performance comparisons. As for the name: I would still stick with ExpressionEntity as the interface is meant to say "you can do things with this that you can do with an expression" even though for non-expressions, only the conservative default handlers would kick in.

lukastaegert requested a review from guybedford January 5, 2018 15:20

guybedford approved these changes Jan 6, 2018

View reviewed changes

alan-agius4 reviewed Jan 7, 2018

View reviewed changes

lukastaegert added 9 commits January 8, 2018 07:38

Switch to using interfaces instead of concrete types for nodes

c4a5bbe

Use type guards for variable types

3657766

Replace VirtualObjectExpression by a static Expression

43abf4a

Remove Statement and Pattern interfaces as they are only ever used with

42914b8

nodes and they can easily be referenced via Entity or WritableEntity

Inline disallowIllegalReassignment into the reassignPath methods of the

37b1c4a

individual nodes

Clean up namespace and computed key resolution in member expressions

510f9dc

Basic -> Generic and move some things around

aee201a

Forgot to move assignment check out of Node

d7213d0

Export missing interface

66b842b

lukastaegert force-pushed the refactor-ast-handling branch from d6f060f to 66b842b Compare January 8, 2018 06:41

Refactor interface names + minor adjustments

a27884d

guybedford reviewed Jan 8, 2018

View reviewed changes

lukastaegert added 2 commits January 9, 2018 06:55

* Refactor ExecutionPathOptions types

181b016

* Improve some names

* Use enum for node types

0f98a30

* Take care all nodes implement the ExpressionEntity interface to simplify things

guybedford reviewed Jan 9, 2018

View reviewed changes

Rich-Harris mentioned this pull request Jan 10, 2018

Code Splitting #1841

Merged

lukastaegert added this to the 0.53.4 milestone Jan 10, 2018

lukastaegert merged commit 0f98a30 into master Jan 10, 2018

dependencies bot mentioned this pull request Jan 10, 2018

Update rollup in /bundle from 0.52.1 to 0.53.4 src-zone/material#217

Closed

lukastaegert deleted the refactor-ast-handling branch January 19, 2018 14:31


		export interface ExpressionNode extends Expression, Node {}

		export class GenericExpressionNode extends GenericNode implements Expression {


		export default class Variable {
		export default class Variable implements Expression {


		export default class ArrayPattern extends Node {
		export default class ArrayPattern extends GenericPatternNode {


		export default class BreakStatement extends Statement {
		export default class BreakStatement extends GenericStatementNode {

Refactor AST and variable handling to use interfaces instead of types #1840

Refactor AST and variable handling to use interfaces instead of types #1840

Conversation

lukastaegert commented Jan 5, 2018 • edited Loading

guybedford left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukastaegert commented Jan 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guybedford commented Jan 8, 2018

lukastaegert commented Jan 8, 2018

guybedford commented Jan 8, 2018

lukastaegert commented Jan 9, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guybedford commented Jan 9, 2018

guybedford commented Jan 9, 2018

lukastaegert commented Jan 9, 2018

lukastaegert commented Jan 5, 2018 •

edited

Loading