Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor AST and variable handling to use interfaces instead of types #1840

Merged
merged 12 commits into from
Jan 10, 2018

Conversation

lukastaegert
Copy link
Member

@lukastaegert lukastaegert commented Jan 5, 2018

This should not change any functionality but it fundamentally refactors and cleans up the way nodes and variables are handled. At the core of the refactoring wherever possible, we no longer describe nodes and variables via their types but rather via the minimal interfaces they expose. This is following established best practices, namely the Dependency inversion principle and the Open/closed principle and also has some immediate benefits (see below).

The following interfaces are available:

  • Entity: Common base type for nodes, variables and virtual entities. Methods:
    • toString
  • WritableEntity (extends Entity): An entity that Expressions can be assigned to. These are most importantly Patterns but also all kinds of Expressions. Additional methods:
    • hasEffectsWhenAssignedAtPath
    • reassignPath
  • ExpressionEntity (extends WritableEntity): An entity that Expressions can be assigned to and that itself can be assigned to other WritableEntities. Additional methods:
    • forEachReturnExpressionWhenCalledAtPath
    • getValue
    • hasEffectsWhenAccessedAtPath
    • hasEffectsWhenCalledAtPath
    • someReturnExpressionWhenCalledAtPath
  • Node (extends ExpressionEntity): Actual AST nodes that can be rendered. Exposes lifecycle methods as well as hasEffects and render

All variables now implement the ExpressionEntity interface which means that they now also have a getValue method. For now these methods return UNKNOWN_VALUE but in the future we can use these to implement really powerful partial evaluation features that can statically evaluate expressions involving (non-reassigned) variables.

This refactoring also means that in many places, we no longer have to check if we are dealing with a node, a variable, or e.g. an UNKNOWN_EXPRESSION. Which not only feels very nice but also opens the doors for implementing lots of new "virtual expressions" as a kind of internal type system. As a first example, the VirtualObjectExpression used for function prototype objects has been replaced by a static UNKNOWN_OBJECT_EXPRESSION which makes thinks a lot cleaner. I am currently actively working on extending this to cover e.g. string and array prototype methods.

As AST nodes often combine the Node interface with other interfaces, ExpressionNode is available as a shorthand. The previous Node base class has been replaced by NodeBase. Though possible, this should no longer be used as a type!

Other changes in this PR:

  • Use type guards wherever possible to make the code cleaner. These are usually placed next to the types they guard 😉 (e.g. isIdentifier next to Identifier). Note that type guards can still be a code smell, i.e. they can make things possible that might be better implemented via a common interface.
  • Inline illegal import reassignment checks into reassignPath methods
  • Rework and clean up the namespace resolution in MemberExpression

Copy link
Contributor

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great and much neater. Always nice to see negative diffs too :)

Mostly nitpicking from me here... just the one interface / class naming suggestion was the main thing that tripped me up a bit, otherwise seems very sensible.

I just noticed the tests measure their time in seconds for a basic eye to perf... wondering if we should make a habit of checking the diffs on this for PRs. I assume with these changes though there is completely negligible performance difference.

On a somewhat unrelated tangent, one thing I would mention though is that the codebase seems to be riding off into a functional / class / interface style. My one word of warning on this would be that as a compiler, performance is of the utmost concern and is more important than beautiful code patterns. Loops and function calls with flat stacks will always be way faster then classes and deep stack-based algorithms, and we shouldn't let the code style make applying loops and plain functions feel "wrong", which it is currently on the verge of doing.

export default class ExecutionPathOptions {
_optionValues: Map<string, Node | Variable>;
_optionValues: Map<string, any>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not do better than any here?


/**
* @returns {ExecutionPathOptions}
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an intentional move to undocument these sorts of classes?

(I did find these quite useful with the TS suggestions, although haven't got into habit of doing it myself)

@@ -31,18 +31,21 @@ export default function enhance (ast: any, module: Module, comments: Comment[])
}
}

function enhanceNode (raw: Node | Node[], parent: Node | Module, module: Module, code: MagicString) {
function isArrayOfNodes (raw: Node | Node[]): raw is Node[] {
return 'length' in raw;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for this over raw.length !== undefined?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in was used by the original code but I guess there is no harm in changing this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, in definitely works better with TypeScript so I'll leave this.

if ('length' in raw) {
for (let i = 0; i < (<Node[]>raw).length; i += 1) {
enhanceNode((<Node[]>raw)[i], parent, module, code);
if (isArrayOfNodes(raw)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely a step in the right direction, although I'd still prefer a visitor that can work this out from Node context over these detections.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree though I guess that would require a larger refactoring beyond the scope of this PR.


export interface ExpressionNode extends Expression, Node {}

export class GenericExpressionNode extends GenericNode implements Expression {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to just call this ExpressionNode?

Once confusion I find with the naming here is it is difficult to tell what is an interface and what is an actual class.

It could be useful to make this clearer in the name, say ExpressionNode and ExpressionNodeInterface or something like that over GenericExpressionNode and ExpressionNode.... still trying to fully get the constructs though, but that's the first impression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or how about NodeBase and ExpressionBase instead of GenericNode and GenericExpressionNode... just so it's clear these are actual real base classes and not interfaces?

@@ -18,7 +15,7 @@ export default class LocalVariable extends Variable {

constructor (
name: string, declarator: Identifier | ExportDefaultDeclaration | null,
init: Expression | Declaration | UnknownAssignment
init: Expression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this will always be an expression? I may be wrong, but I seem to remember testing this before and seeing declarations come up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the declarations in question all satisfy the Expression interface now (and Declaration no longer is a concept). This will probably be clearer once I employ your interface naming suggestions.


export default class Variable {
export default class Variable implements Expression {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about ExpressionEntity as the base shared between Variable and Node?

Then we've have ExpressionEntity, ExpressionNode and ExpressionBase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that. So that's what it's gonna be!


export const UNKNOWN_KEY: UnknownKey = { type: 'UNKNOWN_KEY' };

type PathCallback = (path: ObjectPath, expression: Expression | Declaration | UnknownAssignment) => void;
type PathPredicate = (path: ObjectPath, expression: Expression | Declaration | UnknownAssignment) => boolean;
type PathCallback = (path: ObjectPath, expression: Expression) => void;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You sure about declaration again here?

const nextPathKey = memberExpression.propertyKey;
const object = memberExpression.object;
if (isUnknownKey(nextPathKey)) return null;
if (isIdentifier(object)) return [{key: object.name, pos: object.start}, {key: nextPathKey, pos: memberExpression.property.start}];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline?

if (isIdentifier(object)) return [{key: object.name, pos: object.start}, {key: nextPathKey, pos: memberExpression.property.start}];
if (isMemberExpression(object)) {
const parentPath = getPathIfNotComputed(object);
return parentPath && [...parentPath, {key: nextPathKey, pos: memberExpression.property.start}];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we doing { x: 'asdf' } or {x:'asdf'} style for objects?

src/Module.ts Outdated
interface ExportDescription { localName: string, identifier?: string }
interface ReexportDescription { localName: string, start: number, source: string, module: Module }
interface CommentDescription {
block: boolean,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be a semi colon instead of a comma.

Not sure if that causes a build error or not. But at least that's the norm.


export default class Identifier extends Node {
export function isIdentifier (node: Node): node is Identifier {
return node.type === 'Identifier';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am seeing a lot of these string running around.

Wouldn't it make sense to have a string enum instead, if you really don't want to have numeric representation.

Example;

export enum NodeType {
    Identifier = "Identifier",
    BlockElement = "BlockElement"
    ...
}

And than from here for instance you'd be able to use

return node.type === NodeType.Identifier;

The nice thing about this, if that refactors would be easier and even finding symbol references etc...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I'll add this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If wanting to switch to an enum here, perhaps consider going for the numeric form? For example add to enhance:

node.nodeType = NodeType[node.type];

Then the is check would be the same as the above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually even better than the above would be to add the type to the prototype of the Node type itself:

class Identifier {
}
Identifier.prototype.nodeType = NodeType.Identifier;

That would avoid adding to the memory footprint of Nodes or having anything in enhance, while getting the benefit of a numeric enum.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a seasoned TypeScript expert but to me it looks like there are no advantages of using a numeric enum over a const string enum which has no overhead except that the code base might be just a few bytes smaller. Acorn is already adding the right string constants to all nodes anyway. I also could not find any information about performance benefits of one over the other. Unless there is some hard argument for using numeric enums I would go for a const enum with strings (which is basically the original code)

@lukastaegert
Copy link
Member Author

I've pushed most of the refactorings so I guess it is safe to rebase your other PRs on this now, @guybedford. Will take a look at the enum this evening then I guess it is good to be reviewed again.

one thing I would mention though is that the codebase seems to be riding off into a functional / class / interface style. My one word of warning on this would be that as a compiler, performance is of the utmost concern and is more important than beautiful code patterns. Loops and function calls with flat stacks will always be way faster then classes and deep stack-based algorithms

I totally agree that especially functional patterns always carry a performance risk. Interfaces and classes not so much as long as you do not create a lot of instances. At least my own benchmarks I did some time ago showed that on current engines, there is hardly any performance penalty to calling class members instead of directly calling functions. And interfaces are mainly a symbolic means of encapsulation for the programmer.

Rollup has greatly benefitted from these changes as a lot of difficult bugs could be resolved and the implementation of new features has become much simpler. And another very important goal is to also make it easier for strangers to make changes to the code as they are only required to understand small portions of the algorithm to be able to make changes in contrast to large loops that need to be read as a whole and easily look like Spaghetti code (even to oneself once enough time passed). This is thus aiming at long-term maintainability.

Nevertheless I would very much welcome a move to e.g. loop based visitor patterns if they

  • provide a measurable performance benefit
  • maintain the code separate, e.g. node type specific code should reside with the specific node


export default class ArrayPattern extends Node {
export default class ArrayPattern extends GenericPatternNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GenericPatternNode -> PatternBase?


export default class BreakStatement extends Statement {
export default class BreakStatement extends GenericStatementNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StatementBase?

@guybedford
Copy link
Contributor

@lukastaegert definitely I'm not arguing against the existing patterns at all, just cautioning that they shouldn't be to the exclusion of loop patterns when loop patters will be faster. In many places I found myself using forEach in the PR to match code style, even though a simple for loop would have been much quicker. I can also guarantee that loop-based visitor algorithms would provide a measurable performance benefit for any code that is dominated by visit iteration time. The enhance function is probably the most ripe for this sort of performance improvement (and I hope to find the time at some point to contribute to this). If you ever want to explore this area further would be good to discuss.

In terms of code separation, my other concern is that Nodes may have some overloaded meaning between render and analysis concerns. I'm wondering here if there might be a pattern where the methods called exist as more virtual layers as opposed to being direct class methods on the nodes themselves... imagining separate folders for ast, analysis and render layers on Nodes. Although I'm not sure what best pattern that might be, or what the performance implications would be.

@lukastaegert
Copy link
Member Author

Oh, please do use for loops instead of forEach if they provide a performance benefit! I am really not attached to these micro patterns and a for loop is probably just as readable as a forEach.

Otherwise I guess long prototype chains are probably a bad idea performance-wise. Maybe we should go back to having each node extending ExpressionNodeBase regardless if it actually is an expression. Or rather putting everything back into Node. We could still keep the various interfaces Entity, WritableEntity, ExpressionEntity, Node etc. when referring to them to be able to more easily refactor things. What do you think?

I actually like your suggestion of separating rendering and analysis. We would just need to figure out how to pass data around. If we could find a way to separate data objects and code here, that would be awesome.

@guybedford
Copy link
Contributor

@lukastaegert sure sounds good. I'm not too worried about the prototype chains actually... mainly just the use of deep stacks and stack-based visitors.

* Take care all nodes implement the ExpressionEntity interface to simplify
  things
@lukastaegert
Copy link
Member Author

Ok, simplified the Node interface to contain the ExpressionEntity interface so that all nodes look similar. Should make extensions a little easier.

Also changed all node types to be represented by an enum. As reasoned above, I used a const enum with strings which is 1-to-1 equivalent to the previous situation and will just help to prevent typos.

@guybedford If there are no objections, we can merge this today and I can finally move on to review your stuff which I am really looking forward to :)

innerOptions.addAccessedReturnExpressionAtPath(path, this)
),
options
))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were these tab -> space conversions intended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not :) I think I need to do the "prettier" PR soon.

@guybedford
Copy link
Contributor

Looks great! @lukastaegert the only difference between strings and enums would be the difference between rows 3 and 4 here - https://jsperf.com/string-number-comparison. Which is negligible in that test anyway.

Certainly go ahead with the merge.

@guybedford
Copy link
Contributor

Minor nit - if all nodes implement ExpressionEntity then perhaps ExecutableEntity would be a better name?

@lukastaegert
Copy link
Member Author

Ah, that benchmark is nice to know if we need to do higher performance comparisons. As for the name: I would still stick with ExpressionEntity as the interface is meant to say "you can do things with this that you can do with an expression" even though for non-expressions, only the conservative default handlers would kick in.

@Rich-Harris Rich-Harris mentioned this pull request Jan 10, 2018
@lukastaegert lukastaegert added this to the 0.53.4 milestone Jan 10, 2018
@lukastaegert lukastaegert merged commit 0f98a30 into master Jan 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants