Skip to content
This repository has been archived by the owner on Oct 9, 2020. It is now read-only.

Bundle arithmetic now supports parens. #535

Merged
merged 6 commits into from Mar 29, 2016
Merged

Bundle arithmetic now supports parens. #535

merged 6 commits into from Mar 29, 2016

Conversation

arackaf
Copy link
Contributor

@arackaf arackaf commented Mar 26, 2016

I really, really wanted to support this without re-writing your tokenizing code, but there was just no way. A regex ninja may have been able to pull it off, but even then it would have been literally impossible to support nested parens. So the code now manually walks the string, pulling out what it needs. All your prior code to account for canonicalization and single-modules I left as is.

The tests I previously reported are still failing, but it does look like it's a Windows-only issue; they were failing from the moment I cloned the repo.

@guybedford
Copy link
Member

Wow, this is really amazing. Thank you so much for working on this.

The main feedback I have is to handle syntax errors. For example getNextOperator should verify that the given character is a valid operator, and getNextIdentifier should verify that the given identifier is not an operator character (as was provided by https://github.com/systemjs/builder/pull/535/files#diff-2e0496fccad838c52136f5d29f3e85e2L27). The other thing that opens up here is to throw a syntax error when there are unmatching parenthesis too? Or at least confirming what happens currently makes sense there? +1 to the whitespace handling :)

The rest of the feedback would all just be code style and naming stuff (eg since it's no longer operator / module name, changing the name to operator / expression everywhere), which I know can be a pain, but let me know if you're happy for me to provide that sort of feedback? Otherwise I can always do a manual merge with my own mostly stylistic edits (keeping your commit and attribution in-tact) if you're ok with that too.

@arackaf
Copy link
Contributor Author

arackaf commented Mar 27, 2016

Ah - good feedback. I've got your master merged in locally (ostensibly I should now be able to npm run test so thanks for that.) I'll add some better error handling - question - do you have tests for this already? I don't see anything for bundling something like bundle.trace('a +') and catching an expected error.

Do you have them which I'm missing, or should I add?

And WebStorm handles local variable renames quite simply, so that's no big deal at all.

I should have that done tonight.

@guybedford
Copy link
Member

No there aren't any tests for the syntax error cases currently, if you'd like to add these that is welcome.

@guybedford
Copy link
Member

And just let me know if / when you want me to go ahead with the style feedback?

@arackaf
Copy link
Contributor Author

arackaf commented Mar 27, 2016

OH - yeah go ahead and shoot with the style feedback. I tried my best to match your styles, so sorry about that if I didn't quite.

@arackaf
Copy link
Contributor Author

arackaf commented Mar 27, 2016

@guybedford - If you don't like the wording on these, now's the time to tell me :)

image

@arackaf
Copy link
Contributor Author

arackaf commented Mar 27, 2016

Ok I renamed that variable, and also added some needed validation. (The validation messages do NOT normally show in the test console, unless you manually enable them).

Validation checks verify against missing operators, ie a + b c, missing identifiers, ie a + b + + c and unclosed parens.

Let me know if you think there's anything else needed.

expandPromise = expandPromise.then(function() {
return Promise.resolve(expandedTreePromise)
.then(function(fullTree){
expandedOperations = expandedOperations.concat({ preExpandedTree: fullTree, operator: operation.operator })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this still be called bulkOperation instead of preExpandedTree or am I missing something here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right I see what you are doing here, running the actual arithmetic during this phase recursively.

To avoid conflating steps I think I'd rather take one of two approaches here:

  1. Instead of a single traceExpression separate it out into traceExpression and traceBulkExpression, where traceBulkExpression handles this sort of logic instead.
  2. During this expansion step, recursively expand globs only, leaving the arithmetic syntax in-tact.

I'd be happy with either method here, let me know if that makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify the reason being here that each "step" of the algorithm should have a clear purpose. To do globbing and paren expansion together is an arbitrary conflation just to take advantage of the same loop, something which can rather be split up into two loops (corresponding to either (1) or (2)).

@guybedford
Copy link
Member

One other thing here - previously expressions were defined as being separated by the string matching /\s[\+\-\&]\s/. Since we now allow arbitrary spaces as the terminator I think this will break the ability to support spaces in file names.

That is, currently we can do:

jspm bundle 'My App/x.js + My App/y.js'

With this PR it would no longer be possible to bundle paths with spaces in the names.

So unfortunately that seems like a tokenizing adjustment may be necessary to use a regex like the above. We could still still adjust this regex to be "space-eating" like /\s+[\+\-&]\s+/.

@guybedford
Copy link
Member

And thanks for being open to all this feedback. If you're getting sick of it let me know and I'm happy to jump in too.

@arackaf
Copy link
Contributor Author

arackaf commented Mar 28, 2016

The problem with splitting with a regex isn't just supporting more than one space: that was the last thing I cared about; it was just a bonus that I wound up being able to support multiple spaces. The problem is that it's hard to get a regex to split in a way that leaves parenthetical expressions alone, and of course it's impossible to have a nested expression that's parsed by a regex, ie (a + b + (c & d)) [citation] :-) http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

Even if we were to ditch nested expressions, which I'd hate to do, you wind up with this sort of unholy mess

var expressionString = 'x + (a + b + (c & d))'
console.log(expressionString.split(/(\s+\(?=.*\)\s+|[\+\-\&])/));

which still doesn't even work - it's aggressively splitting up the inner expression. And again, even if we could tweak the lookahead to work, it still would never / could never support nested parens.

So can we support filenames with spaces by just quoting them? Ie builder.trace('"My file 1.js" + file2.js');


I could separate out traceExpression with traceExpressionBulk, but they would share a non-trivial amount of code, with the latter being recursive in the same way. For example, consider

bundle.trace('(a + b + (c & d))')

traceExpressionBulk would get a + b + (c & d) and would wind up calling itself to get the results of c & d.

It would be something vaguely like what's below, though that's literally untested and almost certainly wrong. That would leave me duplicated code to handle single operations, like a + b which should probably be re-factored to yet another utility method.

Are you sure we can't leave this in the single, recursive traceExpression method? Parsing things like (a + b + (c & d)) is inherently recursive. Recursively calling the same traceExpression to handle nested expressions seems like a fairly natural solution.

javascript
exports.traceExpression = function traceExpression(builder, expression, traceOpts) {
if (!expression)
throw new Error('A module expression must be provided to trace.');

var operations = parseExpression(expression);
var expandedOperations = [];

// expand any globbing operations in the expression
var expandPromise = Promise.resolve();
operations.forEach(function(operation) {
if (operation.bulkOperation) {
traceExpressionBulk(builder, operation.bulkOperation, traceOpts,expandedOperations);
} else if (operation.operator) {
expandPromise = expandPromise.then(function() {
return Promise.resolve(expandGlobAndCanonicalize(builder, operation))
.then(function (expanded) {
expandedOperations = expandedOperations.concat(expanded);
});
});
}
});

return expandPromise.then(function() {
// chain the operations, applying them with the trace of the next module
return expandedOperations.reduce(function(p, op) {
return p.then(function(curTree) {
// tree . module
if (op.singleModule)
return getTreeModuleOperation(builder, op.operator)(curTree, op.moduleName);

    if (op.preExpandedTree){
      return getTreeOperation(op.operator)(curTree, op.preExpandedTree);
    }
    // tree . tree
    return builder.tracer.traceCanonical(op.moduleName, traceOpts)
    .then(function(nextTrace) {
      return getTreeOperation(op.operator)(curTree, nextTrace.tree);
    });
  });
}, Promise.resolve({}));

});
};

exports.traceExpressionBulk = function traceExpressionBulk(builder, expression, traceOpts, expandedOperations) {
var operations = parseExpression(expression);

// expand any globbing operations in the expression
var expandPromise = Promise.resolve();
operations.forEach(function(operation) {
if (operation.bulkOperation) {
var expandedTreePromise = traceExpressionBulk(builder, operation.bulkOperation, traceOpts)
expandPromise = expandPromise.then(function() {
return Promise.resolve(expandedTreePromise)
.then(function(fullTree){
expandedOperations = expandedOperations.concat({ preExpandedTree: fullTree, operator: operation.operator })
});
});
} else if (operation.operator) {
expandPromise = expandPromise.then(function() {
return Promise.resolve(expandGlobAndCanonicalize(builder, operation))
.then(function (expanded) {
expandedOperations = expandedOperations.concat(expanded);
});
});
}
});
};

@arackaf
Copy link
Contributor Author

arackaf commented Mar 28, 2016

Thinking about this more at the gym .... I think the tl;dr is that your builder now supports child expressions. So parseExpression now gets recursively called to handle each such child expression.

I don't think this is a terribly unnatural implementation, but I'd be happy to see alternatives if you really think they're necessary.

Also, for the filenames with the spaces - would your canonicalization stuff handle that? I haven't dug into that part of the code to see what it does, and it's not documented, so I'm not sure if

builder.bundle("a.js + `my file.js` + b.js"

would work.

@guybedford
Copy link
Member

I don't mean splitting with a regex from the beginning, rather just updating the readOperator step to read using the /\s+[\+\-\&]\s+/ regex rather. that way spaces in module names can continue to be supported.

For the other thing. The problem is not so much that parseExpression is recursive, which is completely right, rather just about the "expand and canonicalize" step being separate to the "evaluate" step. That is because expand and canonicalize is a pre-processing of the expression, while calculating the expression value is a later step conceptually. For example, in future we may make expression simplifications at a symbolic level, allowing us to cut out unnecessary execution. That way we could allow (A + B) - A to not need to trace A at all. If we mix expression evaluation with the pre-compilation, it will make this sort of stuff harder, that is all. So it's just a conceptual separation, which is exactly a duplication of the loop.

@arackaf
Copy link
Contributor Author

arackaf commented Mar 28, 2016

@guybedford it took me an embarrassing amount of time to figure out what you meant, but I finally got it (I think). And I gained a much better understanding of the code in the process.

Filenames with spaces are still not done yet, but please take a look at my last commit: I think it achieves what you were wanting with the expression parsing/globbing/canonicalization all happening prior to the actual trees being computed and intersected / added / subtracted.

Thanks for calling this out - the code is much more nicely organized.

@arackaf
Copy link
Contributor Author

arackaf commented Mar 29, 2016

Ok - last commit supports files with spaces. Hopefully this is good to go now.

@guybedford guybedford merged commit b5cf432 into systemjs:master Mar 29, 2016
@guybedford
Copy link
Member

Thanks for your great work on this!

@arackaf
Copy link
Contributor Author

arackaf commented Mar 29, 2016

@guybedford my pleasure. Can you drop me a quick line, here or elsewhere, when this is up on npm?

@guybedford
Copy link
Member

@arackaf sure this has been released in 0.15.14... jspm release to follow now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants