Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance collapse_vars #1862

Merged
merged 8 commits into from
May 6, 2017
Merged

enhance collapse_vars #1862

merged 8 commits into from
May 6, 2017

Conversation

alexlamsl
Copy link
Collaborator

@alexlamsl alexlamsl commented May 1, 2017

  • extend expression types
    • a++
    • a=x;
  • extend scan range
    • for(init;;);
    • switch(expr){case expr:}
    • a = x; a = a || y;
  • terminate upon debugger;

fixes #27
fixes #1858

Another break-out of #1821, as mentioned in #1850 (comment)

I have managed not to overlap with cascade or reduce_vars too much, and they should not interfere with each other too badly. This can be viewed as cascade but across multiple AST_Statements instead of within a single AST_Sequence, with the additional support for AST_VarDef (since you can't have that within AST_Sequence anyway).

@kzc
Copy link
Contributor

kzc commented May 1, 2017

Do you still run the fuzzer from time to time? Not clear if it'd catch problems in optimizations like this though.

@alexlamsl
Copy link
Collaborator Author

Haven't started running test/ufuzz.js yet, though I'd expect it to catch some basic stuff at least.

This thing is actually quite scary, for instance it can reach within the first AST_Case.expression of a switch statement. So I'm thinking of writing a few more test/compress cases first.

x = i += 2,
y = i += 3;
log(x, i += 4, y, i);
console.log.bind(console)(x, i += 4, y, i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you and I can deduce this optimization is correct, how can collapse_vars know that the side effects in console.log.bind do not interact with the modification of i above?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does this:

  • begin scanning on log
  • collect all assignments within the value (none in this case, but if say log = i = 10 it would have marked i down)
  • realise console.log.bind(console) may have side-effects
  • every time it hits AST_SymbolRef (i in this case), make sure all references to i are within this scope, i.e. f4()

So collapse_vars move past each i safely because it knows whatever side-effects console.log.bin(console) may have, it can't possibly modify i which is exclusively modified within f4().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming that.

return {
get b() { return 7; },
r: z
r: x + y
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What in collapse_vars prevented this particular optimization before? I don't see any side effects above it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AST_Accessor.has_side_effects() on get b() { return 7; }, which aborts the scan previously.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - of course! Thanks.

@kzc
Copy link
Contributor

kzc commented May 1, 2017

for instance it can reach within the first AST_Case.expression of a switch statement.

That was certainly not the case in 2.7.5 at least. I explicitly rejected such nodes in preorder(). Did you change code to that effect?

@alexlamsl
Copy link
Collaborator Author

Did you change code to that effect?

Yeah - I basically wrote another version in parallel to test between the two back and forth.

@alexlamsl
Copy link
Collaborator Author

I guess I should put that compressor.info() back in... 😅

@kzc
Copy link
Contributor

kzc commented May 1, 2017

In the first collapse_vars implementation I naïvely did not exclude the branches of if/else, the ternary operator and the && and || operators. That's when I put in the preorder exclusions of branching nodes including AST_Case as a precaution.

@kzc
Copy link
Contributor

kzc commented May 1, 2017

Even AST_While.condition has to be excluded in preorder since it is executed multiple times during the course of the loop. That had tricked me as well.

@alexlamsl
Copy link
Collaborator Author

alexlamsl commented May 1, 2017

Here's a list of cases that would terminate the scan:

  • AST_Do
  • AST_While
  • AST_ForIn
  • AST_For (after trying .init)
  • AST_Try (which includes catch and finally)
  • AST_With
  • AST_SymbolRef that are unsafe (undeclared, referenced out-of-scope or assigned within collapsing value)
  • AST_Call (after trying .expression and .args)
  • AST_Exit (after trying .value)
  • AST_Dot (after trying .expression)
  • AST_Sub (after trying .expression and .property)
  • && and || (after trying .left)
  • AST_Conditional (after trying .condition)
  • AST_If (after trying .condition)
  • AST_Switch (after trying .expression and the first AST_Case.expression)

@kzc
Copy link
Contributor

kzc commented May 1, 2017

Perhaps should add AST_Debugger to the list? It's a bit of an oddball instruction.

@alexlamsl
Copy link
Collaborator Author

Perhaps should add AST_Debugger to the list? It's a bit of an oddball instruction.

I'll confess I never used that feature of JavaScript before... 😅

Better safe than sorry, indeed.

@alexlamsl
Copy link
Collaborator Author

OT: TIL null[console.log("This gets printed before the runtime error.")]

@kzc
Copy link
Contributor

kzc commented May 1, 2017

null[console.log("This gets printed before the runtime error.")]

That's whack, but it sort of makes of makes sense from a JS interpreter point of view. It doesn't consider the implications of the expression until the subscript is evaluated.

Keep in mind that the expression is still evaluated before the subscript:

$ echo '(console.log("expr"))[console.log("subscript")];' | node
expr
subscript
[stdin]:1
(console.log("expr"))[console.log("subscript")];
                     ^

TypeError: Cannot read property 'undefined' of undefined

lib/compress.js Outdated
@@ -677,7 +677,7 @@ merge(Compressor.prototype, {
|| node instanceof AST_IterationStatement && !(node instanceof AST_For)
|| node instanceof AST_SymbolRef
&& (node.undeclared()
|| lvalues && node.name in lvalues
|| lvalues && lvalues[node.name]
Copy link
Contributor

@kzc kzc May 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be more readable, but it's slower because the value needs to be evaluated in a boolean context rather than merely checking for key existence.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not an optimisation - I do need the boolean value here. lvalues stores true if variable is modified (same as before), but false when the variable is read-only.

This is to facilitate the use case below when dealing with var a=b; return b++ + a where the modifying case is encountered during scanning as opposed to the assigned value. This case used to be taken care of by the post-order has_side_effects()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake. I did not realize you changed its use.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries - always good to make sure we know what I'm doing 😉

@alexlamsl
Copy link
Collaborator Author

OT: Node.js oddity of the day

$ node
> !function f(){console.log(f, f ^= 0, f)}()
[Function: f] 0 [Function: f]
true
> !function f(){console.log(typeof f, f ^= 0, f)}()
function 0 function f(){console.log(typeof f , f ^= 0, f)}
true

So in that sense, typeof isn't side-effect-free 🤣

@kzc
Copy link
Contributor

kzc commented May 1, 2017

So in that sense, typeof isn't side-effect-free

How a JS engine prints out a function is a crap shoot in the best of circumstances.

@alexlamsl
Copy link
Collaborator Author

Right, I'll leave test/ufuzz.js to have a go at this PR, since it's 7am which means it's a good time to take a nap 😴

@alexlamsl
Copy link
Collaborator Author

Every sub-domain of angularjs.org is down at the moment - will restart CI jobs once they sort themselves out.

@kzc
Copy link
Contributor

kzc commented May 2, 2017

Where did all the comments in the collapse_vars implementation go?

No matter - it's beyond my ability to understand it at this point.

Please fuzz this PR for a few hundred CPU hours since it's bound to have unforeseen issues.

@alexlamsl
Copy link
Collaborator Author

Where did all the comments in the collapse_vars implementation go?

They went the same way as compressor.info() 😅 - I wrote this version next to the original code, so the comments weren't copied over. I'll see what I can do.

No matter - it's beyond my ability to understand it at this point.

Any suggestions to improve readability? 😅

Please fuzz this PR for a few hundred CPU hours since it's bound to have unforeseen issues.

The last few commits are bugs found by test/ufuzz.js before I need the box for $day_job. I shall run to 1MFuzz without a single incident before merging this PR.

@alexlamsl
Copy link
Collaborator Author

alexlamsl commented May 2, 2017

Issues fixed so far are discovered around 10~30kFuzz.

@kzc
Copy link
Contributor

kzc commented May 2, 2017

On the comment front all I can suggest is for you to ask yourself the question - what would someone unfamiliar with the project want to know about particularly dense sections of code? And answer those questions with brief comments just to say where you're going. Clearly you don't need comments to understand the code, but mortals like the rest of us do. :-)

Before I forget, there was one other section of code unrelated to this PR in which variables were renamed in a confusing manner:

                        var name = d.name.length;
                        var overhead = 0;
                        if (compressor.option("unused") && (!d.global || compressor.toplevel(d))) {
                            overhead = (name + 2 + value) / d.references.length;
                        }
                        d.should_replace = value <= name + overhead ? fn : false;

name and value are misleading - they should be name_length and value_length respectively to convey what they actually represent.

I'm going to take this opportunity to step back from this project since it's in very capable hands. I think it's never been in better shape. You've whittled the Issues down from a dozen pages to just 4. With the upcoming 3.x release you've simplified the interface and made it more consistent. In the event you need a second opinion on some issue, feel free to @kzc me.

@alexlamsl
Copy link
Collaborator Author

Thanks for all the help so far - much appreciated 👍

name and value are misleading - they should be name_length and value_length respectively to convey what they actually represent.

Will do.

Clearly you don't need comments to understand the code, but mortals like the rest of us do.

I'd be one of said mortals when I come back to read that code in a few weeks' time, so better write something down now before I regret it. 😅

@alexlamsl
Copy link
Collaborator Author

Wrote a summary and added short descriptions to cut up the main code section in 6c3a7b0

- extend expression types
  - `a++`
  - `a=x;`
- extend scan range
  - `for(init;;);`
  - `switch(expr){case expr:}`
  - `a = x; a = a || y;`
- terminate upon `debugger;`

fixes mishoo#27
fixes mishoo#1858
@alexlamsl
Copy link
Collaborator Author

Rebased from master after #1863 & #1864 - had 100kFuzz without incident.

@kzc
Copy link
Contributor

kzc commented May 2, 2017

@alexlamsl I haven't benchmarked anything on master in a while, but going forward if you could keep an eye on the default -m -c times for the files mentioned in test/benchmark.js to make sure they don't regress substantially from v2.7.5 that'd be great. Any new optimization taking a lot of time should be put behind an option defaulting to false.

With default options Uglify users have grown to expect simplicity, decent minification and speed - otherwise they'd be using one of the other minifiers.

Thanks again and good luck with the 3.x release!

@alexlamsl
Copy link
Collaborator Author

alexlamsl commented May 2, 2017

test/ufuzz.js just presented me with an interesting discovery:

var s = "" + function(){...};
for (var a in s) {
    // this will loop through every character of the string `s`, so `c` is going to be dependent on the function body
    c = 1 + c;
}

And all hell broke loose for that false positive of failed test case 😓

@alexlamsl alexlamsl merged commit dee5a27 into mishoo:master May 6, 2017
@alexlamsl alexlamsl deleted the collapse_vars branch May 6, 2017 08:15
papandreou added a commit to papandreou/uglifyproblem that referenced this pull request Sep 14, 2017
@papandreou
Copy link
Contributor

@alexlamsl, I've run into a problem with a web app (the Calendar part of the One.com Webmail suite), and several levels of bisecting landed me here:

dee5a27516cb574dda5fc3d23a64344f0ea654b6 is the first bad commit
commit dee5a27516cb574dda5fc3d23a64344f0ea654b6
Author: Alex Lam S.L <alexlamsl@gmail.com>
Date:   Sat May 6 16:15:43 2017 +0800

    enhance `collapse_vars` (#1862)
    
    - extend expression types
      - `a++`
      - `a=x;`
    - extend scan range
      - `for(init;;);`
      - `switch(expr){case expr:}`
      - `a = x; a = a || y;`
    - terminate upon `debugger;`
    
    closes #1821
    fixes #27
    fixes #315
    fixes #1858

:040000 040000 0036eaa992e75618cb992c9aee45710b6914fe39 35d7ef56a99cde5950421194c91d1f2fa7e08367 M	lib
:040000 040000 b64e87760056fd566af3970179d637162175a937 bd4600092a0f237d2fcaf4c1a8bd81f7ce8e264c M	test

The symptom in the application is that recurring calendar events start appearing in the wrong places, so it seems like this transformation doesn't preserve the semantics of the code, so the math comes out wrong.

Here's a diff between the output of uglify-js before and after this change landed: papandreou/uglifyproblem@7cffa47 -- I'll try to skim through it to see if I can spot the error myself, but help would be appreciated :)

@papandreou
Copy link
Contributor

Further debugging indicates that it's this change that causes the application to break:

diff --git a/http-pub-universal/static/calendarfrontend.js b/http-pub-universal/static/calendarfrontend.js
index e948030d2..e5ad03505 100644
--- a/http-pub-universal/static/calendarfrontend.js
+++ b/http-pub-universal/static/calendarfrontend.js
@@ -20585,7 +20585,7 @@ _gaq.push([ "_trackPageLoadTime" ]), _gaq.push([ "_gat._anonymizeIp" ]), functio
                 increment_monthday: function(inc) {
                     for (var i = 0; i < inc; i++) {
                         var daysInMonth = ICAL.Time.daysInMonth(this.last.month, this.last.year);
-                        this.last.day++, this.last.day > daysInMonth && (this.last.day -= daysInMonth, this.increment_month());
+                        ++this.last.day > daysInMonth && (this.last.day -= daysInMonth, this.increment_month());
                     }
                 },
                 increment_month: function() {

@papandreou
Copy link
Contributor

The original code looks like this:

    increment_monthday: function increment_monthday(inc) {
      for (var i = 0; i < inc; i++) {
        var daysInMonth = ICAL.Time.daysInMonth(this.last.month, this.last.year);
        this.last.day++;

        if (this.last.day > daysInMonth) {
          this.last.day -= daysInMonth;
          this.increment_month();
        }
      }
    },

@alexlamsl
Copy link
Collaborator Author

@papandreou your example seems to work as intended:

$ cat test.js
var obj = {
  last: {
    day: 0
  },
  increment_month: function() {
    console.log("next month");
  },
  inc: function(inc) {
    for (var i = 0; i < inc; i++) {
      var daysInMonth = 28;
      this.last.day++;

      if (this.last.day > daysInMonth) {
        this.last.day -= daysInMonth;
        this.increment_month();
      }
    }
  }
};

obj.inc(100);
console.log(obj.last.day);
$ uglifyjs test.js -cb bracketize
var obj = {
    last: {
        day: 0
    },
    increment_month: function() {
        console.log("next month");
    },
    inc: function(inc) {
        for (var i = 0; i < inc; i++) {
            ++this.last.day > 28 && (this.last.day -= 28, this.increment_month());
        }
    }
};

obj.inc(100), console.log(obj.last.day);
$ cat test.js | node
next month
next month
next month
16

$ uglifyjs test.js -cb bracketize | node
next month
next month
next month
16

Please narrow the issue down to an reproducible case, then file a new report (thanks in advance for your efforts!)

@papandreou
Copy link
Contributor

Ah, it seems to happen because this.last.day is a getter/setter, and the transformation is changing the number of times it's being dereferenced. In other words this source transformation is unsafe in the presence of getters and setters.

@papandreou
Copy link
Contributor

This outputs yay before being uglified, and nothing after being uglified:

var foo = { _bar: 0 };
Object.defineProperty(foo, 'bar', {
    get: function () {
        this._bar += 1;
        return this._bar;
    },
    set: function (bar) {
        this._bar = bar;
    }
});

foo.bar++;
if (foo.bar === 3) {
    console.log('yay');
}

@kzc
Copy link
Contributor

kzc commented Sep 14, 2017

@papandreou Please open a new ticket for this issue.

alexlamsl added a commit that referenced this pull request Jan 31, 2018
First introduced in #1862 to stop assignments to migrate beyond `return` or `throw`. Since then `collapse_vars` has been improved to handle various side-effect-related corner cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants