Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

added README

  • Loading branch information...
commit ce8e8d57c0d346dba9527b7a11b03364ce9ad1bb 1 parent bf70205
@mishoo authored
Showing with 290 additions and 0 deletions.
  1. +290 −0 README.md
View
290 README.md
@@ -0,0 +1,290 @@
+> **Tl;dr** — I want to make UglifyJS2 faster, better, easier to maintain
+> and more useful than version 1. If you enjoy using UglifyJS v1, I can
+> promise you that you will love its successor.
+
+> Please help me make this happen by funding the development!
+
+> <a href='http://www.pledgie.com/campaigns/18110'><img alt='Click here to lend your support to: Funding development of UglifyJS 2.0 and make a donation at www.pledgie.com !' src='http://www.pledgie.com/campaigns/18110.png?skin_name=chrome' border='0' /></a>
+
+UglifyJS v2
+===========
+
+[UglifyJS](https://github.com/mishoo/UglifyJS) is a popular JavaScript
+parser/compressor/beautifier and it's itself written in JavaScript. Version
+1 is battle-tested and used in many production systems. The parser is
+[included in WebKit](http://src.chromium.org/multivm/trunk/webkit/Source/WebCore/inspector/front-end/UglifyJS/parse-js.js).
+In two years UglifyJS got over 3000 stars at Github and hundreds of bugs
+have been identified and fixed, thanks to a great and expanding community.
+
+I'd say version 1 is rock stable. However, its architecture can't be
+stretched much further. Some features are hard to add, such as source maps
+or keeping comments in the compressed AST. I started work on version 2 in
+May, but I gave up quickly because I lacked time. What prompted me to
+resume it was investigating the difficulty of adding source maps (an
+[increasingly popular](https://github.com/mishoo/UglifyJS/issues/315)
+feature request).
+
+Status and goals
+----------------
+
+In short, the goals for v2 are:
+
+- better modularity, cleaner and more maintainable code; (✓ it's better already)
+- parser generates objects instead of arrays for nodes; (✓ done)
+- store location information in all nodes; (✓ done)
+- better scope representation and mangler; (✓ done)
+- better code generator; (✓ done)
+- compression options at least as good as in v1; (⌛ in progress)
+- support for generating source maps;
+- better regression tests; (⌛ in progress)
+- ability to keep certain comments;
+- command-line utility compatible with UglifyJS v1;
+- documentation for the `AST` node hierarchy and the API.
+
+Longer term goals—beyond compressing JavaScript:
+
+- provide a linter; (started)
+- feature to dump an AST in a simple JSON format, along with information
+ that could be useful for an editor (such as Emacs);
+- write a minor JS mode for Emacs to highlight obvious errors, locate symbol
+ definition or warn about accidental globals;
+- support type annotations like Closure does (though I'm thinking of a
+ syntax different from comments; no big plans for this yet).
+
+### Objects for nodes
+
+Version 1 uses arrays to represent AST nodes. This model worked well for
+most operations, but adding additional information in nodes could only be
+done with hacks I don't really like (you _can_ add properties to an array
+just as if it were an object, but that's just a dirty hack; also, such
+properties were not propagated in the compressor).
+
+In v2 I switched to a more “object oriented” approach. Nodes are objects
+and there's also an inheritance tree that aims to be useful in practice.
+For example in v1 in order to see if a node is an aborting statement, we
+might do something like this:
+
+ if (node[0] == "return"
+ || node[0] == "throw"
+ || node[0] == "break"
+ || node[0] == "continue") aborts();
+
+In v2 they all inherit from the base class `AST_Jump`, so I can say:
+
+ if (node instanceof AST_Jump) aborts();
+
+The parser was _heavily_ modified to support the new node types, however you
+can still find the same code layout as in v1, and I trust it's just as
+stable. Except for the parser, all other parts of UglifyJS are rewritten
+from scratch.
+
+The parser itself got a bit slower (430ms instead of 330ms on my usual 650K
+test file).
+
+#### A word about Esprima
+
+[Esprima](http://esprima.org/) is a really nice JavaScript parser. It
+supports EcmaScript 5.1 and it claims to be “up to 3x faster than UglifyJS's
+parse-js”. I thought that's quite cool and I considered using Esprima in
+UglifyJS v2, but then I did some tests.
+
+On my 650K test file, UglifyJS v1's parser takes 330ms and Esprima about
+250ms. That's not exactly “3x faster” but very good indeed! However, I
@ariya
ariya added a note

The 3x difference is only visible using the latest Chrome (V8) on the said benchmark suite. I'll try to clarify that in the doc.
I'd be happy to include your 650K test file since I'm still looking for a representative corpus for the benchmark.

@ariya
ariya added a note

Also, post Esprima 1.0, optimizing parsing with location info is the priority. Based on the feedback and use-cases, I also come to the same conclusion that nobody really needs plain vanilla syntax tree anymore. Glad that we're going in a similar direction here :)

@mishoo Owner
mishoo added a note
@ariya
ariya added a note

Thanks, I'll revise our benchmark corpus and include it.
Chrome 21 still shows the 3x difference for me with that file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+noticed that in the default configuration Esprima does not keep location
+information in the nodes. Enabled that, and parse time grew to 680ms.
+
+Some would claim it's a fair
+[comparison](http://esprima.org/test/compare.html), because UglifyJS doesn't
+keep location information either, but that's not entirely accurate. It's
+true that the `parse()` function will not propagate location into the AST
+unless you set `embed_tokens`, but the lexer _always_ stores it in the
+tokens.
+
@ariya
ariya added a note

Since Esprima does the same, i.e. its lexer always track the location info of each token, isn't the comparison still quite apple-to-apple?

@mishoo Owner
mishoo added a note

@mishoo: I think you're ignoring the most important aspect of comparison to esprima. Esprima would parse the JS into a standardised AST representation, around which many tools have been built. By using your own AST representation, it forces tools that want to interact with uglifyjs to use JS as an intermediate representation. For a concrete example, this would require a compiler like my CoffeeScript compiler to first use escodegen to transform the standardised AST into JS, then use your parser to parse that JS into an AST of the form you use, then use your tool to generate JS yet again. Ideally, your minifier could consume and produce ASTs in this standard format, then escodegen or another code generator with minimal-size formatting built in could generate JS. This separates responsibilities so that your project can worry about optimisation/rewriting and ignore parsing and code generation, which are solved problems. And, of course, the process will be much simpler (and faster!) because of the standardised internal representation and interface.

@mishoo Owner
mishoo added a note

Generating valid code from an AST seems to me the most complex job that a compressor must do
(at least that was the source of most problems in UglifyJS).

I agree, it is very complex, and escodegen does a wonderful job with it. A minifier is an optimising rewriter, it should not be responsible for JS parsing (an extremely difficult problem) or JS code generation (which you yourself admit is also a very difficult problem). Why waste your time re-engineering these tools when they already exist? I'm just trying to push uglifyjs toward being a simple but powerful tool with a single purpose.

@ariya
ariya added a note

To be honest, there might not be a killer AST format which suits all purposes. One format can be generic enough but for the reason of speed or something else, a particular format tweaked for the minifying purpose (like UglifyJS) can be more optimal.

@ariya: I completely agree. And I particularly dislike the format specified by the spidermonkey parser API. But interoperability stands above all.

@ariya
ariya added a note

Interoperability is one thing, but there is another factor to consider. For example, UglifyJS parser is more battle hardened than any other parsers simply because it is widely used. Making tweaks and modifications to it seems like a reasonable iterative approach, instead of using a completely foreign code.

@Sarah-C
Sarah-C added a note

Is there a blog around anywhere?

I'd love to use this online, without cracking open a copy of Node.js...... does anyone know of any?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+Enabling `embed_tokens` makes UglifyJS do it in 400ms, which is still a lot
+better than Esprima's 680ms.
+
+In version 2 we always maintain location info and comments in the AST nodes,
+which is why the parser in v2 takes about 430ms on that file (some
+milliseconds get lost because it's more work to create object nodes than
+arrays). I might try to speed it up, though I'm not sure it's worth the
+trouble (parsing 650K in 430ms (on my rather outdated machine) to get an
+objectual AST with full location/range info and comments seems good enough
+for me).
+
@ariya
ariya added a note

@mishoo A tentative test is at http://ariya.github.com/js/parsing/. With that 650 KB, the latest Chrome shows that Esprima (with location info and comment) is just 15% slower, and not as drastic as 430 ms vs 680 ms.

@staabm
staabm added a note

maybe you could add this into a jsperf so you can get an x-browser result?

@mishoo Owner
mishoo added a note
@mishoo Owner
mishoo added a note
@ariya
ariya added a note

@staabm My test page is just missing Browserscope otherwise it's very similar to JSPerf (using the same BenchmarkJS). When I originally tried to put in in JSPerf, I didn't figure out yet on how to pull the 650 KB payload for the test.

@ariya
ariya added a note

@mishoo Interesting point on loc. The feedback to me so far is that line/col is not useful for syntax post-processing while range is nice because you can point directly to the original source string. For user report etc, the next Esprima will have a way to map back a given index to line/col without manual work. In short, loc is mainly there for Mozilla Reflect compatibility.

@marijnh
marijnh added a note

Wait, how does the repeated mention of esprima being slightly slower than parse-js in this discussion relate to http://esprima.org 's claim of being three times as fast?

Edit: Ah, I get it now. He's comparing apples to cored apples.

@ariya
ariya added a note

What is the specific of apples vs cored apples here?

@marijnh
marijnh added a note

I mean that featuring "3x faster than UglifyJS" prominently on the website, when that speed difference actually disappears when you store the same information as UglifyJS does, is somewhat misleading.

@mishoo Owner
mishoo added a note

@marijnh, that statement is about UglifyJS v1, which did not quite store location information in the AST, although the tokenizer did keep it in the tokens. I'm fine with it, it's maybe not 3x faster in the general case but it's really fast indeed when location info is not requested.

As for v2, indeed, for a fair benchmark I think Esprima should enable location/range info.

@ariya
ariya added a note

UglifyJS v1 tokenizer always has the location info but the parser does not output the info in the syntax tree. Esprima 1.0 does exactly the same thing. So I think it's apple to apple. It is not misleading. This is the "3x faster" comparison (try it yourself) mentioned on the website.

UglifyJS v2 always output the location info in the parse output. When such option is enabled in Esprima, it's then an apple to apple comparison. The result is however never advertised anywhere on the website. Once Esprima 1.1 (which handles location info much better) is ready, the comparison will be made available.

@mishoo Owner
mishoo added a note

BTW Esprima does some micro-optimizations that I don't quite like (I mean the code looks funny) such as the switch here. @ariya, did you benchmark, does checking the length first really save enough time (compared to just looking up the keyword in an object with .hasOwnProperty) to justify the code ugliness?

I could do various similar hacks I guess, but for now I've more important stuff to work on. I have a new laptop, UglifyJS just got faster. :-p (well, and so did Esprima ;-)

@ariya
ariya added a note

Ugliness is a matter of taste. As for the optimizations, I have blogged about them. For the switch, it's http://ariya.ofilabs.com/2012/08/determining-objects-in-a-set-examples-in-javascript.html.

@marijnh
marijnh added a note

So did you find that object lookup was slower, or is "because of the abuse" your reason not to use an object there?

@ariya
ariya added a note

It was slower.

@marijnh
marijnh added a note

@ariya: Awesome. See http://marijnhaverbeke.nl/blog/acorn.html for a new contender.

@ariya
ariya added a note

@marijnh Acorn looks really nice. Well done!

@mishoo Owner
mishoo added a note

“But I wasn't about to write out all these boring predicates myself, so I defined a function that, given a list of words, builds up the text for such a predicate automatically, and then evals it to produce a function.” — hehe, that's what I was thinking too... No way I'm gonna macro-expand manually. :-)

Nice job!

@marijnh
marijnh added a note

@mishoo It would be easy to make Acorn configurable on the point of the actual values it uses to tag the AST types. You could then pass it, instead of the default set of strings, a set of objects in the style of {name: "Identifier", isLVal: true, visit: function(){/*etc*/}}. Would you be at all inclined to use it in UglifyJS, or are you happy with your existing parser?

@mishoo Owner
mishoo added a note

@marijnh I could make UglifyJS use the Mozilla AST relatively easy with some glue code that takes a Mozilla AST and creates an UglifyJS2 AST, if that's what you're asking. I have to do some benchmarks but my rough guess is that time to transform the AST should be insignificant.

(to make it work on a Mozilla AST directly would be quite tedious; the mangler, compressor and code generator expect nodes to be instanceof something, and there's some hierarchy that it relies on.)

@marijnh
marijnh added a note

@mishoo The intent of the isLVal flag in the proposal above was that you'd do node.type.isLVal instead of node instanceof LVal (I'm not sure if LVal is actually a thing in your hierarchy, but you get the idea).

@mishoo Owner
mishoo added a note

@marijnh That would work for replacing the instanceof tests (though they are many... quite a big refactoring there), but it won't help much in picking the right method when you have i.e. node.print(stream). Most of my code relies on the standard JS prototype inheritance... Yeah, in light of that, I should say I'm pretty happy with the existing parser. I don't exclude having some conversion to/from Mozilla AST in the future though, that should be trivial to write.

@marijnh
marijnh added a note

Fair enough!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+### The code generator, V2 vs. V1
+
+The code generator in v1 is a big function that takes a node and applies
+various walkers on it in order to generate code. The code was _returned_
+from each walker function, and finally assembled into a big string by
+concatenation or array.join, and further returned. It is impossible there
+to know what's the current line/column of the output, which would be
+necessary for source maps. For the same reason, v1 required an additional
+step to split very long lines (that includes an additional run of the
+tokenizer). It's _slow_.
+
+The rules for inserting parentheses in v1 are an unholy mess; we know at
+least [one case](https://github.com/mishoo/UglifyJS/issues/368) where it
+inserts unnecessary parens (non-trivial to fix), and I just discovered one
+case where it generates invalid code—UglifyJS can properly parse the
+following (valid) statement:
+
+ for (var a = ("foo" in bar), i = 0; i < 5; ++i);
+
+however, the code generator in version 1 will break it by not including the
+parens (the `in` operator is not allowed in a `for` initializer, unless it's
+parenthesized).
+
+The codegen in V2 is a thing of beauty. Since I now use objects for AST
+nodes, I defined a "print" method on each object type. This method takes an
+object (an OutputStream) and instead of returning the source code for the
+node, it prints it in the output stream. The stream object keeps track of
+current line/colum in the output and provides helper functions to insert
+semicolons, to indent etc. The code is somewhat bigger than the `gen_code`
+in v1, but it's much easier to understand, it's faster and does not require
+an additional pass for splitting long lines. Also the rules for inserting
+parens are nicely separated from the `print` method definitions.
+
+### More aggressive compressing
+
+As I
+[blogged](http://lisperator.net/blog/javascript-minification-is-it-worth-it/)
+a few days ago, it seems to me that the squeezer works really hard for not
+too much benefit. On my test file, passing `--no-squeeze` to UglifyJS v1
+adds only 500 bytes after `gzip`, that is 0.68% of the gzipped file size;
+every byte counts, but to be frank, that's not a very big deal either.
+
+Beyond doing what V1 does, I'd like to make it smarter in certain
+situations, for example:
+
+ function foo() {
+ var something = compute_something();
+ var something_else = compute_something_else(something);
+ return something_else;
+ }
+
+I sometimes write this kind of code because it's cleaner, it nests less and
+it avoids the need to add explanatory comments. It could _safely_ compress
+into:
+
+ function foo() {
+ return compute_something_else(compute_something());
+ }
+
+which makes it a single statement (further compressable into sequences and
+allowing to drop brackets in other cases) and it avoids the `var`
+declarations. That's one tricky optimization to do in V1, but I feel with
+the new architecture is doable, at least for the simple cases.
+
+Currently the compressor in V2 is far from complete (where by “complete” I
+mean as good as V1), and I'll actually put it on hold to add support for
+generating source maps first. However the mangler is complete (seems to be
+working properly) as well as the code generator, so V2 is already usable for
+achieving pretty good compression.
+
+### Better regression test suite
+
+The existing test suite in UglifyJS v1 has been contributed (thanks!).
+Unfortunately it's not great because it employs all the compression
+techniques in each test. Eventually I'd like to port all existing tests to
+v2, but for now I started it from scratch.
+
+Tests broke many times for no good reason as I added new features; for
+example the feature that transforms consecutive simple statements into
+sequences:
+
+ INPUT → function f(){ if (x) { foo(); bar(); baz(); }}
+ OUTPUT → function f(){ x && foo(), bar(), baz() }
+
+It's an useful technique; without meshing consecutive statements into an
+`AST_Seq` we would have to keep the `if` and the brackets.
+
+Having a test only for this feature is fine; but if the feature is applied
+to all tests, then tests where the “expected” file contains consecutive
+statements will break, although the output is perfectly fine.
+
+In v2 I started a new test suite (I actually took the “test driven
+development” approach: I'm progressing on both compressor and test suite at
+once; for each new compressor option I add a test case). Tests look like
+this:
+
+ keep_debugger: {
+ options = {
+ drop_debugger: false
+ };
+ input: {
+ debugger;
+ }
+ expect: {
+ debugger;
+ }
+ }
+
+ drop_debugger: {
+ options = {
+ drop_debugger: true
+ };
+ input: {
+ debugger;
+ if (foo) debugger;
+ }
+ expect: {
+ if (foo);
+ }
+ }
+
+That might look funny, but it's syntactically valid JS. A test file
+consists of a sequence of labeled block statements. Each label names a test
+in that file. In each block you can assign to the `options` variable to
+override compressor options (for the purpose of running the tests, all
+compression options are turned off, so you just enable the stuff you test).
+Then you include two other labeled statements: `input` and `expect`. The
+compressor test suite simply parses these statements to get two AST-s. It
+applies the compressor on the `input` AST, then the `codegen` on the
+compressed AST. It applies the `codegen` to the `expect` AST (without
+compressing it). Then it compares the results and if they match, the test
+passes.
+
+I expect this model to give a lot less false negatives, and it would work
+quite well for the name mangling too (no tests for that yet).
+
+For the code generator we'll need something more fine-tuned, since we care
+exactly how the output is going to look like. I don't yet have any plans
+about code generator tests.
+
+
+Play with it
+------------
+
+We don't yet have a nice command line utility, but there's a test script for
+NodeJS in tmp/test-node.js. To play with UglifyJS v2 just clone the
+repository anywhere you like and run `tmp/test-node.js script.js` (script.js
+being the script that you'd like to compress). Take a look at the source of
+`test-node.js` to see how the API looks like, to enable/disable steps or
+compressor options.
+
+To run the existing tests, run `test/run-tests.js`
+
+
+Status of UglifyJS v1
+---------------------
+
+We didn't have any significant new features in the last few months; most
+commits are about bug fixes. I plan to continue to fix show-stopper bugs in
+v1 for a while, depending on how time permits, but there won't be any new
+development.
+
+
+Help me complete the new version
+--------------------------------
+
+I've put a lot of energy already into this project and I think it comes out
+nicely. It's based on all my previous experience from working on version 1
+and I'm working carefully, trying not to introduce bugs that were already
+fixed, trying to keep it fast and clean. If you'd like to help me dedicate
+more time to it, please consider making a donation!
+
+<a href='http://www.pledgie.com/campaigns/18110'><img alt='Click here to
+lend your support to: Funding development of UglifyJS 2.0 and make a
+donation at www.pledgie.com !'
+src='http://www.pledgie.com/campaigns/18110.png?skin_name=chrome' border='0'
+/></a>
Please sign in to comment.
Something went wrong with that request. Please try again.