Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: ESLint is slow #574

Closed
nzakas opened this issue Feb 1, 2014 · 35 comments

Comments

Projects
None yet
7 participants
@nzakas
Copy link
Member

commented Feb 1, 2014

I'd like to get some better information about why ESLint is so much slower than JSHint.

I added a perf test to the build, that can be run like this:

node Makefile.js perf

It runs ESLint over jshint.js. Currently it takes over 3 seconds for this test to complete. When I run JSHint on itself, it takes less than 0.5 seconds, meaning that we are six times as slow. JSCS takes just about the same as JSHint.

My hunch as to why we're six times slower is that we're doing multiple traversals of the AST:

  1. To attach comments
  2. To determine token ranges
  3. For escope
  4. The primary traversal that applies our rules

We need to start figuring this out.

@nzakas

This comment has been minimized.

Copy link
Member Author

commented Feb 1, 2014

I ran ESLint with the V8 profiler. The results are here:
https://gist.github.com/nzakas/8757103

@nzakas

This comment has been minimized.

Copy link
Member Author

commented Feb 2, 2014

Some other timings I captured:

  • Esprima parse time: 1147ms
  • escope Evaluation Time: 78ms
  • ESLint Traversal: 1629ms
    • Attach Comments: 26ms
@ilyavolodin

This comment has been minimized.

Copy link
Member

commented Feb 2, 2014

Another profiler results: https://docs.google.com/spreadsheet/ccc?key=0At6DZUsxLY1TdDlzaDlQVGg4R19NTDhvcEV1eHlOYWc&usp=sharing#gid=0

From the looks of it, both of them are saying that the top offender that we have control over is getTokens

@nzakas

This comment has been minimized.

Copy link
Member Author

commented Feb 2, 2014

Yup. Ironic because that's exactly the pull request I made to Esprima:
https://github.com/ariya/esprima/pull/192

I'm going to follow up and see what it will take to get that change in.

@ilyavolodin

This comment has been minimized.

Copy link
Member

commented Feb 2, 2014

Unfortunately, I don't think this will improve performance significantly. I think we will have to go rule-by-rule to clear out all of the bottlenecks. I would suggest running eslint on jshint code and turning rules off one by one, to see which once to focus on first (no-unused-vars would be number 1 in that list, I know that much).

@nzakas

This comment has been minimized.

Copy link
Member Author

commented Feb 2, 2014

If getTokens() were reduced to a simple array lookup, that wouldn't help?
If that's truly where we're spending a lot of time, it seems like that
would be useful.

And not disagreeing with going rule-by-rule in any case.

@ilyavolodin

This comment has been minimized.

Copy link
Member

commented Feb 2, 2014

It will help, it's just not going to get us down to 6x of where we are now. Based on profiling results, while getTokens is on the top of the list, it's still not accounting for much.

@nzakas

This comment has been minimized.

Copy link
Member Author

commented Feb 5, 2014

Just sharing this page of resources for optimizing V8 code:
http://mrale.ph/blog/2011/12/18/v8-optimization-checklist.html

@nzakas

This comment has been minimized.

Copy link
Member Author

commented Feb 6, 2014

Top places where time is spent in ESLint

perf

The first couple are in Esprima, then GC, then findVariable() (which is the subject of #592), and then getTokens().

nzakas added a commit that referenced this issue Feb 6, 2014

@nzakas

This comment has been minimized.

Copy link
Member Author

commented Feb 6, 2014

I should not that chart is for the large profile. Definitely no-unused-vars needs some attention, specifically findVariable(). If we can get the changes to Esprima for token ranges then hopefully we can knock out getTokens() as well. I think that getScope() is being affected primarily by no-unused-vars.

nzakas added a commit that referenced this issue Feb 7, 2014

Merge pull request #592 from btmills/faster-find-variable
WIP: Optimize findVariable() in no-unused-vars (refs #574)
@nzakas

This comment has been minimized.

Copy link
Member Author

commented Feb 7, 2014

@btmills did some great work to chop off around 400-600ms from the time of findVariable(). It's still one of the top offenders, so it would be great to keep hacking away at it.

@mcloide

This comment has been minimized.

Copy link

commented Feb 7, 2014

This can help to fix it. It is an idea and really need to be placed on code and tested to be sure but the concept is to make it a bit more DRY and only continue if you really need to continue. In another words it should exit if scopeVariable.length is 0.

https://gist.github.com/mcloide/e5f630901e1be4b9b128

@mcloide

This comment has been minimized.

Copy link

commented Feb 7, 2014

Just some notes about can be the issue:

  • the function doesn't check if there isn't nothing to check so it goes a whole interaction before exiting
  • while loops can be considerable slower than for or foreach loops and in the current structure there can be situations where that loop never exits
  • not much found of the function inside the function. It seems as a design flaw and even knowing that it is simple enough, I would probably move that check to the appropriate place and remove it from the function unless it is necessary to have it as a function and in that case, consider a class instead.
  • you are using the scopeVariable as a magic variable, reconsider

The gist above, if modified correctly, can improve the performance, but I would still re-check the function to see if there is something that could have been done differently.

@nzakas

This comment has been minimized.

Copy link
Member Author

commented Feb 7, 2014

Thanks. I'm curious, why would while loops be slower than other types of loops?

@mcloide

This comment has been minimized.

Copy link

commented Feb 7, 2014

Because they are widely wrongly used. A while loop should be a simple:

  • loop until a given condition is true then exit

And most of the times it is wrongly used where the condition can repeat several times. Using a for loop you enforce control and using a foreach loop you enforce control again because it will only loop through the elements of that array.

It is true that all loops can fall under the situation where improperly programmed they can fall under a endless loop, I just see while being widely misused and more prune to have it happen most of the times.

@ilyavolodin

This comment has been minimized.

Copy link
Member

commented Feb 7, 2014

My information might be very outdated (I haven't really don't serious performance optimization for JS in a long while), by as far as I recall, decrementing while loops were always considered fastest way of going through an array, outperforming for loops and especially forEach loops.

@nzakas

This comment has been minimized.

Copy link
Member Author

commented Feb 8, 2014

I disproved the decrementing loops theory in High Performance JavaScript. :) It has more to do with how you setup your loop than the fact you're going in reverse.

@chrisdickinson

This comment has been minimized.

Copy link
Contributor

commented Feb 8, 2014

To weigh in briefly: any kind of built-in loop construct will likely be faster than _.each, [].forEach, or any API that takes a function that references a variable in an enclosing scope -- those functions often cannot be inlined into loops. This goes doubly so for situations like the following:

function calledOften(arr) {
  var by = 2

  return arr.filter(function inner(item) {
    return item % by === 0
  })
}

The inner function will not share optimizations across multiple calls of calledOften. It cannot be inlined.

function calledOften(arr) {
  return arr.filter(function inner(item) {
    return item % 2 === 0
  })
}

As I understand it, inner may be inlined in this version: there are no references to outer scopes. Make sure that hot path functions do not themselves contain function expressions / statements to ensure that those inner functions get optimized once, and (more importantly) that those inner functions are able to be inlined -- the overhead of function calls is pretty big.

function calledOften(arr) {
  return arr.filter(filterItems)
}

function filterItems(item) {
  return item % 2 === 0
}

Now filterItems will be optimized once and all calls to calledOften will share the optimized filterItems.

@ariya

This comment has been minimized.

Copy link

commented Mar 24, 2014

Isn't api.getTokens bound to be expensive 'cause it always needs to create a new array of tokens? In some cases, it's a lot of tokens which must be processed.

Applying the following:

diff --git a/lib/eslint.js b/lib/eslint.js
index ae42626..8bd3e43 100755
--- a/lib/eslint.js
+++ b/lib/eslint.js
@@ -560,6 +560,10 @@ module.exports = (function() {
                 ++cursor;
             }
         }
+        if (tokens.length > 1000) {
+            console.log("getTokens returns", tokens.length + beforeTokens.length + afterTokens.length,
+            "tokens for", node.type);
+        }
         return beforeTokens.concat(tokens, afterTokens);
     };

and running node Makefile.js perf gives the list of calls with more than a thousand tokens:

getTokens returns 60773 tokens for ExpressionStatement
getTokens returns 60770 tokens for CallExpression
getTokens returns 60750 tokens for ExpressionStatement
getTokens returns 60748 tokens for CallExpression
getTokens returns 60541 tokens for ObjectExpression
getTokens returns 1062 tokens for ArrayExpression
getTokens returns 1056 tokens for FunctionExpression
getTokens returns 1044 tokens for ExpressionStatement
getTokens returns 1044 tokens for CallExpression
getTokens returns 2208 tokens for ArrayExpression
getTokens returns 2205 tokens for FunctionExpression
getTokens returns 2193 tokens for ExpressionStatement
getTokens returns 2193 tokens for CallExpression
getTokens returns 1198 tokens for ExpressionStatement
getTokens returns 1193 tokens for ObjectExpression
getTokens returns 27672 tokens for ArrayExpression
getTokens returns 27634 tokens for FunctionExpression
getTokens returns 27622 tokens for ExpressionStatement
getTokens returns 27622 tokens for CallExpression
getTokens returns 27520 tokens for VariableDeclaration
getTokens returns 27514 tokens for CallExpression
getTokens returns 1158 tokens for ExpressionStatement
getTokens returns 1155 tokens for CallExpression
getTokens returns 1122 tokens for ExpressionStatement
getTokens returns 1119 tokens for FunctionExpression
getTokens returns 2109 tokens for VariableDeclaration
getTokens returns 2107 tokens for FunctionExpression
getTokens returns 7850 tokens for ArrayExpression
getTokens returns 7832 tokens for FunctionExpression
getTokens returns 7820 tokens for ExpressionStatement
getTokens returns 7820 tokens for CallExpression
getTokens returns 1777 tokens for VariableDeclaration
getTokens returns 1773 tokens for ArrayExpression
getTokens returns 5630 tokens for ExpressionStatement
getTokens returns 5625 tokens for ObjectExpression
getTokens returns 7181 tokens for ArrayExpression
getTokens returns 7178 tokens for FunctionExpression
getTokens returns 7166 tokens for ExpressionStatement
getTokens returns 7166 tokens for CallExpression
getTokens returns 7157 tokens for ExpressionStatement
getTokens returns 7156 tokens for CallExpression
getTokens returns 7151 tokens for FunctionExpression
getTokens returns 2121 tokens for ArrayExpression
getTokens returns 2115 tokens for FunctionExpression
getTokens returns 1230 tokens for ExpressionStatement
getTokens returns 1227 tokens for FunctionExpression
getTokens returns 1479 tokens for ArrayExpression
getTokens returns 1469 tokens for FunctionExpression
getTokens returns 1457 tokens for ExpressionStatement
getTokens returns 1457 tokens for CallExpression
getTokens returns 6997 tokens for ArrayExpression
getTokens returns 6983 tokens for FunctionExpression
getTokens returns 6971 tokens for ExpressionStatement
getTokens returns 6971 tokens for CallExpression

Would it make sense to separate the process of annotating every syntax node with its associated token range in a separate function? That way, hopefully we can see the cost difference if getTokens already got that information and it just needs to produce that array.

@nzakas

This comment has been minimized.

Copy link
Member Author

commented Mar 24, 2014

That's a good point, we are probably redoing work at various points by not precalculating the token range. I was hoping my token range feature would land in Esprima so we wouldn't have to do this with yet another traversal, but maybe it's time to bite the bullet and do it.

@michaelficarra

This comment has been minimized.

Copy link
Member

commented Mar 24, 2014

The API for getTokens could probably use a redesign. Right now you ask for the tokens for a node including n tokens before the node and m tokens after. This is often misused in rules that are just looking for the preceding n or following m tokens. If we expose that functionality individually, we can probably eliminate most of these requests for huge portions of the token stream.

@nzakas

This comment has been minimized.

Copy link
Member Author

commented Mar 24, 2014

Very good point, as well.

@ariya

This comment has been minimized.

Copy link

commented Mar 27, 2014

This is slightly improved with the latest Esprima 1.1 (running node Makefile.js perf).

Before:

Took 2947843.983ms

After:

Took 1968021.825ms
@nzakas

This comment has been minimized.

Copy link
Member Author

commented Mar 27, 2014

Nice! Will update the version today.

@ariya

This comment has been minimized.

Copy link

commented Mar 27, 2014

@nzakas Already updated, since package.json doesn't pin Esprima version :-)

@michaelficarra

This comment has been minimized.

Copy link
Member

commented Mar 27, 2014

Which is something that should probably be fixed. Maybe pin to 1.x? I'll submit a PR.

@nzakas

This comment has been minimized.

Copy link
Member Author

commented Mar 27, 2014

I updated to ~1.1.1 for now in package.json.

@nzakas

This comment has been minimized.

Copy link
Member Author

commented Mar 29, 2014

The results of the latest profile run with the Esprima upgrade

large: 2147.000ms
medium: 896.000ms
small: 633.000ms

As a comparison, here was @chrisdickinson's first run:

large: 5546.956ms
medium: 2649.375ms
small: 1389.801ms

So with the Esprima upgrade and no-unused-vars fixes, we've managed to cut most of these times in half (some, moreso). Awesome work everyone!

@jwmerrill

This comment has been minimized.

Copy link

commented Apr 2, 2014

Have you evaluated acorn as an alternative to esprima for parsing? It targets the same AST format as esprima, and is significantly faster than esprima according to esprima's speed comparisons. Acorn has especially large performance advantages when tracking location data, as eslint must.

I'm not sure how deeply embedded esprima is in this system, but if you just use the AST datastructure that parse spits out, then acorn might be a drop in replacement.

@jwmerrill

This comment has been minimized.

Copy link

commented Apr 2, 2014

Here's another benchmark where you can see the effect of including location data or not: http://marijnhaverbeke.nl/acorn/test/bench.html

@ariya

This comment has been minimized.

Copy link

commented Apr 3, 2014

@jwmerrill That benchmark site is using outdated Esprima. In addition, the current bottleneck is not in the parser. A difference of < 50 ms between parser won't matter much.

@nzakas

This comment has been minimized.

Copy link
Member Author

commented Apr 3, 2014

We've considered other AST generators, Acorn included. The deciding factor
on Esprima was it's continuing development and roadmap as part of the JS
tooling ecosystem. Plus, the performance improved greatly with the latest
release, so we're pretty happy with it.

@michaelficarra

This comment has been minimized.

Copy link
Member

commented Apr 3, 2014

@jwmerrill: Running using the latest acorn and esprima, I got these results:

Acorn      438603 lines per second  100%
Esprima    500865 lines per second  114%
UglifyJS2   90277 lines per second   21%
@jwmerrill

This comment has been minimized.

Copy link

commented Apr 3, 2014

Awesome. Great work Esprima team :-)

@nzakas

This comment has been minimized.

Copy link
Member Author

commented Apr 17, 2014

Closing this out as we have made some great improvements.

@nzakas nzakas closed this Apr 17, 2014

@eslint eslint bot locked and limited conversation to collaborators Feb 7, 2018

@eslint eslint bot added the archived due to age label Feb 7, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.