Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid lazy parsing by adding parens around factory #774

Merged
merged 1 commit into from
Aug 14, 2016
Merged

Avoid lazy parsing by adding parens around factory #774

merged 1 commit into from
Aug 14, 2016

Conversation

bmaurer
Copy link
Contributor

@bmaurer bmaurer commented Jul 8, 2016

Most javascript engines do a lazy parsing optimization. For example if they see:

function X() {...}

They won't parse the contents of X until it is executed. However they often have to read the contents of X so that they can see if X uses any variables from the outer scope. Thus X ends up getting parsed twice.

Realizing that often one immediately executes a function, browser have optimized statements of the form:

(function X() {...})()

So that X is not parsed lazily. Traditionally this optimization looks for an open paren before the function. For example, here's the optimization in v8:

https://github.com/v8/v8/blob/203391bcc0e970d3cae27ba99afd337c1c5f6d42/src/parsing/parser.cc#L4256

Since the factory is immediately executed (at least in the commonJS and global export case), this change avoids the double parsing that comes with parsing the function lazily.

Most javascript engines do a lazy parsing optimization. For example if they see:

function X() {...}

They won't parse the contents of X until it is executed. However they often have to read the contents of X so that they can see if X uses any variables from the outer scope. Thus X ends up getting parsed twice.

Realizing that often one immediately executes a function, browser have optimized statements of the form:

(function X() {...})()

So that X is not parsed lazily. Traditionally this optimization looks for an open paren before the function. For example, here's the optimization in v8:

https://github.com/v8/v8/blob/203391bcc0e970d3cae27ba99afd337c1c5f6d42/src/parsing/parser.cc#L4256

Since the factory is immediately executed (at least in the commonJS and global export case), this change avoids the double parsing that comes with parsing the function lazily.
@kangax
Copy link

kangax commented Jul 8, 2016

OT: I wonder why they (V8) don't also check for other common !function (airbnb/javascript#44 (comment)) and +function (https://github.com/twbs/bootstrap/blob/master/js/affix.js#L10) "patterns"

@getify
Copy link

getify commented Jul 8, 2016

May not affect this case, but the discussed technique changes the meaning of the code significantly... the 'X' in the former case is added to enclosing scope, but is not added in the latter case.

Looking at the commit, you're talking about a different case here, where the function is already an expression. Are you sure the engine doesn't already parse function expressions without adding the () around them.

Do you have any benchmark for this?

@getify
Copy link

getify commented Jul 8, 2016

Moreover, why does v8 lazy parse any function if it's gonna cause a double parse. The purpose of lazy parse is to save on startup time, but if they're ever doing the parse work twice then they're sure losing overall perf, right?

Is there some measurement that found such functions are much less commonly encountered? I use function decls in all of my code. Probably happens a lot I would think.

@bmaurer
Copy link
Contributor Author

bmaurer commented Jul 8, 2016

So here's a demo of the difference, using about:tracing

Existing method

function immutable() {
  (function (global, factory) {
  typeof exports === 'object' && typeof module !== 'undefined' ? module.exports = factory() :
  typeof define === 'function' && define.amd ? define(factory) :
  (global.Immutable = factory());
}(this, function () { ... });
}

http://output.jsbin.com/tofixojuya
image

New version

function immutable() {
  (function (global, factory) {
  typeof exports === 'object' && typeof module !== 'undefined' ? module.exports = factory() :
  typeof define === 'function' && define.amd ? define(factory) :
  (global.Immutable = factory());
}(this, (function () { ... }));
}

http://output.jsbin.com/wapepakoge/1
image

Notice how in the old version there were two lazy parsing blocks, each of which took about 5 ms.

The idea of lazy parsing is the first parse does slightly less work. The v8 team is testing ideas to allow them to remove the optimization, but for now this approach is likely to be a win in major JS engines.

@pvdz
Copy link

pvdz commented Jul 8, 2016

How much js is being parsed here? I have a hard time believing chrome takes time in order of ms to just parse scripts. I can parse scripts from within js faster than that. So unless we're talking megabytes of code (and even then) can you bench some real world code and show that this change actually makes a difference?

Because you're adding code so that takes longer to parse so you're only making it worse... ;)

No but seriously, you're adding "dead code" to guard an edge case for heuristic that could just as well be removed or changed tomorrow. I'm just dipping my nose into something that doesn't affect me at all so take that for whatever you think it's worth :)

Bottom line; I don't think the change here is worth the expected result on various levels.

@bmaurer
Copy link
Contributor Author

bmaurer commented Jul 8, 2016

Parsing takes a long time. Not only does it have to be parsed but some basical lvel of symbol resolution and syntax validation needs to happen. In this case immutable (which is a commonly used library) took ~5 ms to parse. It's an unminified version, but the minified version isn't that much faster.

In our discussions with the v8 team it's not uncommon to see websites spend 20-30% of their time parsing. If they used a large amount of code with UMD, the double parsing caused by the extra function UMD adds could be a substantial contribution

I realize it seems really silly to add an extra paren to trigger behavior in the parser. But this kind of stuff adds up

@kangax
Copy link

kangax commented Jul 8, 2016

I have a hard time believing chrome takes time in order of ms to just parse scripts.

It sounds crazy (and I was skeptical too) but it actually does turn out to be significant when you have a shitload of code. For example, on FB homepage we could spend as much as 370ms just on parsing all of the JS. Delayed loading/execution aside, that's pretty nuts.

@Rich-Harris
Copy link
Contributor

Thanks all, this is fascinating. Personally I can't see any good reason not to do this – it seems silly to forgo a potentially significant optimisation for the sake of two bytes. Nothing is set in stone and we can always change it later if engines change their behaviour. I do wonder about minifiers though – do they respect this pattern?

Obviously the tests will need to be fixed before we can merge this 😀

@chadhietala
Copy link
Contributor

Can confirm this has a decent sized win. Here is another thread that talks about exploiting this heuristic.

@getify
Copy link

getify commented Jul 8, 2016

I can't see any good reason not to do this

I can think of a couple.

Years ago everyone was sure that caching the length of an array at for-loop init time was better perf than checking the length each iteration. Countless blog posts and books were written to spread that new cult knowledge.

Then, a funny thing happened... engines started caching the length for you. So guess what? Now your once faster code is slower because you bet against the future of the browser.

Sure you think you'll just change the tool if something like that happens again, and all will be better. Except there will be tens of thousands of files that will never get reprocessed and will "forever" stay slower.

As a tool author, I think you have an even greater responsibility than most devs, to make sure not to do things that look like quick wins which lose in the long run.

Just some cents to rattle around. Sorry for barging in. :)

@krisselden
Copy link
Contributor

uglify on its default settings will remove this

@krisselden
Copy link
Contributor

@bmaurer fyi mishoo/UglifyJS#886

@kangax
Copy link

kangax commented Jul 9, 2016

So uglify at least has an option to turn this off, good.

I don't know what the right solution is here and it's up to authors and community to decide. It's just something we (FB) came across due to the sheer size of our code and we thought it would make sense to suggest optimization to a commonly-used bundler like rollup, not just make local changes.

fwiw, this all is even more noticeable on mobile. React Native engineers found that up to 30% could be spent on parsing.

It's definitely one of those engine-specific optimizations that won't live long. But it might be worth the "hassle" for the time being.

@Rich-Harris
Copy link
Contributor

Sorry for barging in. :)

Not at all! Appreciate the input, and you're totally right that toolmakers have an extra responsibility here.

I'm just trying to figure out what the worst case scenario is here – it's hard to imagine that any future engine changes would mean that parenthesized function expressions were penalised to anything like the degree that unparenthesized ones currently are, if they're being parsed twice? To use the array.length analogy, surely the penalty for caching the length is miniscule compared to the initial savings from not doing a property lookup on each iteration, and so it was still basically the right thing to do at the time, no?

To be honest I'm speaking here from a position of ignorance as to what JS engines actually do under the hood – am relying on all of your advice 😀

@sebmarkbage
Copy link

There is one significant difference between this and the .length case. You can know something about how your code will be used that the browser can't. The reason this heuristic exists is because in bundles, most code isn't immediately used so lazy parsing makes sense. However, the browser can't know that. For example, at FB we also found, that adding it to individually wrapped modules when those modules are lazily evaluated would regress performance. If few of them are immediately needed.

This is a tradeoff, however, in the rollup case you know which one of these scenarios this will be, right?

@Rich-Harris
Copy link
Contributor

This is a tradeoff, however, in the rollup case you know which one of these scenarios this will be, right?

Yeah, in the case of a UMD block we know that the function will be called immediately (unless we're in AMD-land in which case it'll be called as soon as any dependencies have been loaded, but that basically amounts to the same thing – nothing to be gained by being lazy there).

@getify
Copy link

getify commented Jul 9, 2016

Just for consideration:

What bothers me about this mindset is essentially it's trying to set a precedent that any time we know that there's a function that will run "soon" but isn't a traditional IIFE, it encourages us to always have to put these otherwise-totally-unnecessary parentheses around it. Imagine all the promise callbacks that we're going to have to wrap.

This hack is as silly IMO as the CSS transform: translatez(0); hack to "force" GPU acceleration. I know a lot of people felt they needed to do it for pragmatic reasons, but it just created a whole mess of propagated bad-practice. Those lines will be polluting CSS files for decades.

@pvdz
Copy link

pvdz commented Jul 9, 2016

It sounds crazy (and I was skeptical too) but it actually does turn out to be significant when you have a shitload of code.

@kangax No, it kind of makes sense for megabytes of code, which I presume you're talking about. And that probably ends up in a single function with a tool like this. You're still coding against a heuristic that may be replaced with something else at any time.

I hope this won't lead to this in five years.

@getify
Copy link

getify commented Jul 9, 2016

To clarify, I understand if facebook or whatever does these hacks in specific cases... what bothers me is a popular tool setting a precedent that a lot of other people will be inclined to just blindly follow.

@jdalton
Copy link
Contributor

jdalton commented Jul 9, 2016

Pinging @pleath of Chakra and @bzbarsky of SpiderMonkey for insights into this from their engine's perspectives.

The gist is covered here. V8 performs a double lazy-parse because it doesn't know the callback is being executed immediately. So because of their heuristic, wrapping the callback in parens, changing
foo(a, function(){/* lots of code */}) to
foo(a, (function(){/* lots of code */})),
avoids the double lazy-parse.

@pvdz
Copy link

pvdz commented Jul 10, 2016

Btw another solution is to eliminate that, what I've always found a silly and unclear, factory pattern where you pass on the business logic in a pseudo iife as argument and just put that logic inside that intro iife main body.

In fact, now that I look at that, obviously the factory function is only called conditionally so it seems your perf improvement is in fact a degradation to some cases. But perhaps that's only affects test case situations or whatever.

@bzbarsky
Copy link

@jdalton I'm not really a SpiderMonkey developer, and certainly haven't worked on the frontend (parser) side of it. You probably want @jswalden or @syg.

That said, I'm 99% sure that the IIFE optimizations in Spidermonkey don't depend on heuristic sniffing of the actual bytes in the byte stream. So chances are, you're basically adding these contortions to work around what is fundamentally a V8 bug...

@jswalden
Copy link

SpiderMonkey's predict-invoked heuristic applies when the function is inside parentheses or immediately follows new. There's no dependence on actual bytes in the byte stream.

@jswalden
Copy link

...other than whether the bytes encode an open-parenthesis or new, at least. (Which I assumed was obvious, but maybe wasn't.)

@bzbarsky
Copy link

Hey, I left myself 1% wiggle room. ;)

@pleath
Copy link

pleath commented Jul 28, 2016

I think there was a question here about tokens that engines use as immediate-execution hints if they immediately precede a function expression (with the result that lazy/deferred parsing is suppressed). Chakra treats “(“ this way, as well as all the unary operators.

-- Paul

From: Boris Zbarsky [mailto:notifications@github.com]
Sent: Tuesday, July 12, 2016 1:10 PM
To: rollup/rollup rollup@noreply.github.com
Cc: Paul Leathers pleath@microsoft.com; Mention mention@noreply.github.com
Subject: Re: [rollup/rollup] Avoid lazy parsing by adding parens around factory (#774)

Hey, I left myself 1% wiggle room. ;)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com//pull/774#issuecomment-232164578, or mute the threadhttps://github.com/notifications/unsubscribe/APF8RKwEuvao6GQQXCVtX3PdLnd5vU0hks5qU_SjgaJpZM4JHrtQ.

@jdalton
Copy link
Contributor

jdalton commented Jul 29, 2016

For engine folks wanting to dig in it's the UMD factory pattern (also seen in this PR).

@ghost
Copy link

ghost commented Jul 29, 2016

@jdalton I see you mention v8. In that case I just mention quickly there is a lot of things in the Rollup source code gives DEOPT with latest Chrome versions now. E.g. use of instanceOf, assigning a value to an empty object without any reference to it, switch statement without default etc.

@jswalden
Copy link

jswalden commented Jul 29, 2016

@getify

I was assuming the heuristics they're talking about are during tokenization, prior to (full) parsing, since the whole point is avoiding parsing on some functions.

The heuristic is not applied during tokenization. It's applied during parsing after a ( in a PrimaryExpression, or new in a MemberExpression, or other unary operators apparently in non-SpiderMonkey engines, is consumed.

At that point subsequent code hasn't been parsed, and the next token may or may not have been tokenized. In a recursive-descent parser, the ideas is to invoke the function to parse the next subcomponent of the production passing in InvokedPrediction invoked = PredictInvoked, as opposed to the typical InvokedPrediction invoked = PredictUninvoked. That InvokedPrediction is passed down to grammar subproduction parsing. If it bottoms out in parsing a function, then that function is full-parsed instead of lazy-parsed. If it bottoms out in some other production/terminal, PredictInvoked is not passed down to any subproductions (e.g. (foo.bar[function() {}]) doesn't pass PredictInvoked to the subcomponent for the property specifier), and so they aren't eagerly full-parsed.

Hopefully that clarifies things. The ( in foo(function() {}) is not a ( in a PrimaryExpression and so is not subject to the heuristic. The ( in (function() {}) is a ( in a PrimaryExpression, so it is.

@kangax
Copy link

kangax commented Jul 29, 2016

@pleath it'd be great to have access to Edge tracing data like we have with Chrome. We could then plug it into our system and see exact difference between 2 versions (Mozilla's effort is here btw https://bugzilla.mozilla.org/show_bug.cgi?id=1250290).

have to rescan and reparse, since we haven't preserved the contents of the function in any form

Makes sense. In light of this, we had some ideas on transporting JS (via hypothetical format) in such way that function boundaries and scope info are marked in a way that it's easier to re-scan/re-parse and compile. There's nothing certain yet but I'd be curious to hear your thoughts.

@camillobruni
Copy link

V8 dev here.

Probably you are already familiar with --trace_lazy and --trace_parse flag with which you can easily detect the functions that are parsed lazily and the impact your parenthesis "fix" has on total compilation time.
V8 tends to be too aggressively lazy parse, which works well on websites that include many libraries.
For pages that ship only what's necessary V8 could perform a bit better by parsing more eagerly (you can observe this when using the --nolazy flag).

Depending on how you measure you will see that if we avoid lazy parsing that the total time spent in parsing (eager parsing + preparsing + lazy parsing) is lower since we spend no time in preparsing (which is 2-3 times the speed of normal parsing and is proportional to the lazy parsed functions).

That said, we are aware of the issue and will actively look into improving parsing and preparsing.

@Rich-Harris Rich-Harris merged commit a9c5850 into rollup:master Aug 14, 2016
@Rich-Harris
Copy link
Contributor

Have gone ahead and released this as 0.34.8, as a result of mrdoob/three.js#9310:

The library now takes 4x to evaluate (Chrome 52, MacBook Air) 😟

Next to a real-world slowdown of tens of milliseconds for a widely-used library, discussions over whether or not future JS engines may or may not change their behaviour seems a bit... ivory tower, especially when the downside to adding the parens is so negligible.

@KFlash this is a discussion about speeding up the code that Rollup generates – are you talking about performance gotchas in Rollup itself? Would welcome any insight into those in a separate issue, thanks

@getify
Copy link

getify commented Aug 14, 2016

If a recent "improvement" in chrome caused a 4x slowdown in a hugely popular library, that is a massive failure on chrome's part and they should roll that back.

Despite the patronizing "ivory tower" quip, I continue to dissent and assert this change is not only band-ading the real problem but setting bad precedent.

That strenuous objection registered, I'll drop the argument since it's clearly not going to get anywhere.

@Rich-Harris
Copy link
Contributor

that is a massive failure on chrome's part and they should roll that back

Agreed (with the caveat that they apparently view it as a performance win overall, and know a lot more about this than I do – as @camillobruni says they're aware of this issue) – and if they do we can revert this change.

Despite the patronizing "ivory tower" quip, I continue to dissent and assert this change is not only band-ading the real problem but setting bad precedent.

I'm sorry you took it that way, because I wasn't quipping – I was being deadly serious. We have users, who expect us to make the decisions that are best for them and their users – as a tool author, my responsibility is to them, not to the people who might cargo-cult this pattern in future. (As one of the most influential JavaScript educators, you have the opportunity to help make sure that doesn't happen.) It feels very wrong for us to say that the hundreds of thousands (more?) of users of Three.js apps (or any other library that uses Rollup) should put up with slower startup times because someone might one day include some useless-but-harmless parentheses after this gets fixed. If I'd understood the magnitude of the problem sooner this PR would have been merged long ago.

Band-aiding real problems is what libraries and tools have always done, because we can't fix the real problem – we can only bring it to people's attention and muddle through as best we can in the meantime.

@krisselden
Copy link
Contributor

Just a reminder to those following this thread that minifiers will alter tricks like this, and you have to retest after minify. Also you don't want to do this trick with all factories in a single page app since it helps with initial load for factories not used right away.

@samccone
Copy link
Contributor

Landing this PR was the correct imho move +1

Band-aiding real problems is what libraries and tools have always done, because we can't fix the real problem – we can only bring it to people's attention and muddle through as best we can in the meantime.

^ I mostly agree with this statement, however I think being aware of performance, measuring, and being aware of your runtime, is an essential part of any engineers job -- blindly trusting any runtime is going to lead to observational performance regressions.

but why should I optimize for a runtime? v8 / spidermonkey .. ect ect.. it should always be fast

This is not some one off instance of optimizing for runtimes (v8 or similar). As one example look at one of the canonical perf tuning docs from bluebird:
https://github.com/petkaantonov/bluebird/wiki/Optimization-killers

There is a reason why bluebird is so fast today, performance and perf tuning was a key part of the development process.

As to:

we can't fix the real problem

I am going to disagree here, It is pretty trivial to reach out to v8 devs and or people that can put you in touch with them, they do not live in some "ivory tower", they are software engineers that want to deliver the best product as possible. just like you or me.


How to "fix" the real problem

Tweeting into the void is not really a solution:
image

Step 1: make a test case
Step 2: file a bug crbug.org or share a testcase with me or @paulirish or @s3ththompson
Step 3: Let's understand what is going on and come up with a sound solution!

+1 to people caring about perf, +1 to measuring, +1 to making the web fast! 🐐

@Rich-Harris
Copy link
Contributor

@samccone 🐐 ❤️

Step 2: file a bug crbug.org or share a testcase with me or @paulirish or @s3ththompson

I took @camillobruni's comment to mean that this was already being tracked somewhere, but I just created a gist illustrating the problem here with live demo here. Hope it's useful!

@krisselden
Copy link
Contributor

@samccone @Rich-Harris this isn't a bug though in Chrome or the other browsers that lazy parse.

Lazy parsing generally saves time if you have a large amount of define() modules that aren't needed right away.

This is only a good thing to do for rollup modules that are eagerly required during load. This is a bad thing if this is rolling up a module not intended to be loaded right away.

I know you want the fastest default with the least thought and that is fair enough, but parsing everything up front may not be the fastest path for a large app.

@TheLarkInn
Copy link

@Rich-Harris @samccone @getify waaaaaaaaaaat.

Are you saying that we could be leveraging the same technique?
Just wrap the modules? 👀
https://github.com/webpack/webpack/tree/master/examples/commonjs#jsoutputjs

@TheLarkInn
Copy link

It appears that way, I'm going to get an issue in on our end.

But I think we can write a plugin for this. @bmaurer are there any other perceived tradeoffs?

@Rich-Harris im going to lurk the repo more often, otherwise if you see sweet things like this ping me <3 nice work guys.

@samccone
Copy link
Contributor

samccone commented Sep 8, 2016

@bmaurer are there any other perceived tradeoffs?


I know you want the fastest default with the least thought and that is fair enough, but parsing everything up front may not be the fastest path for a large app.

basically there is a little bit of a dance that happens here when you step into lazy parse mode, in chrome at least you are paying for a slower parse time when you finally hit that code, however you are able to avoid the initial slam of JS lockout... which I think is a pretty good thing.

So there is a line to walk and of course using this lazy parse technique for everything can actually slow things down more.

There are no hard and fast rules here, your best bet is to measure 📊 and iterate on what is best for you and your application.

@TheLarkInn
Copy link

Oooo, okay but this leads to some good ideas in optim. (if possible). I could technically wrap the main loaded js, but then lazy loaded code may be unwrapped because it won't be as sufferable (not as many modules, already rendered and painted initial load).

@nolanlawson
Copy link
Contributor

nolanlawson commented Sep 14, 2016

@kangax Not sure exactly what kind of tracing data you'd like for Edge, but the best way to profile it is with the Windows Performance Analyzer. There's a blog post and video tutorial here: https://blogs.windows.com/msedgedev/2016/05/11/top-down-analysis-wpt

Using that, you can see exactly which DOM APIs and JS functions are taking up the most time, as well as low-level details of Chakra code itself.

@nolanlawson
Copy link
Contributor

I adapted @bmaurer's JSBins into a little benchmark. It's pretty impressive the wins you can get in Chrome and Edge, although Safari and Firefox seem to show basically no difference.

I used a Surface Book i5 for Chrome/FF/Edge and a 2013 i5 MacBookPro for Safari (so it's not apples-to-apples for Safari). All runs were median of 99.

Browser unwrapped time (ms) wrapped time (ms)
Edge 14 8.05 5.285
Chrome 52 22.005 11.99
Firefox 48 5.275 4.91
Safari 10 1.535 1.375

And just for fun, some mobile browsers:

Browser unwrapped time (ms) wrapped time (ms)
Chrome 52 on Nexus 5X Android 7 72.2 40.995
Firefox 48 on Nexus 5X Android 7 21.93 21.645
Safari on iPod Touch 6 iOS 9.3.1 4.405 4.43
Edge 14 on Lumia 950 Win10 RS1 56.895 44.19

So yeah, this mostly seems to be a V8 and Chakra story unless my methodology is wrong somehow.

@nolanlawson
Copy link
Contributor

🎉 Profiler time! 🎉

I ran this benchmark on Edge with Windows Performance Recorder (using the same steps as described in the post above; you don't need to be a MS employee to do this), and confirmed that the unwrapped function causes a double-parse in Chakra. If you drill down the stacktrace, you can see an extra DeferredParsingThunk, whereas in the wrapped version there's only one. Here's a profiler walkthrough on imgur.

@kurtextrem
Copy link

As we have a great thread here, is anyone able to answer commet #2 here? That would be interesting.

Also, as this is getting more and more attention, it would be great to sum drawbacks up somewhere to prevent people from jumping on the "parentheses around everything"-hypetrain.

@kangax
Copy link

kangax commented Sep 14, 2016

@nolanlawson yep, I've used Windows Performance Analyzer in the past — it's great. I was talking about programmatically collecting tracing data to use in CI tools like the one we wrote about here https://code.facebook.com/posts/1368798036482517/browserlab-automated-regression-detection-for-the-web/ Ideally, tracing data would conform to a single standard for ease of use and aggregation.

@nolanlawson
Copy link
Contributor

@kangax Ah OK, I getcha. Yeah, we are actually working on a tool right now to automatically run profiles and generate ETL files (it'll be open-sourced... very soon 😃), but unfortunately we'd need to go a step further to give you an easy way to programmatically pull data out of the ETL (as opposed to just loading it in WPA). Thanks for the feedback, though; it's good to know this is something worth pursuing.

@nolanlawson
Copy link
Contributor

Update: hey guess what we open-sourced it: https://github.com/MicrosoftEdge/Elevator

stevemao added a commit to stevemao/awesome-pull-requests that referenced this pull request Jan 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet