-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid lazy parsing by adding parens around factory #774
Conversation
Most javascript engines do a lazy parsing optimization. For example if they see: function X() {...} They won't parse the contents of X until it is executed. However they often have to read the contents of X so that they can see if X uses any variables from the outer scope. Thus X ends up getting parsed twice. Realizing that often one immediately executes a function, browser have optimized statements of the form: (function X() {...})() So that X is not parsed lazily. Traditionally this optimization looks for an open paren before the function. For example, here's the optimization in v8: https://github.com/v8/v8/blob/203391bcc0e970d3cae27ba99afd337c1c5f6d42/src/parsing/parser.cc#L4256 Since the factory is immediately executed (at least in the commonJS and global export case), this change avoids the double parsing that comes with parsing the function lazily.
OT: I wonder why they (V8) don't also check for other common |
May not affect this case, but the discussed technique changes the meaning of the code significantly... the 'X' in the former case is added to enclosing scope, but is not added in the latter case. Looking at the commit, you're talking about a different case here, where the function is already an expression. Are you sure the engine doesn't already parse function expressions without adding the () around them. Do you have any benchmark for this? |
Moreover, why does v8 lazy parse any function if it's gonna cause a double parse. The purpose of lazy parse is to save on startup time, but if they're ever doing the parse work twice then they're sure losing overall perf, right? Is there some measurement that found such functions are much less commonly encountered? I use function decls in all of my code. Probably happens a lot I would think. |
So here's a demo of the difference, using about:tracing Existing method
http://output.jsbin.com/tofixojuya New version
http://output.jsbin.com/wapepakoge/1 Notice how in the old version there were two lazy parsing blocks, each of which took about 5 ms. The idea of lazy parsing is the first parse does slightly less work. The v8 team is testing ideas to allow them to remove the optimization, but for now this approach is likely to be a win in major JS engines. |
How much js is being parsed here? I have a hard time believing chrome takes time in order of ms to just parse scripts. I can parse scripts from within js faster than that. So unless we're talking megabytes of code (and even then) can you bench some real world code and show that this change actually makes a difference? Because you're adding code so that takes longer to parse so you're only making it worse... ;) No but seriously, you're adding "dead code" to guard an edge case for heuristic that could just as well be removed or changed tomorrow. I'm just dipping my nose into something that doesn't affect me at all so take that for whatever you think it's worth :) Bottom line; I don't think the change here is worth the expected result on various levels. |
Parsing takes a long time. Not only does it have to be parsed but some basical lvel of symbol resolution and syntax validation needs to happen. In this case immutable (which is a commonly used library) took ~5 ms to parse. It's an unminified version, but the minified version isn't that much faster. In our discussions with the v8 team it's not uncommon to see websites spend 20-30% of their time parsing. If they used a large amount of code with UMD, the double parsing caused by the extra function UMD adds could be a substantial contribution I realize it seems really silly to add an extra paren to trigger behavior in the parser. But this kind of stuff adds up |
It sounds crazy (and I was skeptical too) but it actually does turn out to be significant when you have a shitload of code. For example, on FB homepage we could spend as much as 370ms just on parsing all of the JS. Delayed loading/execution aside, that's pretty nuts. |
Thanks all, this is fascinating. Personally I can't see any good reason not to do this – it seems silly to forgo a potentially significant optimisation for the sake of two bytes. Nothing is set in stone and we can always change it later if engines change their behaviour. I do wonder about minifiers though – do they respect this pattern? Obviously the tests will need to be fixed before we can merge this 😀 |
Can confirm this has a decent sized win. Here is another thread that talks about exploiting this heuristic. |
I can think of a couple. Years ago everyone was sure that caching the length of an array at for-loop init time was better perf than checking the length each iteration. Countless blog posts and books were written to spread that new cult knowledge. Then, a funny thing happened... engines started caching the length for you. So guess what? Now your once faster code is slower because you bet against the future of the browser. Sure you think you'll just change the tool if something like that happens again, and all will be better. Except there will be tens of thousands of files that will never get reprocessed and will "forever" stay slower. As a tool author, I think you have an even greater responsibility than most devs, to make sure not to do things that look like quick wins which lose in the long run. Just some cents to rattle around. Sorry for barging in. :) |
uglify on its default settings will remove this |
So uglify at least has an option to turn this off, good. I don't know what the right solution is here and it's up to authors and community to decide. It's just something we (FB) came across due to the sheer size of our code and we thought it would make sense to suggest optimization to a commonly-used bundler like rollup, not just make local changes. fwiw, this all is even more noticeable on mobile. React Native engineers found that up to 30% could be spent on parsing. It's definitely one of those engine-specific optimizations that won't live long. But it might be worth the "hassle" for the time being. |
Not at all! Appreciate the input, and you're totally right that toolmakers have an extra responsibility here. I'm just trying to figure out what the worst case scenario is here – it's hard to imagine that any future engine changes would mean that parenthesized function expressions were penalised to anything like the degree that unparenthesized ones currently are, if they're being parsed twice? To use the To be honest I'm speaking here from a position of ignorance as to what JS engines actually do under the hood – am relying on all of your advice 😀 |
There is one significant difference between this and the This is a tradeoff, however, in the rollup case you know which one of these scenarios this will be, right? |
Yeah, in the case of a UMD block we know that the function will be called immediately (unless we're in AMD-land in which case it'll be called as soon as any dependencies have been loaded, but that basically amounts to the same thing – nothing to be gained by being lazy there). |
Just for consideration: What bothers me about this mindset is essentially it's trying to set a precedent that any time we know that there's a function that will run "soon" but isn't a traditional IIFE, it encourages us to always have to put these otherwise-totally-unnecessary parentheses around it. Imagine all the promise callbacks that we're going to have to wrap. This hack is as silly IMO as the CSS |
@kangax No, it kind of makes sense for megabytes of code, which I presume you're talking about. And that probably ends up in a single function with a tool like this. You're still coding against a heuristic that may be replaced with something else at any time. I hope this won't lead to this in five years. |
To clarify, I understand if facebook or whatever does these hacks in specific cases... what bothers me is a popular tool setting a precedent that a lot of other people will be inclined to just blindly follow. |
Pinging @pleath of Chakra and @bzbarsky of SpiderMonkey for insights into this from their engine's perspectives. The gist is covered here. V8 performs a double lazy-parse because it doesn't know the callback is being executed immediately. So because of their heuristic, wrapping the callback in parens, changing |
Btw another solution is to eliminate that, what I've always found a silly and unclear, factory pattern where you pass on the business logic in a pseudo iife as argument and just put that logic inside that In fact, now that I look at that, obviously the |
@jdalton I'm not really a SpiderMonkey developer, and certainly haven't worked on the frontend (parser) side of it. You probably want @jswalden or @syg. That said, I'm 99% sure that the IIFE optimizations in Spidermonkey don't depend on heuristic sniffing of the actual bytes in the byte stream. So chances are, you're basically adding these contortions to work around what is fundamentally a V8 bug... |
SpiderMonkey's predict-invoked heuristic applies when the function is inside parentheses or immediately follows |
...other than whether the bytes encode an open-parenthesis or |
Hey, I left myself 1% wiggle room. ;) |
I think there was a question here about tokens that engines use as immediate-execution hints if they immediately precede a function expression (with the result that lazy/deferred parsing is suppressed). Chakra treats “(“ this way, as well as all the unary operators. -- Paul From: Boris Zbarsky [mailto:notifications@github.com] Hey, I left myself 1% wiggle room. ;) — |
For engine folks wanting to dig in it's the UMD factory pattern (also seen in this PR). |
@jdalton I see you mention v8. In that case I just mention quickly there is a lot of things in the Rollup source code gives DEOPT with latest Chrome versions now. E.g. use of |
The heuristic is not applied during tokenization. It's applied during parsing after a At that point subsequent code hasn't been parsed, and the next token may or may not have been tokenized. In a recursive-descent parser, the ideas is to invoke the function to parse the next subcomponent of the production passing in Hopefully that clarifies things. The |
@pleath it'd be great to have access to Edge tracing data like we have with Chrome. We could then plug it into our system and see exact difference between 2 versions (Mozilla's effort is here btw https://bugzilla.mozilla.org/show_bug.cgi?id=1250290).
Makes sense. In light of this, we had some ideas on transporting JS (via hypothetical format) in such way that function boundaries and scope info are marked in a way that it's easier to re-scan/re-parse and compile. There's nothing certain yet but I'd be curious to hear your thoughts. |
V8 dev here. Probably you are already familiar with Depending on how you measure you will see that if we avoid lazy parsing that the total time spent in parsing (eager parsing + preparsing + lazy parsing) is lower since we spend no time in preparsing (which is 2-3 times the speed of normal parsing and is proportional to the lazy parsed functions). That said, we are aware of the issue and will actively look into improving parsing and preparsing. |
Have gone ahead and released this as 0.34.8, as a result of mrdoob/three.js#9310:
Next to a real-world slowdown of tens of milliseconds for a widely-used library, discussions over whether or not future JS engines may or may not change their behaviour seems a bit... ivory tower, especially when the downside to adding the parens is so negligible. @KFlash this is a discussion about speeding up the code that Rollup generates – are you talking about performance gotchas in Rollup itself? Would welcome any insight into those in a separate issue, thanks |
If a recent "improvement" in chrome caused a 4x slowdown in a hugely popular library, that is a massive failure on chrome's part and they should roll that back. Despite the patronizing "ivory tower" quip, I continue to dissent and assert this change is not only band-ading the real problem but setting bad precedent. That strenuous objection registered, I'll drop the argument since it's clearly not going to get anywhere. |
Agreed (with the caveat that they apparently view it as a performance win overall, and know a lot more about this than I do – as @camillobruni says they're aware of this issue) – and if they do we can revert this change.
I'm sorry you took it that way, because I wasn't quipping – I was being deadly serious. We have users, who expect us to make the decisions that are best for them and their users – as a tool author, my responsibility is to them, not to the people who might cargo-cult this pattern in future. (As one of the most influential JavaScript educators, you have the opportunity to help make sure that doesn't happen.) It feels very wrong for us to say that the hundreds of thousands (more?) of users of Three.js apps (or any other library that uses Rollup) should put up with slower startup times because someone might one day include some useless-but-harmless parentheses after this gets fixed. If I'd understood the magnitude of the problem sooner this PR would have been merged long ago. Band-aiding real problems is what libraries and tools have always done, because we can't fix the real problem – we can only bring it to people's attention and muddle through as best we can in the meantime. |
Just a reminder to those following this thread that minifiers will alter tricks like this, and you have to retest after minify. Also you don't want to do this trick with all factories in a single page app since it helps with initial load for factories not used right away. |
Landing this PR was the correct imho move +1
^ I mostly agree with this statement, however I think being aware of performance, measuring, and being aware of your runtime, is an essential part of any engineers job -- blindly trusting any runtime is going to lead to observational performance regressions. but why should I optimize for a runtime? v8 / spidermonkey .. ect ect.. it should always be fast This is not some one off instance of optimizing for runtimes (v8 or similar). As one example look at one of the canonical perf tuning docs from bluebird: There is a reason why bluebird is so fast today, performance and perf tuning was a key part of the development process. As to:
I am going to disagree here, It is pretty trivial to reach out to v8 devs and or people that can put you in touch with them, they do not live in some "ivory tower", they are software engineers that want to deliver the best product as possible. just like you or me. How to "fix" the real problem Tweeting into the void is not really a solution: Step 1: make a test case +1 to people caring about perf, +1 to measuring, +1 to making the web fast! 🐐 |
@samccone 🐐 ❤️
I took @camillobruni's comment to mean that this was already being tracked somewhere, but I just created a gist illustrating the problem here with live demo here. Hope it's useful! |
@samccone @Rich-Harris this isn't a bug though in Chrome or the other browsers that lazy parse. Lazy parsing generally saves time if you have a large amount of define() modules that aren't needed right away. This is only a good thing to do for rollup modules that are eagerly required during load. This is a bad thing if this is rolling up a module not intended to be loaded right away. I know you want the fastest default with the least thought and that is fair enough, but parsing everything up front may not be the fastest path for a large app. |
@Rich-Harris @samccone @getify waaaaaaaaaaat. Are you saying that we could be leveraging the same technique? |
It appears that way, I'm going to get an issue in on our end. But I think we can write a plugin for this. @bmaurer are there any other perceived tradeoffs? @Rich-Harris im going to lurk the repo more often, otherwise if you see sweet things like this ping me <3 nice work guys. |
basically there is a little bit of a dance that happens here when you step into lazy parse mode, in chrome at least you are paying for a slower parse time when you finally hit that code, however you are able to avoid the initial slam of JS lockout... which I think is a pretty good thing. So there is a line to walk and of course using this lazy parse technique for everything can actually slow things down more. There are no hard and fast rules here, your best bet is to measure 📊 and iterate on what is best for you and your application. |
Oooo, okay but this leads to some good ideas in optim. (if possible). I could technically wrap the main loaded js, but then lazy loaded code may be unwrapped because it won't be as sufferable (not as many modules, already rendered and painted initial load). |
@kangax Not sure exactly what kind of tracing data you'd like for Edge, but the best way to profile it is with the Windows Performance Analyzer. There's a blog post and video tutorial here: https://blogs.windows.com/msedgedev/2016/05/11/top-down-analysis-wpt Using that, you can see exactly which DOM APIs and JS functions are taking up the most time, as well as low-level details of Chakra code itself. |
I adapted @bmaurer's JSBins into a little benchmark. It's pretty impressive the wins you can get in Chrome and Edge, although Safari and Firefox seem to show basically no difference. I used a Surface Book i5 for Chrome/FF/Edge and a 2013 i5 MacBookPro for Safari (so it's not apples-to-apples for Safari). All runs were median of 99.
And just for fun, some mobile browsers:
So yeah, this mostly seems to be a V8 and Chakra story unless my methodology is wrong somehow. |
🎉 Profiler time! 🎉 I ran this benchmark on Edge with Windows Performance Recorder (using the same steps as described in the post above; you don't need to be a MS employee to do this), and confirmed that the unwrapped function causes a double-parse in Chakra. If you drill down the stacktrace, you can see an extra DeferredParsingThunk, whereas in the wrapped version there's only one. Here's a profiler walkthrough on imgur. |
As we have a great thread here, is anyone able to answer commet #2 here? That would be interesting. Also, as this is getting more and more attention, it would be great to sum drawbacks up somewhere to prevent people from jumping on the "parentheses around everything"-hypetrain. |
@nolanlawson yep, I've used Windows Performance Analyzer in the past — it's great. I was talking about programmatically collecting tracing data to use in CI tools like the one we wrote about here https://code.facebook.com/posts/1368798036482517/browserlab-automated-regression-detection-for-the-web/ Ideally, tracing data would conform to a single standard for ease of use and aggregation. |
@kangax Ah OK, I getcha. Yeah, we are actually working on a tool right now to automatically run profiles and generate ETL files (it'll be open-sourced... very soon 😃), but unfortunately we'd need to go a step further to give you an easy way to programmatically pull data out of the ETL (as opposed to just loading it in WPA). Thanks for the feedback, though; it's good to know this is something worth pursuing. |
Update: hey guess what we open-sourced it: https://github.com/MicrosoftEdge/Elevator |
Most javascript engines do a lazy parsing optimization. For example if they see:
function X() {...}
They won't parse the contents of X until it is executed. However they often have to read the contents of X so that they can see if X uses any variables from the outer scope. Thus X ends up getting parsed twice.
Realizing that often one immediately executes a function, browser have optimized statements of the form:
(function X() {...})()
So that X is not parsed lazily. Traditionally this optimization looks for an open paren before the function. For example, here's the optimization in v8:
https://github.com/v8/v8/blob/203391bcc0e970d3cae27ba99afd337c1c5f6d42/src/parsing/parser.cc#L4256
Since the factory is immediately executed (at least in the commonJS and global export case), this change avoids the double parsing that comes with parsing the function lazily.