Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less 2.x very slow in Safari #2339

Closed
k-lange opened this issue Dec 15, 2014 · 42 comments
Closed

Less 2.x very slow in Safari #2339

k-lange opened this issue Dec 15, 2014 · 42 comments

Comments

@k-lange
Copy link

k-lange commented Dec 15, 2014

Version 2.x seems to be about 10x slower in Safari than 1.7.5. I created the following jsfiddles to demonstrate this:

I used the Browserify CDN to get less.js, benchmark.less and ten iterations.

My results (all on OS X Yosemite, 2.4GHz dual core i5):

Version Safari 8 Chrome 39 Firefox 34
1.7.5 ~210ms ~150ms ~200ms
2.1.1 ~2000ms ~200ms ~190ms

Some background: I'm using less.js in a webworker for a live, in-browser less-editor. With 2.x it becomes completely unusable in Safari.
I understand that this is probably not a supported use-case, but it would still be great if someone could have a look at this since it probably applies to the distributed browser version as well.

@lukeapage
Copy link
Member

Is there a profiler in safari? I dont have a mac, so we need some help to
figure out what is causing the problem..

@k-lange
Copy link
Author

k-lange commented Dec 16, 2014

Thanks for looking into this!
I'm no expert on profiling in Safari, but it looks like most of the time is spent on this regex:

screen shot 2014-12-16 at 16 25 08
screen shot 2014-12-16 at 16 24 38
screen shot 2014-12-16 at 16 27 13

@matthew-dean
Copy link
Member

Could it be a case of this? http://www.regular-expressions.info/catastrophic.html

I'm no regex expert, but maybe someone who is could take a look at the regexes?

@matthew-dean
Copy link
Member

I tried poking around the Safari debugger. I couldn't track down the same line of code, but when recording timeline events, something certainly is very computationally intensive for Safari, as the browser locked up indefinitely as it tried to record events at a certain point. (It actually made this MacBook Air sluggish for a couple minutes.)

@jonschlinkert
Copy link
Contributor

just curious, what are the cases that the regex (in question) should support? The code example in the code comment only needs:

/^("([^"]*)"|'([^']*)')/

But the match groups seem to be collecting more than that (e.g. [^"\\\r\n]|\\.), but I'm not sure what string patterns we want to match with that.

@matthew-dean
Copy link
Member

This page also has a regex for this case, but it looks like we're also testing for carriage returns. http://www.metaltoad.com/blog/regex-quoted-string-escapable-quotes

Their's is: ((?<![\\])['"])((?:.(?!(?<![\\])\1))*.?)\1

@jonschlinkert I tested your regex in regex101.com and it doesn't seem to match escaping quotes in the middle of the quoted string, per the comments in the code.

@lukeapage
Copy link
Member

I would go through the code and see if that regex got changed. If it didnt
the cause must be me removing the chunker - i think i may have left in an
option to enable it (though its disabled for v. Good reasons) - it will at
least allow you to narrow down the cause.

@jonschlinkert
Copy link
Contributor

@matthew-dean that uses negative lookbehinds which don't work in javascript, would be awesome if they did.

forgot about the escaping when I did the example, this should do it:

/^'([^'\\]*\\.)*[^']*'|"([^"\\]*\\.)*[^"]*"/

@matthew-dean
Copy link
Member

@jonschlinkert Ah, well it's all Greek to me. Since you seem to know some about regex, can you tell why the existing regex might be a problem (if this is indeed the source of the issue)?

@jonschlinkert
Copy link
Contributor

I'm not sure. I haven't looked at the method that's being called yet (parserInput.$re) to see what that's doing.

I don't think this would cause any issues, but sometimes when everything is "optional" it can be more processing intensive. e.g. when quantifiers are stars, *, which means "zero or more", versus +, which means "one or more", it can cause ambiguity in the pattern. But... this pattern isn't really ambiguous since it's bookended by quotation marks - meaning the engine only has to figure out what's happening in between.

hmm, I should just look at the method lol

@jonschlinkert
Copy link
Contributor

couple of things that are not really important or solutions, but help to simplify just a little (stuff that is very easily overlooked while refactoring):

  • here we're already doing .replace(/\r\n/g, '\n'). A simpler pattern is .replace(/\r/, ''), I don't know of any use cases where that it matters if carriage returns might exist without a newline. there might be... would this actually make a difference? probably not, just thought it was fun to mention
  • either way, anywhere afterwards we can get rid of \r in the regex patterns, since that doesn't exist anymore. (like here, I think that's the only place)

still looking...

@jonschlinkert
Copy link
Contributor

the cause must be me removing the chunker

not sure, it would take me a while to get up to speed on everything, but it seems that parserInput.$re takes quite a while to loop over the matched tokens. this is super anecdotal, I'm just trying to look for a cause

@rjgotten
Copy link
Contributor

Capturing groups can be very expensive; I'd start with reducing those. Infact, you have a number of match groups which do not need to be capturing groups at all when you strip terminator quotes manually, rather than letting the regex handle it.

Also, I vaguely remember something about it being bad for regex performance to have a leading optional capturing group. Try having the ~ token inside the first capturing group as optional, rather than having the entire capturing group as optional.

E.g.

str = parserInput.$re( /^(~?)("(?:(?:[^"\\\r\n]|\\.)*)"|'(?:(?:[^'\\\r\n]|\\.)*)')/ );
if ( str ) {
  return new( tree.Quoted )( str[ 2 ], str[ 2 ].slice( 1, -1 ), Boolean( str[ 1 ]), index, fileInfo );
}

@lukeapage
Copy link
Member

@k-lange for the profile you posted, quoted is taking 70ms, but the total time at the start of the post, safari is taking 10x chrome/firefox at 2000ms - so even if quoted was made faster, it must be a general problem, not just one with that specific regex? or was your profiling part of a different test where 70ms is a significant % of the total time?

I don't want to spend time micro-fixing a single regex till we know the problem. As I said before, it may be due to the removal of the chunker, making every regex slower (and the quoted one just happens to be the most slow?). Can anyone spread any light on the issue? I don't have a mac to test on.

You can re-enable chunking with {chunkInput: true} in the less options (or data-chunk-input="true" attribute on the less link). Can you try that and report back?

@k-lange
Copy link
Author

k-lange commented Jan 5, 2015

@lukeapage thanks, with the chunker enabled safari is as fast as it was in 1.7.5, so this change indeed caused the performance regression.

On the regex: All the screenshots are from one run, the quoted regex appears in different places of the profiling 'tree'. I'm sure I didn't find all occurrences. But these three alone add up to 250ms…

@matthew-dean
Copy link
Member

I was checking this out today using @k-lange's JSFiddle. I think the regular expressions may be a red herring. The reason I say that is that the memory allocation is increasing over time, with frequent garbage collection dumps of a pretty significant size. Take a look at this:

screen shot 2015-02-03 at 12 50 05 pm

As soon as Less begins it works, it (apparently, if I'm reading this correctly) starts allocating resources like crazy, enough so that Chrome is busily garbage collecting chunks of 10MB or more dozens of times in a span of a couple seconds, and the total JS Heap rises pretty quickly. Adding the chunkInput:true seemed to have no effect on this graph. I researched a bit to get a basic understanding of the Timeline profiler. Chrome says that frequent GC and this "stair-step" kind of chart is probably an indication of a memory leak. Or it could be that we're just constructing a massive object that takes double digits of memory to construct and retain.

Of note: I did this in Chrome because Safari can never complete profiling. The memory / CPU requirements of profiling Less.js are too great for my MacBook Air, and it ran for 14 minutes without ever finishing.

@matthew-dean
Copy link
Member

On the other hand....... I ran this on Less 1.7.5 and 1.6.3, and there was no change in the graph either. Meaning that my discovery is probably the red herring, and Less has probably always consumed and retained a large amount of memory in the browser. Or if it has a memory leak now, it's probably always had a memory leak. The only way I can see this being relevant is a compounding problem: memory / CPU usage overwhelming Safari at a certain tipping point.

@matthew-dean
Copy link
Member

Possibly related to: #1306

@matthew-dean
Copy link
Member

I think I may have discovered a possible reason for all the GC at least, but I want to run some tests to make sure.

@kuus
Copy link

kuus commented Feb 5, 2015

+1 for knowing more about that, I tried to take a look at this diff: 1.1.5-extend_patch...v1.3.1
but it's quite hard to find something meaningful not knowing the codebase so much.

@matthew-dean
Copy link
Member

I've found some clues, but not nailed down anything that would make a difference. I want to figure out these performance issues, mostly because I want to understand how to solve these with the Web Inspector profiling anyway. But also, solving them may improve Node compile performance, who knows.

@matthew-dean
Copy link
Member

Okay, I've done some more extensive profiling today, looking at heap snapshots in Chrome, and recording multiple timelines in Chrome and Safari, and I think I have an answer.

First: in my testing on my system, setting chunkInput: true made no difference. The run times in Safari were equal with and without. So what is the answer to this question:

Q: Why is Safari so much slower than Chrome?
A: Safari is much slower than Chrome.

I know that's an entirely unsatisfying answer. But there was no single code path, or regex that seemed to be significantly adding to the total. A particular piece of the parser may take longer than another (as in @k-lange's example), but not amazingly longer. As expected, the big totals are in parsing the input, but that's most of the work Less does anyway. I believe it's a cumulative effect, and V8 in Chrome is simply faster than JavaScriptCore in Safari, and has optimizations that either JavaScriptCore does not have, or does them better.

The logical next question is: why is 1.7.5 faster in Safari? Well, with more functions exposed in the plugin syntax, one possibility is that each segment in parser functions has a little further to travel. So, if you call an additional function 1,000 times, and your JS engine is 1 millisecond slower than another at calling that function (plus all the functions that function calls), then you've added a total of 1 second to parsing.

iOS: Same story. And Firefox. V8 is the fastest of the bunch at these particular operations. That's it.

So, what are some things V8 might do faster?

  • Garbage collection - Less.js allocates large numbers of objects and arrays in the parsing process. It also de-references many of them almost as soon as they are created. When they're no longer referenced, the GC can jump into action, releasing memory the objects used to hold. V8 may have a more efficient garbage collector.
  • Object allocation - V8 may be faster at allocating memory in the first place for objects and arrays.
  • Array resizing - Some JS engines drag when you use .pop(), .shift(), .push(), because they need to change the memory structure of the array. JS doesn't define HOW to change the reference to that array at the engine level. You just put whatever you please into an array or object, change the structure of that object at your whim, and expect the engine to keep up and give you back the new transformed object as quickly as possible. Guessing the types of non-typed objects is an area where V8 is strongest, and where Google has spent a lot of engineering time. Anyway, Less.js does array transformations A LOT. So, I suspect it's adding a chunk of time in Safari.

Note that these are just educated guesses. There was a lot of data to go through in the respective Inspectors, so I really couldn't say how much any of these things might be affecting the outcome.

Now, can we address it? Probably. Maybe. There are a number of techniques for getting objects and arrays to perform better, and to not, say, not work a GC as hard, such as:

  • Object pooling - Creating objects of like size initially, and then borrowing an object from the pool when needed, rather than create all objects on the fly.
  • More de-referencing - Less.js exposes the AST now (I think?) and a number of other objects. The more objects - the more memory to keep those objects. Of note - if you run Less.render on the same source multiple times, it SHOULD keep a rather level memory / object allocation. However, that's not what I see, which may be what was causing the stair-stepping of memory in the screenshot I did. So, it's possible that even when Less is creating an entirely new AST that exactly resembles the previous one, it's still not de-referencing existing objects. That implies a memory leak. There are objects which we don't need in memory anymore, which the JS engine doesn't know it can get rid of. I'm not sure that memory leaking in objects would slow down Safari during parsing, but it's possible, if that memory leaking is forcing more frequent GC of other objects to make up the difference.
  • Less frequent array transformations - Instead of frequent pop()s, it's more efficient (if possible) to iterate through an array, and when finished, set the array to null. It looks like Less is transforming arrays often, so if it's transforming them in order to essentially iterate through each item in a list and consume it, that could be dragging the other JS parsers.
  • Fewer circular references - As I was drilling down through some of the objects created, I noticed that there were tree objects which had a property that pointed back to the root Less object. When I was researching how to do interpret Inspector data, someone mentioned that Safari does poorly at circular references, even though they're technically fine in JS.
  • Type hinting - Apparently, there are ways that you can "imply" types in JavaScript for a JS engine so that it can more efficiently create a strict type that performs faster. I don't have links offhand that describe it, but I've read about it.
  • (EDIT) Fewer global objects - When plugins were introduced, I think the approach we took was to create more root references to Less functions. (I could be entirely wrong about this.) BUT in short, functions are objects too, and any function that relates to or is only relevant to parsing but is exposed to the root context will not be de-allocated. If possible, objects should be exposed to plugins, and plugin functions themselves should only be defined and exist for the length of time that Less.js is parsing / outputting. That may already be happening, and someone more familiar with the code would know for sure, but just as a general principle: the fewer global functions, the better.

Anyway, this is a tough nut to crack, so if anyone else has different data / conclusions, chime in. But as far as I can tell, it's not an easy fix. It's mostly doing performance enhancements here and there across the entire library.

@rjgotten
Copy link
Contributor

The earlier posted fiddle calls less.render in a recursive loop. I forked it and made it callable via a clickable button, which is a bit more fit to interactive profiling with the taking of heap snapshots:

https://jsfiddle.net/yoe8utbe/6/embedded/result/

  • Open Chrome -> Dev tools -> Profiles
  • Take 1st heap snapshot
  • Click 'Run' button to invoke Less compiler
  • Take 2nd heap snapshot
  • Click 'Run' button again
  • Take 3rd heap snapshot
  • Open 3rd heap snapshot's summary view
  • Filter to objects allocated between snapshot 1 and 2

This should give you a good view of what actually leaked. And it's really not that much.

I had a look at the JavaScript CPU profile and using Firefox's 'invert call stack' functionality, Most CPU time is spent in Ruleset.prototype.eval and Visitor.prototype.visit

@matthew-dean
Copy link
Member

@rjgotten Yes, I did the same thing to isolate it (trigger it on a button). I also found that JSFiddle was not helping because CodeMirror had some memory leaks going.

And I agree, the object difference is not that much, so while some leaking may be present, I think it's just overall processing time in each of the JS engines. Those two functions have more time spent because they are logically used most.

@kuus
Copy link

kuus commented Feb 28, 2015

@matthew-dean thanks for the insight about less.js performances!
I was just wondering, if it's really just all about these 'small' optimizations and the speed of the browser engine, do you have any idea how is possible that version 1.7.5 was that much faster as reported by @k-lange ? There must have been some particular code change that caused that right?

Also, can someone else confirm those tests results on safari? (sorry I'm on linux)

@matthew-dean
Copy link
Member

@kuus My guess is that there is something added to Less which Safari is particularly bad at, or wasn't optimal, but something that Chrome was especially good at optimizing. But yes, I'd love to have other people gather data or run the same tests.

It's hard to know without taking individual pieces and running browser tests, but I don't know if that's possible. I'm just learning some of the more advanced Webkit profiling tools, so someone who knows more about it may have more insight.

@rjgotten
Copy link
Contributor

rjgotten commented Mar 1, 2015

My guess is that there is something added to Less which Safari is particularly bad at, or wasn't optimal, but something that Chrome was especially good at optimizing.

I think the main difference feature-wise is that V8 has a generational garbage collector, which I don't believe JavaScriptCore has available yet. That would help immensely when you have a mix of long-lived and extremely short-lived objects. That, and V8 is better in general at handling multiple 'shapes' (type variants) with the optimizing compiler.

If you want to increase performance, I'd start by looking at stable types and object pooling.

@matthew-dean
Copy link
Member

I think the main difference feature-wise is that V8 has a generational garbage collector, which I don't believe JavaScriptCore has available yet. That would help immensely when you have a mix of long-lived and extremely short-lived objects. That, and V8 is better in general at handling multiple 'shapes' (type variants) with the optimizing compiler.

I think that's exactly right, based on what I've read about V8. And I agree with those solutions.

@lukeapage
Copy link
Member

I think you are missing what was confirmed above.

  1. With large files, non v8 regex gets very slow. In v1 the input was
    chunked, meaning regexes ran on smaller pieces. This is only visible with
    very large less files (it was not visible in bootstrap, which is well split
    up). The removal of chunking was because it had bugs in to do with comment
    handling - it started to look like we would have to implement a mini parser
    in the chunker. You can still test this mode with chunkInput. This is what
    is giving the x2 slowdown for some files and browsers. Maybe we should
    make an incremental fix to the chunker so the unit tests pass and reenable
    by default? It would still not be perfect...
  2. Every fix and feature makes less slower, and v2 is a little bit slower
    than v1. I found a few slow functions that might be able to cache results
    but it was non trivial to fix.

@matthew-dean
Copy link
Member

I think you are missing what was confirmed above.

The chunkInput setting was confirmed by one person, but I couldn't replicate it. Whether or not chunkInput was enabled had no effect for me in Safari. It makes sense in theory that a regex running on an entire file could be orders of magnitude slower than on a small chunk. I just couldn't replicate it. Has anyone else been able to replicate that result?

@k-lange
Copy link
Author

k-lange commented Mar 2, 2015

@matthew-dean here's my fiddle from the beginning with { chunkInput: true }: http://jsfiddle.net/5p2wp2bp/2/
It's dramatically faster for me, can't be just my machine?

@matthew-dean
Copy link
Member

@k-lange God damn it, you're right. I was setting less options before script load, but not passing them directly into render().

Well, at least I learned a lot about developer tools and JavaScript optimization. Son of a....

Welp, I guess the chunker is a necessity then @lukeapage.

@simonferndriger
Copy link

Just wanted to add that in the current version 2.4.0 it is also very slow in Firefox.
Chrome is the only browser so far that renders it quickly (in client mode).

@simonferndriger
Copy link

Also, the main difference I see is that in Chrome, the website does not get rendered at all (white page) until LESS is ready. Then the website shows up correctly all at once.

In Firefox, the website already starts to render without the LESS being ready which looks like a website with no CSS at all, and then, after a few seconds, it shows up correctly.

This prerendering in FireFox without CSS is troublesome for jQuery document.ready because a lot of metrics (width/height/etc) are not correct during this delicate time.

@matthew-dean
Copy link
Member

Just wanted to add that in the current version 2.4.0 it is also very slow in Firefox. Chrome is the only browser so far that renders it quickly (in client mode).

What about with the chunkInput option enabled?

@simonferndriger
Copy link

Total page load in 8.96s - with or without "var less = { chunkInput:
true };"

PS: Will there be Dreamweaver CC support for syntax highlighting?

www.lookr.com http://www.lookr.com — Take a look around the world

Am 12.03.2015 um 19:07 schrieb Matthew Dean:

Just wanted to add that in the current version 2.4.0 it is also
very slow in Firefox. Chrome is the only browser so far that
renders it quickly (in client mode).

What about with the chunkInput option enabled?


Reply to this email directly or view it on GitHub
#2339 (comment).

@bd82
Copy link
Contributor

bd82 commented Dec 29, 2015

I've experienced severe performance issues in Safari RegExps in a similar context but not as a Less.js user.

This may be related because:

  • Similarity in the regExp mentioned above (negation pattern + anchor + capturing groups)
  • More severe issues the larger the input size (removal of chunker)

See jsperf benchmark that demonstrates the issue.

For large enough inputs five!! orders of magnitude performance degradation in safari

webkit bug report

@matthew-dean
Copy link
Member

@bd82 The fix for this is to set {chunkInput: true} but I'm not clear on why that's no longer the default setting. It seems Safari doesn't work well without it.

@bd82
Copy link
Contributor

bd82 commented Dec 29, 2015

@matthew-dean
Thanks, but I am not using less.js (currently) so I do not require a fix/workaround.
My comment is about (possibly?) the root cause of the performance issues.

I've edited my original comment to be more explicit about this to avoid future misunderstandings.

@bd82
Copy link
Contributor

bd82 commented Aug 28, 2016

@matthew-dean

Is this still relevant ?

I wanted to reproduce this on the 3.x branch to attempt a fix.
But was unsuccessful at reproducing.

So i've tried modifying the original JSFiddle to use the latest Less version (2.7.1) But I still can't
reproduce it even if I configure chunkInput=false

JSFiddle with 2.7.1

Works well in Safari(9.1) / Chrome / Firefox

Am I doing something incorrectly or perhaps this was somehow resolved?
I can reproduce the original issue using the original JSFiddle, just not with the latest Less.js (2.7.1).

@AnirudhaGohokar
Copy link

AnirudhaGohokar commented May 19, 2017

I tested with the latest less version (2.7.2) on Safari 5.1.7 for Windows with 130kb CSS
chunkInput: true parse time: 148 ms.
chunkInput: false parse time: 3000 ms.

I can see a significant performance improvement for Safari. 148ms parse time seems to be completely acceptable to me. Don't see much difference on IE 11+, Chrome,FF when chunkInput is toggled.

@matthew-dean
Copy link
Member

@bd82 I'm not sure if this is still relevant. I know that making sure XHR was not sync by default helped things out tremendously.

Closing for now. Re-open if there are new Safari issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants