-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build leaking strings with push/join instead of string concatenation. #48
Conversation
Thanks for the code. As usual with performance improvements, I've setup a small JSPERF: http://jsperf.com/lzstring-1-4-strings-vs-arrays I'll wait a couple of days so that I can test it with all platforms at hand both at home and from work. Can you do the same on your end? Also, please have a look at the JSPERF just to make sure I didn't screw something up. |
Ah, don't worry about the minified version, I update it before any release. |
The Arrays-based implementation could be a bit slower in this kind of micro-benchmarks (5% slower in Chromium 41), but that doesn't really matter. |
Try compressing a block of length 10 000 (10k) 10 000 times, saving each result. Or a block of length 100 000 (100k) 1000 times. Or a block of length 1000 (1k) 100 000 times. |
Just to let you know, your implementation is consistently slower on FF and Chrome on MacOS and Windows, consistently faster on FF & Chrome on Linux, and about the same on Safari (MacOS & iOS). A mixed bag, really. And while I'm working almost exclusively on Linux, I know that Windows + Mac is more than 85% of my desktop traffic. On Android, things look globally equal. As a side note, I've opened an issue on jsperf because the graph is essentially meaningless since they don't mention the OS: mathiasbynens/jsperf.com#222 As usual, there is not really a clear cut decision to take here... Can you tell me how you measured the consumed memory so we can try to benchmark the change from that point of view? |
The problem with strings concatenation is that the resulting string, while being equal to the expected result, consumes much more memory. If you discard the result immediately — you most probably won't notice anything, but if you store the result somewhere — your memory will exhaust pretty soon, brining hell to your setup: gc will start consuming 100% CPU, if your physical memory is exhausted (not just the soft-limit) — side apps and native code will start failing, etc. |
push/join: var len = 1000,
iter = 50000,
m = [];
function foo0() {
var s = [];
for (var j = 0; j < len; j++) {
s.push('x');
}
return s.join('');
}
for (var i = 0; i < iter; i++) {
m.push(foo0());
}
console.log((process.memoryUsage().rss / 1024 / 1024) + ' MiB'); String concatenation: var len = 1000,
iter = 50000,
m = [];
function foo1() {
var s = "";
for (var j = 0; j < len; j++) {
s += 'x';
}
return s;
}
for (var i = 0; i < iter; i++) {
m.push(foo1());
}
console.log((process.memoryUsage().rss / 1024 / 1024) + ' MiB'); Run with iojs/node. If you wish to test browsers, make two separate pages, run them, and measure memory usage, for example, using the developer tools. Make sure that you perform the measurement before gc cleans up things. |
The difference between these tests and your jsperf test is that your test immediately throws the resulting strings out. That doesn't happen in real applications, those strings always go somewhere, for example, into the message queue that is about to be sent. |
True, but bigger strings should consume more memory and then trigger more GCs, leading to a slower compression. That's a lots of theory right there so it's almost certainly false. But I need to get things straight and understand this - at least in a theoretical standpoint - before going further. LZ-String was made for browsers. I don't even understand what people do with it on the server side, save decompressing stuff from the client. Can you tell me your use case of compressing huge strings on the server side? |
The strings were not huge, generally below 1000 chars each. Compressing WebSocket messages before transfering them (not everything was fine with native compression). I disabled that already |
http://jsperf.com/lzstring-1-4-strings-vs-arrays/3 — here, I modified your testcase by storing the results. Now the Array-based implementation is faster in Chromium. |
First, about your use case. Of course, I generally advise people to turn on gzip compression when sending stuff from the server to the client. But even then, you were in the exact same use case as my initial jsperf: Compress a string, send it to the network and throw it away. I don't see the problem of having string bigger than what they represent here. About your jsperf. Well, strings are consistently faster with In particular, this jsperf is making me doubt about the pertinence of this patch for in-browsers lz-string usages. But now that I understand your issue, I'll try to think of something to mitigate |
There is a message queue in my case that contains messages that are about to be sent. I already disabled lz-string compression because the native
Looks I didn't use jsperf correctly. It runs the setup/teardown inside the loop, not outside of it. Wait a moment, I will fix that. |
http://jsperf.com/lzstring-1-4-strings-vs-arrays/4 — take a look at this one. strings-normalize wins in Chromium and Firefox, but that is a hack. And I changed the string that is compressed a bit. It's now the same during one run for all the tests, a bit shorter and has letters. |
About the memory consumption: In Chromium developer tools, under «Timeline», choose «Memory» and press the record button. Same in Firefox. http://oserv.org/bugs/lz-string/memory — use this to measure memory with in-browser developer tools.
My results in Chromium: array — 53M, normalize — 49M, strings — 409M.
My results if Firefox e10s: array — 68M, normalize — 110M, strings — 270M.
That's just 1000 strings with 1000 random numbers each. |
Even if strings concatenation is/were 5% faster, you shouldn't go for that 5% performance win when it increases memory usage 10-100 times in a major browser. In JS strings are immutable, and engines are not optimized for «build a string with char-by-char concat» use-cases, no one expects that. «a += b» does not modify «a» value, it constructs a new string object and overrides the reference. About your «server-side» and «large messages» arguments: The performance should be optimized, for example, by native Map/Set classes, not by misusing strings in a harmful way. |
Sorry, my link was broken, I meant to link to this JSPERF. Of course, it doesn't really apply to me since I'm concatenating characters one by one. In any case, thanks for all the informations. I'm going to look into it closely this week and see what the best solution is. |
http://jsperf.com/javascript-concat-vs-join/4 — try this one. And I still don't understand why are you referring to jsperf micro-benchmarks. |
On your jsperf, the concat version is consistently faster (way faster actually) for me except on Chrome/Linux and IE9... Again a mixed bag. Unfortunately I have extremely little free time these days. I'll try to look into this this weekend to find a way that consumes less memory and doesn't hit performance in the process. If I don't I'll merge this pull request. Thanks for your time and patience. |
Build leaking strings with push/join instead of string concatenation. Slightly slower on some browsers but consumes far less memory.
Thanks. |
This solves #46.
I haven't updated lz-string.min.js, there is no build script.