Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build leaking strings with push/join instead of string concatenation. #48

Merged
merged 1 commit into from Mar 23, 2015

Conversation

ChALkeR
Copy link
Contributor

@ChALkeR ChALkeR commented Mar 15, 2015

This solves #46.

I haven't updated lz-string.min.js, there is no build script.

@pieroxy
Copy link
Owner

pieroxy commented Mar 15, 2015

Thanks for the code. As usual with performance improvements, I've setup a small JSPERF: http://jsperf.com/lzstring-1-4-strings-vs-arrays

I'll wait a couple of days so that I can test it with all platforms at hand both at home and from work. Can you do the same on your end?

Also, please have a look at the JSPERF just to make sure I didn't screw something up.

@pieroxy
Copy link
Owner

pieroxy commented Mar 15, 2015

Ah, don't worry about the minified version, I update it before any release.

@ChALkeR
Copy link
Contributor Author

ChALkeR commented Mar 15, 2015

The Arrays-based implementation could be a bit slower in this kind of micro-benchmarks (5% slower in Chromium 41), but that doesn't really matter.

@ChALkeR
Copy link
Contributor Author

ChALkeR commented Mar 15, 2015

Try compressing a block of length 10 000 (10k) 10 000 times, saving each result. Or a block of length 100 000 (100k) 1000 times. Or a block of length 1000 (1k) 100 000 times.

@pieroxy
Copy link
Owner

pieroxy commented Mar 15, 2015

Just to let you know, your implementation is consistently slower on FF and Chrome on MacOS and Windows, consistently faster on FF & Chrome on Linux, and about the same on Safari (MacOS & iOS).

A mixed bag, really. And while I'm working almost exclusively on Linux, I know that Windows + Mac is more than 85% of my desktop traffic.

On Android, things look globally equal.

As a side note, I've opened an issue on jsperf because the graph is essentially meaningless since they don't mention the OS: mathiasbynens/jsperf.com#222

As usual, there is not really a clear cut decision to take here...

Can you tell me how you measured the consumed memory so we can try to benchmark the change from that point of view?

@ChALkeR
Copy link
Contributor Author

ChALkeR commented Mar 15, 2015

The problem with strings concatenation is that the resulting string, while being equal to the expected result, consumes much more memory. If you discard the result immediately — you most probably won't notice anything, but if you store the result somewhere — your memory will exhaust pretty soon, brining hell to your setup: gc will start consuming 100% CPU, if your physical memory is exhausted (not just the soft-limit) — side apps and native code will start failing, etc.

@ChALkeR
Copy link
Contributor Author

ChALkeR commented Mar 15, 2015

push/join:

var len = 1000,
    iter = 50000,
    m = [];

function foo0() {
    var s = [];
    for (var j = 0; j < len; j++) {
        s.push('x');
    }
    return s.join('');
}
for (var i = 0; i < iter; i++) {
    m.push(foo0());
}
console.log((process.memoryUsage().rss / 1024 / 1024) + ' MiB');

String concatenation:

var len = 1000,
    iter = 50000,
    m = [];

function foo1() {
    var s = "";
    for (var j = 0; j < len; j++) {
        s += 'x';
    }
    return s;
}
for (var i = 0; i < iter; i++) {
    m.push(foo1());
}
console.log((process.memoryUsage().rss / 1024 / 1024) + ' MiB');

Run with iojs/node.

If you wish to test browsers, make two separate pages, run them, and measure memory usage, for example, using the developer tools. Make sure that you perform the measurement before gc cleans up things.

@ChALkeR
Copy link
Contributor Author

ChALkeR commented Mar 15, 2015

The difference between these tests and your jsperf test is that your test immediately throws the resulting strings out. That doesn't happen in real applications, those strings always go somewhere, for example, into the message queue that is about to be sent.

@pieroxy
Copy link
Owner

pieroxy commented Mar 15, 2015

True, but bigger strings should consume more memory and then trigger more GCs, leading to a slower compression. That's a lots of theory right there so it's almost certainly false. But I need to get things straight and understand this - at least in a theoretical standpoint - before going further.

LZ-String was made for browsers. I don't even understand what people do with it on the server side, save decompressing stuff from the client.

Can you tell me your use case of compressing huge strings on the server side?

@ChALkeR
Copy link
Contributor Author

ChALkeR commented Mar 15, 2015

Can you tell me your use case of compressing huge strings on the server side?

The strings were not huge, generally below 1000 chars each. Compressing WebSocket messages before transfering them (not everything was fine with native compression). I disabled that already

@ChALkeR
Copy link
Contributor Author

ChALkeR commented Mar 15, 2015

http://jsperf.com/lzstring-1-4-strings-vs-arrays/3 — here, I modified your testcase by storing the results. Now the Array-based implementation is faster in Chromium.
It's the same microbenchmark in the browser as yours, it just doesn't throw out the results immediately.

@pieroxy
Copy link
Owner

pieroxy commented Mar 15, 2015

First, about your use case. Of course, I generally advise people to turn on gzip compression when sending stuff from the server to the client. But even then, you were in the exact same use case as my initial jsperf: Compress a string, send it to the network and throw it away. I don't see the problem of having string bigger than what they represent here.

About your jsperf. Well, strings are consistently faster with += in your jsperf ;-) But don't worry I get the issue. I'm just think

In particular, this jsperf is making me doubt about the pertinence of this patch for in-browsers lz-string usages.

But now that I understand your issue, I'll try to think of something to mitigate

@ChALkeR
Copy link
Contributor Author

ChALkeR commented Mar 15, 2015

First, about your use case. Of course, I generally advise people to turn on gzip compression when sending stuff from the server to the client. But even then, you were in the exact same use case as my initial jsperf: Compress a string, send it to the network and throw it away. I don't see the problem of having string bigger than what they represent here.

There is a message queue in my case that contains messages that are about to be sent. I already disabled lz-string compression because the native permessage-deflate is working fine nowdays, so don't worry about me =).

About your jsperf. Well, strings are consistently faster with += in your jsperf ;-)

Looks I didn't use jsperf correctly. It runs the setup/teardown inside the loop, not outside of it. Wait a moment, I will fix that.

@ChALkeR
Copy link
Contributor Author

ChALkeR commented Mar 15, 2015

http://jsperf.com/lzstring-1-4-strings-vs-arrays/4 — take a look at this one.

strings-normalize wins in Chromium and Firefox, but that is a hack.

And I changed the string that is compressed a bit. It's now the same during one run for all the tests, a bit shorter and has letters.

@ChALkeR
Copy link
Contributor Author

ChALkeR commented Mar 15, 2015

About the memory consumption: In Chromium developer tools, under «Timeline», choose «Memory» and press the record button. Same in Firefox.

http://oserv.org/bugs/lz-string/memory — use this to measure memory with in-browser developer tools.
My results in Chromium: array — 53M, normalize — 49M, strings — 409M.
My results if Firefox e10s: array — 68M, normalize — 110M, strings — 270M.

That's just 1000 strings with 1000 random numbers each.

@ChALkeR
Copy link
Contributor Author

ChALkeR commented Mar 15, 2015

Even if strings concatenation is/were 5% faster, you shouldn't go for that 5% performance win when it increases memory usage 10-100 times in a major browser.

In JS strings are immutable, and engines are not optimized for «build a string with char-by-char concat» use-cases, no one expects that. «a += b» does not modify «a» value, it constructs a new string object and overrides the reference.

About your «server-side» and «large messages» arguments:
The same engine is used in Chromium, on the client-side.
And if you say that this lib shouldn't be used for large messages or large amounts of messages, what's the point in 5% performance optimization?

The performance should be optimized, for example, by native Map/Set classes, not by misusing strings in a harmful way.

@pieroxy
Copy link
Owner

pieroxy commented Mar 15, 2015

Sorry, my link was broken, I meant to link to this JSPERF. Of course, it doesn't really apply to me since I'm concatenating characters one by one.

In any case, thanks for all the informations. I'm going to look into it closely this week and see what the best solution is.

@ChALkeR
Copy link
Contributor Author

ChALkeR commented Mar 15, 2015

http://jsperf.com/javascript-concat-vs-join/4 — try this one.
It builds 1000 strings with length 5000, one by one.

And I still don't understand why are you referring to jsperf micro-benchmarks.

@pieroxy
Copy link
Owner

pieroxy commented Mar 20, 2015

On your jsperf, the concat version is consistently faster (way faster actually) for me except on Chrome/Linux and IE9... Again a mixed bag.

Unfortunately I have extremely little free time these days. I'll try to look into this this weekend to find a way that consumes less memory and doesn't hit performance in the process. If I don't I'll merge this pull request.

Thanks for your time and patience.

pieroxy added a commit that referenced this pull request Mar 23, 2015
Build leaking strings with push/join instead of string concatenation. Slightly slower on some browsers but consumes far less memory.
@pieroxy pieroxy merged commit d15462a into pieroxy:master Mar 23, 2015
@ChALkeR
Copy link
Contributor Author

ChALkeR commented Mar 23, 2015

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants