Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use += instead of [].join for string concatenation #1977

Closed
wants to merge 4 commits into from

Conversation

alubbe
Copy link
Member

@alubbe alubbe commented Jun 2, 2015

see #1975

@alubbe alubbe mentioned this pull request Jun 2, 2015
5 tasks
@jeromew
Copy link
Contributor

jeromew commented Jun 2, 2015

Hello

is this really needed ? it would make the release of then-jade a lot harder because we use the buf.push mechanism to hijack the buffer and push on the stream.

https://github.com/jadejs/then-jade

I would highly prefer if we could wait until 2.0 for this since I hope that 2.0 will provide new mechanism to maintain then-jade more easily

@alubbe
Copy link
Member Author

alubbe commented Jun 2, 2015

First of all - please ignore the failing tests, I just want to collect some feedback on pursuing this implementation
@jeromew Thanks for the feedback, I had no idea it was used for this.
Regarding whether this is needed: by itself, it speeds up jade only a little bit, but paired with my other improvements, it yields 50-80% performance gains. It is needed as a basis for v8's optimizations.
It also makes jade extremely fast whenever the buf array has less than 5 items.

How about we add a flag for jade-then to add additional calls to some function you can highjack? Couldn't you also monkeypatch into the jade_debug array?

@jeromew
Copy link
Contributor

jeromew commented Jun 2, 2015

I simply reacted to the PR with a fear that it would make things more complicated for then-jade. I don't exactly know what impact it would have except that buf.push is heavily used.

I know that @ForbesLindesay does not want then-jade to stop the evolution of jade so then-jade can't block performance optimization on this (I agree that nowadays += is faster than .join())

I haven't had time to dig into jade 2.0 yet so maybe there will be ways to override this kind of things with the new architecture.

My goal was to do a last 1.x upgrade of then-jade when jade 2.0 is published and then move on to try and use 2.0 for then-jade. If this PR is merged into 1.* I suspect I won't be able to bind then-jade on the last 1.x release.

I let you decide what is best. If you merge it, can you publish an interim release with the multi-line javascript block before ?

@alubbe
Copy link
Member Author

alubbe commented Jun 2, 2015

Seeing that you're generally in favour of this, I'll keep playing around with it until the tests are green. We'll figure out how to release this once we get there - 2.0.0 might be released by then ;)

@ForbesLindesay
Copy link
Member

This needs to wait for 2.0.0 as it's not that uncommon for people to manually inspect the buffer array as a way to do things like runtime filters via mixins.

It is my expectation that then-jade will maintain its own completely separate copy of the compiler in 2.0.0, so it can do its own thing.

@TimothyGu
Copy link
Member

One reason why .push is used is because it disregards undefined and null strings. A simple example:

Before:

var buf = [];
buf.push('a');
buf.push(undefined);
buf.push('b');
buf.join('');
//< 'ab'

After:

var buf = '';
buf += 'a';
buf += undefined;
buf += 'b';
buf
//< 'aundefinedb'

If you want to keep on using +=, you'll need to use something like this, which decreases performance dramatically.

@rlidwka
Copy link
Member

rlidwka commented Jun 2, 2015

What is the memory impact on this?

As far as I remember, one of the reasons [].join is widely used is because with += all partial strings are stored into memory. Make them large enough, and GC would slow it down a lot.

@alubbe
Copy link
Member Author

alubbe commented Jun 2, 2015

@rlidwka I will run some extra tests on memory.

@TimothyGu AFAIK, that is handled before, like this:

text += "<p>" + (null == (jade_interp = thing.a) ? "" : jade_interp) + "</p><p>" + (null == (jade_interp = thing.b) ? "" : jade_interp) + "</p>";

This 'escaping' of null and undefined was done even for the [].push. Does this take care of your issue?

@TimothyGu
Copy link
Member

@alubbe oh okay then.

@alubbe
Copy link
Member Author

alubbe commented Jun 5, 2015

Fixed the failing tests.

@rlidwka As far as I can tell, it uses less RAM than before. I re-ran my benchmark suite and saw less RAM and a 15-50% increase in output.
Also, this simple test shows less RAM and 100% more performance:

/usr/bin/time -v node -e "for(var i = 0; i < 29999; i++){var a = [];for(var j = 0; j < 200; j++) a.push(Math.random().toString());a.join('')}"
/usr/bin/time -v node -e "for(var i = 0; i < 29999; i++){var a = '';for(var j = 0; j < 200; j++) a += Math.random().toString();}"

@ForbesLindesay how do you want to proceed with this PR for now?

@rlidwka
Copy link
Member

rlidwka commented Jun 6, 2015

@alubbe , I just remember a discussion about join vs concat in io.js gitter (March 14 2015, search for "concat").

This bug was discussed:
pieroxy/lz-string#46

Quoting:

ChALkeR March 15 2015

https://paste.kde.org/p0zw84xk2 — one more.
When there is no leak of the resulting string variable, everything is freed instantly. That is dynamic.
When the resulting string leaks, all intermediate strings that were built during it construction are not freed, and are freed only on a gc run.
They only valid reason which I can think of is that concatenation does not realloc the string and does not let the os memory management take care of things, but stores the string as a pile of separate chunks (in this example — 1-symbol chunks) and that the chunk itself costs much more memory than it stores.
And that gc run optimizes strings, allocating them as a single memory buffer, destroying the old chunks list.
That could explain things.

Also, in your benchmarks resulting string is not used anywhere. It can be freed by GC right away, and I won't be surprised if V8 will just throw away all that for loop because it is doing nothing.

If we start to store output to an array, that's what happens:

$ /usr/bin/time -v node -e "var arr=[];for(var i = 0; i < 29999; i++){var a = [];for(var j = 0; j < 200; j++) a.push(Math.random().toString());arr.push(a.join(''))}"
  User time (seconds): 3.96
  Maximum resident set size (kbytes): 215664

$ /usr/bin/time -v node -e "var arr=[];for(var i = 0; i < 29999; i++){var a = '';for(var j = 0; j < 200; j++) a += Math.random().toString();arr.push(a)}"
  User time (seconds): 4.66
  Maximum resident set size (kbytes): 591128

I don't know about this matter any further, honestly. Just remembered that conversation and thought that it might be relevant here.

@alubbe
Copy link
Member Author

alubbe commented Jun 6, 2015

Yes you are right, but after some point in time (I think it was 90 seconds after process launch), v8 starts small, short incremental GC cycles that collect these dangling strings. Meaning that past this point memory usage should be the same.

@alubbe
Copy link
Member Author

alubbe commented Jun 8, 2015

Here is a new idea - if we know the length of the array, [].join can actually keep up, see

/usr/bin/time -v node -e "var arr=[];for(var i = 0; i < 29999; i++){var a = new Array(200);for(var j = 0; j < 200; j++) a[j]=Math.random().toString();arr.push(a.join(''))}"

or http://jsperf.com/optimized-array-join

And within jade templates, we have this information! This should gives us the best of both worlds - faster compilation at lower RAM.

@alubbe
Copy link
Member Author

alubbe commented Jun 8, 2015

Never mind, of course we don't know the length of the buf array, since if statements will heavily influence its length.

So we have to discuss whether the RAM impact have any implications on us.
In synchronous, GC-free for loops of tens of thousands of iterations where none of the results are ever released, the RAM impact does look pretty horrible.
On the client, I don't see this number of iterations being reached (and definitely not before the first GC run)
On the server, these high numbers of iterations can indeed be reached within the GC limit, but there is a crucial difference here - upon sending the response to the client, all strings are released. Therefore, we never actually see the RAM increasing for += beyond what Array.join uses.

I have hit my local laptop with 6000 requests per second for 2 minutes, first against an unclustered express server running jade 1.10.0 and then against the same app using my branch of all optimizations. Here are the results:

jade 1.10.0: Maximum resident set size (kbytes): 80968
optimized branch (using +=): Maximum resident set size (kbytes): 80660

The optimized branch also returned 10% more responses than jade 1.10.0, so we do see an improvement, even when we have all the networking and express overhead.

Now that I understand the impact += has on memory and that it doesn't seem to affect us, my vote would be to use +=.

@alubbe
Copy link
Member Author

alubbe commented Jun 8, 2015

I've also noticed that

a += "some string"

is as fast as

a = a + "some string"

but

a += "some string" + "another string"

is slower than

a = a + "some string" + "another string"

so I have now changed the implementation to actually use a = a + ..., yielding another couple of percentage points of performance.

@alubbe
Copy link
Member Author

alubbe commented Jun 16, 2015

Moved over to jade-code-gen, closing this PR

@alubbe alubbe closed this Jun 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants