Garbage collection causes bad performance #8670

Open
dsehnal opened this Issue Sep 20, 2016 · 17 comments

Projects

None yet

6 participants

@dsehnal
dsehnal commented Sep 20, 2016
  • Version: 6.6.0, x64
  • Platform: win10, x64
  • Subsystem: v8 (probably)

A little background: I need to parse a lot of text based data. Everything was running smoothly and the code was very fast, until I ran it on a larger amount of data (about 15GB). Around halfway down the computation, the code suddenly started to be slow and sections that should have taken 30s took up to 10 minutes.

The issue seems to be that if the garbage collection is called enough times, it causes some code to run very slowly. I have also tested Node 4.5 and 6.3 of different machines and the issue was the same.


I experimented with the problem and narrowed the problem down to the code that is available here: https://gist.github.com/dsehnal/004f7f46a238ab2e391a62a172b9b19c.

The code includes two slightly different implementations are stripped down versions of the tokenizer of my parser:

  • In the original (slow) version of the code I use a "class" (prototype + access the state of the tokenizer via this). So moving to the next token is done through tokenizer.moveNext().
  • In the updated (fast) version of the code I use a literal object to represent the state and use "normal" functions that I pass the state to explicitly: moveNext(state).

I have also included code to trigger manual garbage collection to trigger the issue:

if (global.gc && t % 100 === 0) {
  global.gc();
}

When I run the code without the manual GC, both approaches perform the same:

> node test slow
7 runs: min 125ms, max 129ms, median 127ms

> node test fast
7 runs: min 129ms, max 133ms, median 132ms

However, when I enable manual garbage collection (via --expose_gc), the "slow" code is more than 2 times slower:

> node --expose_gc test slow
7 runs: min 574ms, max 629ms, median 611ms

> node --expose_gc test fast
7 runs: min 257ms, max 296ms, median 276ms
@targos targos added the V8 label Sep 20, 2016
@targos
Member
targos commented Sep 20, 2016

Running your code with the --trace-deopt flag shows that the slow case is deoptimized by V8 and the fast is not.
The reason given by V8 is "weak-code". It's the first time I see it.

/cc @nodejs/v8

@dsehnal
dsehnal commented Sep 20, 2016

@targos So do you have any idea what the weak aspect of the might be? And why it only triggers after GC is called multiple times?

@dsehnal
dsehnal commented Sep 20, 2016

So the description for 'weak code' is:

Garbage collector forced deoptimization of this code because it contained a weak reference to a map (hidden class) that died during this GC cycle.

If I create an instance of the of the "slow tokenizer" that persists through the entire runtime of the program, the deoptimization does not happen.

So what is happening is that during the GC, the var Tokenizer is not sctively referenced anywhere in the code, so its "hidden class" gets garbage collected, which in turn causes the deoptimization.

The question remains to what degree is the correct behaviour...

@bnoordhuis
Member

I had a look and the logic seems correct to me, even if its effects are unfortunate in this case. My understanding from reading the code is as follows:

  1. The code space is scanned less frequently than the data space(s).
  2. To prevent code->data references from keeping data objects alive too long, such references are weak references (i.e., they don't count towards liveness of the pointed-to object.)
  3. The time/space trade-off that V8 makes is that it sometimes has to recompile code containing stale weak references.
@dsehnal
dsehnal commented Sep 21, 2016

@bnoordhuis I agree that the base logic of what the GC does is sound. However, are you sure the implementation (especially the recompilation of the code, but there can be other details I am missing) is correct?

I have rerun my original computation using the procedural version of the tokenizer and the running time went down from 42 to 11 minutes. Don't tell me that recompiling ~300 lines of code (the CifTokenizer class from https://github.com/dsehnal/LiteMol/blob/master/LiteMol.Core/Formats/CIF/Text/Parser.ts) 15000 times (at most, because that's the number of times the tokenizer object was created in my computation) takes 31 minutes.

@targos
Member
targos commented Sep 21, 2016

@dsehnal It won't compile it 15000 times. After a few cycles of opt / deopt (I think it's 10), V8 will give up on compiling the function.

@targos
Member
targos commented Sep 21, 2016

Perhaps you can work around the issue by keeping of reference to a dummy CifTokenizer (could be the first one that is created) during the computation ?

@dsehnal
dsehnal commented Sep 21, 2016

@targos Yes, I can work around this particular problem either by refactoring or keeping the references. But who knows when a similar issue pops up again and it won't be so easy to spot.

Do you think it would be possible to modify the v8 garbage collection heuristics/rules to not collect hidden classes for function objects (or whatever it should be called) that have only 1 hidden class? Because I believe that is the case here.

@bnoordhuis
Member

You could try filing a V8 bug report but your suggestion would probably break down in some cases; e.g., when generating JS code on the fly with eval() or new Function().

@addaleax
Member

maybe relevant (or at least a good explanation of where this kind of issue comes from): http://benediktmeurer.de/2016/10/11/the-case-of-temporary-objects-in-chrome/

@bmeurer
Contributor
bmeurer commented Oct 11, 2016

Yeah, that looks a lot like we hit the deoptimization limit here after a couple of GCs.

@dsehnal
dsehnal commented Oct 11, 2016

@bmeurer So do you think that the solution proposed at https://docs.google.com/document/d/1VEeXn6BfKmVkYpfMuxNs3otLLNPIxIVA1cLoSwGMIxI will fix this?

@bmeurer
Contributor
bmeurer commented Oct 11, 2016

@dsehnal This is only the first step. Removing the constant tracking on the map is a hard requirement to make transition tree links strong, otherwise you'll risk unbounded memory leaks. But there's more work, for example, we also need to deal with weak vs. strong links for prototype transitions and Object.create (which is a special case of the prototype transitions). The constant tracking is definitely the big bummer here; the rest should be comparably easy.

@mscdex
Contributor
mscdex commented Dec 25, 2016 edited

FWIW I just noticed that this same issue occurs in node core in places, such as ondata() in Readable.prototype.pipe(). While tracing the net/net-pipe.js benchmark, I noticed ondata() gets reoptimized several times because of the reference to dest (which is getting GC'ed) and finally ondata() is permanently deoptimized for too many reoptimizations.

I tested this on master (currently V8 5.4).

@bmeurer
Contributor
bmeurer commented Dec 25, 2016

@mscdex Looking through _stream_readable.js, it looks like there's a lot of things in there where V8 doesn't shine at the moment. I guess we could do a bit better here.

Which benchmark did you run?

@mscdex
Contributor
mscdex commented Dec 25, 2016

@bmeurer benchmark/net/net-pipe.js

@bmeurer
Contributor
bmeurer commented Dec 26, 2016

@mscdex Awesome. I was observing the problem that at some point in every Node server, we mostly run in unoptimized code. This is a super-simple example, very nice.

~/Projects/node/benchmark/net (master)$ node --version
v8.0.0-pre
~/Projects/node/benchmark/net (master)$ node --trace-opt net-pipe.js | grep 'disabled optimization'
[disabled optimization for 0x1ef47b675121 <SharedFunctionInfo ondata>, reason: Optimized too many times]
[disabled optimization for 0xce0ea447fe9 <SharedFunctionInfo writeOrBuffer>, reason: Optimized too many times]
[disabled optimization for 0xce0ea443ae1 <SharedFunctionInfo readableAddChunk>, reason: Optimized too many times]
[disabled optimization for 0xce0ea48e931 <SharedFunctionInfo Writer.write>, reason: Optimized too many times]
[disabled optimization for 0x24ecb9c70041 <SharedFunctionInfo ondata>, reason: Optimized too many times]
[disabled optimization for 0x2faf60447fe9 <SharedFunctionInfo writeOrBuffer>, reason: Optimized too many times]
[disabled optimization for 0x2faf60443ae1 <SharedFunctionInfo readableAddChunk>, reason: Optimized too many times]
[disabled optimization for 0x2faf6048e931 <SharedFunctionInfo Writer.write>, reason: Optimized too many times]
[disabled optimization for 0x37e713db7011 <SharedFunctionInfo ondata>, reason: Optimized too many times]
[disabled optimization for 0x37e713d43ae1 <SharedFunctionInfo readableAddChunk>, reason: Optimized too many times]
[disabled optimization for 0x37e713d47fe9 <SharedFunctionInfo writeOrBuffer>, reason: Optimized too many times]
[disabled optimization for 0x37e713d8e931 <SharedFunctionInfo Writer.write>, reason: Optimized too many times]
[disabled optimization for 0x92b9d343ae1 <SharedFunctionInfo readableAddChunk>, reason: Optimized too many times]
[disabled optimization for 0x92b9d3b66f9 <SharedFunctionInfo ondata>, reason: Optimized too many times]
[disabled optimization for 0x92b9d347fe9 <SharedFunctionInfo writeOrBuffer>, reason: Optimized too many times]
~/Projects/node/benchmark/net (master)$ node --trace-opt --trace-deopt --trace-gc net-pipe.js | grep weak-code | wc -l
     189

Definitely something we need to investigate, but thanks for the example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment