Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance studies #1

Open
staltz opened this issue Dec 3, 2021 · 13 comments
Open

Performance studies #1

staltz opened this issue Dec 3, 2021 · 13 comments

Comments

@staltz
Copy link

staltz commented Dec 3, 2021

Hey @jerive I was about to write some BIPF improvements to this module, but then I was thinking about the overhead of crossing from JS to CPP and thought about making some simple benchmarks.

I created a simple "increment by two" function in C++ like this:

NAPI_METHOD(inc) {
  NAPI_ARGV(1)
  NAPI_ARGV_INT32(numb, 0)
  numb += 2;
  NAPI_RETURN_UINT32(numb)
}

Then ran these benchmarks:

let bipf = require('node-gyp-build')(__dirname);

const obj = {
  inc(x) {
    return x + 2;
  },
};

let res = 0;
console.time('increment JS');
for (let i = 0; i < 1000000; i++) {
  res = obj.inc(i);
}
console.timeEnd('increment JS');

console.time('increment CPP');
for (let i = 0; i < 1000000; i++) {
  res = bipf.inc(i);
}
console.timeEnd('increment CPP');

And the numbers came out as:

increment JS: 2.988ms
increment CPP: 38.592ms

Lets try to get those numbers for CPP down to something competitive with JS. Most likely what's going on here is that the N-API is doing some copying. I want to discover whether we can have zero-copy, somehow. The next thing I'll try is to use V8 APIs.

@jerive
Copy link
Owner

jerive commented Dec 3, 2021

I read Dominic's comment.
Is it somehow conceivable to have jitdb in rust within a certain timeframe that would make V8 implem's maintenance cost too high compared to something solid although maybe more future proof ? If I make myself clear.

@staltz
Copy link
Author

staltz commented Dec 3, 2021

I think JITDB in Rust or C++ or Zig is going to be a huge project, and getting it right (fixing bugs and benchmarking it) is going to take a lot of work. I think it should eventually be built, but realistically we're talking about at least 3 months of full time work. That's what it took us to build JITDB (and it includes async-append-only-log).

So I recommend not trying that, unless or until we get budget/resources for many months of full time work, and I'm assuming this doesn't fit into anyone's hobby time.

@staltz
Copy link
Author

staltz commented Dec 3, 2021

PS: I rewrote the above "inc" function in V8 C++, and it looks like this:

#include <node.h>

namespace demo {

using v8::Exception;
using v8::FunctionCallbackInfo;
using v8::Isolate;
using v8::Local;
using v8::NewStringType;
using v8::Number;
using v8::Object;
using v8::String;
using v8::Value;

void Inc(const FunctionCallbackInfo<Value>& args) {
  Isolate* isolate = args.GetIsolate();

  double value = args[0].As<Number>()->Value() + 2;
  Local<Number> num = Number::New(isolate, value);

  args.GetReturnValue().Set(num);
}

void Init(Local<Object> exports) {
  NODE_SET_METHOD(exports, "inc", Inc);
}

NODE_MODULE(NODE_GYP_MODULE_NAME, Init)

}  // namespace demo

Benchmark results are:

increment JS: 3.700ms
increment CPP: 28.621ms

A bit better, but still very bad.

@staltz
Copy link
Author

staltz commented Dec 3, 2021

I did some profiling on the inc functions and the V8::Number::New takes a big chunk of the time budget. I don't know exactly what JS is doing, but it might be that the JS parser is optimizing functions, making them inline, and maybe it's also directly translating the number operations to even lower levels, like machine code.

I think we are close to saying we can quit this experiment, and maybe we should try to benchmark/profile BIPF (JS) and see if we can do V8 tricks.

@jerive
Copy link
Owner

jerive commented Dec 3, 2021

So I recommend not trying that, unless or until we get budget/resources for many months of full time work, and I'm assuming this doesn't fit into anyone's hobby time.

🤣

@jerive
Copy link
Owner

jerive commented Dec 4, 2021

I think we are close to saying we can quit this experiment, and maybe we should try to benchmark/profile BIPF (JS) and see if we can do V8 tricks.

Were you thinking of any particular code paths ?
It feels, considering that we don't have any JSON schema to map to, that the object/array optimizations described in the article you mentioned cannot really be applied.

I tried the same switch/case optimization that is done in encodingLengthers for decode, but it does strictly nothing in terms of performance.

@staltz
Copy link
Author

staltz commented Dec 4, 2021

Timely comment. I just sat down to try some V8 tricks. I think the first step is to get debug information on which functions are being inlined/optimized and which functions are not being inlined, and then step-by-step try to make them all optimizable. So I don't know yet which tricks to try but getting the information out is the 1st step.

@staltz
Copy link
Author

staltz commented Dec 4, 2021

In bipf repo: node --trace-opt --trace-opt-stats --trace-deopt test/perf.js

@jerive
Copy link
Owner

jerive commented Dec 4, 2021

This is interesting
[bailout (kind: deopt-eager, reason: out of bounds): begin. deoptimizing 0x0a5c6b2d7b61 <JSFunction decode (sfi = 0x2eadd490a709)>, opt id 15, bytecode offset 21, deopt exit 16, FP to SP delta 104, caller SP 0x7ffcc61e89d0, pc 0x7f3592872206]
No, it happens after the test

@staltz
Copy link
Author

staltz commented Dec 4, 2021

I came here to say the same thing. 😅

Mine was

[deoptimizing (DEOPT eager): begin 0x0b5b1b0a10d9 <JSFunction decode (sfi = 0x3f7f9f7a7f09)> (opt #64) @4, FP to SP delta: 96, caller sp: 0x7fff0e1c1068]
            ;;; deoptimize at </home/staltz/oss/bipf/node_modules/.pnpm/varint@5.0.2/node_modules/varint/decode.js:19:12> inlined at </home/staltz/oss/bipf/index.js:228:20>, out of bounds

What I think it means is that it inlined varint/decode inside BIPF's decode function and then for some reason it deoptimized varint/decode. Maybe there's a way to prevent that deopt from happening. Maybe if we tweak the varint code a bit.

@jerive
Copy link
Owner

jerive commented Dec 4, 2021

https://github.com/P0lip/v8-deoptimize-reasons

@staltz
Copy link
Author

staltz commented Dec 4, 2021

I keep seeing Smi and I have no idea what that means. 😅

@jerive
Copy link
Owner

jerive commented Dec 4, 2021

--print-opt-code says a lot of things, for example:

Inlined functions (count = 4)
 0x1ec3e348af79 <SharedFunctionInfo decode_string>
 0x1ec3e348ef71 <SharedFunctionInfo read>
 0x30907c666b39 <SharedFunctionInfo toString>
 0x30907c665359 <SharedFunctionInfo slice>

I keep seeing Smi and I have no idea what that means. sweat_smile

Me neither but found:
https://stackoverflow.com/questions/57348783/how-does-v8-store-integers-like-5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants