Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

napi_get_cb_info has a lot of overhead #261

Closed
magiclen opened this issue Jun 25, 2017 · 10 comments
Closed

napi_get_cb_info has a lot of overhead #261

magiclen opened this issue Jun 25, 2017 · 10 comments

Comments

@magiclen
Copy link

magiclen commented Jun 25, 2017

Currently, I implement each of my functions/methods by writing the code which starts with the napi_get_cb_info function. For example,

napi_value testNAPI(napi_env env, napi_callback_info info){
    napi_value me;
    size_t argsLength = 2; // predicted max numbers of arguments
    napi_value args[2];
    napi_get_cb_info(env, info, &argsLength, args, &me, 0);
    return me;
}

Then, let's do some tests.

Given a JS partial code,

this.testJS = function(a, b){
    return this;
}

Write the test code below,

var s, t1, t2;
s = Date.now();
for(let i = 0; i < 10000000; ++i){
  this.testJS();
}
t1 = Date.now() - s;
s = Date.now();
for(let i = 0; i < 10000000; ++i){
  this.testNAPI();
}
t2 = Date.now() - s;
console.log(t1, t2);

The result is,

26 428

You can find out the method implemented by N-API is much slower than the one implemented by JS.

Other napi_get_* APIs have the same situation, too. Maybe there is something needs to be optimized?

@jasongin
Copy link
Member

jasongin commented Jun 26, 2017

There is some overhead from crossing the JS <-> native code boundary. Also, the JavaScript compiler/JIT might be inlining that simple function call when it is pure JS, while inlining is not possible when the function is implemented as a native call. I suspect either or both of those factors contribute most of the difference you observed in the experiment above.

A fairer test of N-API performance would be to compare it to equivalent code written using v8/nan APIs. Then I expect you would find the results to be more similar, though for some scenarios (especially a tight loop that just stresses the JS <-> native boundary) you may still find still some small amount of overhead since N-API unavoidably adds a layer of abstraction over v8.

@digitalinfinity
Copy link
Contributor

Adding on to what @jasongin said, even in your current benchmark, it's unclear whether you're measuring the cost of napi_get_cb_info, or simply the cost of calling into a N-API method.
In general, the optimal use case for N-API methods is when you have something non-trivial to do in the method itself- if your methods are fairly trivial, then the overhead of calling out of JS into native code will dominate the CPU time.

@magiclen
Copy link
Author

@jasongin @digitalinfinity
My apologies, the article I wrote above is unclear. I'm going to make it clear in this comment.

To run the test, we need these files.

test.c

#include <node_api.h>

napi_value testNAPI2(napi_env env, napi_callback_info info){
    // should do something but here is nothing for the test
    // and finally create the result
    napi_value result;
    napi_create_number(env, 1, &result);
    return result;
}

napi_value testNAPI(napi_env env, napi_callback_info info){
    napi_value me;
    size_t argsLength = 2; // predicted max numbers of arguments
    napi_value args[2];
    napi_get_cb_info(env, info, &argsLength, args, &me, 0);
    uint8_t* buffer;
    size_t bufferLength;
    napi_get_buffer_info(env, args[0], (void**)&buffer, &bufferLength);
    size_t strLength;
    napi_get_value_string_utf8(env, args[1], NULL, 0, &strLength);
    char str[strLength + 1];
    napi_get_value_string_utf8(env, args[1], str, strLength + 1, 0);

    // should do something but here is nothing for the test
    return me;
}

napi_value constructor(napi_env env, napi_callback_info info){
        napi_value me;
        napi_get_cb_info(env, info, 0, 0, &me, 0);
        return me;
}

void Init (napi_env env, napi_value exports, napi_value module, void* priv) {
        napi_property_descriptor testAllDesc[] = {
                {"testNAPI", 0, testNAPI, 0, 0, 0, napi_default, 0},
                {"testNAPI2", 0, testNAPI2, 0, 0, 0, napi_default, 0}
        };
        napi_value cons;
        napi_define_class(env, "Test", constructor, 0, 2, testAllDesc, &cons);
        napi_set_named_property(env, exports, "Test", cons);
}

NAPI_MODULE(test, Init);

binding.gyp

{
  "targets": [
    {
      "target_name": "test",
      "sources": [ "./test.c" ]
    }
  ]
}

index.js

const test = require('./build/Release/test');
const Test = test.Test;

var t = new Test();

t.testJS = function(a, b) {
  var buffer = a;
  var bufferLength = a.length;
  var str = b;
  var strLength = str.length;
  // should do something but here is nothing for the test
  return this;
};

t.testJS2 = function(a, b) {
  // should do something but here is nothing for the test
  // and finally create the result
  return 1;
};

var a = Buffer.alloc(8192, 1); // some data
var b = 'This is a test.'; // some data

var s, t1, t2, t3, t4;

s = Date.now();
for (let i = 0; i < 10000000; ++i) {
  t.testJS(a, b);
}
t1 = Date.now() - s;

s = Date.now();
for (let i = 0; i < 10000000; ++i) {
  t.testJS2(a, b);
}
t2 = Date.now() - s;

s = Date.now();
for (let i = 0; i < 10000000; ++i) {
  t.testNAPI(a, b);
}
t3 = Date.now() - s;

s = Date.now();
for (let i = 0; i < 10000000; ++i) {
  t.testNAPI2(a, b);
}
t4 = Date.now() - s;

console.log('testJS:', t1, 'ms');
console.log('testJS2:', t2, 'ms');
console.log('testNAPI:', t3, 'ms');
console.log('testNAPI2:', t4, 'ms');

To compile test.c, execute this command,

node-gyp rebuild

Finally, to run the test, execute this command,

node --napi-modules index.js

And here is my result,

testJS: 25 ms
testJS2: 23 ms
testNAPI: 1289 ms
testNAPI2: 650 ms

We can find out that there is some overhead to get the arguments input from the JS code by using napi_get_cb_info and other napi_get_* functions. The overhead is not small and it is not the cost of calling into a N-API method.

The more arguments I use, the more napi_get_* functions I need to call, and the more overhead there is.

@mhdawson
Copy link
Member

It would be interesting to see a comparison to the case where the native functions are written using Nan. @magiclen is that something you can try ?

@mhdawson
Copy link
Member

The other thing to note is that it is accepted that for "chatty" applications where there are lots of transitions back and form with almost no work done in the native, we do expect it to be slower. We believe that for most modules that won't be the case.

If I do the math I think what you show is that its <0.1289 usec per call (1289ms/10000000). This includes the transition from JS to C and then at least one back from C to JS to get the arguments.

Are you working on a public module ? If so can you point us to the repo so that we can take a look so that we can understand the use case to see how the overhead of the call (which I think is what is being measured by the examples above) and the expected time doing the intended work of the module will compare.

@jasongin
Copy link
Member

If you are passionate about this issue, you are welcome to get involved and help make N-API faster!

In the experiment above, testNAPI does 10 million iterations (each with 4 napi_ calls) in 1289 ms, and testNAPI2 does 10 million iterations (each with 1 napi_ call) in 650 ms. The difference between them consists of 30 million napi_ calls and 1289 - 650 = 639 ms, or about 21 nanoseconds for each napi_ call. (This math assumes that each napi_ call takes the same amount of time, which is certainly not precisely correct, but probably a reasonable enough approximation.) That is very close to the 25ns per-call cost that I had measured (on different hardware) several months ago; see the end of this comment. And we have done some optimization since that initial performance investigation discussed there.

So yes, each napi_ function call has a small cost (as any C function call does really). Unfortunately none of the napi_ functions can be inlined because that could break ABI stability. Certainly if you write a benchmark that is specifically measuring the cost of the N-API interface, then you will be able to measure that cost. But for most real-world native add-ons that actually do some non-trivial work, the overhead of N-API should be small to negligible. Native add-ons that are highly performance-sensitive may still choose to use the v8 APIs directly, forgoing the ABI-stability and JS-VM-neutrality offered by N-API. (There is no plan to deprecate use of v8 APIs by native add-ons.) But in general, the Node CTC has agreed that a small performance degradation is an acceptable tradeoff for the N-API benefits.

Regarding the JS tests above, I think those are still not a very valid comparison, for the reasons mentioned in my previous comment, and also because when the results of some operations (with no side-effects) are discarded, a clever JS optimizer is free to not execute those operations at all. That might be why the two JS measurements were so similar, even though one of them had a lot more statements.

@magiclen
Copy link
Author

magiclen commented Jun 27, 2017

Thanks you both. I will try v8/nan APIs.

And this is my public module written using N-API. It's a StringBuilder module like the StringBuilder class in Java. The module was implemented by JS before. But recently I rewrote it using N-API. I expected the performance of the new version would be better than the performance of the old version. After all, it's a C program! And in fact, most of the methods are quicker now. However, I noticed the append (JS version) (N-API version) function is slower than before because of the the cost of calling into a N-API method and the time consumed by calling napi_get_* functions.

By the way, both of the append functions are quite slower than the JS operator +. It is expected.

There were several napi_get_cb_* functions before. Now, they are merged into a single one napi_get_cb_info. It's good and an idea comes into my head. Is it possible to have a new API which can get all the pointers of arguments instead of napi_value at one time?

@jasongin
Copy link
Member

See notes from a performance investigation I did here: nodejs/node#14379. I found that the 5th optimization suggested there does speed up all native callback invocations a little bit.

Is it possible to have a new API which can get all the pointers of arguments instead of napi_value at one time?

I don't think that would work very well. The challenge is argument values can be any type: integers, strings, objects, arrays, functions, etc. So what kind of pointers would you return? Furthermore, strings need to be copied into another buffer. (There is no V8 API that returns a char* directly for a string value.) So that would be an extremely complicated API.

@digitalinfinity
Copy link
Contributor

Hi @magiclen did you get a chance to compare against nan and v8? Please let us know if you need anything else from us!

@mhdawson
Copy link
Member

Closing since there has not be a response for quite some time. Please let us know if that was not the right thing to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants