Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure perf for leveldown after error handling implementation. #55

Closed
aruneshchandra opened this issue Jan 20, 2017 · 24 comments
Closed
Assignees
Milestone

Comments

@aruneshchandra
Copy link
Contributor

No description provided.

@aruneshchandra
Copy link
Contributor Author

@boingoing - any update on this ?

@boingoing
Copy link

boingoing commented Feb 8, 2017

Leveldown Nan on V8-Node 8.x Leveldown NAPI on V8-Node-Napi 8.x
Node
https://github.com/nodejs/abi-stable-node
api-prototype-8.x
commit 1b30df1
Node
https://github.com/nodejs/abi-stable-node
api-prototype-8.x
commit 6e70e77
Leveldown
https://github.com/boingoing/leveldown
master
commit boingoing/leveldown@ef05005
Leveldown
https://github.com/boingoing/leveldown
api-opaque-prototype
commit boingoing/leveldown@48c8b83
Elapsed: 45.951
Kernel elapsed: 1.234
User elapsed: 2.359
Elapsed: 45.169
Kernel elapsed: 1.234
User elapsed: 2.296
Elapsed: 43.893
Kernel elapsed: 1.265
User elapsed: 2.109
Elapsed: 45.044
Kernel elapsed: 0.968
User elapsed: 2.171
Elapsed: 46.150
Kernel elapsed: 0.968
User elapsed: 2.328
Elapsed: 44.609
Kernel elapsed: 1.109
User elapsed: 1.812
Elapsed: 45.691
Kernel elapsed: 1.078
User elapsed: 2.500
Elapsed: 45.070
Kernel elapsed: 1.156
User elapsed: 2.171
Elapsed: 42.064
Kernel elapsed: 0.859
User elapsed: 2.031
Elapsed: 44.671
Kernel elapsed: 1.062
User elapsed: 1.953
avg(elapsed) 44.7498 avg(elapsed) 44.9126 (+0.3%)

Above executes the leveldown unit test suite with plain or instrumented Node/Leveldown binaries built from respective repositories. Elapsed time for the suite is measured as a performance metric.

Tests run on a Windows machine with this hardware:
Intel Xeon E5-1620 v3 @ 3.50GHz
16GB DDR4 @ 2133MHz
Samsung XP941 SSD

@mhdawson
Copy link
Member

mhdawson commented Feb 9, 2017

Is there a perf test in leveldown or is this just running the unit tests ?

If there is no perf test I think we might want to write at least as simple one that does something typical and measure that as well.

@aruneshchandra
Copy link
Contributor Author

Next steps:
Try it with a custom perf test like inserting and reading a large data set into and from the database.
Running it on other platforms like mac / Linux etc.

@ianwjhalliday
Copy link

leveldown does have its own perf benchmark suite. @boingoing is that included in your data?

@boingoing
Copy link

boingoing commented Feb 27, 2017

Thanks @ianwjhalliday for pointing to the leveldown benchmark. This looks like a pretty representative test of ordinary database operations. It writes about a million entries representing a database of around 110MB. Ran the benchmark on the same machine and binaries listed in the above comment and found NAPI has about a 5% perf impact. Here are the results for a few runs:

Leveldown Nan on V8-Node 8.x Leveldown NAPI on V8-Node-Napi 8.x
Elapsed: 45.867s Elapsed: 47.619s
Elapsed: 44.805s Elapsed: 47.535s
Elapsed: 45.134s Elapsed: 47.506s
Elapsed: 45.054s Elapsed: 46.482s
Elapsed: 44.739s Elapsed: 47.694s
avg(elapsed) 45.1198s avg(elapsed) 47.3672s (+4.98%)

This workload is not very chatty in terms of calls out to napi API so the measured perf impact may be representative generally. Reducing the overhead for napi callback functions which need to fetch the argument count, argument values, this value, etc by consolidating all those into a single call may prove to minimize the perf impact here, as well as in other modules. This needs some more investigation.

@sampsongao
Copy link
Collaborator

@boingoing what is the unit of elapsed time?

@boingoing
Copy link

@sampsongao all units are in seconds.

@jorangreef
Copy link

jorangreef commented Jul 6, 2018

Reducing the overhead for napi callback functions which need to fetch the argument count, argument values, this value, etc by consolidating all those into a single call may prove to minimize the perf impact here, as well as in other modules.

I am working on a cuckoo hash table. Just calling into JS to fetch the various arguments is costing half the run time. Being able to fetch all arguments in a single call would be terrific.

@gabrielschulhof
Copy link
Collaborator

@jorangreef I do not see a lot of opportunities to improve napi_get_cb_info().

We have already implemented the change you quoted (whereby you retrieve all function arguments in one call).

@kenny-y do you see a way we can further reduce the overhead of that API call?

@jorangreef
Copy link

Thanks @gabrielschulhof

Sorry, I was not clear. What I meant was that I have a napi method which just does this:

size_t argc = 5;
napi_value argv[5];
napi_get_cb_info(env, info, &argc, argv, NULL, NULL);
napi_get_value_external(env, argv[0], (void**) &context));
napi_get_buffer_info(env, argv[1], (void**) &key, &keyLength)
napi_get_value_int64(env, argv[2], &keyOffset);
napi_get_buffer_info(env, argv[3], (void**) &value, &keyLength)
napi_get_value_int64(env, argv[4], &valueOffset);

This alone takes a total time of 200ms for 1,000,000 calls. Each napi call here is adding overhead, 100ms is in napi_get_cb_info and the rest is split across the others.

Is that right?

@gabrielschulhof
Copy link
Collaborator

@jorangreef I experimentally added a benchmark to https://github.com/nodejs/node/tree/master/benchmark/napi/function_args with your combination of parameters in order to examine how much faster V8 is than N-API:

The V8 benchmark:

void CallWithCombo1(const FunctionCallbackInfo<Value>& args) {
  v8::Local<v8::Context> context = args.GetIsolate()->GetCurrentContext();
  void* data = args[0].As<External>()->Value();
  (void) data;
  size_t buf1_length = node::Buffer::Length(args[1]);
  (void) buf1_length;
  char* buf1_data = node::Buffer::Data(args[1]);
  (void) buf1_data;
  int64_t int1 = args[2]->IntegerValue(context).FromJust();
  (void) int1;
  size_t buf2_length = node::Buffer::Length(args[3]);
  (void) buf2_length;
  char* buf2_data = node::Buffer::Data(args[3]);
  (void) buf2_data;
  int64_t int2 = args[4]->IntegerValue(context).FromJust();
  (void) int2;
}

The N-API benchmark:

static napi_value CallWithCombo1(napi_env env, napi_callback_info info) {
  size_t argc = 5;
  napi_value argv[5];
  napi_get_cb_info(env, info, &argc, argv, NULL, NULL);

  void* data;
  napi_get_value_external(env, argv[0], &data);

  size_t buf1_length;
  void* buf1_data;
  napi_get_buffer_info(env, argv[1], &buf1_data, &buf1_length);

  int64_t int1;
  napi_get_value_int64(env, argv[2], &int1);

  size_t buf2_length;
  void* buf2_data;
  napi_get_buffer_info(env, argv[3], &buf2_data, &buf2_length);

  int64_t int2;
  napi_get_value_int64(env, argv[4], &int2);
  return NULL;
}

The result:

combo1-perf-diff

There's definitely a performance degradation there, but, of course, that is to be expected. The question is: does the benefit of ABI stability outweigh the performance cost, and can we improve the performance?

@gabrielschulhof
Copy link
Collaborator

One potential improvement I've found is to have a version of napi_get_value_int64() which is faster than what we currently have, at the expense of not dealing consistently with NaN, and positive and negative infinity:

napi_status napi_get_value_int64_fast(napi_env env,
                                      napi_value value,
                                      int64_t* result) {
  // Omit NAPI_PREAMBLE and GET_RETURN_STATUS because V8 calls here cannot throw
  // JS exceptions.
  CHECK_ENV(env);
  CHECK_ARG(env, value);
  CHECK_ARG(env, result);

  v8::Local<v8::Value> val = v8impl::V8LocalValueFromJsValue(value);

  // Empty context: https://github.com/nodejs/node/issues/14379
  v8::Local<v8::Context> context;
  *result = val->IntegerValue(context).FromJust();

  return napi_clear_last_error(env);
}

This doesn't seem to buy us that much though 😕:
combo1-perf-diff-fast
... so I'm not sure if it's worth implementing napi_get_value_int64_fast().

@kenny-y
Copy link

kenny-y commented Jul 9, 2018

@gabrielschulhof Not yet, I'm still trying to figure out whether there is anything that I can squeeze out...

@jorangreef
Copy link

jorangreef commented Jul 9, 2018

Thanks for the benchmark @gabrielschulhof, that's awesome! 😃

at the expense of not dealing consistently with NaN, and positive and negative infinity

Just speaking for myself, I personally would not mind if methods like napi_get_value_int64 had these optimizations by default and we had slower _validated alternative methods for those who really need the checks done on the C side. These _validated methods could then do more checks and afford to be more paranoid. For example, with something like napi_get_value_uint32 I think I have seen that overflow (without any exception) can occur if the user passes something greater than a uint32. If these checks are important to the module author, then they can always do the validations on the JS side of their module and this should be faster. If they really need the validations done on the C side of their module, then they can fall back to the _validated alternatives?

Are there any optimizations that can be done for napi_get_buffer_info? For example, a version that only gets a pointer to the buffer, not the length? This might help in cases where the module author is already doing validation and throwing exceptions on the JS side.

Regarding napi_get_cb_info is there a way to have JS call a N-API module method AND collect the arguments while still on the JS side, i.e. without the C method having to call back to JS?

If I understand things correctly, at present, it's like this:

JS calls C method, C method calls back to JS asking for arguments.

Is there theoretically any way to do this:

N-API JS collects all arguments and calls N-API C method with argv.

At present, the N-API version of my hash table is slower than the JS version. The JS version can set an element in the hash table faster than the N-API version can just fetch the arguments (let alone set an element). I understand a hash table is by nature more chatty than typical N-API use cases, but it would be great if argument passing was not such a bottleneck.

@gabrielschulhof
Copy link
Collaborator

We don't really call back into JS when retrieving arguments. That is, no JS code gets executed as part of argument retrieval. We do have to copy the arguments out of C++ though, because, although they appear to be an array, with C++' operator overloading we cannot assume that they map to a plain C array – in fact, they most likely don't.

@gabrielschulhof
Copy link
Collaborator

@jorangreef one possible solution I can think of would be to use an ArrayBuffer instance and several TypedArray instances to construct a native structure and then pass only a single argument to the binding. The binding can then cast the ArrayBuffer data to a native structure.

@gabrielschulhof
Copy link
Collaborator

@jorangreef I tested the uint32_t truncation, and AFAICT it's working as advertised. Please have a look at the PR and let me know if I've missed any scenarios!

@gabrielschulhof
Copy link
Collaborator

@jorangreef OK, the external still has to go in a separate argument.

@gabrielschulhof
Copy link
Collaborator

I just realized that using a large ArrayBuffer to hold the arguments implies that the memory for the two Buffers is contiguous, otherwise it requires copying. Is this the case for your addon and, if not, can you rearrange the memory to make this the case?

@gabrielschulhof
Copy link
Collaborator

Argh ... no Int64 TypedArray!

@mhdawson
Copy link
Member

mhdawson commented Jul 9, 2018

Would it make sense to have this in a new open issue?

@gabrielschulhof
Copy link
Collaborator

@jorangreef I'm curious - are you using N-API directly, or via the node-addon-api package? If via the package, are you building on a version of Node.js that has an implementation of N-API? We've recently improved the performance of N-API function invocation.

This improvement has not yet landed in node-addon-api's version of N-API, but I have a PR in the pipe which should bring the changes to node-addon-api.

The performance gains are charted in the PR.

@gabrielschulhof
Copy link
Collaborator

@jorangreef napi_get_buffer_info() accepts nullptr for the information you're not interested in. So, you could pass nullptr for the length to only receive the data.

gabrielschulhof pushed a commit to nodejs/node that referenced this issue Jul 12, 2018
Re: nodejs/abi-stable-node#55 (comment)
PR-URL: #21722
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
targos pushed a commit to nodejs/node that referenced this issue Jul 12, 2018
Re: nodejs/abi-stable-node#55 (comment)
PR-URL: #21722
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants