Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure nanomsg perf after error handling implementation #57

Closed
aruneshchandra opened this issue Jan 20, 2017 · 20 comments
Closed

Measure nanomsg perf after error handling implementation #57

aruneshchandra opened this issue Jan 20, 2017 · 20 comments
Assignees
Milestone

Comments

@aruneshchandra
Copy link
Contributor

No description provided.

@sampsongao
Copy link
Collaborator

Raw Output

After Error Handling

nanomsg sampsongao/node-nanomsg@df36232
node-napi ae24137

node-6.2 perf/local_lat.js tcp://127.0.0.1:5555 1 100000& node-6.2 perf/remote_l at.js tcp://127.0.0.1:5555 1 100000 && wait
message size: 1 [B]
roundtrip count: 100000
average latency: 164.607 [us]
node-6.2 perf/local_thr.js tcp://127.0.0.1:5556 1 100000& node-6.2 perf/remote_thr.js tcp://127.0.0.1:5556 1 100000 && wait
message size: 1 [B]
message count: 100000
throughput: 3841 [msg/s]
throughput: 0.031 [Mb/s]

node-6.2 perf/local_lat.js tcp://127.0.0.1:5555 1 100000& node-6.2 perf/remote_lat.js tcp://127.0.0.1:5555 1 100000 && wait
message size: 1 [B]
roundtrip count: 100000
average latency: 171.08 [us]
node-6.2 perf/local_thr.js tcp://127.0.0.1:5556 1 100000& node-6.2 perf/remote_thr.js tcp://127.0.0.1:5556 1 100000 && wait
message size: 1 [B]
message count: 100000
throughput: 3913 [msg/s]
throughput: 0.031 [Mb/s]

node-6.2 perf/local_lat.js tcp://127.0.0.1:5555 1 100000& node-6.2 perf/remote_lat.js tcp://127.0.0.1:5555 1 100000 && wait
message size: 1 [B]
roundtrip count: 100000
average latency: 154.051 [us]
node-6.2 perf/local_thr.js tcp://127.0.0.1:5556 1 100000& node-6.2 perf/remote_thr.js tcp://127.0.0.1:5556 1 100000 && wait
message size: 1 [B]
message count: 100000
throughput: 4307 [msg/s]
throughput: 0.034 [Mb/s]

node-6.2 perf/local_lat.js tcp://127.0.0.1:5555 1 100000& node-6.2 perf/remote_lat.js tcp://127.0.0.1:5555 1 100000 && wait
message size: 1 [B]
roundtrip count: 100000
average latency: 140.03 [us]
node-6.2 perf/local_thr.js tcp://127.0.0.1:5556 1 100000& node-6.2 perf/remote_thr.js tcp://127.0.0.1:5556 1 100000 && wait
message size: 1 [B]
message count: 100000
throughput: 4275 [msg/s]
throughput: 0.034 [Mb/s]

node-6.2 perf/local_lat.js tcp://127.0.0.1:5555 1 100000& node-6.2 perf/remote_lat.js tcp://127.0.0.1:5555 1 100000 && wait
message size: 1 [B]
roundtrip count: 100000
average latency: 160.49 [us]
node-6.2 perf/local_thr.js tcp://127.0.0.1:5556 1 100000& node-6.2 perf/remote_thr.js tcp://127.0.0.1:5556 1 100000 && wait
message size: 1 [B]
message count: 100000
throughput: 4055 [msg/s]
throughput: 0.032 [Mb/s]

real 0m9.445s
user 0m3.991s
sys 0m0.798s

real 0m9.632s
user 0m4.179s
sys 0m0.828s

real 0m9.159s
user 0m3.817s
sys 0m0.763s

real 0m9.645s
user 0m3.892s
sys 0m0.745s

real 0m9.607s
user 0m4.191s
sys 0m0.732s

Before Error handling

nanomsg sampsongao/node-nanomsg@c70c99c
node-api 67f4581

node-6.2 perf/local_lat.js tcp://127.0.0.1:5555 1 100000& node-6.2 perf/remote_lat.js tcp://127.0.0.1:5555 1 100000 && wait
message size: 1 [B]
roundtrip count: 100000
average latency: 146.274 [us]
node-6.2 perf/local_thr.js tcp://127.0.0.1:5556 1 100000& node-6.2 perf/remote_thr.js tcp://127.0.0.1:5556 1 100000 && wait
message size: 1 [B]
message count: 100000
throughput: 4077 [msg/s]
throughput: 0.033 [Mb/s]

node-6.2 perf/local_lat.js tcp://127.0.0.1:5555 1 100000& node-6.2 perf/remote_lat.js tcp://127.0.0.1:5555 1 100000 && wait
message size: 1 [B]
roundtrip count: 100000
average latency: 162.018 [us]
node-6.2 perf/local_thr.js tcp://127.0.0.1:5556 1 100000& node-6.2 perf/remote_thr.js tcp://127.0.0.1:5556 1 100000 && wait
message size: 1 [B]
message count: 100000
throughput: 4037 [msg/s]
throughput: 0.032 [Mb/s]

node-6.2 perf/local_lat.js tcp://127.0.0.1:5555 1 100000& node-6.2 perf/remote_lat.js tcp://127.0.0.1:5555 1 100000 && wait
message size: 1 [B]
roundtrip count: 100000
average latency: 147.787 [us]
node-6.2 perf/local_thr.js tcp://127.0.0.1:5556 1 100000& node-6.2 perf/remote_thr.js tcp://127.0.0.1:5556 1 100000 && wait
message size: 1 [B]
message count: 100000
throughput: 4105 [msg/s]
throughput: 0.033 [Mb/s]

node-6.2 perf/local_lat.js tcp://127.0.0.1:5555 1 100000& node-6.2 perf/remote_lat.js tcp://127.0.0.1:5555 1 100000 && wait
message size: 1 [B]
roundtrip count: 100000
average latency: 140.077 [us]
node-6.2 perf/local_thr.js tcp://127.0.0.1:5556 1 100000& node-6.2 perf/remote_thr.js tcp://127.0.0.1:5556 1 100000 && wait
message size: 1 [B]
message count: 100000
throughput: 4592 [msg/s]
throughput: 0.037 [Mb/s]

node-6.2 perf/local_lat.js tcp://127.0.0.1:5555 1 100000& node-6.2 perf/remote_lat.js tcp://127.0.0.1:5555 1 100000 && wait
message size: 1 [B]
roundtrip count: 100000
average latency: 137.002 [us]
node-6.2 perf/local_thr.js tcp://127.0.0.1:5556 1 100000& node-6.2 perf/remote_thr.js tcp://127.0.0.1:5556 1 100000 && wait
message size: 1 [B]
message count: 100000
throughput: 4532 [msg/s]
throughput: 0.036 [Mb/s]

real 0m9.819s
user 0m3.851s
sys 0m0.724s

real 0m9.638s
user 0m4.009s
sys 0m0.829s

real 0m9.554s
user 0m4.452s
sys 0m0.730s

real 0m9.789s
user 0m4.198s
sys 0m0.831s

real 0m9.465s
user 0m4.173s
sys 0m0.701s

Before napi

nanomsg: sampsongao/node-nanomsg@4e2bccf
node-api: a4705f6

node-6.2 perf/local_lat.js tcp://127.0.0.1:5555 1 100000& node-6.2 perf/remote_lat.js tcp://127.0.0.1:5555 1 100000 && wait
message size: 1 [B]
roundtrip count: 100000
average latency: 132.443 [us]
node-6.2 perf/local_thr.js tcp://127.0.0.1:5556 1 100000& node-6.2 perf/remote_thr.js tcp://127.0.0.1:5556 1 100000 && wait
message size: 1 [B]
message count: 100000
throughput: 4081 [msg/s]
throughput: 0.033 [Mb/s]

node-6.2 perf/local_lat.js tcp://127.0.0.1:5555 1 100000& node-6.2 perf/remote_lat.js tcp://127.0.0.1:5555 1 100000 && wait
message size: 1 [B]
roundtrip count: 100000
average latency: 125.822 [us]
node-6.2 perf/local_thr.js tcp://127.0.0.1:5556 1 100000& node-6.2 perf/remote_thr.js tcp://127.0.0.1:5556 1 100000 && wait
message size: 1 [B]
message count: 100000
throughput: 4415 [msg/s]
throughput: 0.035 [Mb/s]

node-6.2 perf/local_lat.js tcp://127.0.0.1:5555 1 100000& node-6.2 perf/remote_lat.js tcp://127.0.0.1:5555 1 100000 && wait
message size: 1 [B]
roundtrip count: 100000
average latency: 119.545 [us]
node-6.2 perf/local_thr.js tcp://127.0.0.1:5556 1 100000& node-6.2 perf/remote_thr.js tcp://127.0.0.1:5556 1 100000 && wait
message size: 1 [B]
message count: 100000
throughput: 4328 [msg/s]
throughput: 0.035 [Mb/s]

node-6.2 perf/local_lat.js tcp://127.0.0.1:5555 1 100000& node-6.2 perf/remote_lat.js tcp://127.0.0.1:5555 1 100000 && wait
message size: 1 [B]
roundtrip count: 100000
average latency: 136.911 [us]
node-6.2 perf/local_thr.js tcp://127.0.0.1:5556 1 100000& node-6.2 perf/remote_thr.js tcp://127.0.0.1:5556 1 100000 && wait
message size: 1 [B]
message count: 100000
throughput: 4250 [msg/s]
throughput: 0.034 [Mb/s]

node-6.2 perf/local_lat.js tcp://127.0.0.1:5555 1 100000& node-6.2 perf/remote_lat.js tcp://127.0.0.1:5555 1 100000 && wait
message size: 1 [B]
roundtrip count: 100000
average latency: 122.18 [us]
node-6.2 perf/local_thr.js tcp://127.0.0.1:5556 1 100000& node-6.2 perf/remote_thr.js tcp://127.0.0.1:5556 1 100000 && wait
message size: 1 [B]
message count: 100000
throughput: 4392 [msg/s]
throughput: 0.035 [Mb/s]

real 0m9.668s
user 0m4.390s
sys 0m0.829s

real 0m9.390s
user 0m3.865s
sys 0m0.773s

real 0m9.513s
user 0m4.075s
sys 0m0.846s

real 0m9.577s
user 0m4.176s
sys 0m0.812s

real 0m9.638s
user 0m4.108s
sys 0m0.735s

@sampsongao
Copy link
Collaborator

sampsongao commented Feb 6, 2017

After introduction of error handling

nanomsg sampsongao/node-nanomsg@df36232
node-api ae24137

Items 1 2 3 4 5 Average
average latency [us] 164.607 171.080 154.051 140.030 160.490 158.0516
throughput [msg/s] 3841 3913 4307 4275 4055 4078.2
throughput [Mb/s] 0.031 0.031 0.034 0.034 0.032 0.0324
real [s] 9.445 9.632 9.159 9.645 9.607 9.4976
user [s] 3.991 4.179 3.817 3.892 4.191 4.014
sys [s] 0.798 0.828 0.763 0.745 0.732 0.773

After introduction of napi

nanomsg sampsongao/node-nanomsg@c70c99c
node-api 67f4581

Items 1 2 3 4 5 Average
average latency [us] 146.274 162.018 147.787 140.077 137.002 146.6316
throughput [msg/s] 4077 4037 4105 4592 4532 4268.6
throughput [Mb/s] 0.033 0.032 0.033 0.037 0.036 0.0342
real [s] 9.819 9.638 9.554 9.789 9.465 9.653
user [s] 3.851 4.009 4.452 4.198 4.173 4.1366
sys [s] 0.724 0.829 0.73 0.831 0.701 0.763

Original

nanomsg: sampsongao/node-nanomsg@4e2bccf
node-api: a4705f6

Items 1 2 3 4 5 Average
average latency [us] 132.443 125.822 119.545 136.911 122.180 127.3802
throughput [msg/s] 4081 4415 4328 4250 4392 4293.2
throughput [Mb/s] 0.033 0.035 0.035 0.034 0.035 0.0344
real [s] 9.668 9.39 9.513 9.577 9.638 9.5572
user [s] 4.39 3.865 4.075 4.176 4.108 4.1228
sys [s] 0.829 0.773 0.846 0.812 0.735 0.799

@sampsongao
Copy link
Collaborator

Items delta before and after error handling % delta before and after napi % combined
average latency [us] 7.225488385 13.12909359 24.07862446
throughput [msg/s] -4.668726399 -0.576301363 -5.007919501
throughput [Mb/s] -5.555555556 -0.584795322 -5.813953488
real [s] -1.63620283 0.992437584 -0.623613611
user [s] -3.054309915 0.33360731 -2.638983215
sys [s] 1.319192964 -4.718217562 -3.229036295

So we introduced about 24% more latency and about 5 percent regression in throughput.

@mhdawson
Copy link
Member

mhdawson commented Feb 7, 2017

I'm not sure but does this mean it is for 1 byte messages: message size: 1 [B] ? If so we probably want data with more realistic message sizes as well (hoping its configurable) as I'd expect in real-life 1 byte messages won't necessarily be the norm.

@mhdawson
Copy link
Member

mhdawson commented Feb 7, 2017

I might also be useful to try to find the section of code that most affects the latency so we can understand where the additional time is being added.

@sampsongao
Copy link
Collaborator

sampsongao commented Feb 8, 2017

for message size of 10 bytes

Items delta before and after error handling % delta before and after napi % combined
average latency [us] 0.189636483 22.50655384 29.28834857
throughput [msg/s] -0.537879479 5.889553863 5.689646866
throughput [Mb/s] -0.626780627 6.002265006 5.722891566

@mhdawson
Copy link
Member

mhdawson commented Feb 8, 2017

We should be able to use standard profiling tools, perf even has node.js support so I'd start with that on linux

@aruneshchandra
Copy link
Contributor Author

To understand why Latency is 30% slower with NAPI.
Next steps: Try to stub out NAPI calls on the hot path with static data or V8 calls to find out which NAPI calls are causing the overhead. Also try with larger message size like 1000 bytes.

@sampsongao
Copy link
Collaborator

sampsongao commented Feb 10, 2017

Weight within napi

9.55% node node [.] napi_get_value_int32 ◆
9.38% node node [.] napi_get_cb_args ▒
7.09% node node [.] napi_set_return_value ▒
5.53% node node [.] napi_get_current_env ▒
5.30% node node [.] napi_create_number ▒
5.19% node node_nanomsg.node [.] Send(napi_env__, napi_callback_info__) ▒
4.97% node node [.] napi_escape_handle ▒
4.47% node node [.] napi_make_callback ▒
4.36% node node [.] napi_get_element ▒
4.24% node node [.] v8impl::FunctionCallbackWrapper::Args(napi_value__**▒
4.13% node node [.] napi_open_handle_scope ▒
3.85% node node [.] napi_get_persistent_value ▒
3.29% node node [.] napi_get_global ▒
3.13% node node [.] napi_close_handle_scope ▒
3.02% node node_nanomsg.node [.] Recv(napi_env__, napi_callback_info__) ▒
2.90% node node [.] napi_buffer_copy ▒
2.62% node node [.] napi_open_escapable_handle_scope ▒
2.62% node node [.] napi_buffer_has_instance ▒
2.29% node node [.] v8impl::FunctionCallbackWrapper::SetReturnValue(napi▒
2.01% node node [.] napi_close_escapable_handle_scope ▒
1.12% node node [.] napi_buffer_data ▒
0.89% node node_nanomsg.node [.] PollSendSocket(napi_env__, napi_callback_info__) ▒
0.89% node node [.] napi_create_persistent ▒
0.67% node node [.] napi_buffer_new ▒
0.61% node node_nanomsg.node [.] napi_get_current_env@plt ▒
0.61% node node_nanomsg.node [.] napi_set_return_value@plt ▒
0.61% node node [.] napi_buffer_length ▒
0.56% node node_nanomsg.node [.] napi_create_number@plt ▒
0.56% node node [.] napi_create_object ▒
0.45% node node_nanomsg.node [.] napi_get_cb_args@plt ▒
0.34% node node_nanomsg.node [.] napi_close_handle_scope@plt ▒
0.34% node node_nanomsg.node [.] napi_get_value_int32@plt ▒
0.34% node node_nanomsg.node [.] PollStop(napi_env__, napi_callback_info__) ▒
0.28% node node_nanomsg.node [.] napi_make_callback@plt ▒
0.28% node node_nanomsg.node [.] napi_buffer_has_instance@plt ▒
0.28% node node_nanomsg.node [.] napi_open_escapable_handle_scope@plt ▒
0.28% node node_nanomsg.node [.] napi_close_escapable_handle_scope@plt ▒
0.28% node node [.] napi_set_element ▒
0.11% node node_nanomsg.node [.] napi_buffer_copy@plt ▒
0.11% node node_nanomsg.node [.] napi_buffer_data@plt ▒
0.11% node node_nanomsg.node [.] napi_get_global@plt ▒
0.11% node node_nanomsg.node [.] napi_open_handle_scope@plt ▒
0.11% node node_nanomsg.node [.] napi_create_object@plt ▒
0.06% node node_nanomsg.node [.] napi_get_element@plt ▒
0.06% node node_nanomsg.node [.] napi_create_persistent@plt

@sampsongao
Copy link
Collaborator

sampsongao commented Feb 10, 2017

System Info

Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-55-generic x86_64)

CPU Info

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Stepping: 7
CPU MHz: 2400.000
BogoMIPS: 4800.00
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0,1

@sampsongao
Copy link
Collaborator

sampsongao commented Feb 28, 2017

Performance update after fixing the GC from Napi::Callback #120

Items delta before and after error handling % delta before and after napi % combined
latency [us] 0.95 6.82 7.83
throughput [msg/s] 0.21 -0.12 0.09
throughput [Mb/s] 0.53 0.00 0.53

@jasongin
Copy link
Member

@sampsongao What was the fix?

@digitalinfinity
Copy link
Contributor

Also, what was the workload here? Same as before, 10 byte messages?

@sampsongao
Copy link
Collaborator

This is using 1 byte message.

@aruneshchandra
Copy link
Contributor Author

Data from @sampsongao using 1 byte message size

Items Non N-API N-API Delta
Latency [us] 107.1128 115.5018 7.83%
Throughput [msg/s] 4679.6 4683.6 0.09%
Throughput [Mb/s] 0.0374 0.0376 0.53%

@sampsongao
Copy link
Collaborator

sampsongao commented Mar 6, 2017

Update: investigated further about the reason of this GC difference. The conclusion is that nanomsg code itself is introducing a leak by not freeing Napi::Callback allocated and thus the extra object we allocate in Napi::Callback is taking extra space. By freeing Napi::Callback, Nan implementation and Napi implementation of Callback are both ok. But not sure if we want to be different from nan in this case?

So the fix is as following

diff --git a/src/node_nanomsg.cc b/src/node_nanomsg.cc
index a8f7c58..7588826 100644
--- a/src/node_nanomsg.cc
+++ b/src/node_nanomsg.cc
@@ -499,6 +499,12 @@ NAPI_METHOD(PollReceiveSocket) {
   }
 }

+static void close_cb(uv_handle_t *handle) {
+  const nanomsg_socket_t* const context = static_cast<nanomsg_socket_t*>(handle->data);
+  delete context->callback;
+  delete context;
+}
+
 NAPI_METHOD(PollStop) {
   napi_status status;
   napi_value args[1];
@@ -507,6 +513,7 @@ NAPI_METHOD(PollStop) {

   nanomsg_socket_t *context = UnwrapPointer<nanomsg_socket_t *>(args[0]);
   int r = uv_poll_stop(&context->poll_handle);
+  uv_close(reinterpret_cast<uv_handle_t*>(&context->poll_handle), close_cb);
   napi_value ret;
   status = napi_create_number(env, r, &ret);
   CHECK_STATUS;

Which is already resolved in original node-nanomsg repo. So my next step is to pull in update from original repo and remeasure the performance based on the most updated nanomsg code.

@jasongin
Copy link
Member

jasongin commented Mar 6, 2017

not sure if we want to be different from nan in this case?

Different in what way?

In general we should not be too concerned with doing things exactly like nan. First, Nan is very V8-specific. Second, Nan is basically unable to make breaking changes to fix any issues in its API, because its entire purpose is to provide a more stable API. Since we are introducing an entirely new API, now is our chance to make improvements. See this comment from the Nan maintainer.

@sampsongao
Copy link
Collaborator

So in Nan::Callback, it put the callback directly under the persistent value; in Napi::Callback, we have first store callback under object, and then put the object under the persistent value. Not sure if this is necessary or bringing us any improvement.

@jasongin
Copy link
Member

jasongin commented Mar 6, 2017

That indirection serves no purpose as far as I can see.

Note the Callback class along with everything else in node_api_helpers.h is obsolete and will not be included in the PR to node that we make soon. We need to update all our ported modules to use the new C++ wrappers instead. There, there is no Callback class; you should use Napi::Reference<Napi::Function> instead. (Maybe there should be a typedef?) And then what you get is a Persistent reference to a Function, without the extra Object indirection.

@sampsongao
Copy link
Collaborator

sampsongao commented Mar 7, 2017

Port nanomsg on their current master at napi-port-3.2.4.
New measurement on message size of 1 byte.

Items Non N-API N-API Delta
Latency [us] 102.6978 112.7798 9.82%
Throughput [msg/s] 4730.4 4697.6 -0.69%
Throughput [Mb/s] 0.038 0.0376 -1.05%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants