Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance on AM3358 / BBB / Raspberry Pi #14

Closed
WerVbn opened this issue Jul 30, 2014 · 6 comments
Closed

Performance on AM3358 / BBB / Raspberry Pi #14

WerVbn opened this issue Jul 30, 2014 · 6 comments

Comments

@WerVbn
Copy link

WerVbn commented Jul 30, 2014

I'm running a simple OPC UA server on my BeagleBoneBlack and get the data via a OPC UA client (on another computer) from the server. Both, client und server are written with the node-opcua tool. I read every second some variables with the client and the CPU load of the BBB (the server) is about 100%!
Obviously needs the software much more hardware performance to run the node-opcua module on a Raspberry Pi or BBB.
Any experience with BBB and this module or any idea to reduce the cpu load?

@erossignon
Copy link
Member

Hi tesfel,
Would you be able to share a gist of you server code ?
What version of nodejs ?
What version of node-opcua are you using ?
Have you try to validate the node-opcua by running the unit tests ? What is the outcome on a BBB ?

@WerVbn
Copy link
Author

WerVbn commented Aug 4, 2014

Hello,
I'm using the actual version of nodejs and node-opcua. The outcome seems to be pretty bad... I performed some tests, here are the results...

node-opcua@0.0.26 test /home/debian/node-opcua
mocha test -R spec --recursive --timeout 8000

testing dump browseDescriptions
✓ should provide a service to build NodeClassMask easily
✓ should dump references
✓ should dump a browseDescription
✓ should provide a convenient a way to construct the node full name

testing NodeSet XML file loading
✓ should load a nodeset xml file (1894ms)
1) should load a large nodeset xml file
[....]
XMLToJSON
✓ should parse a simple xml data string
✓ should parse a UTF8 encoded xml file with a BOM (5334ms)

565 passing (3m)
5 pending
4 failing

  1. testing NodeSet XML file loading should load a large nodeset xml file:
    Error: timeout of 10000ms exceeded
    at done (/home/debian/node-opcua/node_modules/mocha/lib/runnable.js:202:67)
    at /home/debian/node-opcua/node_modules/mocha/lib/runnable.js:223:9
    at /home/debian/node-opcua/test/address_space/test_load_nodeset2.js:56:13
    at /home/debian/node-opcua/lib/address_space/load_nodeset2.js:321:18
    at null. (/home/debian/node-opcua/lib/xml2json/lib.js:197:25)
    at EventEmitter.emit (events.js:117:20)
    at null.onEndDocument (/home/debian/node-opcua/node_modules/ersatz-node-expat/lib.js:32:18)
    at SaxParser._fireEvent (/home/debian/node-opcua/node_modules/node-xml/lib/node-xml.js:953:22)
    at SaxParser.parseString (/home/debian/node-opcua/node_modules/node-xml/lib/node-xml.js:831:14)
    at Parser.end (/home/debian/node-opcua/node_modules/ersatz-node-expat/lib.js:59:15)
    at BomStrippingStream.onend (_stream_readable.js:483:10)
    at BomStrippingStream.g (events.js:180:16)
    at BomStrippingStream.EventEmitter.emit (events.js:117:20)
    at _stream_readable.js:920:16
    at process._tickCallback (node.js:415:13)

  2. NodeCrawler should crawl for a complete tree:
    Error: timeout of 10000ms exceeded
    at null. (/home/debian/node-opcua/node_modules/mocha/lib/runnable.js:156:19)
    at Timer.listOnTimeout as ontimeout

  3. NodeCrawler should crawl one at a time:
    Error: timeout of 10000ms exceeded
    at null. (/home/debian/node-opcua/node_modules/mocha/lib/runnable.js:156:19)
    at Timer.listOnTimeout as ontimeout

  4. NodeCrawler should crawl faster the second time:
    Error: timeout of 10000ms exceeded
    at null. (/home/debian/node-opcua/node_modules/mocha/lib/runnable.js:156:19)
    at Timer.listOnTimeout as ontimeout

@erossignon
Copy link
Member

Hi tesfel,
Some of the intense test definitly need a timeout greater than 10s on RPI or BBB.

I am currently looking for some performance gain by profiling the code:

Here are some that have been already made:

  • a9a7441 avoid using if(a in b) move require(...) outside function scope
  • d5f3394 speeding up isSubtypeOf ( by memoization) , ensure a 10 fold performance improvement
  • 8dd7ed9 further optimisation ( avoid hasOwnProperty, cached nodeID)
  • d3a7ff7 speed-up : avoid using forEach and hasOwnProperty in critical code
  • 4c6e94e improve resolveNodeId perf
  • 9cd1161 improve performance of date <-> int64 conversion
  • 5db0902 avoid using Buffer.concat when unnecessary
  • 47c1b87 avoid embedded function to improve V8 code optimization & speed
  • 5b8df9a increase test timeout to cope with RaspberryPI & BeagleBoneBlack performance

There is a nice tool that you can use to compare the perf of the V8 engine on nodejs and inside browser

$ npm install benchmark-octane -g
$ benchmark-octane

you can also run octane inside a chrome/firefox browser:
http://octane-benchmark.googlecode.com/svn/latest/index.html

I have also noticed that a node-opcua server will use quite a lot of CPU at the beginner and that after some hours of work the CPU% will settle down. I guess this is the V8 optimizer working in the background trying to optimize the javascript code behind the scene

Can you tell me the octane score you get on a BBB ?

@brycheiniog
Copy link

I am interested in this as well. I have quite a simple OPC-UA server running on the BBB and it frequently hits >30% CPU. We have not investigated yet, but I have run the benchmarks on both a BBB and an Intel Edison. Which of the bench marks are most critical to performance of the OPC-UA stack?:

BBB Results:

   hostname     : beaglebone
    node version : v0.10.25
      V8 version : 3.14.5.8
 platform & arch : linux arm

 config : { target_defaults:
   { cflags: [],
     default_configuration: 'Release',
     defines: [],
     include_dirs: [],
     libraries:
      [ '-lz',
        '-lcares',
        '-lv8',
        '-lssl',
        '-lcrypto' ] },
  variables:
   { arm_fpu: 'vfpv3',
     arm_neon: 0,
     armv7: 1,
     clang: 0,
     gcc_version: 46,
     host_arch: 'arm',
     node_install_npm: false,
     node_prefix: '/usr',
     node_shared_cares: true,
     node_shared_http_parser: false,
     node_shared_libuv: false,
     node_shared_openssl: true,
     node_shared_v8: true,
     node_shared_zlib: true,
     node_tag: '',
     node_unsafe_optimizations: 0,
     node_use_dtrace: false,
     node_use_etw: false,
     node_use_openssl: true,
     node_use_perfctr: false,
     node_use_systemtap: false,
     python: '/usr/bin/python',
     target_arch: 'arm',
     v8_enable_gdbjit: 0,
     v8_no_strict_aliasing: 1,
     v8_use_arm_eabi_hardfloat: true,
     v8_use_snapshot: false } }

Richards            : 1310
DeltaBlue           : 1044
Crypto              : 1551
RayTrace            : 741
EarleyBoyer         : 1697
RegExp              : 200
Splay               : 459
SplayLatency        : 1254
NavierStokes        : 767
PdfJS               : 737
Mandreel            : 546
MandreelLatency     : 730
Gameboy             : 1016
CodeLoad            : 1204
Box2D               : 331
zlib                : 1418
Typescript          : 1053
Score (version 9): 830
 duration  325.90286258095875  seconds

I also ran it on the Intel Edison:

    hostname     : ubilinux
    node version : v0.10.36
      V8 version : 3.14.5.9
 platform & arch : linux ia32

 config : { target_defaults:
   { cflags: [],
     default_configuration: 'Release',
     defines: [],
     include_dirs: [],
     libraries: [] },
  variables:
   { clang: 0,
     gcc_version: 47,
     host_arch: 'ia32',
     node_install_npm: true,
     node_prefix: '/usr',
     node_shared_cares: false,
     node_shared_http_parser: false,
     node_shared_libuv: false,
     node_shared_openssl: false,
     node_shared_v8: false,
     node_shared_zlib: false,
     node_tag: '',
     node_unsafe_optimizations: 0,
     node_use_dtrace: false,
     node_use_etw: false,
     node_use_openssl: true,
     node_use_perfctr: false,
     node_use_systemtap: false,
     openssl_no_asm: 0,
     python: '/usr/bin/python',
     target_arch: 'ia32',
     v8_enable_gdbjit: 0,
     v8_no_strict_aliasing: 1,
     v8_use_snapshot: false,
     want_separate_host_toolset: 0 } }

Richards            : 1130
DeltaBlue           : 1490
Crypto              : 1484
RayTrace            : 1031
EarleyBoyer         : 2739
RegExp              : 272
Splay               : 577
SplayLatency        : 1474
NavierStokes        : 1850
PdfJS               : 870
Mandreel            : 130
MandreelLatency     : 554
Gameboy             : 1240
CodeLoad            : 1269
Box2D               : 517
zlib                : 1255
Typescript          : 1588

Score (version 9): 933
 duration  332  seconds

@erossignon
Copy link
Member

the overall score version is a good indication of the performance of your box:

Name Score
beaglebone 830
Intel Edison 933
RaspberryPi old tbd

@brycheiniog
Copy link

Here are the results from a RaspberryPi 2 (Not overclocked):

[root@alarmpi benchmark-octane]# node run.js
hostname : alarmpi
node version : v0.12.2
V8 version : 3.28.73
platform & arch : linux arm

config : { target_defaults:
{ cflags: [],
default_configuration: 'Release',
defines: [],
include_dirs: [],
libraries: [ '-lz', '-lssl', '-lcrypto' ] },
variables:
{ arm_float_abi: 'hard',
arm_fpu: 'vfpv3',
arm_neon: 0,
arm_thumb: 0,
arm_version: '7',
clang: 0,
gcc_version: 49,
host_arch: 'arm',
icu_small: false,
node_install_npm: true,
node_prefix: '/usr',
node_shared_cares: false,
node_shared_http_parser: false,
node_shared_libuv: false,
node_shared_openssl: true,
node_shared_v8: false,
node_shared_zlib: true,
node_tag: '',
node_use_dtrace: false,
node_use_etw: false,
node_use_mdb: false,
node_use_openssl: true,
node_use_perfctr: false,
openssl_no_asm: 0,
python2: '/usr/bin/python2',
target_arch: 'arm',
uv_library: 'static_library',
uv_parent_path: '/deps/uv/',
uv_use_dtrace: false,
v8_enable_gdbjit: 0,
v8_enable_i18n_support: 0,
v8_no_strict_aliasing: 1,
v8_optimized_debug: 0,
v8_random_seed: 0,
v8_use_snapshot: false,
want_separate_host_toolset: 0 } }

Richards : 1864
DeltaBlue : 2596
Crypto : 1854
RayTrace : 2804
EarleyBoyer : 2784
RegExp : 308
Splay : 600
SplayLatency : 3684
NavierStokes : 1901
PdfJS : 1154
Mandreel : 1151
MandreelLatency : 790
Gameboy : 2922
CodeLoad : 1439
Box2D : 1309
zlib : 2789

Typescript : 2423

Score (version 9): 1617
duration 170.29 seconds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants