Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Active Node gave errors (exited: arango (exit status 1; not expected), After reboot will not not start: (Error: connect ECONNREFUSED 127.0.0.1:8529)+ writing log files in engine-rocksdb/journals/ #1611

Closed
Fluzyxy opened this issue Aug 16, 2021 · 2 comments

Comments

@Fluzyxy
Copy link

Fluzyxy commented Aug 16, 2021

Expected Behavior

OTnode starts up and the diff/var/lib/arangodb3/engine-rocksdb/journals/ is not filled up with logfiles (or archived+ garbage collected ).

Actual Behavior

The problem is split up into 2 areas

  • When it was running before 14/08 (problem started probably on 08/08)
  • After I tried to restart it >14/08 Node does not start

14/08<
**Last week I noticed I did win hardly any jobs. So I logged into my nodes(14/08). They were still running but 2 were producing errors and space occupied was around 68GB. While it Should be around 30GB

The logging showed it started on 8/8**
2021-08-08 11:46:16,602 INFO exited: arango (exit status 1; not expected)
2021-08-08 11:46:17,642 INFO spawned: 'arango' with pid 53967
2021-08-08 11:46:18,646 INFO success: arango entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

Other error
2021-08-12T07:34:44.320Z - error - Unhandled Rejection:
Error: Timed out waiting for response
at KademliaNode._timeout (/ot-node/5.1.0/node_modules/@deadcanaries/kadence/ lib/node-abstract.js:260:15)
at Timeout.setInterval [as _onTimeout] (/ot-node/5.1.0/node_modules/@deadcan aries/kadence/lib/node-abstract.js:172:28)
at ontimeout (timers.js:466:11)
at tryOnTimeout (timers.js:304:5)
at Timer.listOnTimeout (timers.js:267:5)

The Arango logging started on 08/08 with

2021-08-08T11:46:16Z [12] ERROR [8a210] JavaScript exception in file '/usr/share/arangodb3/js/common/bootstrap/modules.js' at 68,37: ArangoError 2: cannot get current working directory: No such file or directory
2021-08-08T11:46:16Z [12] ERROR [409ee] ! const ROOT_PATH = fs.normalize(fs.makeAbsolute(internal.startupPath));
2021-08-08T11:46:16Z [12] ERROR [cb0bd] ! ^
2021-08-08T11:46:16Z [12] ERROR [8a210] JavaScript exception in file '/usr/share/arangodb3/js/common/bootstrap/modules.js' at 68,37: ArangoError 2: cannot get current working directory: No such file or directory
2021-08-08T11:46:16Z [12] ERROR [409ee] ! const ROOT_PATH = fs.normalize(fs.makeAbsolute(internal.startupPath));
2021-08-08T11:46:16Z [12] ERROR [cb0bd] ! ^
2021-08-08T11:46:16Z [12] FATAL [69ac3] {v8} error during execution of JavaScript file 'server/initialize.js'
2021-08-08T11:46:17Z [53967] INFO [e52b0] ArangoDB 3.5.3 [linux] 64bit, using jemalloc, build tags/v3.5.3-0-gf9ff700153, VPack 0.1.33, RocksDB 6.2.0, ICU 58.1, V8 7.1.302.28, OpenSSL 1.1.1d 10 Sep 2019
2021-08-08T11:46:17Z [53967] INFO [75ddc] detected operating system: Linux version 5.4.0-80-generic (buildd@lcy01-amd64-030) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #90-Ubuntu SMP Fri Jul 9 22:49:44 UTC 2021
2021-08-08T11:46:17Z [53967] WARNING [118b0] {memory} maximum number of memory mappings per process is 65530, which seems too low. it is recommended to set it to at least 128000
2021-08-08T11:46:17Z [53967] WARNING [49528] {memory} execute 'sudo sysctl -w "vm.max_map_count=128000"'
2021-08-08T11:46:17Z [53967] INFO [43396] {authentication} Jwt secret not specified, generating...
2021-08-08T11:46:17Z [53967] INFO [144fe] using storage engine rocksdb
2021-08-08T11:46:17Z [53967] INFO [3bb7d] {cluster} Starting up with role SINGLE
2021-08-08T11:46:17Z [53967] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 1048576, soft limit is 1048576
2021-08-08T11:46:17Z [53967] INFO [3844e] {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on
2021-08-08T11:46:17Z [53967] WARNING [d5c49] {engines} ignoring value for option --rocksdb.max-write-buffer-number because it is lower than recommended
2021-08-08T11:46:27Z [53967] ERROR [8a210] JavaScript exception in file '/usr/share/arangodb3/js/common/bootstrap/modules.js' at 68,37: ArangoError 2: cannot get current working directory: No such file or directory
2021-08-08T11:46:27Z [53967] ERROR [409ee] ! const ROOT_PATH = fs.normalize(fs.makeAbsolute(internal.startupPath));

and today..when I try to restart OT node
2021-08-16T07:41:08Z [12] INFO [43396] {authentication} Jwt secret not specified, generating...
2021-08-16T07:41:08Z [12] INFO [144fe] using storage engine rocksdb
2021-08-16T07:41:08Z [12] INFO [3bb7d] {cluster} Starting up with role SINGLE
2021-08-16T07:41:08Z [12] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 1048576, soft limit is 1048576
2021-08-16T07:41:08Z [12] INFO [3844e] {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on
2021-08-16T07:41:08Z [12] WARNING [b387d] found existing lockfile '/var/lib/arangodb3/LOCK' of previous process with pid 13, but that process seems to be dead already
2021-08-16T07:41:08Z [12] WARNING [d5c49] {engines} ignoring value for option --rocksdb.max-write-buffer-number because it is lower than recommended
2021-08-16T07:41:24Z [12] INFO [6ea38] using endpoint 'http+tcp://0.0.0.0:8529' for non-encrypted requests
2021-08-16T07:41:27Z [12] INFO [e52b0] ArangoDB 3.5.3 [linux] 64bit, using jemalloc, build tags/v3.5.3-0-gf9ff700153, VPack 0.1.33, RocksDB 6.2.0, ICU 58.1, V8 7.1.302.28, OpenSSL 1.1.1d 10 Sep 2019
2021-08-16T07:41:27Z [12] INFO [75ddc] detected operating system: Linux version 5.4.0-81-generic (buildd@lgw01-amd64-052) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #91-Ubuntu SMP Thu Jul 15 19:09:17 UTC 2021
2021-08-16T07:41:27Z [12] WARNING [118b0] {memory} maximum number of memory mappings per process is 65530, which seems too low. it is recommended to set it to at least 128000
2021-08-16T07:41:27Z [12] WARNING [49528] {memory} execute 'sudo sysctl -w "vm.max_map_count=128000"'
2021-08-16T07:41:27Z [12] INFO [43396] {authentication} Jwt secret not specified, generating...
2021-08-16T07:41:27Z [12] INFO [144fe] using storage engine rocksdb
2021-08-16T07:41:27Z [12] INFO [3bb7d] {cluster} Starting up with role SINGLE
2021-08-16T07:41:27Z [12] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 1048576, soft limit is 1048576
2021-08-16T07:41:27Z [12] INFO [3844e] {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on
2021-08-16T07:41:27Z [12] WARNING [ad4b2] found existing lockfile '/var/lib/arangodb3/LOCK' of previous process with pid 12, and that process seems to be still running
2021-08-16T07:41:27Z [12] WARNING [d5c49] {engines} ignoring value for option --rocksdb.max-write-buffer-number because it is lower than recommended

14/08>
My node is unable to start up anymore and the diff/var/lib/arangodb3/engine-rocksdb/journals/ is filled with small log files, but taking huge amount of space (41000 files of 1KB taking 40GB). Writing starts at the moment of a reboot/restart when the docker container is active again. Normal space should be around 30GB, But my node is now at 70GB

Every time my Node is started I get the following error

2021-08-16T07:40:22.288Z - error - Please make sure Arango server is up and running
{ Error: connect ECONNREFUSED 127.0.0.1:8529
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1174:14)
errno: 'ECONNREFUSED',
code: 'ECONNREFUSED',
syscall: 'connect',
address: '127.0.0.1',
port: 8529,
response: undefined }
2021-08-16T07:40:22.290Z - error - Whoops, terminating with code: 1
2021-08-16 07:40:22,299 INFO exited: otnode (exit status 1; expected)
2021-08-16 07:40:23,301 WARN received SIGTERM indicating exit request
2021-08-16 07:40:23,301 INFO waiting for remote_syslog, arango, otnodelistener to die
2021-08-16 07:40:26,137 INFO stopped: arango (terminated by SIGTERM)

I did reboot/restart/kill/ changed swapfile size from 1Gb to 6GB Nothing helped.
However.... Strangely ONE of the Nodes purged the 40GB logfiles and started working again after a reboot (yesterday).

Steps to Reproduce the Problem

  1. Docker start OT node Since I can not start the node anymore, The first part I can not reproduce anymore

Specifications

  • Version: OT node 5.1.0

  • Platform: Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-81-generic x86_64) Linux version 5.4.0-81-generic (buildd@lgw01-amd64-052) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) How to create smart contract for V0.6.0a #91-Ubuntu SMP Thu Jul 15 19:09:17 UTC 2021

  • ArangoDB 3.5.3 [linux] 64bit, using jemalloc, build tags/v3.5.3-0-gf9ff700153, VPack 0.1.33, RocksDB 6.2.0, ICU 58.1, V8 7.1.302.28, OpenSSL 1.1.1d 10 Sep 2019

  • -Digital Ocean 4GB 80GB Storage

@Fluzyxy
Copy link
Author

Fluzyxy commented Aug 19, 2021

The workaround (not a root cause solution or explanation why it crashed and was writing in journals) has been
find . -type f -name "*.log" -delete in /var/lib/docker/overlay2

  • reboot.

@kotlarmilos
Copy link
Contributor

This issue was resolved with our latest release (5.1.2). If you believe this is still a problem, please create a new issue and confirm that it is reproducible in the current ot-node release version. We are working towards closing open issues that meet specific criteria and ask you to create a new one for those that are truly bugs in the current release. We'll be monitoring those issues so that they are properly managed.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants