-
Notifications
You must be signed in to change notification settings - Fork 629
Segmentation fault #4268
Comments
Please cut and paste the actual log output rather than including a screenshot. Screenshots are unreadable for people with high DPI monitors. |
|
What version (use the git hash) is this? Is this repeatable? Ie, it you run it again, do you get the same error? |
That git hash is from Sep 2019. Try checking out tag 3.2.0 which is from Nov 22, 2019. If that does not fix it, I will look at this on Monday morning. Its currently 9pm Friday here. |
thank you @erikd |
@erikd set -euo pipefail if [[ "${1-}" == "--delete-state" ]]; then echo "Keeping state in state-wallet-mainnet" echo "Launching a node connected to 'mainnet' ..." if [ ! -d state-wallet-mainnet/tls ]; then exec /nix/store/ya8iqz0l34w9mszd06ir3pchasryqz4a-cardano-wallet-3.2.0-exe-cardano-node/bin/cardano-node |
There seem to be some extra spaces around the I just did:
and it worked as expected. |
@erikd The process ended abruptly.
Do I need to upgrade my server? now my server : 2c 4GB RAM |
4G should be enough RAM, especically if nothing else is happening on that machine. I am currently running the |
And now it segfaults for me too! At |
I think this error occurs when the process reaches 1gb of memory.Because I restarted the process it can continue to synchronize.I rebooted once, now at 10365th slot of 13th epoch |
I am running on a 16G VM, and I was able to recreate that problem so that is not it. Oh, hang on, you are running on a 64 bit CPU aren't you? |
I also checked out the Then I switched back to the I wonder if there is a peer somewhere on the network that is serving up corrupted blocks. |
admin@ada:/data/ada$ getconf LONG_BIT On Friday, I restarted the node an infinite number of times and finished synchronizing. On Monday, the process was killed. |
What base OS and OS version are you running this on? |
Ubuntu 18.04 |
Ubuntu 18.04 should be fine. I would try deleting the |
8280th slot of 8th epoch |
Ok, if you delete the |
What do you mean? |
Delete |
Do I need to delete this directory when I run it again |
Yes. Thats is the state directory where the node stores blocks. |
The log is not finished. The process is killed.But this time no errors were reported. |
@shenyaqi9527 I need a list of where (ie epoch and slot number) the process gets killed, started from scratch. Please run it again so I can compare it with the last list. |
Do I need to delete "./state-wallet-mainnet" |
|
I am really beginning to suspect that your machine is having hardware issues. Can you run some form of diagnostic on it? |
What epoch/slot are you up to? |
14651st slot of 54th epoch |
17635th slot of 67th epoch.This is where the process is killed. [74788.914053] cardano-node:w[8914]: segfault at 840ccae7a0 ip 00007f4de11d9c55 sp 00007f4dc59cea98 error 6 in libc-2.27.so[7f4de1088000+1aa000] |
The node has been running for over 20 hours under Unfortunately, I am also a bit concerned that |
I gave up on that Trying without |
Another segfault (without valgrind):
According to this, This is probably due to some C code accessed via the C FFI. |
I noticed that the Nix build that we are using is linking to version 5.11 of RocksDB whereas Debian has version 5.17 of that library. Now trying to native Debian build of |
And I almost immediately got a segfault with the version I built without Nix. 😢 |
Currently building a profiled version of |
Ok, profling got me my first traceback:
Going to run it a couple of more times to make sure it crashes the same way each time. |
Have run this a number of times and always get a traceback starting with @dcoutts asked why there is no C code in that traceback. I think that is because of incompatible debug formats. GHC's profiling uses Dwarf debugging symbols and the C code in the backtrace either may not be enabled or may be an incompatible format. |
Has the problem been dealt with? @erikd |
It has not. I have been assigned to other higher priority work. |
No one is dealing with this problem right now? @erikd |
No one that I am aware of. It seems that you are one of the few people who has been hitting this problem, and hence the priority has been downgraded. My advice is to set the node up as a |
We are also running into the same issue with all of our 3 nodes. This is the Dockerfile we are using to run our node:
entrypoint.sh:
nix.conf:
|
I can confirm that it happens both in v3.2.0 and v3.1.0. |
The code base is this repo has been maintenance mode for close to a year. This bug is difficult to reproduce suggesting it is machine specific (insufficient memory?). As such it almost certainly will never be fixed. However, the new code base in the To provide the best advice on your best path forward, it would be useful if you could tell me your goals. |
Thanks @erikd . So, is The goal is to be able to operate a full and reliable node for various operations on the mainnet (create transactions, verify existing ones, etc.). |
Yes! |
What caused the mistake? After running for a while, the error is reported and the process is killed. Then I ran it again and was able to synchronize, but then the error occurred again.Is this the memory limit? This error occurs when the memory reaches 1GB.
![image](https://user-images.githubusercontent.com/56912884/72592554-c8085900-393d-11ea-9262-92af0156d3ba.png)
The text was updated successfully, but these errors were encountered: