New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monerod Syncing problem #9141
Comments
Please see #9139
How do you know the segfault is related to this longer chain? Can you share a backtrace? |
No segfault issues during sync until it switched to the longer chain. Had to run while loop to keep Monerod from stopping. Tell me what to do to get the backtrace |
It didn't switch to this longer chain, it just logged that a node has sent a new top block condidate. So far I haven't seen another person report that this crashed their node so I'm not sure if it's related to your issue.
Which OS are you using? |
This is on Alpine linux, i'm new on this distro |
What kind of hardware do you use? |
3900x / 16gb ram/ 2.5inch ssd/asrock b550 |
For a backtrace you need gdb installed, and then execute gdb with the monerod binary
then wait for it to load and enter
then monerod should start to sync, wait for it to segfault and enter
and share the output. |
i blocked the IP with longer chain, problem went away. Unable to reproduce the problem now. thanks for the replies *edit: nevermind, seeing segment fault again, will try to reproduce the problem again. |
alpine:~/monero-x86_64-linux-gnu-v0.18.3.1$ gdb monerod For help, type "help". Thread 39 "ld-musl-x86_64." received signal SIGSEGV, Segmentation fault. (gdb) Thread 52 (LWP 26610 "ld-musl-x86_64."): Thread 51 (LWP 26609 "ld-musl-x86_64."): Thread 50 (LWP 26608 "ld-musl-x86_64."): Thread 49 (LWP 26607 "ld-musl-x86_64."): Thread 48 (LWP 26606 "ld-musl-x86_64."): Thread 47 (LWP 26605 "ld-musl-x86_64."): Thread 46 (LWP 26604 "ld-musl-x86_64."): Thread 45 (LWP 26603 "ld-musl-x86_64."): Thread 44 (LWP 26602 "ld-musl-x86_64."): Thread 43 (LWP 26601 "ld-musl-x86_64."): Thread 42 (LWP 26600 "ld-musl-x86_64."): Thread 41 (LWP 26599 "ld-musl-x86_64."): Thread 40 (LWP 26598 "ld-musl-x86_64."): Thread 39 (LWP 26597 "ld-musl-x86_64."): Thread 38 (LWP 26596 "ld-musl-x86_64."): Thread 37 (LWP 26595 "ld-musl-x86_64."): Thread 36 (LWP 26594 "ld-musl-x86_64."): Thread 35 (LWP 26593 "ld-musl-x86_64."): Thread 34 (LWP 26592 "ld-musl-x86_64."): Thread 33 (LWP 26591 "ld-musl-x86_64."): Thread 32 (LWP 26590 "ld-musl-x86_64."): Thread 31 (LWP 26589 "ld-musl-x86_64."): Thread 30 (LWP 26588 "ld-musl-x86_64."): Thread 29 (LWP 26587 "ld-musl-x86_64."): Thread 28 (LWP 26586 "ZMQbg/IO/0"): Thread 27 (LWP 26585 "ZMQbg/Reaper"): Thread 26 (LWP 26584 "ld-musl-x86_64."): Thread 25 (LWP 26583 "ld-musl-x86_64."): Thread 24 (LWP 26582 "ld-musl-x86_64."): Thread 23 (LWP 26581 "ld-musl-x86_64."): Thread 22 (LWP 26580 "ld-musl-x86_64."): Thread 21 (LWP 26579 "ld-musl-x86_64."): Thread 20 (LWP 26578 "ld-musl-x86_64."): Thread 19 (LWP 26577 "ld-musl-x86_64."): Thread 18 (LWP 26576 "ld-musl-x86_64."): Thread 17 (LWP 26575 "ld-musl-x86_64."): Thread 16 (LWP 26574 "ld-musl-x86_64."): Thread 15 (LWP 26573 "ld-musl-x86_64."): Thread 14 (LWP 26572 "ld-musl-x86_64."): Thread 13 (LWP 26571 "ld-musl-x86_64."): Thread 12 (LWP 26570 "ld-musl-x86_64."): Thread 11 (LWP 26569 "ld-musl-x86_64."): Thread 10 (LWP 26568 "ld-musl-x86_64."): Thread 9 (LWP 26567 "ld-musl-x86_64."): Thread 8 (LWP 26566 "ld-musl-x86_64."): Thread 7 (LWP 26565 "ld-musl-x86_64."): Thread 6 (LWP 26564 "ld-musl-x86_64."): Thread 5 (LWP 26563 "ld-musl-x86_64."): Thread 4 (LWP 26562 "ld-musl-x86_64."): Thread 2 (LWP 26560 "ld-musl-x86_64."): Thread 1 (LWP 26557 "ld-musl-x86_64."): |
Can you use paste.debian.net to share the backtrace? Also is this the full log? I'm specifically looking for thread 39, it seems to be missing from your comment. |
sry about that, some parts of the log got cut off. |
Can you make sure everything is updated in your alpine? I see a similar error due to ABI compatibility. And what version of Alpine are you using? |
$ cat /etc/alpine-release |
Great. I downloaded the exact version and sync'ed monerod with my local node. But was not able to reproduce. How familiar are you with package compilation in Alpine? Can you build a debug monero package? I am not familiar with alpine at all, I did a quick search and saw this [1]. |
Hi, thanks for the response. I'm not familiar at all with package compilation in Alpine. |
If we change this line [1]:
to
that would at least generate better debugging information when you are debugging it. It seems the official link for how to build
In the meantime, I left my AlpineOS vm running, but so far no luck. If you are using any specific flags or config to run |
Strange, I see the same behavior. I [165.232.190.164:50514 INC] Sync data returned a new top block candidate: 3076846 -> 3493679 [Your node is 416833 blocks (1.6 years) behind] ? 3493679 |
@Haraade you can ignore it, it's just a node sending false data. It's harmless. |
switched to debian, problem solved itself. |
I am closing this issue as it looks like it is alpine issue. |
My guess is the pthread stack size is too small as musl (not Alpine) has a much lower default than glibc. The (presumed) fix is for Monero to explicitly increase its stack size if the system default is presumably too low. This is monerod failing to run on a widely used environment. While we can declare the environment at fault (an entire libc which has a lot of reasons to use it), I'm affected and would like monerod + musl to work as expected. |
What are the steps that reproduce this bug? |
Run monerod on Alpine. If you have a rootless Docker and Rust toolchain, the following will do that: git clone https://github.com/serai-dex/serai
cd serai
git checkout f0694172ef2cdf7dfde0d286e693243e4bdcacca
cargo run -p serai-orchestrator -- key_gen testnet
cargo run -p serai-orchestrator -- setup testnet
cargo run -p serai-orchestrator -- start testnet monero-daemon This will create a key in a file under The container should SIGSEGV, presumably due to the pthread stack size, within a few minutes (<30, I'd expect, yet I think likely as soon as 5-10). Effectively all of users complained of this, and @j-berman can confirm trivial replication. While we've moved to Debian, that has an increased surface, increased memory requirements, and slower bootup times. This isn't specific to Serai either as Alpine is largely preferred for Docker containers. Alpine is also a Linux distro not exclusive to Docker, so this does potentially have impact to personal machines. If it is the theorized issue (pthread stack size defaults), this actually effects all musl systems. |
Thanks, I ran and synced the entire mainnet blockchain on Alpine and didn't have this issue [1]. If you have specific steps that reproduce this issue I am happy to take a look at it. |
Given that effectively every participant I've had has reported the SIGSEGV, that's my current recommendation. I'll also note that configuration doesn't sync the mainnet blockchain and does have a variety of CLI flags. |
Hi,
trying to sync a pruned node, getting some weird log output.
2024-01-29 20:01:36.247 I [207.244.240.82:18080 OUT] Sync data returned a new top block candidate: 3067647 -> 3072795 [Your node is 5148 blocks (7.2 days) behind]
2024-01-29 20:01:36.248 I SYNCHRONIZATION started
2024-01-29 20:01:37.555 I [110.40.229.103:18080 OUT] Sync data returned a new top block candidate: 3067647 -> 3344842 [Your node is 277195 blocks (1.1 years) behind]
There seems to be another chain, and its ahead for 1.1 years? (3344842 blocks)
I know this cannot be right, because the current chain is only 30676XX blocks
Feels like an attacker trying to disrupt network stability.
Also getting Segment Fault after auto switching to this longer chain.
Is the DNS-blacklist not working properly?
The text was updated successfully, but these errors were encountered: