New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No Inbound peers after upgrade #29312
Comments
Besu team is tracking a similar issue, Besu only likes to peer with non-geth peers at the moment. Watching. |
Geth peering issues should not make a difference in attestations, because they are handled by the beacon chain client. |
@J1a-wei can you please double-check your firewall configuration? There haven't been any changes in geth's p2p code between the releases you mentioned. |
Hi, @fjl , We contacted the AWS engineer the day before yesterday (our nodes are deployed on AWS) and confirmed that it is not a firewall or security group issue. The beacon chain is experiencing the same issue; there are no inbound peers. We conducted comparative experiments and found that with inbound peers, the number of missed stakings is significantly reduced, with a gap of approximately half. |
Considering there is no relation to the peering on EL vs. CL nodes, this seems to imply some type of configuration issue. |
hi @lightclient Additionally, earlier on, I didn't have NAT mapping enabled. Our monitoring program feedback indicated a relatively high probability of signature loss. When NAT mapping was enabled and we had inbound peers, the overall situation improved by about double. This is based on our practical experience. Ref: https://www.symphonious.net/2021/08/14/exploring-eth2-why-open-ports-matter/ |
I need to correct, the inbound peer is not 0, all of them are our own intranet addresses. There are no external peers. Our test command is: Geth
Prysm or Lighthouse
Additionally, it's worth noting that 172.1.xx.xx is our K8s cidr, and within this subnet, we have deployed some testnet nodes such as holesky, sepolia, and so on. However, only the mainnet has NAT mappin |
My point is that it doesn't really matter how well connected your EL in regards to attestation performance. What I find interesting is that two completely different pieces of software / p2p stacks are having similar peering issues. Do all of your CL clients have peering issues or just prysm?
What is the number of inbound peers which are your own intranet addresses? Given |
Hi @lightclient Our own intranet inbound peers are currently only 5. So I don't think that's the issue. In addition, I have re-run geth + lighthouse on a new machine with the same parameters, and it is currently running well with plenty of inbound peers. So I am somewhat suspicious that maybe we are under attack? |
In your screenshot, I can see you have tried to connect to the node using I'm asking you to do this to confirm whether your firewall permits inbound UDP traffic correctly. |
@J1a-wei on the kubernetes side, I'm assuming that $MY_PUBLIC_IP is the IP of the node where the pod is running, is that right? Based on the port numbers that you used on the example, I assume you're using a Service Type "NodePort" , am I correct? If that's the case, did you enable |
OP - can you run What do you see under discovery?
My discovery port consistently changes from my setting (30303) to a random value. This happens shortly after start. I see it change in the logs, but I'm not sure what triggers it. This caused low inbound for me, especially when I first booted up geth, because I have all ports blocked on my machine except for what's required (IE 30303 tcp/udp). The longer my discovery port was as expected (IE 30303), the more inbound I have. Curious if this is your issue? This is also new to me, and I've searched everywhere / havent found anything. Just started happening. Running Ubuntu 23.04. |
@fjl
@skylenet When we first deployed, we didn't enable 'externalTrafficPolicy: Local', and the p2p was still in a normal state. From what I've researched, if set to 'cluster', it would add an extra hop to the network. What impact would this have? Also, I'm quite curious about the Ethereum P2P communication principle. After each node starts up, it generates an ENR. However, I've observed that the ENR doesn't seem to regenerate when the node restarts. How can I make it regenerate, or where is it stored? @amplicity
|
@J1a-wei did this resolve itself 'organically'? Or is peering still a problem? Or did you find out some other cause of this? |
This seems to be a bug related to both prysm and geth. |
Besu team is no longer concerned this may be related. |
System information
Geth version:
geth version
geth v1.13.14
prysm v5.0.1
geth command
prysm command
Expected behaviour
lot of inbound peers
Actual behaviour
No inbound peers
Steps to reproduce the behaviour
After the upgrade in Cancun, my geth version also underwent an upgrade from 1.13.8 to 1.13.14.
However, I noticed that the number of peers from inbound connections became 0, which is very unfriendly for staking nodes. Our statistical analysis indicates that the miss att has doubled in the past week.
I suspect there has been a change in the p2p code, or perhaps this is a bug.
We start through containers and NAT our p2p port. Here is our startup command:
I have also conducted some tests and troubleshooting. I found that when I perform tests inside the container using localhost telnet or using the devp2p tool, it returns results correctly and retains them for a period of time.
But when accessed from outside, for example, using the local network card address, the server (geth) immediately returns EOF. I captured packets and found the same.
This makes me very curious.
Trace
When I access the p2p port via 127.0.0.1, it consistently returns data for me. However, when I use the local public network card, a significant number of p2p connections are dropped out of 10 attempts.
I used tcpdump to capture traffic. Through analysis, it can be seen that geth actively sends FIN packets to the client.
Any help is greatly appreciated.
The text was updated successfully, but these errors were encountered: