-
Notifications
You must be signed in to change notification settings - Fork 948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 BUG: Event Log spam when handshake timeout fails #1149
Comments
I don't think this is a bug. This log message is of an "info" level. |
@SamSirry it is not technically broken, but service should be user friendly in the default configuration. Either it should not send messages to Event Log, or this particular message should have its level lowered, or the logging frequency should be reduced for this message. |
Hi @lostmsu - can you help me understand the motivation for preventing these log messages from appearing in the log? We believe that when things are working correctly, you shouldn't need to read the log file. There are many more debug level messages, but the handshake messages are intentionally info level as they are highly pertinent when dealing with support tickets. The log messages occur frequently because a process on your system is attempting to reach out to the node in question and failing. If your host were not trying to access the remote host, or if you were able to handshake successfully, these log messages would not occur. As it stands, there seems to be a potential problem in your network, and that's what these logs are intended to help with. |
Thanks for explaining the impact to your workflow.
This log message occurs when a host attempt to handshake with a peer that presents a certificate with the same VPN IP (Nebula IP) as itself. It is a misconfiguration to have multiple hosts on a network with the same IP address. Therefore this generally means something has gone very wrong, either in the network configuration or internally to Nebula. Can you explain the normal operation that you're seeing this in? |
@johnmaguire I am using a public lighthouse is the only potential reason that comes to mind for "Refusing to handshake with myself". Seeing the message I just assumed it is similar to the behavior of BitTorrent protocol where when you get peer endpoints from a tracker, and try to connect to all of them, you might hit your own endpoint you might not otherwise know about. For instance: you listen behind NAT on [local IP:your port], and you get [public router IP:your port] endpoint which router actually forwards to [local IP:your port]. So when you try to connect to [public router IP:your port], the handshake would notice that the remote peer ID is the same as yours and you drop the connection. |
Thank you again for the context.
The Lighthouse learns underlay IP/port combos from hosts when nodes report their IP addresses. Nodes look at all IP addresses assigned to them (e.g. When behind a NAT, each device behind the NAT will receive a different public port associated with the Nebula instance. So, It is possible that two hosts on separate LANs will report the same private IP address / port combo. Indeed, this is more likely with public Lighthouses, since they are tailored towards home users which likely all use the 192.168.0.0/16 range. FYI, the Nebula project does not support public Lighthouses. Lighthouses, as designed, were not intended to be shared between organizations, and networks should not reuse IP addresses. You can read through #642 and #306 to understand some of the challenges here. (For example, a Lighthouse can only hold on to so many underlay IP addresses [20?] for a given vpn IP - so if you have multiple hosts with the same vpn IP sharing a Lighthouse, you may never learn the underlay IP addresses for the node you are trying to access.) While it is possible that two hosts on separate LANs will get the same IP address and choose the same port on the LAN (especially if configured with a static port), the message would still only appears when these two hosts attempt to handshake with each other. I suspect this is not very common - and this log message is intended to uncover issues such as misconfigured Regarding your initial issue with excessive Nebula logs appearing in the Event Viewer, this occurs when you install Nebula as a service using the "nebula-service" utility (packaged as One option you have is to simply restrict the log verbosity such that only warning and error messages are logged. Alternatively, instead of installing Nebula with |
The "Refusing to handshake with myself" message is outside the scope of my initial bug report, if you wish we can move to discussions for it. TBH I do not understand the scenario where this message appears. E.g. what is the node doing when it eventually ends up printing this message? If it has to do with other nodes having same Re: public lighthouses. I would expect this to work similar to SNI. A single node with a single public
I thought we already discussed that and the issue is that I also want logs to be present, because, for example, I might want to know when a peer is connected/disconnected (definitely info level). But I don't need to know of every single failed attempt to connect to a remote peer, especially if it happens 3 times a minute. Who would unless they are debugging? |
This suggestion is addressed in the linked thread: #306 (comment) That being said, we have no plans to support public Lighthouses.
In my experience, most people only look at Nebula logs when they are debugging. You seem to have a different preference for log levels than others do. These log levels have been set based on years of debugging Nebula issues. I'm not exactly sure what you mean by "I might want to know when a peer is connected/disconnected" but the debug SSH server may be a better fit for that goal. I understood the specific problem you are having is that they are causing a lot of noise in Event Viewer, which is why I suggested logging to a file instead:
|
After discussing with the wider team, we feel that these log levels are too important to move behind a "debug" level. They are invaluable in understanding and debugging users' connectivity issues in Nebula support tickets and on the Nebula OSS Slack channel. We advise decreasing the log verbosity as needed or disconnecting Nebula from Event Viewer if it interferes with your workflow. |
What version of
nebula
are you using? (nebula -version
)1.9.0
What operating system are you using?
Windows
Describe the Bug
Nebula generates an entry in Event Log/Application/Nebula every ~10 seconds when it can't handshake with a remote host.
Logs from affected hosts
I get enormous number of messages like this one:
Config files from affected hosts
N/A
The text was updated successfully, but these errors were encountered: