autonat (?): node frequently changes its mind about its reachability status #2046

marten-seemann · 2023-02-03T00:41:59Z

Using the event bus metrics (#2038) on a Kubo node with Accelerated DHT client enabled, it looks like the node somewhat frequently changes its mind about its reachability status. From the event metrics, we won't be able to tell what it thinks the availability is, but on a public node I wouldn't expect any changes on a node that has been running for more than 5 minutes or so.

This can have interesting consequences on higher layers. For example, when a node goes private, it will leave the DHT by switching to client mode. I'm wondering how much of the observed churn can be attributed to this.

As expected, this is accompanied by a change in supported protocols, presumably the DHT switching back and forth between client and server mode:

Unexpectedly, we don't observe any change in local addresses. It's not clear to my why we're not obtaining a relay reservation. Maybe we're switching back and forth too quickly to actually obtain the reservation? Alternatively, there could also be a bug in AutoRelay.

This issue suggests that it would be valuable to pick up the AutoNAT metrics (#2017) next. This will hopefully give us a better understanding of what's going on.

cc @Jorropo @dennis-tra @yiannisbot

sukunrt · 2023-02-11T17:41:01Z

I added lots of logs on a kubo node locally and observed that all these changes on my node are happening
when the current autonat.status is Public has address with protocol "quic" and the autonat server we contact dialsback on address with protocol "quic-v1" or if the current address is "quic-v1" and server dialsback with "quic"

the reason(bug?) for emitting a reachability change event are these lines:
https://github.com/libp2p/go-libp2p/blob/master/p2p/host/autonat/autonat.go#L306-L309
They emit an event even when the observed address has changed. This seems incorrect since reachability is still public and has not changed.

LocalAddresses don't change here because both "quic" and "quic-v1" are used to communicate with some peers and they're both available from ObservedAddrsManager.

On this branch: #2086
I see my node status stays public before and after events are emitted.

marten-seemann · 2023-02-12T23:40:50Z

the reason(bug?) for emitting a reachability change event are these lines:
https://github.com/libp2p/go-libp2p/blob/master/p2p/host/autonat/autonat.go#L306-L309
They emit an event even when the observed address has changed. This seems incorrect since reachability is still public and has not changed.

This code clearly comes from a time where nodes where listening on one TCP address, and that's it. Those times are long over... We should:

Not emit an event here. It probably doesn't cause too many problems since the event contains the status, but it's still awkward to emit an EvtLocalReachabilityChanged if the reachability didn't change.
Fix our confidence metric. It will be a (very!) common occurrence that nodes observe a successful dialback on a TCP address, then on a QUIC (draft-29) address, then on a QUIC v1 address, etc. This is only getting worse the more transports we add.

What I'm looking for here is the easiest fix to make this work.

Really, what we should be doing is get the AutoNAT v2 project rolling. AutoNAT should be a system that tests individual addresses for their reachability, and integrate into an "address pipeline". Unfortunately, this is a larger change.

marten-seemann added kind/bug A bug in existing code (including security flaws) need/analysis Needs further analysis before proceeding effort/days Estimated to take multiple days, but less than a week labels Feb 3, 2023

marten-seemann mentioned this issue Feb 7, 2023

go-libp2p v0.26 #2062

Closed

25 tasks

dennis-tra mentioned this issue Feb 7, 2023

swarm: inconsistent listening address reporting #2067

Open

This was referenced Feb 14, 2023

relay circuitv2: relaymanager starts new relays without closing older ones #2091

Closed

autonat: don't emit reachability changed events on address change #2092

Merged

marten-seemann closed this as completed in #2092 Feb 15, 2023

marten-seemann mentioned this issue Feb 16, 2023

identify: deduplicate Identify snapshots #2105

Closed

master255 mentioned this issue Oct 10, 2023

Autonat changes the status to private incorrectly #2602

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autonat (?): node frequently changes its mind about its reachability status #2046

autonat (?): node frequently changes its mind about its reachability status #2046

marten-seemann commented Feb 3, 2023

sukunrt commented Feb 11, 2023

marten-seemann commented Feb 12, 2023

autonat (?): node frequently changes its mind about its reachability status #2046

autonat (?): node frequently changes its mind about its reachability status #2046

Comments

marten-seemann commented Feb 3, 2023

sukunrt commented Feb 11, 2023

marten-seemann commented Feb 12, 2023