Skip to content
This repository has been archived by the owner on May 11, 2022. It is now read-only.

NAT Auto Discovery #1

Merged
merged 51 commits into from
Oct 16, 2018
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
32e8ab9
protobuf
vyzo May 5, 2018
70f7dd8
basic client
vyzo May 5, 2018
f3d9a24
add E_BAD_REQUEST to protobuf
vyzo May 6, 2018
cc058d6
service implementation
vyzo May 6, 2018
ef097b5
NAT autodetection
vyzo May 6, 2018
6efad8f
remove left-over design notes
vyzo May 6, 2018
2aa66e5
don't delete autonat peers on disconnect, just mark them as disconnected
vyzo May 6, 2018
ea43bf5
we only track autonat peers
vyzo May 6, 2018
00fb7e7
fix autonat peer tracking
vyzo May 6, 2018
0377627
add TODO in service about skipping private network addresses
vyzo May 6, 2018
7fad996
add network notifee
vyzo May 6, 2018
9af8715
don't try to dial private network addresses
vyzo May 6, 2018
bc41c7a
add localhost to private addr ranges
vyzo May 6, 2018
dcbcfce
no need to select; it's a one shot sync
vyzo May 7, 2018
9efd0ec
typed NATStatus constants
vyzo May 7, 2018
aaaa90e
bump initial autodiscovery delay to 15s
vyzo May 7, 2018
1562e1b
AutoNATState is AmbientAutoNAT
vyzo May 8, 2018
6d4bc41
variables for background delays
vyzo May 8, 2018
fa14117
named magic number incantations
vyzo May 9, 2018
d16ca79
refactor getPeers for locked scope
vyzo May 9, 2018
b1733eb
don't throw away read errors; log them.
vyzo May 9, 2018
bb5cad4
simplify autonat client itnerface
vyzo May 9, 2018
cd7a875
mutex hat
vyzo May 11, 2018
7b3981e
docstrings and another mutex hat.
vyzo May 11, 2018
cf04a09
improve docstring for NewAutoNAT
vyzo May 11, 2018
7c097ed
improve NATStatusUknown docstring
vyzo May 11, 2018
5837cc5
fix typo
vyzo May 12, 2018
56a0966
update gx deps
vyzo Sep 7, 2018
54fb466
regenerate protobuf
vyzo Sep 7, 2018
66ca387
svc: construct dialer host without listen addrs
vyzo Sep 8, 2018
3abf9c7
accept libp2p options for the dialer constructor in NewAutoNATService
vyzo Sep 8, 2018
3b679e0
make service dialback timeout configurable; useful for tests
vyzo Sep 8, 2018
1cba297
basic service tests
vyzo Sep 8, 2018
dd7c7a9
Makefile and travis build file
vyzo Sep 8, 2018
9ff7df3
test for ambient autonat functionality
vyzo Sep 8, 2018
0fdf1b0
address review comments
vyzo Sep 29, 2018
46d352f
use the protocol list by identify, don't emit chatter on every connec…
vyzo Sep 29, 2018
0a4e215
add observed address to the dialback set
vyzo Oct 3, 2018
91c209c
ensure test hosts are only on loopback
vyzo Oct 3, 2018
00d2fea
add /libp2p prefix in protocol string
vyzo Oct 3, 2018
8ea9f1b
configurable throttle for service rate limiter
vyzo Oct 3, 2018
d9a0d1a
call AutoNATService.peers something else (reqs)
vyzo Oct 3, 2018
aadb8db
use more peer dial errors for increased confidence in NATPrivate state
vyzo Oct 4, 2018
d7f55b0
use more peers if less than 3 are connected
vyzo Oct 4, 2018
852f4e0
adjust AutoNATRetryInterval in autonat tests
vyzo Oct 4, 2018
9c8ee52
increase identify delay to 500ms
vyzo Oct 12, 2018
b2c65b0
jenkins file
vyzo Oct 12, 2018
8d2e2ae
add README
vyzo Oct 12, 2018
9ef3734
reduce depgraph to just go-libp2p, update to 6.0.19
vyzo Oct 12, 2018
6a3a9cb
Add configurable Identify delay, with default value of 5 secs
vyzo Oct 16, 2018
67bccae
add docstring for confidence
vyzo Oct 16, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
70 changes: 70 additions & 0 deletions addr.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
package autonat

import (
"net"

ma "github.com/multiformats/go-multiaddr"
)

var private4, private6 []*net.IPNet
var privateCIDR4 = []string{
// localhost
"127.0.0.0/8",
// private networks
"10.0.0.0/8",
"100.64.0.0/10",
"172.16.0.0/12",
"192.168.0.0/16",
// link local
"169.254.0.0/16",
}
var privateCIDR6 = []string{
// localhost
"::1/128",
// ULA reserved
"fc00::/7",
// link local
"fe80::/10",
}

func init() {
private4 = parsePrivateCIDR(privateCIDR4)
private6 = parsePrivateCIDR(privateCIDR6)
}

func parsePrivateCIDR(cidrs []string) []*net.IPNet {
ipnets := make([]*net.IPNet, len(cidrs))
for i, cidr := range cidrs {
_, ipnet, err := net.ParseCIDR(cidr)
if err != nil {
panic(err)
}
ipnets[i] = ipnet
}
return ipnets
}

func isPublicAddr(a ma.Multiaddr) bool {
ip, err := a.ValueForProtocol(ma.P_IP4)
if err == nil {
return !inAddrRange(ip, private4)
}

ip, err = a.ValueForProtocol(ma.P_IP6)
if err == nil {
return !inAddrRange(ip, private6)
}

return false
}

func inAddrRange(s string, ipnets []*net.IPNet) bool {
ip := net.ParseIP(s)
for _, ipnet := range ipnets {
if ipnet.Contains(ip) {
return true
}
}

return false
}
160 changes: 160 additions & 0 deletions autonat.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
package autonat

import (
"context"
"errors"
"math/rand"
"sync"
"time"

host "github.com/libp2p/go-libp2p-host"
peer "github.com/libp2p/go-libp2p-peer"
ma "github.com/multiformats/go-multiaddr"
)

type NATStatus int

const (
NATStatusUnknown NATStatus = iota
NATStatusPublic
NATStatusPrivate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about "no nat"? Do we need that state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does that state mean though? We have Uknown and Public -- no nat is equivalent to public.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was thinking:

  • NatStatusPrivate -> Behind a nat and undiablable.
  • NatStatusPublic -> Behind a nat and dialable.

(although we may not need to track the undialable case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more of "dialable" or not "dialable".
Do we gain anything by knowing that there is no NAT whatsoever?
Note that the inference might be hard to make.

)

var (
AutoNATBootDelay = 15 * time.Second
AutoNATRefreshInterval = 15 * time.Minute

AutoNATRequestTimeout = 60 * time.Second
)

type AutoNAT interface {
Status() NATStatus
PublicAddr() (ma.Multiaddr, error)
}

type AmbientAutoNAT struct {
magik6k marked this conversation as resolved.
Show resolved Hide resolved
ctx context.Context
host host.Host
peers map[peer.ID]struct{}
status NATStatus
addr ma.Multiaddr
mx sync.Mutex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We started using anonymous/nested struct pattern to describe what is protected by mutex.
I am not 100% sure if the peers map is the only thing but if it is it would look:

peers struct {
    sync.Mutex
    set map[peer.ID]struct{}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@vyzo vyzo May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the mutex protects the state/address as well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the mutex hat idiom maybe clearer then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mutex hat in cd7a875

}

func NewAutoNAT(ctx context.Context, h host.Host) AutoNAT {
as := &AmbientAutoNAT{
ctx: ctx,
host: h,
peers: make(map[peer.ID]struct{}),
status: NATStatusUnknown,
}

h.Network().Notify(as)
go as.background()

return as
}

func (as *AmbientAutoNAT) Status() NATStatus {
return as.status
}

func (as *AmbientAutoNAT) PublicAddr() (ma.Multiaddr, error) {
as.mx.Lock()
defer as.mx.Unlock()

if as.status != NATStatusPublic {
return nil, errors.New("NAT Status is not public")
}

return as.addr, nil
}

func (as *AmbientAutoNAT) background() {
// wait a bit for the node to come online and establish some connections
// before starting autodetection
time.Sleep(AutoNATBootDelay)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that important but this should probably select in a context and a time.After.

for {
as.autodetect()
select {
case <-time.After(AutoNATRefreshInterval):
case <-as.ctx.Done():
return
}
}
}

func (as *AmbientAutoNAT) autodetect() {
peers := as.getPeers()

if len(peers) == 0 {
log.Debugf("skipping NAT auto detection; no autonat peers")
return
}

cli := NewAutoNATClient(as.host)

for _, p := range peers {
ctx, cancel := context.WithTimeout(as.ctx, AutoNATRequestTimeout)
a, err := cli.Dial(ctx, p)
cancel()

switch {
case err == nil:
log.Debugf("NAT status is public; address through %s: %s", p.Pretty(), a.String())
as.mx.Lock()
as.addr = a
as.status = NATStatusPublic
as.mx.Unlock()
return

case IsDialError(err):
log.Debugf("NAT status is private; dial error through %s: %s", p.Pretty(), err.Error())
as.mx.Lock()
as.status = NATStatusPrivate
as.mx.Unlock()
return

default:
log.Debugf("Error dialing through %s: %s", p.Pretty(), err.Error())
}
}

as.mx.Lock()
as.status = NATStatusUnknown
as.mx.Unlock()
}

func (as *AmbientAutoNAT) getPeers() []peer.ID {
as.mx.Lock()
defer as.mx.Unlock()

if len(as.peers) == 0 {
return nil
}

peers := make([]peer.ID, 0, len(as.peers))
for p := range as.peers {
if len(as.host.Network().ConnsToPeer(p)) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Network().Connectedness() == inet.Connected avoids allocating.

peers = append(peers, p)
}
}

if len(peers) == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this black or white decision could yield erratic results, e.g. if you only have 1 active connection to an autonat peer, we're going to restrict our query to a single peer. I much rather have a minimum threshold we strive for, e.g. 5 peers, for resilience purposes, starting with peers we hold a connection to.

As it is, the autodetect does a round-robin, so no risk of establishing redundant connections if we shuffle the connected and non-connected sublists separately, i.e.

A..G: connected
U..Z: not connected

A B C D E F G || U V W X Y Z
<- shuffle ->   <- shuffle ->

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow. The code tries to use an already existing connection purely to avoid creating unnecessary new connections.
Do you want to try multiple peers? And each peer multiple times?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code tries to use an already existing connection purely to avoid creating unnecessary new connections.

Currently, if we happen to be connected to 1 autonat peer only, we'll restrict ourselves to it. If it fails, we're out of luck. This makes us fragile, especially because we expect scarcity in autonat peers.

What I'm proposing is to target N peers (e.g. 5), preferring connected peers, and falling back to disconnected ones to fill up the slice. To avoid connected and unconnected peers getting mixed up in the shuffle, we keep track of the pivot index and shuffle both sublists separately.

Since autodetect is round-robin, it'll only resort to disconnected peers if the connected ones fail. This makes us more resilient overall.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine, but do we want to dial more than one peers when we get a DIAL_ERROR?

Copy link
Member

@raulk raulk Oct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. I'm not sure I have the answer. Right now we flag NATStatusPrivate on the first dial err, and abort. However, what if that peer is behind some kind of firewall (corporate, geographical, etc.)? Wouldn't it be better to corroborate with more observations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could do a few more tries if we have more known autonat peers, but accept the failure if we don't have enough.
I would arbitrarily go for "3 times is enemy action".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could do a few more tries if we have more known autonat peers, but accept the failure if we don't have enough.

Yep. If we don't have enough, we'd defer to the next iteration. If by then we've found more autonat peers, with this new logic we'll query them even if not connected, and hence have a chance to improve our connectivity.

I would arbitrarily go for "3 times is enemy action".

We detect "enemy action" on the receiving side through the throttling, no? (3 is fine for that)

That makes me realise that we should probably move peers who have sent us E_DIAL_REFUSED or E_INTERNAL_ERROR to a blacklist, to avoid dwelling on them, and to be well-behaved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably too much complexity for marginal improvement :)

Also, I think I want some slightly more clever strategy for making multiple dial attempts -- if our nat status was unknown or public, then try 3 times.
If it was private, then a single failure should be enough to convince us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented the "3 times is enemy action" strategy in aadb8db, with memory of past failures so that it stops asking multiple peers once it has enough confidence we are NATed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In d7f55b0 we ensure that we have at least 3 autonat peers in the candidate set, even when we are connected to less than that.
This uses the strategy you suggested for ordering.

// we don't have any open connections, try any autonat peer that we know about
for p := range as.peers {
peers = append(peers, p)
}
}

shufflePeers(peers)

return peers
}

func shufflePeers(peers []peer.ID) {
for i := range peers {
j := rand.Intn(i + 1)
peers[i], peers[j] = peers[j], peers[i]
}
}
91 changes: 91 additions & 0 deletions client.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
package autonat

import (
"context"
"fmt"

pb "github.com/libp2p/go-libp2p-autonat/pb"

ggio "github.com/gogo/protobuf/io"
host "github.com/libp2p/go-libp2p-host"
inet "github.com/libp2p/go-libp2p-net"
peer "github.com/libp2p/go-libp2p-peer"
pstore "github.com/libp2p/go-libp2p-peerstore"
ma "github.com/multiformats/go-multiaddr"
)

type AutoNATClient interface {
Dial(ctx context.Context, p peer.ID) (ma.Multiaddr, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this something else? When I see "Dial" I think "establish a connection". When I saw this function used in the code, I had absolutely no idea why dialing a peer would tell us anything about our NAT status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will rename.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Called it DialBack.

}

type AutoNATError struct {
Status pb.Message_ResponseStatus
Text string
}

func NewAutoNATClient(h host.Host) AutoNATClient {
return &client{h: h}
}

type client struct {
h host.Host
}

func (c *client) Dial(ctx context.Context, p peer.ID) (ma.Multiaddr, error) {
s, err := c.h.NewStream(ctx, p, AutoNATProto)
if err != nil {
return nil, err
}
defer s.Close()

r := ggio.NewDelimitedReader(s, inet.MessageSizeMax)
w := ggio.NewDelimitedWriter(s)

req := newDialMessage(pstore.PeerInfo{ID: c.h.ID(), Addrs: c.h.Addrs()})
err = w.WriteMsg(req)
if err != nil {
return nil, err
}

var res pb.Message
err = r.ReadMsg(&res)
if err != nil {
return nil, err
}

if res.GetType() != pb.Message_DIAL_RESPONSE {
return nil, fmt.Errorf("Unexpected response: %s", res.GetType().String())
}

status := res.GetDialResponse().GetStatus()
switch status {
case pb.Message_OK:
addr := res.GetDialResponse().GetAddr()
return ma.NewMultiaddrBytes(addr)

default:
return nil, AutoNATError{Status: status, Text: res.GetDialResponse().GetStatusText()}
}
}

func (e AutoNATError) Error() string {
return fmt.Sprintf("AutoNAT error: %s (%s)", e.Text, e.Status.String())
}

func (e AutoNATError) IsDialError() bool {
return e.Status == pb.Message_E_DIAL_ERROR
}

func (e AutoNATError) IsDialRefused() bool {
return e.Status == pb.Message_E_DIAL_REFUSED
}

func IsDialError(e error) bool {
ae, ok := e.(AutoNATError)
return ok && ae.IsDialError()
}

func IsDialRefused(e error) bool {
ae, ok := e.(AutoNATError)
return ok && ae.IsDialRefused()
}
31 changes: 31 additions & 0 deletions notify.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
package autonat

import (
inet "github.com/libp2p/go-libp2p-net"
peer "github.com/libp2p/go-libp2p-peer"
ma "github.com/multiformats/go-multiaddr"
)

var _ inet.Notifiee = (*AmbientAutoNAT)(nil)

func (as *AmbientAutoNAT) Listen(net inet.Network, a ma.Multiaddr) {}
func (as *AmbientAutoNAT) ListenClose(net inet.Network, a ma.Multiaddr) {}
func (as *AmbientAutoNAT) OpenedStream(net inet.Network, s inet.Stream) {}
func (as *AmbientAutoNAT) ClosedStream(net inet.Network, s inet.Stream) {}

func (as *AmbientAutoNAT) Connected(net inet.Network, c inet.Conn) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we really don't need a large set of peers that support this protocol. Instead of testing every one, How about we:

a. Keep a list of known autonat peers (discovered as we try to use them, not when we first connect).
b. Keep a list of known non-autonat peers.

Then, periodically*, we can:

  1. Try every connected peer in the known good set.
  2. Try every open connection not in the bad set, adding peers to the good set and bad set as we try them.

That way we aren't unnecessarily noisy.

*later, we can get even fancier and set the period to be "time since last inbound connection from a public address", or something like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can reduce the noise by simply checking on the protocols reported by identify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to look at the protocols reported by identify through the peerstore in 46d352f
I don't quite like the delay for identify, but it seems to be necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite like the delay for identify, but it seems to be necessary.

Yeah... that annoys me to me to no end as well.

Copy link
Member

@raulk raulk Oct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identify keeps the stream open after identification. If we made it close the stream, we could hook onto the ClosedStream event to know when identify was over deterministically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's really not. I know it should work, but that's just horrible.

Copy link
Member

@raulk raulk Oct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, but that's all we have now 😅

Should we think about setting up some kind of "in-mem event bus" so that different layers of libp2p can emit and react to events? Identify would then emit a protocols:identify/1.0.0:complete event when it finished, or a ...:error one if it failed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... Ideally services would just hook into identify (or the peerstore? that doesn't seem right) and get called when we connect to a peer supporting protocol X.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the delay configurable, with an initial value of 5 sec (per @magik6k's suggestion)

go func(p peer.ID) {
s, err := as.host.NewStream(as.ctx, p, AutoNATProto)
if err != nil {
return
}
s.Close()

log.Infof("Discovered AutoNAT peer %s", p.Pretty())
as.mx.Lock()
as.peers[p] = struct{}{}
as.mx.Unlock()
}(c.RemotePeer())
}

func (as *AmbientAutoNAT) Disconnected(net inet.Network, c inet.Conn) {}