Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection errors found when use listening addess '0.0.0.0'. #7961

Closed
disksing opened this issue May 22, 2017 · 13 comments
Closed

Connection errors found when use listening addess '0.0.0.0'. #7961

disksing opened this issue May 22, 2017 · 13 comments
Labels

Comments

@disksing
Copy link
Contributor

Bug reporting

A good bug report has some very specific qualities, so please read over our short document on reporting bugs before submitting a bug report.

To ask a question, go ahead and ignore this.


Hey,
I'm trying to run etcd with listening address '0.0.0.0', then etcd server ends up repeatedly producing the error 'transport: dial tcp [::]:3379: connect: network is unreachable'.
I'm not sure where the transport is started, but I guess you want to use the advertised addr instead of listening addr.


Outout:

$ ./etcd --listen-client-urls 'http://0.0.0.0:3379' --advertise-client-urls 'http://192.168.199.118:3379' --listen-peer-urls 'http://192.168.199.118:3380'
2017-05-22 12:01:42.648920 I | etcdmain: etcd Version: 3.2.0-rc.1+git
2017-05-22 12:01:42.648961 I | etcdmain: Git SHA: 4cd5e7e
2017-05-22 12:01:42.648966 I | etcdmain: Go Version: go1.8
2017-05-22 12:01:42.648971 I | etcdmain: Go OS/Arch: linux/amd64
2017-05-22 12:01:42.648976 I | etcdmain: setting maximum number of CPUs to 8, total number of available CPUs is 8
2017-05-22 12:01:42.648985 W | etcdmain: no data-dir provided, using default data-dir ./default.etcd
2017-05-22 12:01:42.649014 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2017-05-22 12:01:42.649116 I | embed: listening for peers on http://192.168.199.118:3380
2017-05-22 12:01:42.649190 I | embed: listening for client requests on 0.0.0.0:3379
2017-05-22 12:01:42.650673 I | etcdserver: name = default
2017-05-22 12:01:42.650684 I | etcdserver: data dir = default.etcd
2017-05-22 12:01:42.650689 I | etcdserver: member dir = default.etcd/member
2017-05-22 12:01:42.650694 I | etcdserver: heartbeat = 100ms
2017-05-22 12:01:42.650699 I | etcdserver: election = 1000ms
2017-05-22 12:01:42.650704 I | etcdserver: snapshot count = 100000
2017-05-22 12:01:42.650712 I | etcdserver: advertise client URLs = http://192.168.199.118:3379
2017-05-22 12:01:42.651090 I | etcdserver: restarting member 8e9e05c52164694d in cluster cdf818194e3a8c32 at commit index 4
2017-05-22 12:01:42.651118 I | raft: 8e9e05c52164694d became follower at term 2
2017-05-22 12:01:42.651129 I | raft: newRaft 8e9e05c52164694d [peers: [], term: 2, commit: 4, applied: 0, lastindex: 4, lastterm: 2]
2017-05-22 12:01:42.702847 W | auth: simple token is not cryptographically signed
2017-05-22 12:01:42.755776 I | etcdserver: starting server... [version: 3.2.0-rc.1+git, cluster version: to_be_decided]
2017-05-22 12:01:42.756689 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
2017-05-22 12:01:42.756811 N | etcdserver/membership: set the initial cluster version to 3.2
2017-05-22 12:01:42.756880 I | etcdserver/api: enabled capabilities for version 3.2
2017-05-22 12:01:43.651339 I | raft: 8e9e05c52164694d is starting a new election at term 2
2017-05-22 12:01:43.651525 I | raft: 8e9e05c52164694d became candidate at term 3
2017-05-22 12:01:43.651595 I | raft: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 3
2017-05-22 12:01:43.651656 I | raft: 8e9e05c52164694d became leader at term 3
2017-05-22 12:01:43.651699 I | raft: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 3
2017-05-22 12:01:43.652030 I | etcdserver: published {Name:default ClientURLs:[http://192.168.199.118:3379]} to cluster cdf818194e3a8c32
2017-05-22 12:01:43.652127 I | embed: ready to serve client requests
2017-05-22 12:01:43.652565 N | embed: serving insecure client requests on [::]:3379, this is strongly discouraged!
2017-05-22 12:01:43.652762 I | etcdserver/api/v3rpc: grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp [::]:3379: connect: network is unreachable"; Reconnecting to {[::]:3379 <nil>}
2017-05-22 12:01:43.652860 I | etcdserver/api/v3rpc: grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp [::]:3379: connect: network is unreachable"; Reconnecting to {[::]:3379 <nil>}
2017-05-22 12:01:43.652946 I | etcdserver/api/v3rpc: grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp [::]:3379: connect: network is unreachable"; Reconnecting to {[::]:3379 <nil>}
2017-05-22 12:01:43.653028 I | etcdserver/api/v3rpc: grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp [::]:3379: connect: network is unreachable"; Reconnecting to {[::]:3379 <nil>}
2017-05-22 12:01:43.653106 I | etcdserver/api/v3rpc: grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp [::]:3379: connect: network is unreachable"; Reconnecting to {[::]:3379 <nil>}
2017-05-22 12:01:43.653189 I | etcdserver/api/v3rpc: grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp [::]:3379: connect: network is unreachable"; Reconnecting to {[::]:3379 <nil>}
2017-05-22 12:01:43.653266 I | etcdserver/api/v3rpc: grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp [::]:3379: connect: network is unreachable"; Reconnecting to {[::]:3379 <nil>}
@siddontang
Copy link
Contributor

/cc @xiang90

@heyitsanthony
Copy link
Contributor

Probably related to the json gateway. I can partially reproduce this on osx, but not the exact error:

$ nc -l localhost 3379 &
$  ./bin/etcd  --listen-client-urls 'http://0.0.0.0:3379' --advertise-client-urls 'http://192.168.199.118:3379' --listen-peer-urls 'http://0.0.0.0:3380' &
...
2017-05-22 11:13:42.231696 I | etcdserver/api/v3rpc: grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp [::]:3379: getsockopt: connection reset by peer"; Reconnecting to {[::]:3379 <nil>}
2017-05-22 11:13:42.231762 I | etcdserver/api/v3rpc: grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp [::]:3379: getsockopt: connection reset by peer"; Reconnecting to {[::]:3379 <nil>}
2017-05-22 11:14:02.231361 I | etcdserver/api/v3rpc: grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp [::]:3379: i/o timeout"; Reconnecting to {[::]:3379 <nil>}
2017-05-22 11:14:02.231429 I | etcdserver/api/v3rpc: grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp [::]:3379: i/o timeout"; Reconnecting to {[::]:3379 <nil>}
2017-05-22 11:14:02.231458 I | etcdserver/api/v3rpc: grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp [::]:3379: i/o timeout"; Reconnecting to {[::]:3379 <nil>}
2017-05-22 11:14:02.231683 I | etcdserver/api/v3rpc: grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp [::]:3379: i/o timeout"; Reconnecting to {[::]:3379 <nil>}

No luck on Linux, get etcdmain: listen tcp 0.0.0.0:3379: bind: address already in use instead. What linux distro is this / is there any special configuration or anything else running on the system?

@gyuho
Copy link
Contributor

gyuho commented May 22, 2017

On linux machine, I've also tried both master branch and 3.2.0-rc.1, and works fine?

@disksing
Copy link
Contributor Author

@heyitsanthony You may try 2379 instead, I'm using 3379 only because I have another etcd instance occupied 2379.

@gyuho It seems that to reproduce the error depends on system or network configuration somehow. I've also noticed on some system/machine it's ok to dial '0.0.0.0' to establish a connection. But I think the problem is we should always dial the advertised address instead of 0.0.0.0.

@disksing
Copy link
Contributor Author

https://github.com/coreos/etcd/blob/f75e33326406a4f05bd2a92ddf214ec4018e5b0b/embed/serve.go#L167 may be related to this issue. It uses the listener's addr to register some handlers.

@heyitsanthony
Copy link
Contributor

@disksing yes, it's from the grpc json gateway dial-out. I'd like to have a way to reproduce this bug instead of blindly using the advertise address and saying it's fixed.

@disksing
Copy link
Contributor Author

@heyitsanthony If you can not easily reproduce the problem, perhaps you may fix it blindly then let me to help the test.

@heyitsanthony
Copy link
Contributor

@disksing I really need to know the root cause here. I don't think using advertise address is the right fix-- the machine could be NATed in such a way that going through the advertise address would need more hops than serving directly off the listen address or possibly not even be able to connect at all.

@butellicarlo
Copy link

butellicarlo commented Jun 28, 2017

+1
Which version is the last version that is working? Is there a estimation when you will fix this issue?

Thanks

@ghovat
Copy link

ghovat commented Jun 28, 2017

+1
It would be really required to get this fix done

@heyitsanthony
Copy link
Contributor

@ghovat @carlitos26 is there a way to reliably reproduce this?

heyitsanthony added a commit to heyitsanthony/etcd that referenced this issue Jul 6, 2017
net.Listener say its address is [::] when given 0.0.0.0, breaking
hosts that have ipv6 disabled.

Fixes etcd-io#8151
Fixes etcd-io#7961
heyitsanthony added a commit to heyitsanthony/etcd that referenced this issue Jul 6, 2017
net.Listener says its address is [::] when given 0.0.0.0, breaking
hosts that have ipv6 disabled.

Fixes etcd-io#8151
Fixes etcd-io#7961
gyuho pushed a commit that referenced this issue Jul 7, 2017
net.Listener says its address is [::] when given 0.0.0.0, breaking
hosts that have ipv6 disabled.

Fixes #8151
Fixes #7961
gyuho pushed a commit that referenced this issue Jul 7, 2017
net.Listener says its address is [::] when given 0.0.0.0, breaking
hosts that have ipv6 disabled.

Fixes #8151
Fixes #7961
gyuho pushed a commit that referenced this issue Jul 7, 2017
net.Listener says its address is [::] when given 0.0.0.0, breaking
hosts that have ipv6 disabled.

Fixes #8151
Fixes #7961
@tarvitz
Copy link

tarvitz commented Aug 11, 2017

Got this bug with running etcd on alpine:edge testing repository, Don't know if was fixed in 3.2.1 version or not, but here's the way how to reproduce it:

etcd --version

etcd Version: 3.2.1
Git SHA: 16b3950000
Go Version: go1.8.3
Go OS/Arch: linux/amd64

Dockerfile.alpine (nfox/etcd:alpine docker image):

FROM alpine:edge

RUN set -x \
    && sed -e 's/main/testing/' -i /etc/apk/repositories \
    && apk update \
    && apk add etcd etcd-ctl

CMD ["etcd"]

Docker container run script:

HOST_IP=0.0.0.0

docker run -d -p 4001:4001 -p 2380:2380 -p 2379:2379 \
  -v /usr/share/ca-certificates/:/etc/ssl/certs \
  --name etcd nfox/etcd:alpine etcd \
  --name etcd0 \
  --advertise-client-urls http://${HOST_IP}:2379,http://${HOST_IP}:4001 \
  --listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
  --initial-advertise-peer-urls http://${HOST_IP}:2380 \
  --listen-peer-urls http://0.0.0.0:2380 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-cluster etcd0=http://${HOST_IP}:2380 \
  --initial-cluster-state new

@heyitsanthony
Copy link
Contributor

@tarvitz this is fixed in 3.2.2

yudai pushed a commit to yudai/etcd that referenced this issue Oct 5, 2017
net.Listener says its address is [::] when given 0.0.0.0, breaking
hosts that have ipv6 disabled.

Fixes etcd-io#8151
Fixes etcd-io#7961
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

7 participants