Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to get final advertise address: No private IP address found, and explicit IP not provided #615

Closed
durandom opened this issue Nov 6, 2018 · 6 comments

Comments

@durandom
Copy link
Contributor

durandom commented Nov 6, 2018

Trying to replicate the example at https://github.com/improbable-eng/thanos/tree/master/kube#long-term-storage-setup on OpenShift 3.9. On a minishift envrionment it works, but on a managed setup I'm running into problems with the gossip component.

The sidecar is started with:

sidecar --cluster.peers=thanos-peers.aiops-dev-prometheus-lts.svc:10900

And the logs produce:

level=warn ts=2018-11-06T11:22:35.773034959Z caller=runutil.go:69 component=sidecar msg="detected close error" err="store gRPC listener: close tcp [::]:10901: use of closed network connection"
level=error ts=2018-11-06T11:22:35.776183146Z caller=main.go:171 msg="running command failed" err="join cluster: create memberlist: Failed to get final advertise address: No private IP address found, and explicit IP not provided"

Being a bit more verbose doesnt improve things:

sidecar --cluster.address=0.0.0.0:10900 --cluster.peers=thanos-peers.aiops-dev-prometheus-lts.svc:10900 --cluster.advertise-address=thanos-peers.aiops-dev-prometheus-lts.svc

Seems thanos has problems binding to 0.0.0.0 ?!

thanos:master-2018-11-05-78e412c
OpenShift 3.9

cc @vpavlin @bwplotka @brancz

@durandom
Copy link
Contributor Author

durandom commented Nov 6, 2018

Here is an internal e-mail thread on the same problem a couple of months ago:

> 
> Thanos, on the other hand, fails with error:
> 
> level=error ts=2018-05-18T13:38:29.043484643Z caller=main.go:147
> msg="running command failed" err="join cluster: create memberlist:
> Failed
> to get final advertise address: No private IP address found, and
> explicit
> IP not provided"
> 
> I tracked the error message down to
> 
> https://github.com/hashicorp/memberlist/blob/master/net_transport.go#
> L151

In this case, the NetTransport object is being created with a
BindAddress of 0.0.0.0 (which isn't uncommon) which just says "serve on
any IP address".

Guessing from the code it tries to advertise itself via a non-public IP
address, which it needs to find out since it wasn't given one.  Which
is what the sockaddr.GetPrivateIP() in FinalAdvertiseAddr() tries to
do.  GetPrivateIP() finds all IPs on the system and and then filters
certain ones out.  In this case, anything in this list is filtered out:

https://github.com/hashicorp/go-sockaddr/blob/6d291a969b86c4b633730bfc6
b8b9d64c3aafed9/rfc.go#L908

It then requires the IP be in this list:

https://github.com/hashicorp/go-sockaddr/blob/6d291a969b86c4b633730bfc6
b8b9d64c3aafed9/rfc.go#L366


So basically, if I understand correctly, you need to either:

1) configure Thanos to use an explicit advertisement address, which
would be hard if it's running in a container since you don't always
know what that address would be


Yeah, I am trying to run it on Upshift, so I have no control or way to guess the pod ip - maybe as a workaround I could try to get the IP address of the pod in entrypoint script and set it directly.

2) convince Thanos or Hashicorp's memberlist library to lessen
restrictions on the advertisement address autodetection

I've seen an issoe mentioning that, but cannot find it now:( I'll probably try to file some issue for it and see what they come back with.

3) re-number your hosts or container network (is Thanos running in the
host net namespace?) to be in this list:

10.0.0.0/8
100.64.0.0/10
172.16.0.0/12
192.88.99.0/24
192.168.0.0/16
198.18.0.0/15

Thanos is not running in host net.

I cannot influence networking in the cluster 


Any idea what IP address of Thanos container is (or if its hostnetwork
then the node's IP addresses)?

IPs I generally get are

172.50.0.0/16


@vpavlin
Copy link

vpavlin commented Nov 7, 2018

To sum up, the problem is that our cluster does not fit into any subnet allowed by go-sockaddr. As is mentioned above, one option is to lift the restriction. Another option might be to allow a user to configure the range and use that instead of built in list of ranges.

@bwplotka
Copy link
Member

bwplotka commented Nov 8, 2018

Ok, so priv IP address is required for gossip cluster. And simply getting IP Address using sockaddr function does not work. So we have couple of options to unblock you:

  1. Since you use kubernetes we could maybe set cluster.address=$(POD_IP) and have in envs:
 env:
    - name: POD_IP
      valueFrom:
        fieldRef:
          fieldPath: status.podIP
  1. We could add more sophistated ways of configuring/deducting priv IP.
  2. Just don't use gossip, as it will be removed soon.

I would vote for number 3, especially when using kubernetes. The only question is that, how to turn off gossip logic for component. I think we need a PR for it with some flag cluster.disable

@durandom
Copy link
Contributor Author

durandom commented Nov 8, 2018

I'm good with 3.
I don't have a cluster setup anyway, I'm only interested in the storage components.

@mreichardt95
Copy link
Contributor

All Thanos components now have --cluster.disable flag (#652), that disables all Gossip logic. Does this solve your issue?

@povilasv
Copy link
Member

Gossip code was removed, I suggest you try configuring with static store nodes. Closing this issue as it's related to gossip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants