Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mesos-master sees framework registration requests as coming from localhost #193

Closed
stelcheck opened this issue Mar 8, 2014 · 11 comments
Closed

Comments

@stelcheck
Copy link

I have a 3 node Mesos cluster, on which I run masters, slave and Marathon. I run for the moment Chronos only on node1.mesos1.dev. If the active mesos master is not on the same node as Chronos, I get the following error:

Mar  8 01:39:26 node3.mesos1.dev mesos-master[17463]: I0308 01:39:26.158910 17481 master.cpp:818] Received registration request from scheduler(1)@127.0.0.1:36198
Mar  8 01:39:26 node3.mesos1.dev mesos-master[17463]: I0308 01:39:26.159025 17481 master.cpp:823] Framework 2014-03-08-01:01:57-3053457580-5050-17463-0000 (scheduler(1)@127.0.0.1:36198) already registered, resending acknowledgement

This is how I run Chronos:

java -cp /opt/chronos/target/chronos-2.1.0_mesos-0.14.0-rc4.jar com.airbnb.scheduler.Main --master zk://172.16.0.180:2181,172.16.0.181:2181,172.16.0.182:2181/mesos --zk_hosts 172.16.0.180:2181,172.16.0.181:2181,172.16.0.182:2181 --hostname 172.16.0.180 --http_port 4400

Marathon does not seem to suffer of this issue, so my guess is that this is an issue with how Chronos advertises itself.

@vinodkone
Copy link
Member

Looks like the Chronos scheduler driver is binding to a local address (127.0.0.1) and hence the registration is unable to succeed. You could force the driver to use a public address by setting "LIBPROCESS_IP" environment variable.

e.g:
LIBPROCESS_IP= 172.16.0.180 java -cp /opt/chronos/target/chronos-2.1.0_mesos-0.14.0-rc4.jar com.airbnb.scheduler.Main --master zk://172.16.0.180:2181,172.16.0.181:2181,172.16.0.182:2181/mesos --zk_hosts 172.16.0.180:2181,172.16.0.181:2181,172.16.0.182:2181 --hostname 172.16.0.180 --http_port 4400

@brndnmtthws
Copy link
Member

Is it possible that the master is reporting itself as being 127.0.0.1 to ZK instead of a reachable address?

@vinodkone
Copy link
Member

It is possible but doesn't look like the case from the master log above. The master was able to receive the registration request from the remote scheduler. But the master's reply to registration request didn't make it to the scheduler because it was sent to scheduler(1)@127.0.0.1:36198.

@brndnmtthws
Copy link
Member

Yeah, that sounds right.

I would also suggest explicitly passing the IP to the mesos processes (which is what we do):

# mesos-master --ip=10.10.10.10 <args...>

@stelcheck
Copy link
Author

@vinodkone I'll give a shot to LIBPROCESS - that being said, I guess I should regard this as a temporary patch?

@brndnmtthws both masters and slaves are already running with the IP configured.

@stelcheck
Copy link
Author

@vinodkone LIBPROCESS=myip does seem to fix the issue.

That being said, it does look like something might be wrong in mesos itself - at least in the RC currently available (using 0.16.0 seems to do the trick for me when using LIBPROCESS)

I think I did not mention this before, but the vagrant machines I am doing my stuff on are set with two network interfaces - any possibilities that could be a reason for this issue?

@ian-kent
Copy link

I get the same, and it also affects marathon. Got mesos master and slaves configured to use the right IP but no obvious way to do that with marathon or chronos (will try the LIBPROCESS trick). Also using vagrant!

@suchisubhra
Copy link

I having same issue with jenkins mesos plugin. It sees 127.0.0.1. I did set up LIBPROCESS_IP=myip

and the start my jenkins as " service jenkins start". any help appreciated

@vinodkone
Copy link
Member

@suchisubhra setting LIBPROCESS_IP in the environment should make the plugin pick up the IP address. If it is not picking it up, I suspect the environment is getting cleared somewhere along the way of the plugin startup.

@corgidesu
Copy link

This is pretty old but it popped up when I was searching for a very similar issue and figured it might help anyone having similar issues.

My environment is Ubuntu Server (14 04 Tacky I think is what its called) running 4 small VM clusters: 1 ZK, 1 Master, and 3 Slaves.

As @vinodkone explained, setting up the LIBPROCESS_IP helped me solve the first problem of 127.0.0.1 appearing in the Master logs, but it didn't help with the re-registering logs in the Master.

What I found was that Ubuntu has a pre-installed firewall pretty much blocking all access to all ports. The master tried to connect to an open port on my slave process (in my case it was port 47250). The slave was listening on that port, but because of the firewall the Master couldn't get through. By poking a hole on port 47250 the whole thing magically worked for me and the process successfully completed.

I'm rather new to this whole Mesos thing and that really got me stuck. Once you start distributing on more than on machine these little gotchas seem to pop up all over the place.

@gkleiman
Copy link
Member

Closing this, as different people reported that setting LIBPROCESS_IP fixes the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants