Mesos-master sees framework registration requests as coming from localhost #193

stelcheck · 2014-03-08T01:52:39Z

I have a 3 node Mesos cluster, on which I run masters, slave and Marathon. I run for the moment Chronos only on node1.mesos1.dev. If the active mesos master is not on the same node as Chronos, I get the following error:

Mar  8 01:39:26 node3.mesos1.dev mesos-master[17463]: I0308 01:39:26.158910 17481 master.cpp:818] Received registration request from scheduler(1)@127.0.0.1:36198
Mar  8 01:39:26 node3.mesos1.dev mesos-master[17463]: I0308 01:39:26.159025 17481 master.cpp:823] Framework 2014-03-08-01:01:57-3053457580-5050-17463-0000 (scheduler(1)@127.0.0.1:36198) already registered, resending acknowledgement

This is how I run Chronos:

java -cp /opt/chronos/target/chronos-2.1.0_mesos-0.14.0-rc4.jar com.airbnb.scheduler.Main --master zk://172.16.0.180:2181,172.16.0.181:2181,172.16.0.182:2181/mesos --zk_hosts 172.16.0.180:2181,172.16.0.181:2181,172.16.0.182:2181 --hostname 172.16.0.180 --http_port 4400

Marathon does not seem to suffer of this issue, so my guess is that this is an issue with how Chronos advertises itself.

The text was updated successfully, but these errors were encountered:

ui. Review: https://reviews.apache.org/r/18361

vinodkone · 2014-03-09T18:19:38Z

Looks like the Chronos scheduler driver is binding to a local address (127.0.0.1) and hence the registration is unable to succeed. You could force the driver to use a public address by setting "LIBPROCESS_IP" environment variable.

e.g:
LIBPROCESS_IP= 172.16.0.180 java -cp /opt/chronos/target/chronos-2.1.0_mesos-0.14.0-rc4.jar com.airbnb.scheduler.Main --master zk://172.16.0.180:2181,172.16.0.181:2181,172.16.0.182:2181/mesos --zk_hosts 172.16.0.180:2181,172.16.0.181:2181,172.16.0.182:2181 --hostname 172.16.0.180 --http_port 4400

brndnmtthws · 2014-03-09T19:46:48Z

Is it possible that the master is reporting itself as being 127.0.0.1 to ZK instead of a reachable address?

vinodkone · 2014-03-09T19:58:00Z

It is possible but doesn't look like the case from the master log above. The master was able to receive the registration request from the remote scheduler. But the master's reply to registration request didn't make it to the scheduler because it was sent to scheduler(1)@127.0.0.1:36198.

brndnmtthws · 2014-03-09T20:01:34Z

Yeah, that sounds right.

I would also suggest explicitly passing the IP to the mesos processes (which is what we do):

# mesos-master --ip=10.10.10.10 <args...>

stelcheck · 2014-03-10T04:09:54Z

@vinodkone I'll give a shot to LIBPROCESS - that being said, I guess I should regard this as a temporary patch?

@brndnmtthws both masters and slaves are already running with the IP configured.

stelcheck · 2014-03-11T05:05:08Z

@vinodkone LIBPROCESS=myip does seem to fix the issue.

That being said, it does look like something might be wrong in mesos itself - at least in the RC currently available (using 0.16.0 seems to do the trick for me when using LIBPROCESS)

I think I did not mention this before, but the vagrant machines I am doing my stuff on are set with two network interfaces - any possibilities that could be a reason for this issue?

ian-kent · 2014-03-16T00:28:33Z

I get the same, and it also affects marathon. Got mesos master and slaves configured to use the right IP but no obvious way to do that with marathon or chronos (will try the LIBPROCESS trick). Also using vagrant!

suchisubhra · 2014-10-10T19:26:00Z

I having same issue with jenkins mesos plugin. It sees 127.0.0.1. I did set up LIBPROCESS_IP=myip

and the start my jenkins as " service jenkins start". any help appreciated

vinodkone · 2014-10-12T16:07:19Z

@suchisubhra setting LIBPROCESS_IP in the environment should make the plugin pick up the IP address. If it is not picking it up, I suspect the environment is getting cleared somewhere along the way of the plugin startup.

corgidesu · 2015-04-24T08:31:36Z

This is pretty old but it popped up when I was searching for a very similar issue and figured it might help anyone having similar issues.

My environment is Ubuntu Server (14 04 Tacky I think is what its called) running 4 small VM clusters: 1 ZK, 1 Master, and 3 Slaves.

As @vinodkone explained, setting up the LIBPROCESS_IP helped me solve the first problem of 127.0.0.1 appearing in the Master logs, but it didn't help with the re-registering logs in the Master.

What I found was that Ubuntu has a pre-installed firewall pretty much blocking all access to all ports. The master tried to connect to an open port on my slave process (in my case it was port 47250). The slave was listening on that port, but because of the firewall the Master couldn't get through. By poking a hole on port 47250 the whole thing magically worked for me and the process successfully completed.

I'm rather new to this whole Mesos thing and that really got me stuck. Once you start distributing on more than on machine these little gotchas seem to pop up all over the place.

gkleiman · 2015-08-18T16:09:35Z

Closing this, as different people reported that setting LIBPROCESS_IP fixes the issue.

stelcheck referenced this issue in apache/mesos Mar 8, 2014

Added hostname attribute to framework and displayed it in master's web

f19db57

ui. Review: https://reviews.apache.org/r/18361

michiroth mentioned this issue Feb 13, 2015

Random IP publishing - Marathon Scheduler mesosphere/marathon#1198

Closed

gkleiman closed this as completed Aug 18, 2015

pdericson mentioned this issue Aug 26, 2015

In some new mesosclouds frameworks are unable to create tasks in mesos mesoscloud/mesoscloud-do#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mesos-master sees framework registration requests as coming from localhost #193

Mesos-master sees framework registration requests as coming from localhost #193

stelcheck commented Mar 8, 2014

vinodkone commented Mar 9, 2014

brndnmtthws commented Mar 9, 2014

vinodkone commented Mar 9, 2014

brndnmtthws commented Mar 9, 2014

stelcheck commented Mar 10, 2014

stelcheck commented Mar 11, 2014

ian-kent commented Mar 16, 2014

suchisubhra commented Oct 10, 2014

vinodkone commented Oct 12, 2014

corgidesu commented Apr 24, 2015

gkleiman commented Aug 18, 2015

Mesos-master sees framework registration requests as coming from localhost #193

Mesos-master sees framework registration requests as coming from localhost #193

Comments

stelcheck commented Mar 8, 2014

vinodkone commented Mar 9, 2014

brndnmtthws commented Mar 9, 2014

vinodkone commented Mar 9, 2014

brndnmtthws commented Mar 9, 2014

stelcheck commented Mar 10, 2014

stelcheck commented Mar 11, 2014

ian-kent commented Mar 16, 2014

suchisubhra commented Oct 10, 2014

vinodkone commented Oct 12, 2014

corgidesu commented Apr 24, 2015

gkleiman commented Aug 18, 2015