investigate: node registrator doesn't play well with slaves that die and come back #778

jdef · 2016-02-10T04:55:22Z

/cc @ravlir

jdef · 2016-02-10T20:10:12Z

@ravlir do you happen to know about the conditions in which the slave came back? for example, did it come back up but with a different slaveID or the same slaveID as before?

ravilr · 2016-02-12T22:17:29Z

the previously reported scenario involved mesos slave registering back with a different slaveId:

I0129 02:30:33.423883 2985 slave.cpp:859] Registered with master master@1.1.1.1:5050; given slave ID 20160129-022011-3340029194-5050-31106-S5
.....
I0205 02:10:24.347128 21347 slave.cpp:606] Slave asked to shut down by master@1.1.1.1:5050 because 'Slave attempted to re-register after removal'
.....
I0205 02:10:45.308372 21408 slave.cpp:859] Registered with master master@1.1.1.1:5050; given slave ID 20160201-215024-3340029194-5050-15361-S0

ravilr · 2016-03-03T02:46:27Z

some more details:

I0302 22:11:20.818402 1 nodecontroller.go:450] Deleting node (no longer present in cloud provider): xyz.com

https://github.com/mesosphere/kubernetes/blob/v0.7.2-v1.1.5/pkg/controller/node/nodecontroller.go#L449
this seems to happen when the mesos-slave agent on the slave node is brought down and the k8s nodecontroller deletes the node from the api as the listSlaves call from mesos cloud provider doesn't return the slave host.
Once the slave agent is started back, the scheduler fails to add back the node as the offers from that host doesn't pass the Compat check as the check looks for the node to be registered in k8s api:
https://github.com/mesosphere/kubernetes/blob/v0.7.2-v1.1.5/contrib/mesos/pkg/scheduler/components/framework/framework.go#L122

jdef · 2016-03-03T15:15:51Z

[EDIT] Thanks for investigating this further. What should happen in this
situation is that when the slave comes back up, the mesos cloud provider
should see it and begin to report it in the list of nodes that it knows
about. Presumably k8s would see this change and create an api.Node in
apiserver. Are we sure that this isn't happening at all, or is it just a
matter of timing (as in, if we wait long enough the right thing happens)?

On Wed, Mar 2, 2016 at 9:46 PM, ravilr notifications@github.com wrote:

some more details:

I0302 22:11:20.818402 1 nodecontroller.go:450] Deleting node (no longer
present in cloud provider): xyz.com

https://github.com/mesosphere/kubernetes/blob/v0.7.2-v1.1.5/pkg/controller/node/nodecontroller.go#L449
this seems to happen when the mesos-slave agent on the slave node is
brought down and the k8s nodecontroller deletes the node from the api as
the listSlaves call from mesos cloud provider doesn't return the slave
host.
Once the slave agent is started back, the scheduler fails to add back the
node as the offers from that host doesn't pass the Compat check as the
check looks for the node to be registered in k8s api:

https://github.com/mesosphere/kubernetes/blob/v0.7.2-v1.1.5/contrib/mesos/pkg/scheduler/components/framework/framework.go#L122

—
Reply to this email directly or view it on GitHub
#778 (comment)
.

ravilr · 2016-03-03T20:24:57Z

yes, the mesos slave node which come back up with different slaveID, never seems to be registered back as k8s api.node object, unless the k8sm scheduler is restarted.
my understanding is that the k8sm scheduler registers the slave nodes with slave attributes converted to k8s labels in k8s api registry based on the offers it sees from the mesos master. for some reason, this doesn't seem to be happening after the slave registers back with the mesos master. i do see the added back slave offering it resources and mesos master recovering all resources from the k8sm framework:
I0303 20:22:42.809595 1765 hierarchical.hpp:814] Recovered ports():[31000-32000]; cpus():24; mem():62791; disk():208307 (total: ports():[31000-32000]; cpus():24; mem():62791; disk():208307, allocated: ) on slave 20160212-004300-3609200202-5050-1762-S5 from framework 20160211-225703-3609200202-5050-14672-0000

but, once the scheduler is restarted, the node appears in the k8s node registry.

sequence of steps to repro:

mesos slave stopped
k8s nodecontroller after node grace period (default 40s) checks there has been no hearbeat and asks the cloudprovider about the node. as the node is not listed in master slave list, nodecontroller deletes the node from k8s node api registry.
mesos slave is brought up after slave ping timeout, master asks the slave trying to register with the same slave ID to shutdown. slave is brought up again by the systemd manager; slave is registered with the new slaveID.

jdef · 2016-03-04T04:08:11Z

I found the problem, a bug in the queue/ package. Will push a fix shortly

On Thu, Mar 3, 2016 at 3:25 PM, ravilr notifications@github.com wrote:

yes, the mesos slave node which come back up with different slaveID, never
seems to be registered back as k8s api.node object, unless the k8sm
scheduler is restarted.

my understanding is that the k8sm scheduler registers the slave nodes with
slave attributes converted to k8s labels in k8s api registry based on the
offers it sees from the mesos master. for some reason, this doesn't seem to
be happening after the slave registers back with the mesos master. i do see
the added back slave offering it resources and mesos master recovering all
resources from the k8sm framework:
I0303 20:22:42.809595 1765 hierarchical.hpp:814] Recovered ports():[31000-32000];
cpus():24; mem():62791; disk():208307 (total: ports():[31000-32000];
cpus():24; mem():62791; disk():208307, allocated: ) on slave
20160212-004300-3609200202-5050-1762-S5 from framework
20160211-225703-3609200202-5050-14672-0000

but, once the scheduler is restarted, the node appears in the k8s node
registry.

sequence of steps to repro:

mesos slave stopped

k8s nodecontroller after node grace period (default 40s) checks there
has been no hearbeat and asks the cloudprovider about the node. as the node
is not listed in master slave list, nodecontroller deletes the node from
k8s node api registry.

mesos slave is brought up after slave ping timeout, master asks the
slave trying to register with the same slave ID to shutdown. slave is
brought up again by the systemd manager; slave is registered with the new
slaveID.

—
Reply to this email directly or view it on GitHub
#778 (comment)
.

jdef · 2016-03-04T04:14:51Z

xref kubernetes/kubernetes#22500

On Thu, Mar 3, 2016 at 11:08 PM, James DeFelice james@mesosphere.io wrote:

I found the problem, a bug in the queue/ package. Will push a fix shortly

On Thu, Mar 3, 2016 at 3:25 PM, ravilr notifications@github.com wrote:

yes, the mesos slave node which come back up with different slaveID,
never seems to be registered back as k8s api.node object, unless the k8sm
scheduler is restarted.

my understanding is that the k8sm scheduler registers the slave nodes
with slave attributes converted to k8s labels in k8s api registry based on
the offers it sees from the mesos master. for some reason, this doesn't
seem to be happening after the slave registers back with the mesos master.
i do see the added back slave offering it resources and mesos master
recovering all resources from the k8sm framework:
I0303 20:22:42.809595 1765 hierarchical.hpp:814] Recovered ports():[31000-32000];
cpus():24; mem():62791; disk():208307 (total: ports():[31000-32000];
cpus():24; mem():62791; disk():208307, allocated: ) on slave
20160212-004300-3609200202-5050-1762-S5 from framework
20160211-225703-3609200202-5050-14672-0000

but, once the scheduler is restarted, the node appears in the k8s node
registry.

sequence of steps to repro:

mesos slave stopped

k8s nodecontroller after node grace period (default 40s) checks there
has been no hearbeat and asks the cloudprovider about the node. as the node
is not listed in master slave list, nodecontroller deletes the node from
k8s node api registry.

mesos slave is brought up after slave ping timeout, master asks the
slave trying to register with the same slave ID to shutdown. slave is
brought up again by the systemd manager; slave is registered with the new
slaveID.

—
Reply to this email directly or view it on GitHub
#778 (comment)
.

jdef added area/scheduler class/bug priority/soon priority/P2 and removed priority/soon labels Feb 10, 2016

jdef added this to the v0.7.3 milestone Feb 10, 2016

This was referenced Feb 16, 2016

scheduler should take action when receiving TASK_LOST for REASON_SLAVE_REMOVED #789

Open

MESOS: scheduler: implement role awareness kubernetes/kubernetes#15775

Merged

jdef removed this from the v0.7.3 milestone Feb 26, 2016

jdef mentioned this issue Mar 4, 2016

MESOS: multi bug fix for k8sm kubernetes/kubernetes#22500

Merged

jdef added WIP tracking PTAL and removed priority/P2 WIP labels Mar 4, 2016

jdef added this to the v0.7.3 milestone Mar 4, 2016

jdef added cherry-pick/v0.7 LGTM and removed PTAL labels Mar 4, 2016

jdef closed this as completed Mar 4, 2016

jdef removed the LGTM label Mar 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

investigate: node registrator doesn't play well with slaves that die and come back #778

investigate: node registrator doesn't play well with slaves that die and come back #778

jdef commented Feb 10, 2016

jdef commented Feb 10, 2016

ravilr commented Feb 12, 2016

ravilr commented Mar 3, 2016

jdef commented Mar 3, 2016

ravilr commented Mar 3, 2016

jdef commented Mar 4, 2016

jdef commented Mar 4, 2016

investigate: node registrator doesn't play well with slaves that die and come back #778

investigate: node registrator doesn't play well with slaves that die and come back #778

Comments

jdef commented Feb 10, 2016

jdef commented Feb 10, 2016

ravilr commented Feb 12, 2016

ravilr commented Mar 3, 2016

jdef commented Mar 3, 2016

ravilr commented Mar 3, 2016

jdef commented Mar 4, 2016

jdef commented Mar 4, 2016