Skip to content
This repository has been archived by the owner on Dec 5, 2017. It is now read-only.

Enable Kubernetes Mesos Integration support revocable resources #795

Open
gyliu513 opened this issue Feb 27, 2016 · 5 comments
Open

Enable Kubernetes Mesos Integration support revocable resources #795

gyliu513 opened this issue Feb 27, 2016 · 5 comments

Comments

@gyliu513
Copy link

The Mesos is now doing many enhancement for Mesos and especially for allocator part to improve resource utilisation, the revocable resource is designed for such cases. If there are multiple frameworks running on top of Mesos including Kubernetes and other frameworks, the revocable resources from one framework can be used by another framework so as to improve resources utilisation.

I did a prototype and did some test here: https://github.com/jay-lau/jay-work/blob/master/k8s/mesos/revocable.diff

The idea is simple and straight forward:

  1. Add a flag to enable revocable resources.
  2. Add a new metadata in Pod YAML to enable this Pod can specify it want to use revocable resources.
  3. Update procurement to add some checking for revocable resources when revocable is enabled in Pod.
  4. Update Task Info to enable the task use revocable resources before it launched when revocable is enabled in Pod.
@jdef
Copy link

jdef commented Feb 27, 2016

Thanks for much for sharing this! Revocable resources has definitely come up in prior conversations. A sticky point w/ respect to the current implementation in mesos is this:

NOTE: If any resource used by a task or executor is revocable, the whole container is treated as a revocable container and can therefore be killed or throttled by the QoS Controller.

For k8sm "the whole container" means the custom executor container that hosts the kubelet-executor and kube-proxy processes (as well as the k8s-instantiated Docker containers/procs if you're running w/ cgroup reparenting). This means that even if just one pod is using revocable resources and a QoS controller decides that it wants those particular resources back, the controller will end up killing all k8sm-related procs on the slave (read: all pods die because one pod's resources needed to be revoked).

A related topic that's surfaced in Mesos-land is nested containerization support, so that custom executors may spawn child containers that are independently isolated via mesos containerization. This appears, at least at the surface, to have interesting implications for revocable resources.

Did you have any work-arounds in mind for dealing with the current QoS-killed-all-my-pods scenario?

@gyliu513
Copy link
Author

Thanks @jdef , just append some of my thoughts here:

  1. The current revocable resources is kind of "scavenge resources", and the QoS Controller will kill the executors using revocable resources and when the executor is terminated, kill all of its tasks. Seems the "scavenge resources" is not good to fit into k8sm user scenario, we may need to enhance the Mesos QoS Controller only kill related tasks but not executors.
  2. The Mesos community is planning to add more revocable resources such as allocation slack (MESOS-1607), quota slack (MESOS-4392) etc, and those revocable resource will trigger task/executor eviction from allocator and thus the Mesos will kill the task/executor based on some eviction policies.

For now, only "scavenge resources" revocable is supported, I will raise this issue in Mesos community to see how we can move this forward. Hope this helps ;-)

@ghost
Copy link

ghost commented Feb 29, 2016

Used to do some investigation on supporting revocable resources in Kubernetes (filed an issue at kubernetes/kubernetes#19529). Currently, only QoS support revocable; so the behaviour is simple: kill the revocable resources directly. There's several tickets in Mesos community on revocable's behaviour (MESOS-4303, MESOS-1607 and MESOS-4392).

I'd suggest to refer k8sm's case on those tickets; and hold this work (kubernetes revocable resources) until revocable resources's behaviour finalised in Mesos.

@gyliu513
Copy link
Author

What do you mean of " kill the revocable resources directly for current revocable resources"?

@ghost
Copy link

ghost commented Mar 1, 2016

I mean QoS's current behaviour; no grace period when kill a executor/container.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants