Skip to content

Commit

Permalink
scheduler: implement role awareness
Browse files Browse the repository at this point in the history
  • Loading branch information
Sergiusz Urbaniak committed Nov 23, 2015
1 parent 1a43dcf commit 9eae47c
Show file tree
Hide file tree
Showing 45 changed files with 2,576 additions and 899 deletions.
8 changes: 4 additions & 4 deletions cluster/mesos/docker/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ mesosmaster1:
- MESOS_QUORUM=1
- MESOS_REGISTRY=in_memory
- MESOS_WORK_DIR=/var/lib/mesos
- MESOS_ROLES=role1
links:
- etcd
- "ambassador:apiserver"
Expand All @@ -40,15 +41,15 @@ mesosslave:
DOCKER_NETWORK_OFFSET=0.0.$${N}.0
exec wrapdocker mesos-slave
--work_dir="/var/tmp/mesos/$${N}"
--attributes="rack:$${N};gen:201$${N}"
--attributes="rack:$${N};gen:201$${N};role:role$${N}"
--hostname=$$(getent hosts mesosslave | cut -d' ' -f1 | sort -u | tail -1)
--resources="cpus:4;mem:1280;disk:25600;ports:[8000-21099];cpus(role$${N}):1;mem(role$${N}):640;disk(role$${N}):25600;ports(role$${N}):[7000-7999]"
command: []
environment:
- MESOS_MASTER=mesosmaster1:5050
- MESOS_PORT=5051
- MESOS_LOG_DIR=/var/log/mesos
- MESOS_LOGGING_LEVEL=INFO
- MESOS_RESOURCES=cpus:4;mem:1280;disk:25600;ports:[8000-21099]
- MESOS_SWITCH_USER=0
- MESOS_CONTAINERIZERS=docker,mesos
- MESOS_ISOLATION=cgroups/cpu,cgroups/mem
Expand All @@ -58,8 +59,6 @@ mesosslave:
- etcd
- mesosmaster1
- "ambassador:apiserver"
volumes:
- ${MESOS_DOCKER_WORK_DIR}/mesosslave:/var/tmp/mesos
apiserver:
hostname: apiserver
image: mesosphere/kubernetes-mesos
Expand Down Expand Up @@ -145,6 +144,7 @@ scheduler:
--mesos-executor-cpus=1.0
--mesos-sandbox-overlay=/opt/sandbox-overlay.tar.gz
--static-pods-config=/opt/static-pods
--mesos-roles=*,role1
--v=4
--executor-logv=4
--profiling=true
Expand Down
88 changes: 88 additions & 0 deletions contrib/mesos/docs/scheduler.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,93 @@ example, the Kubernetes-Mesos executor manages `k8s.mesosphere.io/attribute`
labels and will auto-detect and update modified attributes when the mesos-slave
is restarted.

## Resource Roles

A Mesos cluster can be statically partitioned using [resources roles][2]. Each
resource is assigned such a role (`*` is the default role, if none is explicitly
assigned in the mesos-slave command line). The Mesos master will send offers to
frameworks for `*` resources and – optionally – for one extra role that a
framework is assigned to. Right now only one such extra role for a framework is
supported.

### Configuring Roles for the Scheduler

Every Mesos framework scheduler can choose among the offered `*` resources and
those of the extra role. The Kubernetes-Mesos scheduler supports this by setting
the framework roles in the scheduler command line, e.g.

```bash
$ km scheduler ... --mesos-roles="*,role1" ...
```

This will tell the Kubernetes-Mesos scheduler to default to using `*` resources
if a pod is not specially assigned to another role. Moreover, the extra role
`role1` is allowed, i.e. the Mesos master will send resources or role `role1`
to the Kubernetes scheduler.

Note the following restrictions and possibilities:
- Due to the restrictions of Mesos, only one extra role may be provided on the
command line.
- It is allowed to only pass an extra role without the `*`, e.g. `--mesos-roles=role1`.
This means that no `*` resources should be considered by the scheduler at all.
- It is allowed to pass the extra role first, e.g. `--mesos-roles=role1,*`.
This means that `role1` is the default role for pods without special role
assignment (see below). But `*` resources would be considered for pods with a special `*`
assignment.

### Specifying Roles for Pods

By default a pod is scheduled using resources of the role which comes first in
the list of scheduler roles.

A pod can opt-out of this default behaviour using the `k8s.mesosphere.io/roles`
label:

```yaml
k8s.mesosphere.io/roles: role1,role2,role3
```

The format is a comma separated list of allowed resource roles. The scheduler
will try to schedule the pod with `role1` resources first, using `role2`
resources if the former are not available and finally falling back to `role3`
resources.

The `*` role may be specified as well in this list.

**Note:** An empty list will mean that no resource roles are allowed which is
equivalent to a pod which is unschedulable.

For example:

```yaml
apiVersion: v1
kind: Pod
metadata:
name: backend
labels:
k8s.mesosphere.io/roles: *,prod,test,dev
namespace: prod
spec:
...
```

This `prod/backend` pod will be scheduled using resources from all four roles,
preferably using `*` resources, followed by `prod`, `test` and `dev`. If none
of those for roles provides enough resources, the scheduling fails.

**Note:** The scheduler will also allow to mix different roles in the following
sense: if a node provides `cpu` resources for the `*` role, but `mem` resources
only for the `prod` role, the upper pod will be schedule using `cpu(*)` and
`mem(prod)` resources.

**Note:** The scheduler might also mix within one resource type, i.e. it will
use as many `cpu`s of the `*` role as possible. If a pod requires even more
`cpu` resources (defined using the `pod.spec.resources.limits` property) for successful
scheduling, the scheduler will add resources from the `prod`, `test` and `dev`
roles, in this order until the pod resource requirements are satisfied. E.g. a
pod might be scheduled with 0.5 `cpu(*)`, 1.5 `cpu(prod)` and 1 `cpu(test)`
resources plus e.g. 2 GB `mem(prod)` resources.

## Tuning

The scheduler configuration can be fine-tuned using an ini-style configuration file.
Expand All @@ -49,6 +136,7 @@ offer-ttl = 5s
; duration an expired offer lingers in history
offer-linger-ttl = 2m
<<<<<<< HEAD
; duration between offer listener notifications
listener-delay = 1s
Expand Down
59 changes: 53 additions & 6 deletions contrib/mesos/pkg/executor/executor.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ limitations under the License.
package executor

import (
"bytes"
"encoding/json"
"fmt"
"strings"
Expand All @@ -33,6 +34,7 @@ import (
"k8s.io/kubernetes/contrib/mesos/pkg/executor/messages"
"k8s.io/kubernetes/contrib/mesos/pkg/node"
"k8s.io/kubernetes/contrib/mesos/pkg/podutil"
"k8s.io/kubernetes/contrib/mesos/pkg/scheduler/executorinfo"
"k8s.io/kubernetes/contrib/mesos/pkg/scheduler/meta"
"k8s.io/kubernetes/pkg/api"
unversionedapi "k8s.io/kubernetes/pkg/api/unversioned"
Expand Down Expand Up @@ -223,13 +225,21 @@ func (k *Executor) sendPodsSnapshot() bool {
}

// Registered is called when the executor is successfully registered with the slave.
func (k *Executor) Registered(driver bindings.ExecutorDriver,
executorInfo *mesos.ExecutorInfo, frameworkInfo *mesos.FrameworkInfo, slaveInfo *mesos.SlaveInfo) {
func (k *Executor) Registered(
driver bindings.ExecutorDriver,
executorInfo *mesos.ExecutorInfo,
frameworkInfo *mesos.FrameworkInfo,
slaveInfo *mesos.SlaveInfo,
) {
if k.isDone() {
return
}
log.Infof("Executor %v of framework %v registered with slave %v\n",
executorInfo, frameworkInfo, slaveInfo)

log.Infof(
"Executor %v of framework %v registered with slave %v\n",
executorInfo, frameworkInfo, slaveInfo,
)

if !(&k.state).transition(disconnectedState, connectedState) {
log.Errorf("failed to register/transition to a connected state")
}
Expand All @@ -241,8 +251,22 @@ func (k *Executor) Registered(driver bindings.ExecutorDriver,
}
}

annotations, err := executorInfoToAnnotations(executorInfo)
if err != nil {
log.Errorf(
"cannot get node annotations from executor info %v error %v",
executorInfo, err,
)
}

if slaveInfo != nil {
_, err := node.CreateOrUpdate(k.client, slaveInfo.GetHostname(), node.SlaveAttributesToLabels(slaveInfo.Attributes))
_, err := node.CreateOrUpdate(
k.client,
slaveInfo.GetHostname(),
node.SlaveAttributesToLabels(slaveInfo.Attributes),
annotations,
)

if err != nil {
log.Errorf("cannot update node labels: %v", err)
}
Expand Down Expand Up @@ -270,7 +294,13 @@ func (k *Executor) Reregistered(driver bindings.ExecutorDriver, slaveInfo *mesos
}

if slaveInfo != nil {
_, err := node.CreateOrUpdate(k.client, slaveInfo.GetHostname(), node.SlaveAttributesToLabels(slaveInfo.Attributes))
_, err := node.CreateOrUpdate(
k.client,
slaveInfo.GetHostname(),
node.SlaveAttributesToLabels(slaveInfo.Attributes),
nil, // don't change annotations
)

if err != nil {
log.Errorf("cannot update node labels: %v", err)
}
Expand Down Expand Up @@ -988,3 +1018,20 @@ func nodeInfo(si *mesos.SlaveInfo, ei *mesos.ExecutorInfo) NodeInfo {
}
return ni
}

func executorInfoToAnnotations(ei *mesos.ExecutorInfo) (annotations map[string]string, err error) {
annotations = map[string]string{}
if ei == nil {
return
}

var buf bytes.Buffer
if err = executorinfo.EncodeResources(&buf, ei.GetResources()); err != nil {
return
}

annotations[meta.ExecutorIdKey] = ei.GetExecutorId().GetValue()
annotations[meta.ExecutorResourcesKey] = buf.String()

return
}
34 changes: 30 additions & 4 deletions contrib/mesos/pkg/executor/executor_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -168,10 +168,23 @@ func TestExecutorLaunchAndKillTask(t *testing.T) {
}

pod := NewTestPod(1)
podTask, err := podtask.New(api.NewDefaultContext(), "", pod)
executorinfo := &mesosproto.ExecutorInfo{}
podTask, err := podtask.New(
api.NewDefaultContext(),
"",
pod,
executorinfo,
nil,
)

assert.Equal(t, nil, err, "must be able to create a task from a pod")

taskInfo := podTask.BuildTaskInfo(&mesosproto.ExecutorInfo{})
podTask.Spec = &podtask.Spec{
Executor: executorinfo,
}
taskInfo, err := podTask.BuildTaskInfo()
assert.Equal(t, nil, err, "must be able to build task info")

data, err := testapi.Default.Codec().Encode(pod)
assert.Equal(t, nil, err, "must be able to encode a pod's spec data")
taskInfo.Data = data
Expand Down Expand Up @@ -370,8 +383,21 @@ func TestExecutorFrameworkMessage(t *testing.T) {

// set up a pod to then lose
pod := NewTestPod(1)
podTask, _ := podtask.New(api.NewDefaultContext(), "foo", pod)
taskInfo := podTask.BuildTaskInfo(&mesosproto.ExecutorInfo{})
executorinfo := &mesosproto.ExecutorInfo{}
podTask, _ := podtask.New(
api.NewDefaultContext(),
"foo",
pod,
executorinfo,
nil,
)

podTask.Spec = &podtask.Spec{
Executor: executorinfo,
}
taskInfo, err := podTask.BuildTaskInfo()
assert.Equal(t, nil, err, "must be able to build task info")

data, _ := testapi.Default.Codec().Encode(pod)
taskInfo.Data = data

Expand Down

1 comment on commit 9eae47c

@k8s-teamcity-mesosphere

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TeamCity OSS :: Kubernetes Mesos :: 4 - Smoke Tests Build 6347 outcome was SUCCESS
Summary: Tests passed: 1, ignored: 202 Build time: 00:07:22

Please sign in to comment.