New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triton compatibility for Mesos #1

Merged
merged 1 commit into from Aug 6, 2015

Conversation

Projects
None yet
2 participants
@blakeolsen

blakeolsen commented Aug 6, 2015

To make Mesos compatible with Triton a couple of changes are required. Triton is a Docker deployment service which will deploy containers across an entire data center. The Triton implementation of Docker increases the scalability, security and reliability of docker while maintaining bare metal performance. Mesos can offer further elasticity to Triton when it comes to deployment of containers across data centers, as well as provide a variety of frameworks that can take advantage of Triton as a service.

I used the following slide deck to outline discussion of the opportunity, my solution so far, and outstanding questions we should consider in doing this:

https://docs.google.com/presentation/d/1o6EfTteCDqKwneVRS7tDnvbVmoatvyX6pQQ_kWGQO9I/edit?usp=sharing

Architecture

In traditional Mesos deployments, a Mesos slave runs on each physical host, but Triton's approach to containers eliminates the notion of a host and treats the entire data center as a single host. In Triton, a Mesos Slave is used to represent an entire data center, rather than a single host. Multiple data centers can be addressed via multiple slaves to support multi-region deployments.

Each slave represents an entire data center, instead of a single physical host, but the relationship between the slave(s) and the master and schedulers or other Mesos components is unchanged. This has been tested with Marathon, which runs unchanged. We expect that other frameworks on top of Mesos will work unchanged as well.

Necessary alterations

Mesos sandbox and Docker volumes

Triton for a variety of reasons has its host volumes as read only and thus, the -v flag linking host volumes has been disabled. In turn we must disable the Mesos-Docker Executor from attempting to add the volume link between the established MESOS_SANDBOX and the docker volumes. We do not however interfere with Mesos establishing a location for the stderr and stdout files (MESOS_SANDBOX) as it is foreseeable that we find a roundabout way of migrating these files into the MESOS_SANDBOX.

Port mapping

Mesos views ports as a consumable resource, however in Triton, where each container gets a unique NIC and IP address, port collisions are impossible and this functionality is no longer warranted. As a result we still track the ports as a resource however whenever a container is created we will not “consume” (remove) the used port from the slaves resources.

Container naming

A minor inconsistency between the Mesos executor and Docker is that the Mesos executor constructs container names by piecing together the executor name with other elements. Unfortunately, the Mesos executor can include capital letters which are not allowed in the names of Docker containers. As a result, illegal names are causing Docker name failures for some requests. The changeset modifies the behavior to coerce container names to fit Docker convention in the executor.

Docker container removal upon destruction

When destroying a Docker container, Mesos would send a docker kill but leave the stopped container in place. Garbage collection issues arose as a result, so we addressed that by removing the container with a docker rm after killing it.

Next steps

  1. Dockerize this. It's tested now with all the pieces manually installed and running in an infrastructure container. That probably includes creating a custom Dockerfile for the slave.
  2. Identify which of these changes should be treated as bugs in upstream code. Container naming is a significant issue.
  3. Try to genericize the changes, possibly turn some of them into switches that can be triggered at runtime, so that we can upstream them.

@blakeolsen blakeolsen changed the title from New Triton to Triton compatibility for Mesos Aug 6, 2015

misterbisson added a commit that referenced this pull request Aug 6, 2015

Merge pull request #1 from blakeolsen/master
Triton compatibility for Mesos

@misterbisson misterbisson merged commit edc2674 into joyent:triton Aug 6, 2015

@misterbisson

This comment has been minimized.

Show comment
Hide comment
@misterbisson

misterbisson Oct 9, 2015

Additionally, #3 was required to eliminate the cgroups assumptions in the Docker containerizer.

misterbisson commented Oct 9, 2015

Additionally, #3 was required to eliminate the cgroups assumptions in the Docker containerizer.

@@ -381,6 +381,8 @@ Future<Nothing> Docker::run(
argv.push_back("-e");
argv.push_back("MESOS_SANDBOX=" + mappedDirectory);
/*

This comment has been minimized.

@misterbisson

misterbisson Oct 16, 2015

Remove the attempt to mount the Mesos sandbox as a host volume in the Docker container. Because Docker containers run in a multi-tenant environment, there's no access to the underlying host filesystem. This is an important factor in multi-tenant security.

@misterbisson

misterbisson Oct 16, 2015

Remove the attempt to mount the Mesos sandbox as a host volume in the Docker container. Because Docker containers run in a multi-tenant environment, there's no access to the underlying host filesystem. This is an important factor in multi-tenant security.

@@ -2816,7 +2816,13 @@ void Master::_accept(
// Add task.
if (pending) {
_offeredResources -= addTask(task_, framework, slave);
Resources taskResources; Resources ports;

This comment has been minimized.

@misterbisson

misterbisson Oct 16, 2015

This removes network ports as a consumable resource. Because every container gets one or more unique network interfaces, there's never a port conflict to worry about. This simplified networking is one of the many advantages of Joyent's container-native infrastructure.

@misterbisson

misterbisson Oct 16, 2015

This removes network ports as a consumable resource. Because every container gets one or more unique network interfaces, there's never a port conflict to worry about. This simplified networking is one of the many advantages of Joyent's container-native infrastructure.

@@ -1303,11 +1303,14 @@ void DockerContainerizerProcess::destroy(
container->termination.set(termination);
containers_.erase(containerId);
remove(container->name(), None());

This comment has been minimized.

@misterbisson

misterbisson Oct 16, 2015

Here (and in a number of places throughout this file), we're removing stopped Docker containers. Billing accrues for every provisioned container in Joyent's container-native infrastructure, so this step eliminates the need for garbage collection of stopped containers.

@misterbisson

misterbisson Oct 16, 2015

Here (and in a number of places throughout this file), we're removing stopped Docker containers. Billing accrues for every provisioned container in Joyent's container-native infrastructure, so this step eliminates the need for garbage collection of stopped containers.

return DOCKER_NAME_PREFIX + slaveId.value() + DOCKER_NAME_SEPERATOR +
stringify(id);
std::string slaveIdstring = slaveId.value();
std::transform(slaveIdstring.begin(), slaveIdstring.end(),

This comment has been minimized.

@misterbisson

misterbisson Oct 16, 2015

Here we force the Docker container name to be lower case to prevent bugs elsewhere.

@misterbisson

misterbisson Oct 16, 2015

Here we force the Docker container name to be lower case to prevent bugs elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment