Skip to content

Containerization

robnagler edited this page May 18, 2016 · 1 revision

Why Containers?

This article is designed to help physicists understand containers. It's a work in progress.

Computers

A computer is a collection of devices, which talk with one another electronically via busses. There are many kinds of busses in a computer, and even more kinds of devices. Yet, the way devices talk to one another is basically the same: each device has an address (name), and the bus knows how to deliver messages to/from addresses. Think of a computer like a translator: it converts 1's and 0's (addresses and messages) to/from the physical world, e.g. spinning disks, LEDs, and mouse clicks.

Kernels

A kernel is a translator, too. Modules within the kernel know how to take messages from programs and turn them into messages for devices, and vice-versa. The kernel modules are software abstractions, which are layered. For example, the kernel's SATA module translates between SATA disks and an abstract interface called block devices.

Kernel modules define many different types of abstractions (resources) such as, users, processes, and files. These resources have names (addresses), and there are APIs (busses) in the kernel for communicating within the kernel and to user space processes.

Processes

Like the kernel which is copied from disk by a boot loader, a process is an in memory copy of a program loaded from the file system by the kernel. Processes are translators, too. Messages coming from the kernel run through the program and result in messages going back to the kernel.

Hypervisors

A hypervisor fakes the 1's and 0's that the kernel sees as messages and addreses from a physical computer. It's actually very easy to trick the kernel. It is, after all, just another program and when it is loaded into memory, the hypervisor runs it like any other process. When the kernel sends a message to a device, it is talking to the hypervisor which emulates the device.

There are two kinds of hypervisors: hosted and native. When you rent a virtual machine from Amazon, your virtual machine is running on a native hypervisor, that is, the hypervisor is the "kernel" for the computer (as described above). Virtual machines running on your laptop are running on a hosted hypervisor (VirtualBox, Parallels, VMWare, etc.). However, from the virtual machine's perspective, there's no difference. The hypervisor is providing a complete emulation of a physical computer for the guest operating system, which is why you can run off-the-shelf Windows or Linux on your Macbook.

Containers

Virtual Machines (VMs) are very cool. However, they have to do a lot of work to maintain a complete abstraction of a physical machine. This is inefficient and for the vast majority of processes (programs) is unnecessary. Most programs use the high-level abstractions (files, time, TCP/IP, and pseudo terminals) of the operating system, and have no need to go any deeper (block devices, hardware clock, ethernet, and keyboards).

Instead of emulating the computer, LXC (LinuX Containers) isolates kernel resources for a collection of processes (container). Essentially, LXC is a lightweight, hosted hypervisor without the need for emulation, since all of the guest containers share all the same resources through one kernel. The trick is that LXC fakes the names of kernel resources so that process identifiers, file systems, users, etc. are translated between the container and the kernel.

Containers are a complete abstraction of all the relevant resources used by the vast majority of programs.

Vagrant

When you load VirtualBox on your computer, it starts a GUI, and you can start virtual machines willy nilly, which run in a window on your laptop's windowing system. This is nice if you want to run Microsoft Word on your Linux laptop. However, if you are developing terminal-based software, which might run on a supercomputer, you don't need the overhead.

Vagrant is a program for configuring headless VMs (and containers). It's almost like magic. To start a VM running CentOS, you simply say:

vagrant init centos
vagrant up
vagrant ssh

A tiny version of CentOS is downloaded implicitly (once), the VM is initialized with that image (init), and then booted (up). You then can interact with that server via ssh just like if it were in the cloud.

Vagrant wraps all of the above technology in one simple program vagrant. As noted above, you can also run containers in Vagrant, but it's a bit more awkward, and doesn't provide the best interface to containers.

Docker

Like Vagrant, Docker is a productivity enhancer for LXC. It assumes you are running on Linux, which means you will need Vagrant to boot a Linux VM on your laptop before you can run a container with Docker. However, once you've got Docker installed, you can start a container like this:

# docker run centos:centos6 /bin/echo hello, world
hello, world

Containers vs VMs

An important difference between VMs and (LXC) containers is that VMs can be run as an ordinary user and containers must be run as root. You can run non-root processes inside a container, but you have to configure them as such.

Containers, unlike VMs, start instantly. Here's a timed Docker command:

# time docker run centos:centos6 /bin/echo hello
hello

real	0m0.144s
user	0m0.006s
sys	0m0.006s

Here's the same thing with a VM, which runs 1,000 times slower:

$ time vagrant init chef/centos-6.5
A `Vagrantfile` has been placed in this directory. You are now
ready to `vagrant up` your first virtual environment! Please read
the comments in the Vagrantfile as well as documentation on
`vagrantup.com` for more information on using Vagrant.

real    0m0.985s
user    0m0.893s
sys 0m0.087s
$ time vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'chef/centos-6.5'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'chef/centos-6.5' is up to date...
==> default: Setting the name of the VM: test_default_1422602006210_79210
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
default: Adapter 1: nat
==> default: Forwarding ports...
default: 22 => 2222 (adapter 1)
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
default: SSH address: 127.0.0.1:2222
default: SSH username: vagrant
default: SSH auth method: private key
default: Warning: Connection timeout. Retrying...
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...
==> default: Mounting shared folders...
default: /vagrant => /Users/nagler/vagrant/test

real    0m56.531s
user    0m4.612s
sys 0m2.241s
$ time vagrant ssh echo hello
The machine with the name 'echo' was not found configured for
this Vagrant environment.

real    0m1.362s
user    0m1.009s
sys 0m0.120s