# Optimize your Docker Infrastructure with Python

## PyData NYC

November 11, 2015

Ryan J. O'Neil  
<ryanjoneil@gmail.com>  

### Intro

By Day: Lead Engineer @ Yhat, Inc.  

Formerly:
* Simulation, modeling, optimization @ MITRE
* Data Journo @ The Washington Post

By Night: PhD Candidate in SEOR @ George Mason University

Research:
* Combinatorial optimization
* Scheduling problems
* Cutting & packing problems

### Motivation

Consider the DevOps Engineer.

#### It's a noble group...

![](images/scotty.jpg)

### ...with grave responsibilities.

One of the important functions of DevOps is the creation of environments for:

* Software development
* Testing & quality assurance
* Running operational systems
* Whatever else you might need a computing environment for

#### There's even a Venn diagram about DevOps.

And we all know how much we like Venn diagrams.

![](images/devops-venn.svg)

#### A typical DevOps function

Say you work on _Yet Another Enterprise Java Application (TM)_. YAEJA generates lots of \$\$\$\$ for _Yet Another Enterprise Software Company_ and it keeps you gainfully employed!

Say you need to upgrade portions of the system YAEJA it runs _(and depends)_ on.

Youo don't want to just upgrade the production instances without testing first. So you need two environments from your DevOps person.

One for running the production instance:

|     | Command                                                        |
|-----|----------------------------------------------------------------|
| $A$ | Install the Java compiler and runtime environment.             |
| $B$ | Download a set of external dependencies.                       |
| $C$ | Set up a an EntepriseDB (TM) schema and populate it with data. |

And another for testing it with the new version of the utility:

|     | Command                                                        |
|-----|----------------------------------------------------------------|
| $A$ | Install the Java compiler and runtime environment.             |
| $B$ | Download a set of external dependencies.                       |
| $C$ | Set up a an EntepriseDB (TM) schema and populate it with data. |
| $D$ | Update the underlying system utility.                          |

If _Yet Another Enterprise Java Application (TM)_ integrates with a number of optional third party applications, you might need an additional test environment for each one.

If those interact in interesting ways, you may need test environments for different combinations of third party integrations.

#### Looks like your DevOps person will be working all weekend

![](images/saturday.png)

#### In the old days...

At a big software shop, devops may be responsible for the continual setup and teardown of dozens (or _hundreds_) of system configurations for a single software project.

This used to be done with physical hardware on _(sometimes)_ fresh operating system installs.

Most medium-to-large software shops had an air conditioned room that looked like this.

![](images/cable-spaghetti.jpg)

#### Environmental setup was often pretty manual

If you had to reproduce an environment, you had a few options:

* Start with a fresh install of the operating system


* Save the results of your setup to a CD and load that onto a box


* Hope that uninstalling and resintalling the relevant components is good enough  

    + A _lot_ of people did this
    + It's probably not good enough
    + Most software leaves behind relics
        - Logs...
        - Data files...
        - Configuration...

#### Nowadays...

![](images/cloud-docker.png)

### What's a container?

Docker is so hot right now.

Containers are lightweight virtualization. They make it seem like a process is in it own operating system on its own hardware, without loading up heavy stuff like a kernel.

Container architectures have bveen around since `chroot` jails in V7 Unix.


They're not exactly _new_...

But now they're so convenient they feel _(and are raising venture capital)_ that way!

#### A tiny bit of history

* 1979: `chroot` jails added to System 7 Unix at Bell Labs
    + Last version of Unix before it was commercializated by AT&T
    + Ran on a DEC PDP-11 minicomputer

* 1982: Bill Joy ports `chroot` to BSD

* 2005: Sun releases Solaris Containers
    + Zones provide fully isolated virtual servers on a single host

* 2007: Initial implementation of `cgroups` by Google for Linux Kernel 2.6.24
    + Isolation of system resources

* 2013: **Namespace isolation** introduced in Linux Kernels 3.15 & 3.16
    + Process IDs
    + Network interfaces, iptables, routing
    + Inter-Process Communication
    + etc...

#### Containers are about isolation...

Processes and system resources behave as if the are on their own computers.

##### Container 1

```
[ryan@localhost ~]$ docker run -it ubuntu:trusty /bin/bash
root@19867869f71d:/# echo spam and eggs > /ingredients.txt
root@19867869f71d:/# cat /ingredients.txt                         
spam and eggs
```

##### Container  2

```
[ryan@localhost ~]$ docker run ubuntu:trusty cat /ingredients.txt
cat: /ingredients.txt: No such file or directory
```

They have their own process spaces.

```
[ryan@localhost ~]$ docker run --cidfile=cid -it ubuntu:trusty
root@fc347d97db8c:/# echo $$
1
```

And they are convinced they have their own hardware resources.

```
[ryan@localhost ~]$ docker stats --no-stream=true $(cat cid)

CONTAINER
fc347d97db8c7d02b870eff3e2d1e92747e8c0fc1cb7f9b1c76bf534fcd21ba0

CPU %               MEM USAGE/LIMIT     MEM %               NET I/O
0.00%               524.3 kB/7.945 GB   0.01%               648 B/648 B
```

#### ...but containers are also about sharing.

And this is what we care about today.

Specifically, saving and retrieving the results of a computation from the Docker image cache.

Why?

Smart cache use == time saved building out environments!

### Docker cache mechanics

Maybe I should call this section UnionFS mechanics, but then Docker is so hot right now.

#### A Tale of Two Dockerfiles

Two similar dockerfiles here (ABC

#### What happens when build these into images?

#### 

#### 

#### 

#### 

### Problem statement

Let's write this thing in LaTeX!

### Partitioning sets with PuLP

We'll start with a really easy NP-Complete problem...

### Finding maximal cliques with NetworkX

...add another NP-Complete subproblem...

### Model construction

...and then we'll solve both of them.

### Results

No, really. That's how we do this thing.