# How Docker

### Introduction

We have now seen bits and pieces of Docker.  If there is one thing to focus on, it is Docker's concept of containerization represented by both images and containers.  To really appreciate it, we need to understand what the world looked like before Docker and how Docker did things differently.

### Why Docker Again?

As we saw in the last lesson, Docker allows us to perform all of the environment setup to perform a task -- that is to run a process.  In our last lesson, this task was booting up the ghost platform on our computer.  Think about all that needed to occur.  We needed to have the correct underlying software installed (Ghost uses node), download the ghost codebase, and initialize a database along with all of necessary tables, among other things.  With Docker we were able to do this, without worrying about any underlying steps.

### Keeping it Isolated

One of the key pieces to ensuring that this worked was having a relatively isolated environment.  How isolated?  Well, before Docker, people tended to use virtual machines to make sure the software was *really* isolated.  

A virtual machine is:

> "an efficient, isolated duplicate of a real computer machine."

> [Wikipedia](https://en.wikipedia.org/wiki/Virtual_machine#Definitions)

Below is a a Windows virtual machine operating inside of a Mac.  

<img src="./win-virtual-machine.png" width="40%">

> In the example above we call the mac computer the **host** and the Windows operating system the **guest**.  So the host provides the underlying hardware and computing resources to support a particular guest virtual machine.

As we may suspect, starting up an entire new operating system on our computer has some downsides.  
1. Running a separate operating system on our computer takes up a lot of **space**
2. Booting up this separate operating system takes up a good amount of **time**

And this can really become an issue if say we want two have two separate applications (like maybe a chat application) to also develop on our computer.  To make sure the two applications are isolated, should we really setup a second virtual machine?

This is the promise of Docker:

> Create an isolated environment while limiting the speed and space requirements of this environmental setup. 

### How Docker Maintains this Balance

So the promise of docker is to have isolated environment for each individual task (eg. our chat application, and blogging application), *without* creating a separate virtual machine.

How does Docker achieve this?  It does so by taking advantage of a few features from inside Linux (and Windows if you have a Windows computer).

### 1. A union filesystem  

The first feature that Docker takes advantage of is a union filesystem.  Union filesystems were used in the days of CDs.  

Imagine that we had the first Harry Potter book, `Sorcerer's Stone`on a CD that we borrowed from our friend, and the second Harry Potter book in a folder on our computer named `Chamber of Secrets`.  Well we could have the contents of both books *appear* as if they were in the `harry_potter` folder with a union filesystem.  

> That's one feature of a union filesystem: it allows our folders to act as if **they contain certain files** even when the files really live somewhere else.

<img src="./directories.png">

Now imagine that we want to edit a chapter that was on that CD.  If we actually edited the chapter on the CD, our friend would get annoyed, and justifiably.  So instead, we add an edit layer on top of the original with a new file and keep the original CD unchanged.  This allows a file system to appear as writable, but without actually changing the file system, also known as **copy-on-write**. 

<img src="./harry_copy_on_write.png" width="40%">

This is a **union file system**.  

> A union filesystem allows computer contents to be seen as under a cohesive directory even if they live elsewhere, and used by another resource.  The contents are read only.  Any changes made are performed by first copying the relevant file and then making the edit.

Docker uses the union filesystem so that different images share overlapping pieces of software.  So for example, if boht our blogging application and our chat app need to use Python 3, we will not download a new copy of Python 3 for each image, but instead it will just "appear" as if we have a new copy of Python for each.  If we make any changes to the Python codebase for our blogging application, the chat app would be unaffected as Docker would perform a copy on write.

We can see this evidence of Docker re-using images when we pulled down our ghost image.

```
latest: Pulling from library/ghost
d121f8d1c412: Already exists
3a54a24e4e59: Already exists
8aa65a634fc0: Already exists
7634d710af87: Already exists
ec150ee2ad17: Already exists
8d63bf0b1e87: Already exists
2f2b08fee21d: Already exists
c90a3cd5a740: Already exists
8d1c1d735844: Already exists
Digest: sha256:d36769ce35d3ad3c868a359ad48d9a0b37f886ef4df3571c54936beb4c23689e
Status: Downloaded newer image for ghost:latest
docker.io/library/ghost:latest
```

Each of the identifiers above also pulled is a separate image that can be used by other images because of a union filesystem.  So we can see that Docker is indicating it does not need to pull down certain images, because they already exist on the computer.  

### Containers

In fact the difference between an image and a container, is simply that there is another layer added on top of the image.  When we boot up a container, we add another layer on top of our previous images to allow us to make certain changes if need be.  And we keep the underlying images read only.  Let's take a look at a diagram of the architecture.

<img src="./copy-on-write-container.png" width="30%">

We can see that towards the bottom, we have images of fundamental software like Ubuntu, then emacs, and at the top layer is our writable container.  The benefit of this structure, is that the underlying images can be shared across different containers and images, and if we need to make a change just happens at a layer on top.  

So one way that we acheive isolation in a space efficient way is with a union file system uses that shared read only images and makes them appear as if they are under a specific directory.  Any changes are made by the copy on write to prevent conflicts.  

Remember our goal was to provide the isolation of a virtual environment, with less space requirements and startup time.  So with the union file system Docker makes the **files** required by each container appear isolated.  But it also must make any processes in the container appear isolated -- and it does this with namespaces.

### 2. Namespaces

Imagine that our blogging application and our chat applications both use the same Python image by virtue of the union file system.  If we're running the two applications simultaneously, we want to make sure any tasks we perform in Python in the chat application do not affect our blogging application.  This is achieved by placing the two separate Python processes in different namespaces.   

A namespace is just a folder.  With a namespace in Linux, software is unaware of processes outside of that namespace.  For example, this means that we could have one version of Python installed in one namespace, and a totally separate version installed in a separate namespace.  So this is one way Docker can maintain isolation.

Docker uses the namespacing features in Linux to keep software and processes isolated where they needs to be.  And it can achieve this without a separate virtual machine for each piece of software.  

### Summary

In this lesson, we learned of a predecessor to containerization with virtual machines.  While virtual machines created an isolated and reproducible computing environment, one of the downsides of a virtual machine is that creating a separate operating system takes does not efficiently use time and space.  Creating a new environment requires both time to bootup the environment and space run and store the system.  

Docker instead allows files to be shared across different pieces of software, and maintains isolation by having any changes be a copy on write.  In fact, a container is just another layer on top of an image where changes can be made.  In addition, containers take advantage of namespaces to control what can be seen and accessed from inside of the container.

### Resources

[Cgroups and Linux Containers](https://www.youtube.com/watch?v=el7768BNUPw)

[Understanding Docker Internals](https://medium.com/@nagarwal/understanding-the-docker-internals-7ccb052ce9fe)

[Docker Namespace and Cgroups](https://medium.com/@kasunmaduraeng/docker-namespace-and-cgroups-dece27c209c7)

[Docker Containers and Filesystem](https://medium.com/@nagarwal/docker-containers-filesystem-demystified-b6ed8112a04a)