# Going Deeper with Building Images

### Introduction

### Unpacking the steps 

In the last lesson, we saw that we performed three steps with building an image:

```python 
# 1. FROM base_image
# 2. RUN build onto environment
# 3. CMD to run this task
```

And that we applied them, with the following:

```python
# jupyter-kaggle/Dockerfile
FROM jupyter/scipy-notebook
RUN conda install 'kaggle'
CMD ["jupyter", "notebook"]
```

Now, let's unpack these steps.

**1.  Choosing the base image?**

We choose the base image to prevent us from building a new image from scratch.  For choosing a base image, we can look the at the image's Dockerfile on Dockerhub to see what software the image comes pre-installed with.  For different Jupyter images, we can visit the [jupyter/docker-stacks repo](https://github.com/jupyter/docker-stacks).  In our earlier lesson, we used the [jupyter/scipy-notebook](https://hub.docker.com/r/jupyter/scipy-notebook/dockerfile).

**2. Adding customizations**

From there, we build on the image.  In building our customer image, we did so with the line: 

```python
RUN conda install 'kaggle'
```

Now `RUN` is part of the docker language.  But how could we execute a command like `conda install`?  Well we only could if `conda` was already installed in our image.  And while we do not see it installed in the current `jupyter/scipy-notebook` Dockerfile, we can see that the base image is `minimal-notebook`.  Following that images base image of [jupyter/base-notebook](https://hub.docker.com/r/jupyter/base-notebook/dockerfile), we see that it is installed there.

```python
RUN cd /tmp && \
    wget --quiet https://repo.continuum.io/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh && \

```

And this `wget` command is only because it is available on the *base-notebook's* base image of *ubuntu*.

**3. Finishing with CMD**

Finally, we end each Dockerfile with CMD.  The CMD always takes a list as an argument.  The command task is booted up when the image is used to run a container.

### The build process

Now to build our image we run `docker build .` 

That `.` at the end of `docker build` is our build context.  With it, we are specifying the directory with the Dockerfile, which in this case is the current directory.  

After running `docker build .`, our terminal displays all of the steps in the docker build process.  And additional line in our dockerfile leads to another step.  This is what we saw:

```
Sending build context to Docker daemon  2.048kB
Step 1/3 : FROM jupyter/scipy-notebook
 ---> 844815ed865e
Step 2/3 : RUN conda install 'kaggle'
 ---> Running in 4f5b1210656d
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

...

Removing intermediate container 4f5b1210656d
 ---> 29886ca05035
Step 3/3 : CMD ["jupyter", "notebook"]
 ---> Running in 5b0a8c54d511
Removing intermediate container 5b0a8c54d511
 ---> 09c2125c0a49
Successfully built 09c2125c0a49
```

Ok, so let's unpack what is happening here. 

* Step 1/3: Docker uses the existing `jupyter/scipy-notebook` and then outputting the image id `---> 844815ed865e`.

* Step 2/3:  `---> Running in 4f5b1210656d` is the *container id*.  This means that Docker used the previous image to boot up a new container, with id `4f5b1210656d` and in that container installed kaggle.  

* `Removing intermediate container 5b0a8c54d511
 ---> 09c2125c0a49` states that the previous step is complete, we can use the updated container to output a new image, and then remove the old container.
 
* Step 3/3: A similar process occurs of starting a new container with the previous image, adding something to the container (here the `CMD ["jupyter", "notebook"]`, and then producing a new image and removing the container.

### The build cache

Finally notice what happens when we rebuild the image.  Docker, in it's words uses the `cache`.

`docker build -t jeffkatzy/jupyter-kaggle .`

```
Sending build context to Docker daemon  2.048kB
Step 1/3 : FROM jupyter/scipy-notebook
 ---> 844815ed865e
Step 2/3 : RUN conda install 'kaggle'
 ---> Using cache
 ---> 29886ca05035
Step 3/3 : CMD ["jupyter", "notebook"]
 ---> Using cache
 ---> 09c2125c0a49
Successfully built 09c2125c0a49
Successfully tagged jeffkatzy/jupyter-kaggle:latest
```

Essentially, in `Step 2/3`, Docker sees that it has already gone from a `jupyter/scipy-notebook` image to `RUN conda install 'kaggle'` and therefore does not need to re-initialize a container, and turn that into an image.  Instead it just uses the cache, or the previous intermediate image, and then moves on from there.  

### Summary