# Working on data on host system (mount directory)

## Non-persistance of file system changes in container

While the container is a nice environment to provide our tailored installation environment and computational tools, it does not retain any changes to the disk system: as soon as the container process is stopped, all changes to the files within the container are forgotten:

In [1]:
!docker run ubuntu:18.04 echo "Hello" > hello.txt && ls -l hello.txt

-rw-r--r--  1 fangohr  staff  6 Sep  8 21:40 hello.txt


In the example above, we create the file `hello.txt` and store `Hello` in it. The `&&` means that if that command was executed successfully, we will carry on and also execute the next one. The next command (`ls -l hello.txt`) tries to display the file `hello.txt`, and the command succeeds.

At this point, our container sessions stops, and all changes in the container are forgotten. We can confirm this by trying to use `ls -l` exactly as we did before, but find that the file `hello.txt` is not there:

In [2]:
!docker run ubuntu:18.04 ls -l hello.txt

ls: cannot access 'hello.txt': No such file or directory


While it is possible to create special data containers for persistent data, I find it more straight forward to mount a directory from the host system into the container, and to save any output data on this mounted directory.

## Mounting a directory from the host to be available in the container

Let's create a new container to demonstrate this. We will use `cowsay` as the application we install in the container, and we want it to "say" something that comes from an input file (on the host system) and to produce output in the container, which should be saved to the host file system so we can make use of this when the container execution has completed.

In [3]:
%%file Dockerfile
FROM ubuntu:18.04

RUN apt-get update 
RUN apt-get install -y cowsay

# cowsay installs into /usr/games. Make avaible in PATH:
RUN ln -s /usr/games/cowsay /usr/local/bin 

# create directory we use for input and output 
RUN mkdir /io

# change into that direcotry
WORKDIR /io

Overwriting Dockerfile


In [4]:
!docker build -t cowimage-mount .

Sending build context to Docker daemon  163.3kB
Step 1/6 : FROM ubuntu:18.04
 ---> cd6d8154f1e1
Step 2/6 : RUN apt-get update
 ---> Using cache
 ---> 864c48282361
Step 3/6 : RUN apt-get install -y cowsay
 ---> Using cache
 ---> 951b8495700a
Step 4/6 : RUN ln -s /usr/games/cowsay /usr/local/bin
 ---> Using cache
 ---> 532aedd4e4e6
Step 5/6 : RUN mkdir /io
 ---> Using cache
 ---> a9d42e44c301
Step 6/6 : WORKDIR /io
 ---> Using cache
 ---> 426382c2da8b
Successfully built 426382c2da8b
Successfully tagged cowimage-mount:latest


Let's check that we start in `/io` if we use the container:

In [5]:
!docker run cowimage-mount pwd

/io


Now we need to mount our local directory to `/io` when we call docker. Let's first create an itput data file:

In [6]:
%%file cow-input.txt
Hello from file

Overwriting cow-input.txt


Let's also make sure no file `cow-output.txt` is on the disk. (We create the file in the next command.)

In [7]:
!rm -f cow-output.txt

In [8]:
!docker run -v `pwd`:/io cowimage-mount  cowsay `cat cow-input.txt` > cow-output.txt

Let's first check that this has created our output file `cow-output.txt`, and that the file is available on the host system:

In [9]:
!ls -l cow-output.txt

-rw-r--r--  1 fangohr  staff  181 Sep  8 21:40 cow-output.txt


The file exists. What does it contain?

In [10]:
!cat cow-output.txt

 _________________
< Hello from file >
 -----------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||



This is good. We discuss all parts of the command line now.

The `-v` option tells docker to mount a Volume. In particular, the notation `-v A:B` asks to mount the path `A` from the host file system to the path `B` in the container. As we would like to mount our current directory from the host, we use the `pwd` command (which stands for Print Working Directory). By enclosing `pwd` in backticks (\`), the output of the `pwd` command is used to represent `A`.

The path `B` to which we mount the directory is `/io`. Note that we have asked in the Dockerfile that the process in the container should start in this directory.



The actual command to be executed within the container is 
```
cowsay `cat cow-input.txt` > cow-output.txt
```

- `cowsay` is the name of the executable.
- `cow-input.txt` is a file on the host file system, which is available within the container in `/io` because we mounted the directory
- with `` `cat cow-input.txt` ``we, we take the content of the `cow-input.txt` file (which is `hello from file` as we have created the file earlier in this notebook), and pass this content to `cowsay`. As a result, the cow prints this in the speech bubble.
- with `> cow-output.txt` we send the (standard) output from the process into the file `cow-output.txt`. As the file doesn't exist, it is created (in the container) in our directory `/io`) and as our host directory is mounted to `/io` in the container, the file is actually saved in the host directory. And therefore available after the docker container process has completed.