# Ensure that your code works on any system with Docker

We use docker to run our code inside containers. The benefit to this approach is that tour code will work similarly in any OS. Containerization also makes it easy to deploy to cloud systems (or any system that can run containers)

In [None]:
%%capture
! docker build -t dbt-image .

In [None]:
! rm -rf ./tpch_warehouse/models/*/.ipynb_checkpoints # always run before dbt run, caused by notebooks, no need to do this if performed via terminal

In [None]:
! docker run --name dbt-container --rm -v $(pwd)/tpch_warehouse:/dbt dbt-image dbt debug

In [None]:
! docker run --name dbt-container --rm -v $(pwd)/tpch_warehouse:/dbt dbt-image dbt run

In [None]:
! docker run --name dbt-container  --rm -v $(pwd)/tpch_warehouse:/dbt dbt-image dbt test

In [None]:
! docker run --name dbt-container  --rm -v $(pwd)/tpch_warehouse:/dbt dbt-image dbt docs generate

In [None]:
! docker run --name dbt-container  --rm -v $(pwd)/tpch_warehouse:/dbt -p 8080:8080 dbt-image dbt docs serve

In [None]:
! docker stop dbt-container

In [None]:
! docker rm dbt-container

In [None]:
! docker ps

In [None]:
%%capture
! docker compose up --build -d

In [None]:
! docker compose run dbt dbt run 

In [None]:
! docker compose run dbt dbt test

In [None]:
! docker compose run dbt dbt docs generate

In [None]:
! docker compose exec dbt dbt docs serve

In [None]:
%%capture
! docker compose down 

In [None]:
! docker ps

## Define the OS you want to run your code on with an Image

add: image

An image is a blueprint to create your docker container. You can define the modules to install, variables to set, etc. Let’s consider our example:

## Containers are where your OS (& code) runs, they are created from Image

With a blueprint defined with an image we can use this to create one or more containers. Containers are the actual running OS where your code will be run.

Note that we can create multiple containers from the same image.

The image files are often named `Dockerfile` 

The commands in the docker image (usually called Dockerfile ) are run in order. Let’s go over the key commands:

    FROM: We need a base operating system on which to set our configurations. We can also use existing Docker images available at the Docker Hub and add our config on top of them. In our example, we use the official Delta Lake Docker image.
    COPY: Copy is used to copy files or folders from our local filesystem to the image. The copy command is usually used when building the docker image to copy settings, static files, etc. In our example, we copy over the tpch-dbgen folder, which contains the logic to create tpch data. We also copy over our requirements.txt file and our entrypoint.sh file.
    RUN: Run is used to run a command in the shell terminal of your image. It is typically used to install libraries, create folders, etc.
    ENV: This command sets the image’s environment variables. In our example, we set Spark environment variables.
    ENTRYPOINT: The entrypoint command executes a script when the image starts. In our example, we use a script file (entrypoint.sh) to start spark master and worker nodes depending on the inputs given to the docker cli when starting a container from this image.


### Containers can be always running or only run for the duration of your code

Docker containers are by default ephemeral, meaning that they only last for the duration of the process that is running in the container.

In case of a webserver this means htat the container will e always on due to the nature of the process (webserver).

In our case for running `dbt` commands, our container need only run for the duration of the execution of the command.

In certain cases we will want our containers to be running always (e.g. Airflow scheduler, which we will see in the next chapter). 

We can use the `docker exec` command to run a command in existing containers,
docker run starts new containers from images.

## Containers can interact with your local OS

When we run containers, we typically want to 
* sync code changes, ie. when we are developing our IDEs often open the files in your os and thechanges you make here should be reflected inside the copy in the container.
* Open port. When running systems that have some UI/port access locally you want to ensure that these ports of the specified containers are open to your local os

### Ensure ports are open for your code to interact with other systems

In our setup we want to ensure that the docs generated and served by the dbt cli (from inside the container) is accessible from our local os. 

To do this we keep port 8008 open. This will ensure that when we open http://localhost:8080 on our web browser we can actually see the dbt document UI

### Ensure code/data is synced between your local OS and your container with `volume mounts`

Using mounted volumes, we can also ensure that files are synced between the containers and the local operating system. In addition to syncing local files, we can also create docker volumes to sync files between our containers.

This is especially critical when we are developing locally, since we would want the changes to our code reflected inside the containers (where our code would actually run).

add: volume to share data

## Let's run our dbt pipeline with `docker exec`



## Orchestrate multiple containers with `docker compose`

docker cli is simple to use, but when we need to start multiple containers or have containers start in a specific order using a docker compose yml file can greatly simplify our setup



## Start containers with `docker compose up`

Docker compose will start all the defined containers in the `docker-compose.yml` file.

## Run dbt commands with `docker exec`