# Docker and Python Packaging

<center>
    <img src="https://miro.medium.com/max/504/1*iBGlEPUruUqqT5NreeEF8g.png" width=200>
</center>

## Motivation

Data professionals often hear phrases like "it worked on my machine" or "nothing changed, but by workflow stopped running."

You might already be familiar with virtual environments. Virtual environment managers such as `poetry`, `pipenv`, and `conda` are useful since they allow us to fix package versions. 


Simiarly, Docker allows us to build images and run them in other execution environments. An image contains all of the dependencies needed for our application, while a container is an instance of the image. By building a fixed image, we can pin the dependencies so that they stay fixed from run to run. 

## What are some reasons to be using **docker**?

- We can manage processes needing different Python versions on different containers!

- It allows us to include non-Python dependencies in our runtime, such as C compilers or Java for Spark

- Encourages reproducible work and lightweight runtimes (exactly what we need &mdash; nothing more, nothing less)

- Eases the transition from development to deployment

- Containerized environments are the standard within scalable frameworks like kubernetes


## Docker Architecture

The Aqua Security [documentation](https://www.aquasec.com/cloud-native-academy/docker-container/docker-architecture/) has a very good diagram about the Docker architecture. There are three main parts:

* Client - this can be the Docker client or Python client

* Daemon - the daemon is what orchestrates containers and images

* Registry - used to house images that we can pull down from other places

![img](docker_architecture.png)

## Looking at a sample project...

Alongside this notebook, there is a folder named `docker_with_custom_module`. 

It represents a Prefect Flow that uses custom modules defined by the user. 

Packaging custom code as a Python modules allows us to run with from any directory within the container, as well as reuse it in other projects more easily. 

Using this folder, we will build a custom Docker image to support our Flow. The directory structure is like:

```
docker_with_custom_module/
├── components/
│   ├── __init__.py
│   ├── componentA.py
│   ├── componentB.py
├── workflow/
│   ├── custom_flow.py
├── requirements.txt
├── Dockerfile
└── setup.py
```

and a brief description of each of the components:

* `components/*.py` - contains the custom code that will be used in multiple flows

* `workflow/` - contains the Prefect flow

* `requirements.txt` - dependencies of the project

* `Dockerfile` - the instructions to package this folder into a Docker image

* `setup.py` - `pip` looks at this file for instructions how to install the module

### What if we want to use custom Python code in our container?

The components are very simple Python classes. Imagine they're mission-critical custom python modules that we want to use in a production Prefect flow.

 Below is `componentA.py`:

```python
class ComponentA:
    def __init__(self, n=2) -> None:
        self.n = n
```

... `componentB.py` being very similar.

<hr>

Now, let's look at the `custom_flow.py` in the `workflow` folder. This just imports the components and uses them inside a task.

```python
from prefect import flow, task

from components.componentA import ComponentA 
from components.componentB import ComponentB

@task
def custom_task():
    x = ComponentA(2)
    y = ComponentB(2)
    _sum = x.n + y.n
    print(f"Test {_sum}!")  # Should return 4
    return _sum

@flow
def custom_flow():
    custom_task()
```

### Making a module with a `setup.py`

Importing these custom modules works all well and dandy without docker when we run our code locally.

However, in order to package the `components` as a portable module, we need to add the `__init__.py` file inside the folder. The package name and version are used by pip to keep track of the package, but they don’t affect how the package is used in Python code. 

The `find_packages()` function call goes through the subdirectories with an `__init__.py` and includes them in `mypackage`. 

Notice this file takes care of installing the requirements &mdash; the  `setup()` function is the one `pip` looks for in order to install the library:

```python
from setuptools import setup, find_packages

with open('requirements.txt') as f:
    requirements = f.read().splitlines()

setup(
    name="mypackage",
    version='0.1',
    packages=find_packages(),
    install_requires=requirements
)
```

### Installing the custom module

With this file written, we can now install the library by doing,

```
pip install -e .
```

and this lets us import `components` from other directories because the Python path can resolve it.

## Building the Docker image

Now that we have the custom module installed, we want to create the Docker image so that we can run it in other execution environments. 

```Dockerfile
FROM prefecthq/prefect:latest

WORKDIR /app

ADD . .

RUN pip install .
```

<div style="background-color: #70c6ff;border-radius: 10px;padding: 20px;">

💡 **note**:

context on Dockerfile keywords we used above:

`FROM` — this is the base image that we’ll build our image on top of


`WORKDIR` — set the working directory for the container. It will be created if it doesn’t exist


`ADD` — here we copy all of our files from the current directory to the container `WORKDIR`


`RUN` - we can `RUN` a command to add a layer to our image. In this case, those our dependencies from our module

</div>

In order to build the image, we can run a command like the following:

```
docker build . -t test:latest
```

where `test` is the image name and `latest` is the image tag.

### Using the image

In order to check everything is good, we can run the image interactively,

```
docker run --name containername -i -t my_image:latest /bin/bash
```

and from there we should be in the app directory and we can run our flow with:

```
python workflow/flow.py
```

## Upload to a Registry

Now that the image has been created, you can push it to your registry (Dockerhub, AWS ECR, etc.) using the `docker push` command. These registries will have different ways to do it but we'll cover how to do it with DockerHub, which is the de facto registry.

### Auth
We need to auth our CLI session with our image repository otherwisesuperconvenientfreestorage

```console
docker login
```

### Build and Tag
```console
docker build . --tag zzstoatzz/my_image:latest
```

## Using the image for a Flow

In order to use the image for a flow, we can create a deployment using the `DockerFlowRunner()` with the image that we just uploaded

```python
from prefect.deployments import DeploymentSpec
from prefect.flow_runners import DockerFlowRunner

DeploymentSpec(
    name="docker-example",
    flow=custom_flow,
    flow_runner=DockerFlowRunner("repo/image")
)
```

In the next section, we'll look at advanced patterns for workflow orchestration before doing an end-to-end example.