# How to create a Devcontainer for your Python project 🐳

![Docker ship background wallpaper image](https://godatadriven.com/wp-content/uploads/2022/09/without-ship-taller-img.png)

<img src="images/presentation/jeroen-overschie-intro.png" alt="Jeroen Overschie" width="600px" />

## The use case

You are assigned to setup a new repo for a team. The requirements are as follows:

<table>
<tr>
<td>
<img src="images/presentation/logo-spark.png" alt="Apache Spark" width="200px" class="fragment" />
</td>
<td>
<img src="images/presentation/logo-pyspark.png" alt="Apache Spark - pyspark" width="200px" class="fragment" />
</td>
<td>
<img src="images/presentation/logo-python.png" alt="Python" width="200px" class="fragment" />
</td>
</tr>
</table>

- Apache Spark to **process data** (we have a cluster setup. Plus needs **Java**.)
- pyspark (pip package)
- Python (Python version)

So we need to align on:

- 📌 A specific Java version

- 📌 A specific Python version

- 📌 A specific `pyspark` version

→ otherwise we do not enjoy the guarantees we want in **production** code

# “I never get to the actual work because I am always configuring Python virtual environments”

— Python Developer, source and date unknown

<table style="background-color: #F5F5F5">
    <tr style="background: none">
        <td style="border: none">
            <img src="./images/presentation/teams-logo-32.png" />
        </td>
    </tr>
    <tr style="background: none">
        <td style="border: none">
            <img class="fragment" src="./images/presentation/messed-up-pyenv.png" />
        </td>
        <td style="border: none">
        </td>
        <td style="border: none">
            <img class="fragment" src="images/presentation/project-deps-dont-work-on-modern-os.png" />
        </td>
    </tr>
    <tr style="background: none">
        <td style="border: none">
            <img class="fragment" src="./images/presentation/difficulties-dockerizing-and-going-into-production.png" />
        </td>
        <td style="border: none" colspan="2">
            <img class="fragment" src="./images/presentation/misaligned-environments.png" />
        </td>
    </tr>
    <tr style="background: none">
        <td style="border: none" colspan="3">
            <img class="fragment" src="./images/presentation/corrupted-venv.png" />
        </td>
    </tr>
</table>

1. _Python Dev colleague_. Hi! Sorry I couldn't join the meeting just now. I messed up my mpyenv again 🥲. Wanted to fix it fast so I could at least have it working this afternoon.
2. _John the Python Dev_. Hey so I was working on maintaining the old project. Just found out that some of the dependencies don't even work on my MacOS version anymore. FML
3. _Corporate colleague using a Windows laptop_. Hey I'm trying to dockerize the project but it's haaard. Everything that works on my Windows laptop seems to fail on Linux. We should have done this earlier. Anyway: I think we should ask the PO for an extension for running the entire thing in production.
4. _Me_: Hii. Can you maybe run `pip show pyspark`? I'm curious which pyspark version you are running 🧐. Because if it works for you but not for me & also not in the CI maybe your environment is different. Just checking.
5. _A fellow Data Scientist_. There weren't any docs on how to set up the correct venv right? Mine somehow got corrupted 🙃. It's saying "No module named 'numpy'" even though I seemingly have it installed. Not sure how to fix.

## Devcontainers to the rescue ⛑!

**Docker** helps us create a formal definition of our environment. **Devcontainers** allow you to connect your editor (IDE) to that container.

- **Docker**
- **Devcontainer**: connect to your IDE
- **Formal** instruction set for dev env setup

<span style="color: #aaa;">📝 Note that this does mean running Docker images on your laptop (performance requirement).</span>

<span style="color: #4d4d4d;">Devcontainers can help us:</span>

- 🔄 Reproducible development environment
- ⚡️ Faster project setup → faster onboarding
- 👨‍👩‍👧‍👦 Better alignment between team members
- ⏱ Forced to keep your dev environment up-to-date & reproducible
    - → saves your team time going into **production** later

## 👷🏻‍♂️ Let's build a Devcontainer!

Let’s say we have a really simple project that looks like this:

```bash
$ tree .
.
├── README.md
├── requirements.txt
├── requirements-dev.txt
├── sales_analysis.py
└── test_sales_analysis.py
```

### The `.devcontainer` folder
Your Devcontainer spec will live inside the `.devcontainer` folder.

 There will be two main files:

- `devcontainer.json`
- `Dockerfile`


Create a new file called `devcontainer.json`:

```json
{
    "build": {
        "dockerfile": "Dockerfile",
        "context": ".."
    }
}
```


This does basically means: as a base for our Devcontainer, use the `Dockerfile` located in the current directory, and build it with a *current working directory* (cwd) of `..`.


So how does this `Dockerfile` look like?

```docker
FROM python:3.10

# Install Java
RUN apt update && \
    apt install -y sudo && \
    sudo apt install default-jdk -y

## Pip dependencies
# Upgrade pip
RUN pip install --upgrade pip
# Install production dependencies
COPY requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt && \
    rm /tmp/requirements.txt
# Install development dependencies
COPY requirements-dev.txt /tmp/requirements-dev.txt
RUN pip install -r /tmp/requirements-dev.txt && \
    rm /tmp/requirements-dev.txt
```


We are building our image on top of `python:3.10`, which is a Debian-based image. This is one of the Linux distributions that a Devcontainer can be built on. The main requirement is that **Node.js** should be able to run: VSCode automatically installs VSCode Server on the machine.

For an extensive list of supported distributions, see [“Remote Development with Linux”](https://code.visualstudio.com/docs/remote/linux).

### Opening the Devcontainer

The `.devcontainer` folder in place, now it’s time to open our Devcontainer.


Open up the command pallete (<kbd>CMD</kbd> + <kbd>Shift</kbd> + <kbd>P</kbd>) and select “*Dev Containers: Reopen in Container*”:

![Dev Containers: Reopen in Container](https://godatadriven.com/wp-content/uploads/2022/10/reopen-in-devcontainer-notification.png)


Upon opening a repo with a valid `.devcontainer` folder, you are already notified:

![folder contains a dev container config file](https://godatadriven.com/wp-content/uploads/2022/10/folder-contains-a-dev-container-config-file.png)


Your VSCode is now connected to the Docker container 🙌🏻:

<!-- ![VSCode is now connected to the Docker container](https://godatadriven.com/wp-content/uploads/2022/10/opening-the-devcontainer.gif) -->

<img src="https://godatadriven.com/wp-content/uploads/2022/10/opening-the-devcontainer.gif" width="80%" alt="VSCode is now connected to the Docker container"/>


### What is happening under the hood 🚗

<span style="color: #ccc;">Besides starting the Docker image and attaching the terminal to it, VSCode is doing a couple more things:</span>

<ol>
    <li class="fragment"><b><a href="https://code.visualstudio.com/docs/remote/vscode-server">VSCode Server</a></b> is being installed on your Devcontainer.
        <br><span style="color: #aaa;">VSCode Server is installed as a service in the container itself so your VSCode installation can communicate with the container. <br>For example, install and run <b>extensions</b>.</span>
    </li>
    <li class="fragment"><b>Config is copied</b> over.
        <br><span style="color: #aaa;">Config like <code>~/.gitconfig</code> and <code>~/.ssh/known_hosts</code> are copied over to their respective locations in the container.</span>
    </li>
    <li class="fragment"><b>Filesystem mounts</b>. 
        <br>
        <span style="color: #aaa;">VSCode automatically takes care of mounting: 
        <ul>
            <li>The folder you are running the Devcontainer from.</li>
            <li>Your VSCode workspace folder.</li>
        </ul>
        </span>
    </li>
</ol>


### Opening your Devcontainer with the click of a button

<span style="color:#bbb;">Your entire project setup is now encapsulated in the Devcontainer. So actually we can add a <b style="color: black;">Markdown</b> button to open up the Devcontainer:</span>


<pre><code data-line-numbers="1">
[
    ![Open in Remote - Containers](
        https://img.shields.io/static/v1?label=Remote%20-%20Containers&message=Open&color=blue&logo=visualstudiocode
    )
](
    https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/godatadriven/python-devcontainer-template
)
</code></pre>

<span style="color:#bbb;">Just modify the GitHub URL after `url=` ✓.</span>

This renders the following button:

[![Open in Remote - Containers](https://img.shields.io/static/v1?label=Remote%20-%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/godatadriven/python-devcontainer-template)


What kind of README would you rather like?

Manual installation             |  Using a Devcontainer 🙌🏻
:-------------------------:|:-------------------------:
![](https://godatadriven.com/wp-content/uploads/2022/10/installation-instructions-manual.png)  |  ![](https://godatadriven.com/wp-content/uploads/2022/10/installation-instructions-devcontainer.png)


## Extending the Devcontainer

<span style="color:#bbb;">We have built a working Devcontainer, that is great! But a couple things are still missing.</span>


- Install a **non-root user** for extra safety and good-practice

- Pass in **custom VSCode settings** and install extensions by default


- Be able to access Spark UI (**opening up port 4040**)


- Run **Continuous Integration** (CI) in the Devcontainer


Let's see how.


### Installing a non-root user

<!-- > Step 2 -->

If you `pip install` a new package, you will see the following message:

![The warning message: “*WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: [https://pip.pypa.io/warnings/venv](https://pip.pypa.io/warnings/venv)*](https://godatadriven.com/wp-content/uploads/2022/10/running-pip-as-root.png)


So let's go ahead and create a user for this scenario.

```bash
# Add non-root user
ARG USERNAME=nonroot
RUN groupadd --gid 1000 $USERNAME && \
    useradd --uid 1000 --gid 1000 -m $USERNAME
## Make sure to reflect new user in PATH
ENV PATH="/home/${USERNAME}/.local/bin:${PATH}"
USER $USERNAME
```



Add the following property to `devcontainer.json`:

```json
    "remoteUser": "nonroot"
```

That's great! When we now start the container we should connect as the user `nonroot`.


### Passing custom VSCode settings




```json
     "customizations": {
        "vscode": {
            "extensions": [
                "ms-python.python"
            ],
            "settings": {
                "python.testing.pytestArgs": [
                    "."
                ],
                "python.testing.unittestEnabled": false,
                "python.testing.pytestEnabled": true,
                "python.formatting.provider": "black",
                "python.linting.mypyEnabled": true,
                "python.linting.enabled": true
            }
        }
    }
```

The defined extensions are always installed in the Devcontainer. However, the defined settings provide just a **default** for the user to use, and can still be overriden by other setting scopes like: User Settings, Remote Settings or Workspace Settings.


### Accessing Spark UI

<!-- > Step 4 -->

Since we are using pyspark, it would be nice to be able to access **Spark UI**.


```json
    "portsAttributes": {
        "4040": {
            "label": "SparkUI",
            "onAutoForward": "notify"
        }
    },

    "forwardPorts": [
        4040
    ]
```

When we now run our code, we get a notification we can open Spark UI in the browser:

![open Spark UI in the browser](https://godatadriven.com/wp-content/uploads/2022/10/application-running-on-port-4040.png)



Resulting in the Spark UI like we know it:

![spark UI in the browser](https://godatadriven.com/wp-content/uploads/2022/10/spark-ui-visible-in-localhost-4040.png)

✨

### Running our CI in the Devcontainer


Wouldn't it be convenient if we could re-use our Devcontainer to run our Continuous Integration (CI) pipeline as well? Indeed, we can do this with Devcontainers. Similarly to how the Devcontainer image is built locally using `docker build`, the same can be done _within_ a CI/CD pipeline.


There are two basic options:

1. Build the Docker image _within_ the CI/CD pipeline
2. Prebuilding the image

<span style="color:#aaa">Let's see about option number (1).</span>

#### 1. Build the Docker image _within_ the CI/CD pipeline

Luckily, a GitHub Action was already setup for us to do exactly this:

[devcontainers/ci](https://github.com/devcontainers/ci)


To now build, push and run a command in the Devcontainer is as easy as:

```yaml
name: Python app

on:
  ...

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout (GitHub)
        uses: actions/checkout@v3

      - name: Login to GitHub Container Registry
        uses: docker/login-action@v2
        with:
          registry: ghcr.io
          username: ${{ github.repository_owner }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Build and run dev container task
        uses: devcontainers/ci@v0.2
        with:
          imageName: ghcr.io/${{ github.repository }}/devcontainer
          runCmd: pytest .
```

That's great! Whenever this workflow runs on your main branch, the image will be pushed to the configured registry; in this case GitHub Container Registry (GHCR).

See below a trace of the executed GitHub Action:

![running-ci-in-the-devcontainer-github-actions](https://godatadriven.com/wp-content/uploads/2022/10/running-ci-in-the-devcontainer-github-actions.png)



Awesome!


## The final Devcontainer definition

We built the following Devcontainer definitions. 


First, `devcontainer.json`:

```json
{
    "build": {
        "dockerfile": "Dockerfile",
        "context": ".."
    },

    "remoteUser": "nonroot",

    "customizations": {
        "vscode": {
            "extensions": [
                "ms-python.python"
            ],
            "settings": {
                "python.testing.pytestArgs": [
                    "."
                ],
                "python.testing.unittestEnabled": false,
                "python.testing.pytestEnabled": true,
                "python.formatting.provider": "black",
                "python.linting.mypyEnabled": true,
                "python.linting.enabled": true
            }
        }
    },
    ...
}
```

```json
"portsAttributes": {
    "4040": {
        "label": "SparkUI",
        "onAutoForward": "notify"
    }
},

"forwardPorts": [
    4040
]
```


And our `Dockerfile`:

```docker
FROM python:3.10

# Install Java
RUN apt update && \
    apt install -y sudo && \
    sudo apt install default-jdk -y

# Add non-root user
ARG USERNAME=nonroot
RUN groupadd --gid 1000 $USERNAME && \
    useradd --uid 1000 --gid 1000 -m $USERNAME
## Make sure to reflect new user in PATH
ENV PATH="/home/${USERNAME}/.local/bin:${PATH}"
USER $USERNAME

## Pip dependencies
# Upgrade pip
RUN pip install --upgrade pip
# Install production dependencies
COPY --chown=nonroot:1000 requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt && \
    rm /tmp/requirements.txt
# Install development dependencies
COPY --chown=nonroot:1000 requirements-dev.txt /tmp/requirements-dev.txt
RUN pip install -r /tmp/requirements-dev.txt && \
    rm /tmp/requirements-dev.txt
```

## Three environments 🎁

<span style="color:#aaa;">The above idea sets us up for having 3 different images for our entire lifecycle. One for Development, one for CI, and finally one for production.</span>



![three-environments-docker-images-devcontainer](./images/presentation/three-environments.png)

<!-- https://excalidraw.com/#json=xiyT30e-6fLScXqtXOx9-,pqW03Hw6D6FFb-0Q06VTQw -->

## Going further 🔮



- Devcontainer [features](https://containers.dev/features)

- Devcontainer [templates](https://containers.dev/templates)

- [Mounting directories ](https://code.visualstudio.com/remote/advancedcontainers/add-local-file-mount)

    💡 Pro tip: mount your AWS/GCP/Azure credentials


... and much more:


### Awesome resources

- [devcontainers/ci](https://github.com/devcontainers/ci). Run your CI in your Devcontainers. Built on the [Devcontainer CLI](https://github.com/devcontainers/cli).
- [https://containers.dev/](https://containers.dev/). The official Devcontainer specification.
- [devcontainers/images](https://github.com/devcontainers/images). A collection of ready-to-use Devcontainer images.
- [Add a non-root user to a container](https://code.visualstudio.com/remote/advancedcontainers/add-nonroot-user). More explanations & instructions for adding a non-root user to your `Dockerfile` and `devcontainer.json`.
- [Pre-building dev container images](https://code.visualstudio.com/docs/remote/containers#_prebuilding-dev-container-images)
- [awesome-devcontainers](https://github.com/manekinekko/awesome-devcontainers). A repo pointing to yet even more awesome resources.


## Concluding
Devcontainers have proven useful to: 

- 🔄 Reproducible development environment
- ⚡️ Faster project setup → faster onboarding
- 👨‍👩‍👧‍👦 Better alignment between team members
- ⏱ Forced to keep your dev environment up-to-date & reproducible
    - → saves your team time going into **production** later


Now only VSCode, but [open specification](https://containers.dev/) taking shape.



## Thanks! 🙌🏻


<table>
<tr>
<td>
<img src="./images/presentation/cat.webp" alt="moving cat gif" />
</td>
<td>
<img src="./images/presentation/qrcode_github_repo.png" alt="python-devcontainer-template github QR code link" width="400px" /><br>
<span style="text-align:center;">
Repo: <a href="https://github.com/godatadriven/python-devcontainer-template">godatadriven/python-devcontainer-template</a>
</span>
</td>
</tr>
</table>



## About

This blogpost is written by [Jeroen Overschie](https://www.github.com/dunnkers), working at [GoDataDriven](https://godatadriven.com/).