# Docker 
2024-04-17 <br>
[Per Halvorsen](https://perhalvorsen.com) | [GitHub](https://github.com/pmhalvor/ocean-species-identification) | [LinkedIn](https://www.linkedin.com/in/pmhalvor/)

---

## Abstract
In this note, we want to explore the basics of containerization with Docker.
To start the note off, some background terminology needed to understand these technologies is covered, basically summarizing the [Docker overview](https://docs.docker.com/get-started/overview/). 
From here, we introduce a small example project and go through the necessary steps to build custom images suited specifically to the needs of the project.
Iterating on these simpler images, we then wrap a simplified machine learning model with a GUI into a container, and show how to run and interact with this locally. 
The note rounds off with some tips on how such a model would be deployed to a server. 

# Outline 
Introduction
* Problem definition
    * Scope 
    * Fields 
* Best solution
    * Basic idea of containerization
    * Runnable anywhere
    * Opportunities: local development runnable on cloud, or easily portable app, like a tool-box. 

Background
* Vocabulary: summarized notes from [Docker overview](https://docs.docker.com/get-started/overview/)
* Diagrams: from tutorial or draw own
* Local requirements
    * Docker Desktop
    * Docker Hub account
    * Installation
* Basic starter commands 
    * Browsing images 
    * Pushing and pulling 
    * Run examples 
    * Build custom images

Our simple example
* Use the img2vid.py file to start
* Bake into image and run as executable 
* Bake into a Gradio app, and run locally
* (Attempt) Run from server, or just mention and say “more on that later”

<!-- A template project
* Bare bones
* Scripts to build project  -->
<!-- 
Our complex example
* Prepare the image 
    * All repos necessary 
    * All configs necessary  -->

Conclusion
* Recap
* Future work
    * Deploy to server
    * More complex scenarios
        * Lightweight images using multistage build or alpine
        * More complex models: multiple stage pipelines or multiple models
        * Running multiple containers using docker-compose
        * Networking and volumes using docker-compose
        * Dockerfile best practices


# Introduction

## Problem
In software development, it is often necessary to run code on different machines, with different operating systems, and with different dependency versions.
The underlying code is the same, but the environment in which it runs can vary significantly.
Some may argue that well-written code should be executable on any machine, but reality is not always that simple.

Often, systems require specific drivers, modules, or libraries in order to run, or the code is written in a language that is not natively supported by the target machine.
[OS-level virtualization](https://en.wikipedia.org/wiki/Operating-system-level_virtualization) makes it possible to run multiple (even nested) operating systems on a single physical machine, providing the essential first steps towards solving the problem of cross-platform collaboration. 

However, even with virtual machines, the code still needs to be packaged and shipped with all its dependencies.
Dependency management within these systems can be a hassle, especially as out-dated libraries become vulnerable to security threats, or as new versions of libraries break compatibility with existing code.

Many Python projects nowadays make use of package managers like [pip](https://pypi.org/project/pip/) or [conda](https://docs.conda.io/en/latest/) together with a `requirements.txt` file to manage and specify the exact versions of needed dependencies. 
Similar set-ups exist for other languages too, like [maven](https://maven.apache.org/) for Java/Scala or [cargo](https://doc.rust-lang.org/cargo/) for Rust, but these are not always enough to ensure that the code runs as expected on all machines.
Even with these package managers, the target machine would still need to have the correct version of the language runtime installed, which introduces another potential source of conflicts.

Transitioning from technical, project development to high-level, product management, the necesseity for to code run the same way on different machines becomes even more critical.
Whether the product needs to be deployed a server which is a different OS than the development team's local machine, or the product is to be handed over as a tool to a client, the code should always run as expected, without any issues.


## Solution
To solve this problem, we need a way to package code and its dependencies into a single, self-contained unit that can be run anywhere.
This is where containerization comes in.

Containerization is a lightweight, portable, and self-sufficient way to package software.
Containers are isolated environments that contain everything needed to run an application, including the code, runtime, system tools, libraries, and settings.
This makes it easy to run the same code on different machines, without worrying about compatibility issues or dependencies.




# References

1. [Docker overview](https://docs.docker.com/get-started/overview/)
2. [OS-level virtualization](https://en.wikipedia.org/wiki/Operating-system-level_virtualization)
3. [What is containerization](https://www.ibm.com/topics/containerization)
4. [Docker](https://www.docker.com/)


