## Welcome to Data Engineering Workshop

# Orientation & Setup

## Your computer
* Docker <br />
    * [Linux](https://docs.docker.com/install/) 
    * [Mac OS](https://docs.docker.com/docker-for-mac/) 
    * [Windows](https://docs.docker.com/docker-for-windows/) 

## You
* Linux/Unix skills
* Command Line Interface (CLI)
* Editing files
* Python programming

## What is a container? <br>
["A standard Unit of Software"](https://www.docker.com/resources/what-container) that runs on top of an operating system's kernel.

**Most recent container technologies**
* [OpenVZ](https://openvz.org/)
* [Solais Zones](https://docs.oracle.com/cd/E18440_01/doc.111/e18415/chapter_zones.htm#OPCUG426)
* [Linux containers](https://linuxcontainers.org/)

## Main idea -- isolate the computing environment<br>
   * Allow regenerating computing environments
   * Allow sharing your computing environments

## Docker <br>
* an open-source project that automates the deployement of applications into containers
* According to the Docker documentation, containers are:
    * flexible - you can containerize complex applications
        * <span style="color:green">Although some might disagree: they can generally run the same or similar guest operating systems as the underlying host</span>
    * lightweight - you can leverage and share the host kernel
    * interchangeable - you can deploy updates and upgrades on-the-fly
    * portable - you can build locally, deploy to the cloud, and run anywhere
    * scalable - you can increase and automatically distribute container replicas
   

## Why do we need containers? <br>

* Each Project in a lab depends on complex software environments
    * Operating system
    * drivers
    * software dependencies: Python/MATLAB/R+libraries
* To avoid
    * you <span style="color:red">cannot</span> make updates to this computer EVER or you will not be able to run that application that your research depends on.
    * worked in development, now an operations problem
    * it works on my computer...
* Collaboration with others
    * Sharing your code or using a repository or jupyter notebook might not be enough
* Freedom to experiment!
   * Building and testing complex application locally before launching into a production environment
   
* list of projects built with Docker can be found [here](https://blog.docker.com/2013/07/docker-projects-from-the-docker-community/)

## Docker Terminology <br>
**images** -- an executable package that includes everything needed to run an application -- the code, a runtime, libraries, environment variables, and configuration files <br>
**containers** -- runtime instance of an image<br>
**Dockerfile** -- an automated setup file that defines what goes in the environment inside your computer.<br>
**Docker Hub** -- repositories to share Docker images


## Testing your Docker Installation: <br>

1. Run ```docker info``` or (```docker version``` without ```--```) to view even more details about your docker installation
2. Run ```docker run hello-world``` in a CLI to confirm that you have successfully installed docker
3. Run ```docker image ls``` to list the images that are downloaded to your machine

## Package and run a custom app using Dockerfile

```
# Use an official Python runtime as a parent image
FROM python:3.7.0-slim

# Set the working directory to /app
WORKDIR /app

# Copy the current directory contents into the container at /app
ADD . /app

# Install any needed packages specified in requirements.txt
RUN pip install --trusted-host pypi.python.org -r requirements.txt

# Make port 80 available to the world outside this container
EXPOSE 80

# Define environment variable
ENV NAME World

# Run app.py when the container launches
CMD ["python", "app.py"]

```

## Create requirements.txt <br>

Flask<br> 

## Create app.py
```
from flask import Flask
import os
import socket

app = Flask(__name__)

@app.route("/")
def hello():

    html = "<h3>Hello {name}!</h3>" \
           "<b>Hostname:</b> {hostname}<br/>" \

    return html.format(name=os.getenv("NAME", "world"), hostname=socket.gethostname())

if __name__ == "__main__":
        app.run(host='0.0.0.0', port=80)

```

## Build the app
```
pip install -r requirements.txt

docker build -t friendlyhello .
```

## Run the app
```
docker run -p 4000:80 friendlyhello
```

Go to ``` http://localhost:4000 ``` in a web browser or type ``` curl http://localhost:4000```

Run the app in the background, in detached mode:
```
docker run -d -p 4000:80 friendlyhello
```

## Stop the app
```
docker container ls -a

docker container stop CONTAINER ID
```

## Share your image

If you don’t have a Docker account, sign up for one at [hub.docker.com](https://hub.docker.com/). Make note of your username.<br><br>
**Log in with your Docker ID**
```
 docker login
 ```
**Tag the image**
```
docker tag image username/repository:tag
```
**Verify new image**p
```
docker image ls 
```
**Publish the image**
```
docker push username/repository:tag
```
**Pull and run the image from the remote repository**
```
docker run -p 4000:80 username/repository:tag
```

# Break

## Interesting tutorials and blog posts:

* [A beginner friendly intro to VMs and Docker](https://medium.freecodecamp.org/a-beginner-friendly-introduction-to-containers-vms-and-docker-79a9e3e119b)
* [Intro to Docker from Neurohackweek](https://neurohackweek.github.io/docker-for-scientists/)
* [Understanding Images](https://code.tutsplus.com/tutorials/docker-from-the-ground-up-understanding-images--cms-28165)
* [Cloud Native Trail Map](https://raw.githubusercontent.com/cncf/landscape/master/trail_map/CNCF_TrailMap_latest.png)
* [Cloud Native Landscapes](https://raw.githubusercontent.com/cncf/landscape/master/landscape/CloudNativeLandscape_latest.png)
