Skip to content
Branch: master
Find file History
dbczumar Improve docs and example README for MLflow Docker projects (#970)
* Env docs

* Docker docs and project docs refactor

* Spacing

* Assorted example README improvements

* Readme RST format improvements

* Example readme updates

* Link cleanup

* Revert unrelated file changes

* Remove commented code

* Phrasing cleanup

* Style edit.

* A few more edits.

* Add sentence for context

* Fix phrasing to indicte that MLflow automatically performs an action

* Clarify README image name requirements

* Add information about docker environment specification in MLProject file

* Make parallel content have parallel structure.

* Clarify that the directory gets COPY'd

* Fix directory formatting.
Latest commit db34340 Mar 12, 2019
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
.dockerignore
Dockerfile Run MLProjects on docker containers (#555) Jan 18, 2019
MLproject Run MLProjects on docker containers (#555) Jan 18, 2019
README.rst Improve docs and example README for MLflow Docker projects (#970) Mar 12, 2019
train.py Run MLProjects on docker containers (#555) Jan 18, 2019
wine-quality.csv Run MLProjects on docker containers (#555) Jan 18, 2019

README.rst

Dockerized Model Training with MLflow

This directory contains an MLflow project that trains a linear regression model on the UC Irvine Wine Quality Dataset. The project uses a Docker image to capture the dependencies needed to run training code. Running a project in a Docker environment (as opposed to Conda) allows for capturing non-Python dependencies, e.g. Java libraries. In the future, we also hope to add tools to MLflow for running Dockerized projects e.g. on a Kubernetes cluster for scale out.

Structure of this MLflow Project

This MLflow project contains a train.py file that trains a scikit-learn model and uses MLflow Tracking APIs to log the model and its metadata (e.g., hyperparameters and metrics) for later use and reference. train.py operates on the Wine Quality Dataset, which is included in wine-quality.csv.

Most importantly, the project also includes an MLProject file, which specifies the Docker container environment in which to run the project using the docker_env field:

docker_env:
  image:  mlflow-docker-example

Here, image can be any valid argument to docker run, such as the tag, ID or URL of a Docker image (see Docker docs). The above example references a locally-stored image (mlflow-docker-example) by tag.

Finally, the project includes a Dockerfile that is used to build the image referenced by the MLProject file. The Dockerfile specifies library dependencies required by the project, such as mlflow and scikit-learn.

Running this Example

First, install MLflow (via pip install mlflow) and install Docker.

Then, build the image for the project's Docker container environment. You must use the same image name that is given by the docker_env.image field of the MLproject file. In this example, the image name is mlflow-docker-example. Issue the following command to build an image with this name:

docker build -t mlflow-docker-example -f Dockerfile .

Note that the name if the image used in the docker build command, mlflow-docker-example, matches the name of the image referenced in the MLProject file.

Finally, run the example project using mlflow run examples/docker -P alpha=0.5.

What happens when the project is run?

Running mlflow run examples/docker builds a new Docker image based on mlflow-docker-example that also contains our project code. The resulting image is tagged as mlflow-docker-example-<git-version> where <git-version> is the git commit ID. After the image is built, MLflow executes the default (main) project entry point within the container using docker run.

Environment variables, such as MLFLOW_TRACKING_URI, are propagated inside the container during project execution. When running against a local tracking URI, MLflow mounts the host system's tracking directory (e.g., a local mlruns directory) inside the container so that metrics and params logged during project execution are accessible afterwards.

You can’t perform that action at this time.