Madminer workflow

About

This repository serves as wrapper around Madminer physics workflow and Madminer ML workflow to construct a single, linked workflow to be executed in REANA.

Both workflows are defined as GIT submodules in this repository. Submodules allow us to combine contents from different repositories when contents of both are necessary to perform some complex operation, but they are, indeed, different projects.

Workflow definition

The workflow specification is composed by 2 sub-workflows:

Physics workflow: generating events.
ML workflow: analysing those events.

The combined workflow has this shape:

Development

To install all the source code that is necessary to operate with this project:

git clone --recurse-submodules https://github.com/madminer-tool/madminer-workflow

For cases where the project has already been cloned:

git submodule update --init --recursive

The repositories defined as sub-modules will follow their own development pace. For cases where the sub-module repositories has been updated on GitHub, and want to propagate those changes to your local copy of the repositories:

git submodule update --remote

MLFlow ♻️

The MLFlow framework has been integrated with some steps of the workflow in order to keep track of runs initial set of parameters, set of results, and generated artifacts.

In order to locally deploy your own:

# Deploy local tracking server
mlflow server \                                                 
    --host "0.0.0.0" \
    --port 5000 \
    --workers 2 \
    --backend-store-uri "file:///tmp/mlflow/runs/metadata" \
    --default-artifact-root "file:///tmp/mlflow/runs/artifacts"

# Specify server URL to interact with it
export MLFLOW_TRACKING_URI="http://0.0.0.0:5000"

# Create experiments to avoid race conditions on parallelized steps.
mlflow experiments create --experiment-name "madminer-ml-sample"
mlflow experiments create --experiment-name "madminer-ml-train"
mlflow experiments create --experiment-name "madminer-ml-eval"

Execution

The full workflow can be launched using Yadage. Yadage is a YAML specification language over a set of utilities that are used to coordinate workflows. Please consider that it can be hard to define Yadage workflows as the Yadage documentation is incomplete. For learning about Yadage hidden features contact Lukas Heinrich, Yadage creator.

Yadage execution depends on having both Docker environment images (physics and ML) already pushed. If they are not, please follow the instructions on the Madminer physics workflow and Madminer ML workflow repositories.

Once the Docker images are available on DockerHub, run locally:

export MLFLOW_TRACKING_URI="http://host.docker.internal:5000"
export PACKTIVITY_DOCKER_CMD_MOD="--add-host host.docker.internal:host-gateway"  # Linux only
make yadage-run

Deployment

Local debugging

To debug the workflow locally using REANA first install Docker and the kind CLI tool (Kubernetes in Docker) to deploy a local cluster. Please follow the local deployment documentation to set up REANA.

To start the workflow:

$ source ~/.virtualenvs/reana/bin/activate
(reana) $ eval $(reana-dev client-setup-environment)
(reana) $ export REANA_WORKON=madminer-workflow
(reana) $ export MLFLOW_TRACKING_URI=http://host.docker.internal:5000
(reana) $ make reana-run

Remote deployment

In case you have access to a remote REANA cluster and want to deploy there, you would need to set up the environment variables yourself:

$ source ~/.virtualenvs/reana/bin/activate
(reana) $ export REANA_ACCESS_TOKEN=[..]
(reana) $ export REANA_SERVER_URL=[..]
(reana) $ export REANA_WORKON=madminer-workflow
(reana) $ export MLFLOW_TRACKING_URI=<tracking_server_url>
(reana) $ make reana-run

It might take some time to finish depending on the job and the cluster. Once it does, list and download the files:

(reana) $ reana-client ls
(reana) $ reana-client download <path/to/file/on/reana/workon>

Name		Name	Last commit message	Last commit date
Latest commit History 283 Commits
.github		.github
docs/images		docs/images
modules		modules
reana		reana
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Madminer workflow

About

Workflow definition

Development

MLFlow ♻️

Execution

Deployment

Local debugging

Remote deployment

About

Releases

Packages

Contributors 2

Languages

License

madminer-tool/madminer-workflow

Folders and files

Latest commit

History

Repository files navigation

Madminer workflow

About

Workflow definition

Development

MLFlow ♻️

Execution

Deployment

Local debugging

Remote deployment

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages