# MLOPs Parsl workflow

This notebook is the stand-alone companion to the Parsl MLOPs workflow in `main.py` in this repository. This notebook is designed to be run directly on an HPC resource while the `main.py` in this workflow uses the `parsl_utils` to launch MLOPs applications from a central coordinating node (i.e. a laptop or the Parallel Works platform). This workflow simulates a typical MLOPs situation with the following tasks:
1. start an MLFlow tracking server
2. start DVC tracking within an architve repository + remote
3. download and preprocess training data
4. run training loop and store results on-the-fly with MLFlow
5. commit and push resulting models with DVC to repo + remote
6. use the model for inference and generate figures.
7. reusing the model for inference and generating figures


## Installs

In [None]:
# Conda does not install monitoring, so use pip.
#! conda install -y -c conda-forge parsl

! pip install 'parsl[monitoring, visualization]'

## Imports

Based on the instructions in the [Parsl Tutorial](https://parsl.readthedocs.io/en/latest/1-parsl-introduction.html)

In [None]:
import parsl
import os
from parsl.app.app import python_app, bash_app
from parsl.configs.local_threads import Config

# We want to use monitoring, so we must use HTEX
from parsl.executors import HighThroughputExecutor
from parsl.monitoring.monitoring import MonitoringHub
from parsl.addresses import address_by_hostname
import logging

#parsl.set_stream_logger() # <-- log everything to stdout

print(parsl.__version__)

# Configure Parsl

This configuration must use the HTEX since we also want to enable [Parsl monitoring](https://parsl.readthedocs.io/en/latest/userguide/monitoring.html).

In [None]:
config = Config(
   executors=[
       HighThroughputExecutor(
           label="local_htex",
           cores_per_worker=1,
           max_workers_per_node=2,
           address=address_by_hostname(),
       )
   ],
   monitoring=MonitoringHub(
       hub_address=address_by_hostname(),
       hub_port=55055,
       monitoring_debug=False,
       resource_monitoring_interval=10,
   ),
   strategy='none'
)

parsl.load(config)

## Define Parsl apps

Parsl workflows are divided into the smallest unit of execution, the app. There are two types of Parsl apps:
1. Python apps are useful when launching pure Python code (i.e. TensorFlow)
2. Bash apps are useful when launching tasks on the command line (i.e. starting the MLFlow server)

Here, the applications are *defined* but not run.

In [None]:
@python_app
def slow_hello ():
    import time
    time.sleep(5)
    return 'Hello World from slow Python app!'

@bash_app
def echo_hello(stdout='echo-hello.stdout', stderr='echo-hello.stderr'):
    return 'echo "Hello World from fast Bash app!"'

## Start Parsl Monitoring

In [None]:
# Need to figure out how to run this in background, just like MLFlow server
! parsl-visualize

## Run the workflow

The workflow code below runs the applications.

In [None]:
# Example Python app
future = slow_hello()

print(slow_hello().result())

# Example Bash app
future = echo_hello()

echo_hello().result()

with open('echo-hello.stdout', 'r') as f:
     print(f.read())

## Parsl monitoring

## Clean up some log files

In [None]:
# Application logs
! rm echo-hello.stdout
! rm echo-hello.stderr

# This directory contains Parsl monitoring along with other logs
! rm -rf runinfo