# Demonstration Parsl workflow for multiple ensemble members

This notebook is a stand-alone sandbox to explore a Parsl workflow that sets up multiple ensemble member apps that depend on initial and final apps. This notebook is designed to be run directly on an HPC resource.

## Workflow visualization

There are three options for visualizing a Parsl workflow:
1. Manual "direct" launch of `parsl-visualize` before workflow runs;
2. Visualization launched as part of the workflow (i.e. self-launch); and
3. Offline visualization (i.e. for browsing previous Parsl workflows or starting visualization completely independently of the workflow).

Examples for all three are provided below but options \#1 and \#3 are commented out.

## Interactive usage

This notebook can be used via a `JupyterLab` interactive session on a cluster head node. The visualization can be accessed via a `Desktop` interactive session on the same cluster head node. You will probably need to bootstrap the `parsl` Conda environment first (see below), select the `parsl` kernel for the notebook, and then restart the notebook to use Parsl functionality.

## Workflow parameters

The key customizable parameters for this workflow are defined immediately below. For a fully automated workflow (i.e. non-interactive workflow in `main.py`), these parameters are typically specified in the workflow launch form (and corresponding `.json` package for API launch) and then they make it to the command line launch of `main.py` on the head node of the cluster.

In [None]:
# Workflow parameters

# Bootstrap installation info
install=False
install_from_scratch=False
conda_base_path="~/pw/software/.miniconda3c/"
conda_env_name="parsl"

# App workdir path info
param_log_dir='./parsl-app-logs'

# Ensemble size and other solar forcing parameters
param_ens_size=10

## Installs

There are two install options:
1. `install_from_scratch = True` documents the steps to build a particular environment
2. `install_from_scratch = False` is faster to reconstruct a Conda environment from an exported env file `.yaml` than to rebuild from scratch and promotes reproducibility.

The reconstruction command is kept active here since 
env files are distributed with this notebook. Once the command to reconstruct
the Conda environment has been run, you may need to tell
this notebook to use the kernel from that Conda environment
with the `Kernel > Change kernel...` option in the menu above.

In [None]:
# You don't need to rerun the install if the environment has already
# been built and selected as the kernel for this notebook.
if (install):
    if (install_from_scratch):
        
        # Currently there is a dependency bug with ipykernel and Python 3.13,
        # so pin to a different Python.
        ! conda create -y --name {conda_env_name} python=3.9
        
        # To use a Jupyter notebook with a
        # specific conda environment:
        ! conda install -y --name {conda_env_name} requests
        ! conda install -y --name {conda_env_name} ipykernel
        ! conda install -y --name {conda_env_name} -c anaconda jinja2
        
        # pip installs
        # Conda does not install monitoring, so use pip
        # Each Conda env has its own pip, so need to activate.
        ! source {conda_base_path}/etc/profile.d/conda.sh; conda activate {conda_env_name}; pip install --upgrade pip
        ! source {conda_base_path}/etc/profile.d/conda.sh; conda activate {conda_env_name}; pip install 'parsl[monitoring, visualization]'
        
        # The environment was then exported with:
        ! conda env export --name {conda_env_name} > ./requirements/{conda_env_name}.yaml
    else:
        # You can rebuild the environment with:
        ! conda env update -f ./requirements/{conda_env_name}.yaml --name {conda_env_name}


## Imports

Based on the instructions in the [Parsl Tutorial](https://parsl.readthedocs.io/en/latest/1-parsl-introduction.html)

In [None]:
import os
import numpy as np
#import pandas as pd

# parsl dependencies
import parsl
import logging
from parsl.app.app import python_app, bash_app
from parsl.configs.local_threads import Config
from parsl.executors import HighThroughputExecutor # We want to use monitoring, so we must use HTEX
from parsl.executors import MPIExecutor # MPIExecutor wraps the HTEX; need to test if can be used with monitoring
from parsl.monitoring.monitoring import MonitoringHub
from parsl.addresses import address_by_hostname
from parsl.providers import SlurmProvider, LocalProvider

# to display Parsl monitoring GUI in notebook
# Experimental - does not work yet
from IPython.display import IFrame

#=================================================
# Log everything to stdout (ends up in pink boxes 
# in the notebook). This information is logged anyway
# in ./runinfo/<run_id>/parsl.log. Careful - this has
# the potential to slow the notebook down significantly
# for complex workflows.
# parsl.set_stream_logger() # <-- log everything to stdout
#==================================================

print(parsl.__version__)

## Configure Parsl

This configuration must use the `HighThroughputExecutor` (HTEX) since we also want to enable [Parsl monitoring](https://parsl.readthedocs.io/en/latest/userguide/monitoring.html).

In [None]:
config = Config(
    retries=3,
    executors=[
        # Use slurm_htex for running SLURM jobs on the worker nodes
        HighThroughputExecutor(
            label="slurm_htex",
            # cores_per_worker is often more general than cores_per_node
            # and often allows Parsl scale out multiple Parsl-workers on
            # a single worker-node in the most flexible way (i.e. worker
            # nodes of different sizes on different CSPs).
            cores_per_worker=4,
            #max_workers_per_node=3,
            address=address_by_hostname(),
            provider=SlurmProvider(
                partition='big',
                nodes_per_block=1,
                #cores_per_node=4, # Remove this for now - is one core reserved for the Parsl worker?
                init_blocks=2,
                min_blocks=2,
                max_blocks=10,
                exclusive=True,
                worker_init="source "+conda_base_path+"/etc/profile.d/conda.sh; conda activate "+conda_env_name,
                # I don't know why, but Parsl is refusing to relaunch
                # Parsl workers via SLURM. Perhaps the default walltime
                # of 30 mins is a frim, absolute, limit whereas in the
                # past I *think* it has been treated as a per-pilot job limit
                # and new workers would get called up as needed. If this
                # is a firm limit, then Parsl doesn't know that it's happened:
                # once the workers are down, Parsl keeps trying to launch jobs!
                walltime="01:00:00")
        )
    ],
    # If this part of the is not present, no 
    # visualization information is gathered.
    monitoring=MonitoringHub(
        hub_address=address_by_hostname(),
        hub_port=55055,
        monitoring_debug=False,
        resource_monitoring_interval=10,
    ),
    strategy='none'
)

# Loading the configuration starts a Parsl DataFlowKernel
# Pilot jobs are also automatically launched at this point 
# if init_blocks and min_blocks are non-zero. You can
# verify this by checking for files in ./runinfo/<run_id>/submit_scripts/
# as well as monitoring your SLURM allocation with squeue.
dfk = parsl.load(config)

## Define Parsl apps

Parsl workflows are divided into the smallest unit of execution, the app. There are two types of Parsl apps:
1. `python_app`s are useful when launching pure Python code. They are also particularly useful if you want to pass a *small* amount of output from a running app directly back into the workflow. For example, `make_dir` below could be set up as a `python_app` or a `bash_app`. But, I choose to make it a `python_app` because I want the value of `my_dir` to be made explicitly available to the workflow via the `.result()` of the app. This is not possible with a `bash_app`.
2. `bash_app`s are useful when launching tasks on the command line

Here, the applications are *defined* but not run. The `@python_app` and `@bash_app` decorators are the "flags" that tell Parsl that these functions are special and need to be tracked as part of the workflow. Undecorated functions execute locally as regular Python in whatever runtime the notebook is in.

### Python Apps

In [None]:
@python_app # make directory to keep all files associated with the ML model
def make_dir(my_dir):
    import os
    os.makedirs(my_dir, exist_ok = False)
    return my_dir

### Bash Apps

In [None]:
@bash_app # Start the parsl visulaizer
def start_parsl_visualize(
    stdout='parsl_vis_app.stdout', 
    stderr='parsl_vis_app.stderr'):
    return 'parsl-visualize --listen 127.0.0.1 --port 8080'

In [None]:
@bash_app # Test broadcasting jobs across the worker nodes
def worker_hello(enforce, stdout='worker_hello_app.stdout', stderr='worker_hello_app.stdout'):
    return '''
    date
    hn=`hostname`
    echo Host: $hn
    pwd
    whoami
    '''

@bash_app # Test multiple peer dependencies
def count_workers(inputs=(), log_dir="./logs", stdout='count_workers_app.stdout', stderr='count_workers_app.stdout'):
    return '''
    echo `cat {log_dir}/parsl_hello_app.stdout | col | grep Host | sort | uniq -c`
    '''.format(
        log_dir=log_dir
    )

## Start Parsl monitoring - Option 1 - direct shell invocation to background

This step can be done at any point provided that a database file exists.  The default location of this file is in `./runinfo/monitoring.db` and this file is created when the Parsl configuration is loaded. When the notebook kernel is restarted, additional Parsl workflow runs' information is appended to the monitoring information in `./runinfo`. It is possible to view this information "offline" (i.e. no active running Parsl workflows) and also fully independently of this notebook (see Option 3, at the end of this notebook for how to specify custom locations fort the monitoring DB). For our purposes here, we will assume that we're using a monitoring DB in the same directory as the notebook runtime in `./runinfo`.

This launch can be commented out here since it is also possible to launch `parsl-visualize` from a Parsl app within the workflow, examples of which are below. The advantage to running `parsl-visualize` as a Parsl app is that the visualization server is up and running while the workflow is running and then is shut down when the workflow is cleaned up. Otherwise, when `parsl-visualize` is launched via `os.system` the running child process can persist even after workflow shut down or notebook kernel restart. Here, however, we opt not to use `parsl-visualize` in the workflow because we want only one executor for this workflow for simplicity (to send jobs to the SLURM scheduler) but `parsl-visualize` would need to be started on a local (to the head node) executor.

You can rerun this command even if `parsl-visualize` is already running because it checks for existing port usage and if that port is already in use, it fails silently here (on the command line it will give you an error).

In [None]:
# Launch Parsl 
os.system('parsl-visualize 1> parsl_vis.stdout 2> parsl_vis.stderr &')

## Parsl Workflow starts here

### Make a directory for workflow app logs

As part of the workflow, we make a directory for all the app logs. It looks like the first Parsl app to be invoked will start the Parsl interchanges and pilot jobs. This means that you may need to wait some time for this first app to start if you have a queue/cloud spinup wait time associated with getting an allocation of worker nodes **even if this first app is NOT going to worker nodes**!

Note that the use of `*_future.result()` blocks the notebook execution (i.e. the cell stays in pending state) until the result of the app future is realized. You can, however, access the state of the future with `*_future` without blocking the workflow. If you don't invoke `*_future.result()`, then the notebook execution contines and builds out the whole Parsl DAG until you reach the first blocking cell.

In [None]:
# Launch the app
make_log_dir_future = make_dir(param_log_dir)

# Get the result
log_dir = make_log_dir_future.result()

### Start Parsl monitoring - Option 2 - Monitoring as an app within a Parsl workflow

This approach is helpful if we want Parsl Monitoring processes to be cleaned up after the workflow is complete. Note that this command is tracked by Parsl and is considered to be part of the workflow. Since we defined the app to use the `local_htex` above, it is running on the head node of the cluster. You can verify this placement in the terminal with `ps -u $USER -HF -ww | grep vis`.

In [None]:
# Start Parsl visualization in a
# separate cell since we only want
# to run this app one time.
#parsl_vis_future = start_parsl_visualize(
    #make_log_dir_future, 
#    stdout=log_dir+'/parsl_vis_app.stdout', 
#    stderr=log_dir+'/parsl_vis_app.stderr')

In [None]:
# I'd love to view Parsl monitoring in the notebook,
# but this doesn't work.
# IFrame('http://localhost:8080', width=600, height=500)

### Example parallel ensemble Parsl workflow

The cells below run a very simple parallel workflow example. See below for a workflow running TIEGCM.

For development, it's often nice to separate each Parsl app in it's own cell so you can more easily see error messages. Otherwise, error messages can be obscured. Note that the `worker_hello` app is helpful for testing broadcasting an application to many nodes/workers. It is important to ensure that you make a **list** of app futures when launching many Parsl apps - if you overwrite a Parsl future of a still-running application, Parsl tends to get confused and blocks those applications in `launched` state but cannot proceed to update overwritten futures.

In [None]:
# Example of launching 100 parallel tasks in Parsl
# Note that this example is simple but somewhat problematic:
# all the Parsl workers will be writing in parallel to the
# SAME stdout and stderr. This results in lots of strange
# control characters in those two files and even 
# interrupted/partially overwritten lines -> this operation 
# is definitely NOT thread safe. BUT, it sure is easy to do!
future_list=[]
for ii in np.linspace(1,100,100):
    future_list.append(
        worker_hello(
            make_log_dir_future,
            stdout=log_dir+'/parsl_hello_app.stdout',
            stderr=log_dir+'/parsl_hello_app.stderr'))
    

count_future = count_workers(
    inputs=future_list,
    log_dir=log_dir,
    stdout=log_dir+'/parsl_count_app.stdout',
    stderr=log_dir+'/parsl_count_app.stderr')

## Stop Parsl

The cells above can be rerun any number of times; this will simply send more and more apps to be run by Parsl. When the workflow is truly complete, it is time to call the cleanup() command. This command runs implicitly when a `main.py` script finishes executing, but it is *not* run in a notebook unless it is explicitly called as it is below.

In [None]:
dfk.cleanup()

## Clean up Parsl log files

In [None]:
# This directory contains Parsl monitoring logs
! rm -rf runinfo

# This directory contains the Parsl app logs
! rm -rf {log_dir}

## Start Parsl Monitoring - Option 3 - Post workflow manual invocation

Once the Parsl `./runinfo/monitoring.db` is created, it is possible to start Parsl Monitoring and browse the results of workflow in an offline manner.  In this scenario, `parsl-visualize` can be started on the command line provided that a Conda env with `parsl[visualize]` installed is activated. For example:
```
source pw/.miniconda3/etc/profile.d/conda.sh
conda activate base
parsl-visualize sqlite:////${HOME}/<work_dir>/runinfo/monitoring.db
```
(You may need to adjust the path to the Conda environment, its name, and the path to `monitoring.db`.)