# Command Line Tools

As mentioned before, running Pegasus is in a Jupyter notebook is very convenient for tutorials and for smaller workflows, but production workflows are most commonly submitted on dedicated HTCondor submit nodes using command line tools. This section of the tutorial uses the same workflow as we have seen in the previous sections, generated inside the notebook. Planning, submitting and checking status will be done using the command line tools.

First, execute the following cell to generate the workflow. Note that we are just writing it out at the end.

## 0. Set Jupyter Environment

We need to set PYTHONPATH for Pegasus libraries to be imported successfully in the notebooks. 

In [None]:
pegasus_python_path=!pegasus-config --python 
import sys
sys.path.append(pegasus_python_path.pop())

## 1. Create the Workflow

**Specify your SLURM information**

At a minimum, you need to specify some variables that declare
* the project/account under which your jobs run
* the slurm partition to which the jobs should be submitted to.

In [None]:
# some variables for slurm cluster. 
# Please update according per your cluster
slurm_partition="XXXXX"
slurm_account="YYYYY"

In [None]:
import logging

from pathlib import Path

from Pegasus.api import *

logging.basicConfig(level=logging.DEBUG)
BASE_DIR = Path(".").resolve()
EXECUTABLES_DIR = Path(BASE_DIR / ".." / ".." / "executables").resolve()

# --- Properties ---------------------------------------------------------------
props = Properties()
props["pegasus.monitord.encoding"] = "json"                                                                    
props["pegasus.catalog.workflow.amqp.url"] = "amqp://friend:donatedata@msgs.pegasus.isi.edu:5672/prod/workflows"
props["pegasus.mode"] = "tutorial" # speeds up tutorial workflows - remove for production ones
props.write() # written to ./pegasus.properties 

# --- Replicas -----------------------------------------------------------------
with open("f.a", "w") as f:
   f.write("This is sample input to KEG")

fa = File("f.a").add_metadata(creator="ryan")
rc = ReplicaCatalog().add_replica("local", fa, Path(".").resolve() / "f.a")

# --- Transformations ----------------------------------------------------------
preprocess = Transformation(
                "preprocess",
                site="local",
                pfn="{}/pegasus-keg.py".format(EXECUTABLES_DIR),
                is_stageable=True,
                arch=Arch.X86_64,
                os_type=OS.LINUX
            )

findrange = Transformation(
                "findrange",
                site="local",
                pfn="{}/pegasus-keg.py".format(EXECUTABLES_DIR),
                is_stageable=True,
                arch=Arch.X86_64,
                os_type=OS.LINUX
            )

analyze = Transformation(
                "analyze",
                site="local",
                pfn="{}/pegasus-keg.py".format(EXECUTABLES_DIR),
                is_stageable=True,
                arch=Arch.X86_64,
                os_type=OS.LINUX
            ) 

tc = TransformationCatalog().add_transformations(preprocess, findrange, analyze)

# --- Sites -----------------------------------------------------------------
# add a local site with an optional job env file to use for compute jobs
shared_scratch_dir = "{}/LOCAL/work".format(BASE_DIR)
local_storage_dir = "{}/LOCAL/storage".format(BASE_DIR)

# some variables for slurm cluster. you may wish to update
# them for your needs
slurm_partition="main"
slurm_account="hpcsuppt_613"

local = Site("local") \
    .add_directories(
    Directory(Directory.SHARED_SCRATCH, shared_scratch_dir)
        .add_file_servers(FileServer("file://" + shared_scratch_dir, Operation.ALL)),
    Directory(Directory.LOCAL_STORAGE, local_storage_dir)
        .add_file_servers(FileServer("file://" + local_storage_dir, Operation.ALL)))

slurm_scratch_dir = "{}/SLURM/work".format(BASE_DIR)
slurm_storage_dir = "{}/SLURM/storage".format(BASE_DIR)

slurm = Site("slurm")\
    .add_directories(
    Directory(Directory.SHARED_SCRATCH, slurm_scratch_dir)
        .add_file_servers(FileServer("file://" + slurm_scratch_dir, Operation.ALL)),
    Directory(Directory.LOCAL_STORAGE, slurm_storage_dir)
        .add_file_servers(FileServer("file://" + slurm_storage_dir, Operation.ALL)))

slurm.add_pegasus_profile(
                        style="glite",
                        queue=slurm_partition,
                        project=slurm_account,
                        data_configuration="nonsharedfs",
                        auxillary_local="true",
                        nodes=1,
                        ppn=1,
                        runtime=1800,
                        clusters_num=2
                    )
slurm.add_condor_profile(grid_resource="batch slurm")

sc = SiteCatalog()\
   .add_sites(local)\
   .add_sites(slurm)\

# --- Workflow -----------------------------------------------------------------
'''
                     [f.b1] - (findrange) - [f.c1]
                     /                             \
[f.a] - (preprocess)                               (analyze) - [f.d]
                     \                             /
                     [f.b2] - (findrange) - [f.c2]

'''
wf = Workflow("blackdiamond")

fb1 = File("f.b1")
fb2 = File("f.b2")
job_preprocess = Job(preprocess)\
                    .add_args("-a", "preprocess", "-T", "3", "-i", fa, "-o {},{}".format(fb1, fb2))\
                    .add_inputs(fa)\
                    .add_outputs(fb1, fb2)

fc1 = File("f.c1")
job_findrange_1 = Job(findrange)\
                    .add_args("-a", "findrange", "-T", "3", "-i", fb1, "-o", fc1)\
                    .add_inputs(fb1)\
                    .add_outputs(fc1)

fc2 = File("f.c2")
job_findrange_2 = Job(findrange)\
                    .add_args("-a", "findrange", "-T", "3", "-i", fb2, "-o", fc2)\
                    .add_inputs(fb2)\
                    .add_outputs(fc2)

fd = File("f.d")
job_analyze = Job(analyze)\
                .add_args("-a", "analyze", "-T", "3", "-i {},{}".format(fc1, fc2), "-o", fd)\
                .add_inputs(fc1, fc2)\
                .add_outputs(fd)

wf.add_jobs(job_preprocess, job_findrange_1, job_findrange_2, job_analyze)
wf.add_replica_catalog(rc)
wf.add_transformation_catalog(tc)
wf.add_site_catalog(sc)
wf.write()


## 2. Opening the Jupyter terminal

To open a new terminal window, navigate back to the listings tab of Jupyter notebook. This is where you have been opening all the sections from. In the top right corner of the listing, click `New` and then `Terminal`. It looks something like:

![Terminal Start](../images/terminal-start.png)

Once started, arrange your browser tabs/windows side by side so that you can see these instructions and the terminal window at the same time. In the following sections, when you are presented with a `$`, that means it is a command you can type in or copy and paste into the terminal window. Sometimes you have to substitute your own values and that is highlighted with square brackets `[]`.

First, cd to the correct directory:

    $ cd ~/hpc-examples/notebooks/03-Command-Line-Tools/
    
If you run `ls`, you should see these files:

    $ ls
    03-Command-Line-Tools.ipynb
    f.a
    pegasus.properties
    workflow.yml
    
The 3 latter ones were just generated by the cell above.

## 3. Planning and submitting



    $ pegasus-plan --sites slurm --submit workflow.yml
    
In the output of the plan command, you will see a reference to several other Pegasus commands such as pegasus-status. More importantly, a workflow directory was generated for the new workflow instance. This directory is the handle to the workflow instance and used by Pegasus command line tools. Some useful tools to know about:

 * **pegasus-status -v [wfdir]** Provides status on a workflow instance
 * **pegasus-analyzer [wfdir]** Provides debugging clues why a workflow failed. Run this after a workflow has failed
 * **pegasus-statistics [wfdir]** Provides statistics, such as walltimes, on a workflow after it has completed
 * **pegasus-remove [wfdir]** Removes a workflow from the system


## 4. Workflow status

Use the workflow directory given in the output of the `pegasus-plan` command to determine the status of your workflow:

    $ pegasus-status -l [wfdir]

The flag `-l` gives you more verbose output. Please see `pegasus-status --help` to see all the options available.

You can keep running `pegasus-status` until the workflow has completed, or you can use the `-w` flag to mimic the `wait()` function we used in the API. This flag will make `pegasus-status` run periodically until the workflow is complete:

    $ pegasus-status -l -w 30 [wfdir]
    

## 5. Workflow statistics

Once the workflow is complete, you can extract statistics from the provenance database:

    $ pegasus-statistics -s all [wfdir]

 
## What's Next?

The next notebook is `04-Containers/` that shows you how to use a docker container for executing jobs in your workflow.