# Quickstart Guide

Welcome to the Pegasus quickstart notebook, which is intended for new users who want to get a quick overview of Pegasus concepts and usage. 

In this notebook, we cover

 - Using the Pegasus API to generate an abstract workflow
 - Using the API to plan the abstract workflow into an executable workflow and submit it
 - Monitor the workflow and get runtime statistics
 
## No Allocation Required

Typically, using ACCESS Pegasus to run workflows necessitates users to link their own allocations. However, the initial notebooks in this guide are pre-configured to operate on a modest resource bundled with ACCESS Pegasus. As you progress to more complex sample workflows, such as Variant Calling, you'll be required to utilize your own allocation.

## Diamond Workflow

This notebook will generate the **diamond workflow** illustrated below, then plan and execute the workflow on the local condorpool. Rectangles represent input/output files, and ovals represent compute jobs. The arrows represent file dependencies between each compute job. 

![Diamond Workflow](../images/diamond.svg)

The abstract workflow description that you specify to Pegasus is portable, and usually does not contain any locations to physical input files, executables or end points where jobs are executed. Pegasus uses three information catalogs during the planning process.In this quickstart guide, we will not configure any of the above 3 catalogs. Instead, we will

- provide hints in the jobs in the abstract workflow as to the locations of the executables
- Pegasus will pick up the locations on input data from an input directory and place the outputs to an output directory. 
- the workflows will run on a default compute site named **condorpool**. 

In [None]:
from Pegasus.api import *

from pathlib import Path

import logging

logging.basicConfig(level=logging.DEBUG)
BASE_DIR = Path(".").resolve()

INPUT_DIR="./inputs"
Path(INPUT_DIR).mkdir(parents=True, exist_ok=True)
with open( INPUT_DIR + "/" + "f.a", "w") as f:
    f.write("This is the contents of the input file for the diamond workflow!")

# --- Workflow -----------------------------------------------------------------
wf = Workflow("blackdiamond")

props = Properties()
# Allow the jobs to run on the test cluster. You do not need to provision
# resources from your own allocations in this case, but the cluster is small
# and should not be used for production workloads.
props.add_site_profile("condorpool", "condor", "+run_on_test_cluster", "true")
props.write()

fa  = File("f.a")
fb1 = File("f.b1")
fb2 = File("f.b2")
job_preprocess = Job("preprocess", node_label="preprocess")\
                    .add_args("-a", "preprocess", "-T", "3", "-i", fa, "-o", fb1, fb2)\
                    .add_inputs(fa)\
                    .add_outputs(fb1, fb2)

fc1 = File("f.c1")
job_findrange_1 = Job("findrange", node_label="findrange")\
                    .add_args("-a", "findrange", "-T", "3", "-i", fb1, "-o", fc1)\
                    .add_inputs(fb1)\
                    .add_outputs(fc1)

fc2 = File("f.c2")
job_findrange_2 = Job("findrange", node_label="findrange")\
                    .add_args("-a", "findrange", "-T", "3", "-i", fb2, "-o", fc2)\
                    .add_inputs(fb2)\
                    .add_outputs(fc2)

fd = File("f.d")
job_analyze = Job("analyze", node_label="analyze")\
                .add_args("-a", "analyze", "-T", "3", "-i", fc1, fc2, "-o", fd)\
                .add_inputs(fc1, fc2)\
                .add_outputs(fd)

wf.add_jobs(job_preprocess, job_findrange_1, job_findrange_2, job_analyze)

# write out the workflow to files, and graph it
try:
    wf.write()
    wf.graph(include_files=True,  output="graph.png")
except PegasusClientError as e:
    print(e)

# view rendered workflow
from IPython.display import Image
Image(filename='graph.png')

In [None]:
# submit the workflow
# note the use of transformations_dir argument. that tells pegasus the directory where your executables are
try:
    wf.plan(input_dirs=[INPUT_DIR], output_dir="./outputs", transformations_dir="./executables", submit=True)\
        .wait()
except PegasusClientError as e:
    print(e)

## 8. What's Next?

To continue exploring Pegasus, and specifically learn how to run the same workflow using your ACCESS allocation, please open the notebook in `01-Introduction/` . The introduction chapter will make you run the same diamond workflow. But this time around, it walks you through on how to setup your

- Replica Catalog to provide locations to datasets that are not available on the locally.
- Transformation Catalog to define executables and the container in which the executable is supposed to run
- Site Catalog to describe directories to use for data staging.
- Tie in your ACCESS allocation to running your workflows on supported ACCESS resources.