# Running a Workflow on a Single ACCESS Resource and using the Shared Filesystem 

**Objective:** Learn about how you can run a workflow on a single ACCESS Resource (Expanse used as an example) and use the shared filesystem on it. 

In the previous notebook, you learnt about how to do provisioning using the command `htcondor annex` against an ACCESS resource. 

<div class="alert alert-block alert-info">
<b>Note:</b>
If you don't have an allocation on Expanse, you can use the annex command against the HPC ACCESS resource you have an allocation for. 
</div>



## 1. Annex Setup (Need to do only once)



To launch the pilot jobs from ACCESS Pegasus submit host, use the htcondor annex create command. If this is your first time doing an `annex` against a resource, you need to some one time setup. 
Please refer to 
[ACCESS Pegasus Annex documentation](https://access-ci.atlassian.net/wiki/spaces/ACCESSdocumentation/pages/564887666/HTCondor+Annex) for details.

## 2. Run a workflow against the Annex

We will now run the same workflow that we ran in `03-Tutorial-Software` notebook that runs each job in a container against the annex.

### 2.1 Setup the Replica and Transformation Catalog

In [1]:
from Pegasus.api import *
import sys
import os
from pathlib import Path

import logging

logging.basicConfig(level=logging.DEBUG)

# we specify directories for inputs, executables and outputs
# - directory where the executables that the workflow uses are placed.
# - directory where the outputs should be placed.

BASE_DIR = Path(".").resolve()
EXECUTABLES_DIR = Path(BASE_DIR / ".." /  "executables").resolve()
OUTPUT_DIR = Path(BASE_DIR /  "output").resolve() 

# --- Replicas -----------------------------------------------------------------
fin = File("f.in").add_metadata(creator="vahi")
rc = ReplicaCatalog()\
    .add_replica("remote", fin, "http://download.pegasus.isi.edu/tutorial/inputs/f.in")\
    .write() # written to ./replicas.yml 

tc = TransformationCatalog()
        
wf_container = Container("wf_container",
    container_type = Container.SINGULARITY,
    image = "http://download.pegasus.isi.edu/containers/hello-world/hello-world.sif",
    image_site = "web"
)

# For each type of job in the workflow specify a transformation
# When you instantiate a Job() object, you specify a transformation name
# which is a logical identifier for the executable you want to run
# when the job is launched on a remote node. 
#
# In this workflow, we have two transformations "hello" and "world",
# with each mapping to the same executable that is installed in
# the container. is_stageable parameter is set to False to indicate
# the executable is installed in the container.
# Note: how cpu and other resources are requested
hello = Transformation("hello", 
                         site="web", 
                         pfn="/opt/pegasus-tutorial/pegasus-keg.py", 
                         is_stageable=False, 
                         container=wf_container)\
          .add_pegasus_profiles(cores=1, memory="1 GB", diskspace="1 GB")

world = Transformation("world", 
                         site="web", 
                         pfn="/opt/pegasus-tutorial/pegasus-keg.py", 
                         is_stageable=False, 
                         container=wf_container)\
          .add_pegasus_profiles(cores=1, memory="1 GB", diskspace="1 GB")

tc.add_containers(wf_container)
tc.add_transformations(hello, world)
tc.write()

### 2.2 Define a site for the Annex

First update the variable EXPANSE_USERNAME to match your username on EXPANSE.
If, you are doing an annex against a resource other than EXPANSE, you need to update the full path accordingly.

In [2]:
# note the variables below should be updated to refer to your
# username on expanse. There are of the form uxXXXXX
cluster_shared_dir = "/expanse/lustre/scratch/ux454545/temp_project"
cluster_home_dir = "/home/ux454545"

We now describe the expanse site in the Site Catalog

In [3]:
sc = SiteCatalog()   

# add a profile to indicate an env script that should be sourced
# before a job is run. In this case, it just loads singlaritypro
# before a job is executed on Expanse.
local_scratch_dir = os.path.join(BASE_DIR, "scratch")
local_storage_dir = os.path.join(BASE_DIR, "output")
      
# define the site layout for the expanse resource
EXEC_SITE="expanse"
expanse = (Site(EXEC_SITE)
                .add_pegasus_profile(
                    style="condor",
                    data_configuration="nonsharedfs",
                )
           .add_pegasus_profile(
                pegasus_lite_env_source=str(Path(BASE_DIR /  "hpc_env_setup.sh").resolve())
            )
)


exec_site_shared_scratch_dir = os.path.join(cluster_shared_dir, "pegasuswfs/scratch")
exec_site_shared_storage_dir = os.path.join(cluster_home_dir, "pegasuswfs/outputs")

expanse.add_directories(
    Directory(Directory.SHARED_SCRATCH, exec_site_shared_scratch_dir)
    .add_file_servers(FileServer("file://" + exec_site_shared_scratch_dir, Operation.ALL)),
    Directory(Directory.LOCAL_STORAGE, exec_site_shared_storage_dir)
    .add_file_servers(FileServer("file://" + exec_site_shared_storage_dir, Operation.ALL))
)
expanse.add_profiles(Namespace.ENV, LANG='C')
expanse.add_profiles(Namespace.ENV, PYTHONUNBUFFERED='1')

# exclude the ACCESS Pegasus TestPool
# we want it to run on our annex
expanse.add_condor_profile(requirements="TestPool =!= True")

sc.add_sites(expanse)
sc.write()

In [4]:
!cat sites.yml

x-pegasus:
  apiLang: python
  createdBy: vahi
  createdOn: 11-06-25T17:51:08Z
pegasus: 5.0.4
sites:
- name: expanse
  directories:
  - type: sharedScratch
    path: /expanse/lustre/scratch/ux454545/temp_project/pegasuswfs/scratch
    sharedFileSystem: false
    fileServers:
    - url: file:///expanse/lustre/scratch/ux454545/temp_project/pegasuswfs/scratch
      operation: all
  - type: localStorage
    path: /home/ux454545/pegasuswfs/outputs
    sharedFileSystem: false
    fileServers:
    - url: file:///home/ux454545/pegasuswfs/outputs
      operation: all
  profiles:
    condor:
      requirements: TestPool =!= True
    env:
      LANG: C
      PYTHONUNBUFFERED: '1'
    pegasus:
      style: condor
      data.configuration: nonsharedfs
      pegasus_lite_env_source: /home/vahi/ACCESS-Pegasus-Examples/08-Tutorial-SharedFS/hpc_env_setup.sh


### 3. Define and Execute the Workflow

In [5]:
# --- Workflow -----------------------------------------------------------------
wf = Workflow("hello-world")


finter = File("f.inter")
fout = File("f.out")

job_hello = Job("hello")\
                    .add_args("-T", "3", "-i", fin, "-o {}".format(finter))\
                    .add_inputs(fin)\
                    .add_outputs(finter, stage_out=False)

job_world = Job("world")\
                    .add_args("-T", "3", "-i", finter, "-o {}".format(fout))\
                    .add_inputs(finter)\
                    .add_outputs(fout)

wf.add_jobs(job_hello, job_world)    

# --- Run the Workflow ---------------------------------------------------
# we plan the workflow to run on site expanse, and have the outputs placed
# on the expanse site.
try:
    wf.write()
    wf.plan(sites=[EXEC_SITE], output_sites=[EXEC_SITE], submit=True)\
      .status()      
except PegasusClientError as e:
    print(e)

INFO:Pegasus.api.workflow:hello-world added Job(_id=ID0000001, transformation=hello)
INFO:Pegasus.api.workflow:hello-world added Job(_id=ID0000002, transformation=world)
INFO:Pegasus.api.workflow:inferring hello-world dependencies
INFO:Pegasus.api.workflow:workflow hello-world with 2 jobs generated and written to workflow.yml

################
# pegasus-plan #
################
2025.11.06 17:51:16.618 UTC:
2025.11.06 17:51:16.623 UTC:   -----------------------------------------------------------------------
2025.11.06 17:51:16.628 UTC:   File for submitting this DAG to HTCondor           : hello-world-0.dag.condor.sub
2025.11.06 17:51:16.634 UTC:   Log of DAGMan debugging messages                   : hello-world-0.dag.dagman.out
2025.11.06 17:51:16.639 UTC:   Log of HTCondor library output                     : hello-world-0.dag.lib.out
2025.11.06 17:51:16.644 UTC:   Log of HTCondor library error messages             : hello-world-0.dag.lib.err
2025.11.06 17:51:16.649 UTC:   Log of the 


STAT  IN_STATE  JOB                      
 Run    00:01   hello-world-0 (/home/vahi/ACCESS-Pegasus-Examples/08-Tutorial-SharedFS/vahi/pegasus/hello-world/run0014)
Summary: 1 Condor job total (R:1)

UNREADY  READY   PRE   QUEUED   POST   SUCCESS  FAILURE  %DONE  
   10      1      0      0       0        0        0      0.0   
Summary: 1 DAG total (Running:1)


## 3. Setting up Annex against a HPC ACCESS Resource

To launch the pilot jobs from ACCESS Pegasus submit host, use the htcondor annex create command.

A sample invocation against SDSC Expanse is listed below.

<div class="alert alert-block alert-info">
<b>Note:</b>
Note: you need to do it on the command line in a terminal.
</div>

<br>

```
htcondor annex create --project <project-id> --lifetime 3600   --nodes 1  $USER QUEUE@RESOURCE
```


Please note the annex created should be named $USER as the ACCESS Pegasus HTCondor configuration automatically adds the annex name (same as use ACCESS Pegasus username) to the jobs as a job transform. You need to specify your project-id instead of <project-id>. And also update QUEUE and RESOURCE keywords to reflect the ACCESS resource against which you are doing the annex.

Below is an invocation for doing an annex against queue named `compute` on SDSC resource `expanse` which requests 1 node (128 cores) for 60 minutes.

```
htcondor annex create --project <project-id> --lifetime 3600   --nodes 1  $USER compute@expanse
```

Please open a terminal and type the above command. Remember to update the project-id to match your project id. 

Sample invocation against EXPANSE is shown below.

```
htcondor annex create --project XXXX --lifetime 3600   --nodes 1  $USER compute@expanse
This will (as the project 'XXX') request 1 nodes for 1.00 hours for an annex named 'vahi' from the queue named 'compute' on the system named 
'Expanse'.  To change the project, use --project.  To change the resources requested, use either --nodes or one or more of --cpus and --mem_mb. 
 To change how long the resources are reqested for, use --lifetime (in seconds).
This command will access the system named 'Expanse' via SSH.  To proceed, follow the prompts from that system below; to cancel, hit CTRL-C.
Enter passphrase for key '/home/vahi/.ssh/annex': 
TOTP code for ux454545: 453580
Thank you.
Populating annex temporary directory... done.
Requesting annex named 'vahi' from queue 'compute' on the system named 'Expanse'...
    Step 8 of 8: Submitting SLURM job............    
... requested.
It may take some time for the system named 'Expanse' to establish the requested annex.
To check on the status of the annex, run 'htcondor annex status vahi'.

```
    
    
## 4. Wait for the workflow to finish
    
We will now wait for the workflow to finish using the `wait()` command.

In [None]:
# --- Wait for the workflow to finish ---------------------------------------------------
# we plan the workflow to run on site expanse, and have the outputs placed
# on the expanse site.
try:
    wf.wait()     
except PegasusClientError as e:
    print(e)

[[1;32m##################[0m-------] 72.73% ..Running ([1;34mUnready: 1[0m, [1;32mCompleted: 8[0m, [1;33mQueued: 0[0m, [1;36mRunning: 1[0m, [1;31mFailed: 0[0m))

## 5. Inspecting the generated output of the workflow

In this case, the workflow places the outputs in a directory on expanse

```
[expanse ~]$ cat ~/pegasuswfs/outputs/f.out 
===================== contents start f.out =====================
Hostname: exp-5-56 IP Addr: 198.202.102.215
        --- start f.inter ----
        ===================== contents start f.inter =====================
        Hostname: exp-5-56 IP Addr: 198.202.102.215
                --- start f.in ----
                This is the contents of the input file hosted remotely for the hello world workflow!
                --- end f.in ----
        ===================== contents end   f.inter =====================
        --- end f.inter ----
===================== contents end   f.out =====================
```