# Utilizing advanced features
In this tutorial we will be exploring how to use advanced features of the framework. We will be using common terminology found in the repo's README.md - refer to that for any underlined terms that need clarification. Additionally, we will be building upon the material covered in the [Basic Test Config](./BasicTestConfig.ipynb); please review that tutorial if you haven't already. Anything in `"code-quoted"` format refers specifically to the test config, and anything in <ins>underlined</ins> format refers to specific terminology that can be found in the [README.md](../README.md).


In [1]:
# Get notebook location
shellReturn = !pwd
notebookDirectory = shellReturn[0]
print( "Working from " + notebookDirectory )

Working from /home/aislas/hpc-workflows/tutorials


## Test Config Advanced Usage
Advanced usage of the json config options will focus on the `"<regex>::<argpack>"` option under `"arguments"` and `"<match-fqdn>"`


In [2]:
# Output template file documenting options
from IPython.display import Markdown as md
md( "```jsonc\n" + open( notebookDirectory + "/../.ci/template.json", "r" )
                        .read()
                        .split( "// dict of argpacks" )[1]
                        .split( "// No limit on number of host-specific" )[0] + 
   "\n```" )

```jsonc

    "arguments"         :
    {
      // argpacks can be any valid string but all must be unique
      // Recommended to not contain spaces or periods, character pattern
      // '::' is reserved for regex-based argpacks
     
      // list of arguments to this specific <argpack>
      // They DO NOT undergo shell-expansion, so $ENV_VAR will be verbatim passed in
      "<argpack>"          : [ "list", "of", "arguments" ],
     
      // <regex> should be a valid regex usable by python's re module
      // <argpack> can match the above <argpack> string since the full
      // strings are unique, but they will be considered separate and
      // future definitions will only override the specific unique match
      // Single arguments with spaces should be entered as one string
      "<regex>::<argpack>" : [ "-f", "one whole arg", "-x" ]
    },
    // NO MORE KEYWORDS AT THIS LEVEL, ANYTHING ELSE IS CONSIDERED A HOST-SPECIFIC SUBSET
    // subset will be applied if key is found inside host FQDN
    // NOT REGEX - this is just python `if key in fqdn`
    "<match-fqdn>" :      
    {
      // Any of the "submit_options" KEYWORDS - host-specific subsets cannot be nested
      // When steps are resolved if a host-specific subset matches it will be applied
      // AFTER all generic submit_options have been applied
    },
    
```

### Regex-based <ins>argpacks</ins>

The following section assumes you have a general understanding of regular expressions. For further reading please refer to :
* [the wikipedia article](https://en.wikipedia.org/wiki/Regular_expression#Basic_concepts)
* an excellent [open-source guide available in many languages](https://github.com/ziishaned/learn-regex)
* a python-specific [`re` introduction](https://docs.python.org/3/howto/regex.html#regex-howto)
* (something I personally find incredibly useful) an [online regex tester with in-depth explanations](https://regex101.com/)

When using the `"<regex>::"`-style <ins>argpack</ins>, it is important to note a few things : 
1. The regex flavor used is python's `re` module
2. The python if-check uses `re.match()` with no flags ([`re` flags](https://docs.python.org/3/library/re.html#flags))
3. __ALL__ inherited arguments are attempted to be applied only when steps begin execution
4. The regex is applied to the full <ins>ancestry</ins>
5. To override the regex-<ins>argpack</ins>, it must match wholly `"<regex>::<argpack>"`
6. The `"<argpack>"` portion of `"<regex>::<argpack>"` is what is used to sort order


For (1) and (2), please refer to the `re` reference links. Simply put, (1) gives wide flexibility on the types of regex constructs that may be used and (2) means the regex is interpreted as literal as possible, i.e. case-sensitive, `^` and `$` only apply to beginning and end of string, and `.` does not match newline.


The importance of (3) is that the cummulative set of arguments inherited in a <ins>step</ins>'s <ins>ancestry</ins> is always applied. When not using regex-<ins>argpack</ins>, one should be careful to only pass things down to appropriate steps. Thus, with the advantages of regex-based conditional application of <ins>argpacks</ins> it is possible to become lazy or sloppy in heavy reliance of this feature. This could be thought of akin to global variables in that solely relying on regexes to route arguments from the top-level `"arguments"` option pollutes the <ins>argpack</ins> namespace unnecessarily. Writing regexes is already complex enough, and having _all_ the regexes applied to _all_ <ins>steps</ins> but only needing them to apply to a few under a single specific <ins>test</ins> may be a disaster waiting to happen and easier solved by scoping the regex to just the <ins>test</ins>'s `"argument"` option.

Point (4) provides corollary to (3): because the regex is applied to the <ins>ancestry</ins>, to make maximal use of this greater flexibility in application of the arguments one should be mindful of test and step naming in conjunction with writing well-defined regexes. Keeping this in mind for a <ins>test config</ins>, one could devise specific naming conventions allowing for specific test or step filtering. 

For instance, if I have a series of steps across multiple tests but all require a specific set of arguments when executing the "build" step of each test, I can extract the common build arguments and place them under a `".*build.*::build_args"` <ins>argpack</ins> if my "build" steps are prefixed with `"build"`. Furthermore, if my build steps across tests slightly differ based on compiler I can name my steps `"build-gcc"`, `"build-icc"`, `"build-clang"` and so on writing other <ins>argpacks</ins> such as `".*-gcc.*::gcc_env"` to load a specific environment.


Finally, (5) and (6) are restatements from the [README.md](../README.md) "How it works"-><ins>Submit Options</ins> subsection. These matter when argument order or overriding an <ins>argpack</ins> is necessary. Generally speaking, if you find you need to often override an <ins>argpack</ins>, whether regex or not (but especially if regex) you are most likely overcomplicating things. Consider rescoping the arguments or moving them to specific `"arguments"` option inside a <ins>step</ins> (not the `"submit_options"`)

#### Simple regex-<ins>argpack</ins> config

Let's start with something small : a test with two types of steps - a sender and receiver - with prefixes `"send"` and `"recv"`. We'd like for each to identify themselves with a defined string before output.


In [3]:
%%bash -s "$notebookDirectory"
cat << EOF > $1/../our-config.json
{
  "regex-test" : 
  {
    "submit_options" :
    {
      "arguments" :
      {
        ".*send.*::send_prefix" : [ "[send] " ],
        ".*recv.*::recv_prefix" : [ "[recv] " ]
      }
    },
    "steps" :
    {
      "send-step0" : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "Hello!" ] },
      "recv-step1" : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "Hello back!" ] },
      "send-step2" : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "Ping 1" ] },
      "send-step3" : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "Ping 2" ] },
      "recv-step4" : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "Pings received" ] }
    }
  }
}
EOF

echo "$( realpath $1/../our-config.json ) :"
cat $1/../our-config.json

/home/aislas/hpc-workflows/our-config.json :
{
  "regex-test" : 
  {
    "submit_options" :
    {
      "arguments" :
      {
        ".*send.*::send_prefix" : [ "[send] " ],
        ".*recv.*::recv_prefix" : [ "[recv] " ]
      }
    },
    "steps" :
    {
      "send-step0" : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "Hello!" ] },
      "recv-step1" : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "Hello back!" ] },
      "send-step2" : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "Ping 1" ] },
      "send-step3" : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "Ping 2" ] },
      "recv-step4" : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "Pings received" ] }
    }
  }
}


Let's run this config with `--forceSingle` to our <ins>run script</ins> to see the output easier. Likewise, we will use `--inlineLocal` to avoid the need to peek at log files.

In [4]:
%%bash -s "$notebookDirectory"
$1/../.ci/runner.py $1/../our-config.json -t regex-test --forceSingle --inlineLocal

Using Python version : 
3.9.18 (main, Jan  4 2024, 00:00:00) 
[GCC 11.4.1 20230605 (Red Hat 11.4.1-2)]
Inline stdout for steps requested, but steps' threadpool is greater than 1 - forcing threadpool to size 1 (serial)
[file::our-config]  Root directory is : /home/aislas/hpc-workflows
[file::our-config]  Preparing working directory
[file::our-config]    Running from root directory /home/aislas/hpc-workflows
[test::regex-test]  Preparing working directory
[test::regex-test]    Running from root directory /home/aislas/hpc-workflows
[test::regex-test]  Checking if results wait is required
[test::regex-test]    No HPC submissions, no results job added
[step::send-step0]  Preparing working directory
[step::send-step0]    Running from root directory /home/aislas/hpc-workflows
[step::send-step0]    Current directory : /home/aislas/hpc-workflows
[step::send-step0]  Submitting step send-step0...
[step::send-step0]    From regex-test adding arguments pack '.*send.*::send_prefix' : ['[send] ']
[st

When inspecting the output, we do see in the lines following `Submitting step <stepname>...` where <ins>argpacks</ins> are applied the appropriate prefix is selected, but our prefix is being applied as a suffix. Even more explicit, the line after `"Running command:"` verbatim outputs the command and arguments showing two argument strings being supplied with our "prefix" last. Not quite what we wanted...

<div class="alert alert-block alert-info">
<b>Recall:</b>
The order of <ins>arguments</ins> applied is step-specifics first, then all cummulative <ins>argpacks</ins> from <code>"submit_options"</code> in alphabetical order with conflicts resolved by order of first appearance.
</div>


When applied to a scipt in a manner that is expected, the regex-<ins>argpacks</ins> can become a very powerful feature. Let's further explore how one might apply this to a broader scope of procedures.

#### Advanced regex-<ins>argpack</ins> config

To get a feel for how best the regex feature of <ins>argpacks</ins> may be used, we will "build" a more complex set of <ins>steps</ins> that wouldn't just echo out the arguments. The term "build" here should be taken figuratively, as actually developing the logic to be outlined is beyond the scope of this tutorial. We will still be using the `echo_normal.sh` script for ease of use.

First, assume we have a build system (CMake, Make, etc.) who can alternate build types based on flags or configuration inputs. Assuming builds can be run in parallel, we _could_ have each build test run as separate <ins>tests</ins> unto themselves. However, these could also be categorically one <ins>test</ins> -  a compilation test. Let's do just that with the following assumptions:
* `build.sh` can facilitate our flags into the corresponding build system
* `-o` determines the build output location
* `-d` sets debug mode
* `--mpi` sets build with MPI
* `--omp` sets build with OpenMP
* `--double` sets build with double precision
* `-a` enables feature A
* `-b` enables feature B
* `-c` enables feature C which is mutually exclusive with A

We don't want a combination of every option, just a select few for our most critical tests. 


In [5]:
%%bash -s "$notebookDirectory"
cat << EOF > $1/../our-config.json
{
  "regex-test" : 
  {
    "submit_options" :
    {
      "arguments" :
      {
        ".*dbg.*::dbg_flag" : [ "-d" ],
        ".*mpi.*::mpi_flag" : [ "--mpi" ],
        ".*omp.*::omp_flag" : [ "--omp" ],
        ".*fp64.*::fp64_flag" : [ "--double" ],
        ".*ft[^A]*A.*::feature_a" : [ "-a" ],
        ".*ft[^B]*B.*::feature_b" : [ "-b" ],
        ".*ft[^C]*C.*::feature_b" : [ "-c" ]
      }
    },
    "steps" :
    {
      "build-omp-fp32-dbg"  : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "-o", "build-omp-fp32-dbg" ] },
      "build-omp-fp32-ftA"  : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "-o", "build-omp-fp32-ftA" ] },
      "build-omp-fp32-ftAB" : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "-o", "build-omp-fp32-ftAB" ] },
      "build-omp-fp32-ftBC" : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "-o", "build-omp-fp32-ftBC" ] },
      "build-omp-fp64-ftB"  : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "-o", "build-omp-fp64-ftB" ] },
      "build-mpi-fp32-dbg"  : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "-o", "build-mpi-fp32-dbg" ] },
      "build-mpi-fp32"      : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "-o", "build-mpi-fp32" ] },
      "build-mpi-fp64"      : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "-o", "build-mpi-fp64" ] }
    }
  }
}
EOF

echo "$( realpath $1/../our-config.json ) :"
cat $1/../our-config.json

/home/aislas/hpc-workflows/our-config.json :
{
  "regex-test" : 
  {
    "submit_options" :
    {
      "arguments" :
      {
        ".*dbg.*::dbg_flag" : [ "-d" ],
        ".*mpi.*::mpi_flag" : [ "--mpi" ],
        ".*omp.*::omp_flag" : [ "--omp" ],
        ".*fp64.*::fp64_flag" : [ "--double" ],
        ".*ft[^A]*A.*::feature_a" : [ "-a" ],
        ".*ft[^B]*B.*::feature_b" : [ "-b" ],
        ".*ft[^C]*C.*::feature_b" : [ "-c" ]
      }
    },
    "steps" :
    {
      "build-omp-fp32-dbg"  : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "-o", "build-omp-fp32-dbg" ] },
      "build-omp-fp32-ftA"  : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "-o", "build-omp-fp32-ftA" ] },
      "build-omp-fp32-ftAB" : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "-o", "build-omp-fp32-ftAB" ] },
      "build-omp-fp32-ftBC" : { "command" : "./tests/scripts/echo_normal.sh", "arguments" : [ "-o", "build-omp-fp32-ftBC" ] },
      "build-omp-fp64-

Let's run this code and see how flags are routed to the appropriate tests.

In [6]:
%%bash -s "$notebookDirectory"
$1/../.ci/runner.py $1/../our-config.json -t regex-test --forceSingle --inlineLocal

Using Python version : 
3.9.18 (main, Jan  4 2024, 00:00:00) 
[GCC 11.4.1 20230605 (Red Hat 11.4.1-2)]
Inline stdout for steps requested, but steps' threadpool is greater than 1 - forcing threadpool to size 1 (serial)
[file::our-config]  Root directory is : /home/aislas/hpc-workflows
[file::our-config]  Preparing working directory
[file::our-config]    Running from root directory /home/aislas/hpc-workflows
[test::regex-test]  Preparing working directory
[test::regex-test]    Running from root directory /home/aislas/hpc-workflows
[test::regex-test]  Checking if results wait is required
[test::regex-test]    No HPC submissions, no results job added
[step::build-omp-fp32-dbg] Preparing working directory
[step::build-omp-fp32-dbg]   Running from root directory /home/aislas/hpc-workflows
[step::build-omp-fp32-dbg]   Current directory : /home/aislas/hpc-workflows
[step::build-omp-fp32-dbg] Submitting step build-omp-fp32-dbg...
[step::build-omp-fp32-dbg]   From regex-test adding arguments pac

By solely controlling the apt naming of our steps we can get the correct flags applied to the respective step. This, of course, is a heavily simplified example.


### Joining Tests for Single HPC Submission

Tests can specify general resource usage and steps can further refine those requirements. This works well to individually submit each step independently to an HPC grid. However, for relatively small steps or if the smallest resource allocations are comparably large (e.g. whole node allocations for large core-count CPUs) one may want to aggregate the tests into larger workloads to be more effective with the resources. 

This can also be an appealing option if queue times on the grids are long, requiring individually submitted steps to each wait in the queue. Thus for steps with dependencies this would emulate re-entering the queue multiple times.

To avoid the need to over-spec the test suite to one particular machine (remember we want this to remain generic and flexible) the framework natively supports joining tests and steps into single job submissions. Tests and steps should now be able to remain as small logically separate components, steering away from machine-specific large and fragile multiple tests in one script designs.

#### Accumulate Resources
Recall that each step in the end specifies its own individual set of `"arguments"` from its cummulative ancestry. The `"resources"` string field works the same way as all keys under `"submit_options"` do. The difference here is that `"resoursces"` is a single string as opposed to a dictionary where you can update part of it.

As HPC systems can be very different in scheduler args, format, and resources available to request the simplest approach is to only carry the single string. Internally, when steps and tests are joined based on scheduler type the framework makes a best guess as to how resoursces should be combined and aggregated.

The join option allows overriding specific 


### Host-Specific `"submit_options"`

One can conditionally control whether `"submit_options"` are applied based on the FQDN ([fully qualified domain name](https://en.wikipedia.org/wiki/Fully_qualified_domain_name)) of the host running the step. 

<div class="alert alert-block alert-info">
<b>HPC Users:</b>
The last part is worth reiterating, as when using cummulative join features for HPC runs the host that actually runs the final individual step is actually the HPC node and not the login node from which the command was launched. Joining tests under one HPC job bundles all the specified tests and their respective steps into one HPC launch step that then runs the tests normally in the node. This can at times lead to different FQDN naming between where the initial launch of the job and where the step is finally run, depending on how your computing system had been configured.
<br><br>
    
Final `"arguments"` to the step only rely on the location where the step script is started (i.e. when "Running command" for that step is shown). This will be affected by using the joining capabilities.

The `"resources"` required for the steps, total aggregated via joining or submitted normally, will only ever rely on the host that starts the test suite (the HPC login node).
</div>

In [7]:
%%bash -s "$notebookDirectory"
$1/../.ci/runner.py $1/../our-config.json -t our-test -fc -i # using shorthand options

usage: runner.py [-h] [-t TESTS [TESTS ...]] [-s {PBS,SLURM,LOCAL}]
                 [-a ACCOUNT] [-d DIROFFSET] [-j [JOINHPC]]
                 [-alt [ALTDIRS ...]] [-l LABELLENGTH] [-g GLOBALPREFIX]
                 [-dry] [-nf] [-nw] [-np] [-k KEY] [-p POOL] [-tp THREADPOOL]
                 [-r REDIRECT] [-i] [-fs]
                 testsConfig
runner.py: error: unrecognized arguments: -fc


Using Python version : 
3.9.18 (main, Jan  4 2024, 00:00:00) 
[GCC 11.4.1 20230605 (Red Hat 11.4.1-2)]


CalledProcessError: Command 'b'$1/../.ci/runner.py $1/../our-config.json -t our-test -fc -i # using shorthand options\n'' returned non-zero exit status 2.

## Adding step arguments
Okay, now that we have that printing neatly we can see that our example script doesn't do a whole lot aside from echoing our success <ins>keyphrase</ins> `TEST echo_normal.sh PASS`. Not much of a test?

Let's add some <ins>arguments</ins> and observe how they get routed to the step.

In [None]:
%%bash -s "$notebookDirectory"
cat << EOF > $1/../our-config.json
{
  "our-test" : 
  { 
    "steps" : 
    { 
      "our-step0" : 
      { 
        "command" : "./tests/scripts/echo_normal.sh",
        "arguments" : [ "foobar" ]
      }
    }
  }
}
EOF

echo "$( realpath $1/../our-config.json ) :"
cat $1/../our-config.json

Now we run again, but this time note the changes in both the step command listed after the line starting with `[step::our-step0]...Running command` and the actual step output.

In [None]:
%%bash -s "$notebookDirectory"
$1/../.ci/runner.py $1/../our-config.json -t our-test -fc -i # using shorthand options

## Step dependencies
Let's go ahead and add another step, but with this one having a <ins>dependency</ins> on the first causing it to run only after the first has completed.

In [None]:
%%bash -s "$notebookDirectory"
cat << EOF > $1/../our-config.json
{
  "our-test" : 
  { 
    "steps" : 
    { 
      "our-step0" : 
      { 
        "command" : "./tests/scripts/echo_normal.sh",
        "arguments" : [ "foobar" ]
      },
      "our-step1" : 
      { 
        "command" : "./tests/scripts/echo_normal.sh",
        "arguments" : [ "why", "not more", "args?" ],
        "dependencies" : { "our-step0" : "afterany" }
      }
    }
  }
}
EOF

echo "$( realpath $1/../our-config.json ) :"
cat $1/../our-config.json

In [None]:
%%bash -s "$notebookDirectory"
$1/../.ci/runner.py $1/../our-config.json -t our-test -fc -i # using shorthand options

Most of the output should look very similar, but notice that after running `our-step0` there is an additional line now stating `Notifying children...` just before `our-step1` begins to run. This tells us that we have properly tied a dependency between `our-step0` as a parent step and `our-step1` as a dependent child step.

Going a little further, if we look at `our-step1`'s respective `Running command` line we see that `"not more"` is being passed in as one whole argument. This emulates exactly how it was listed in the `"arguments"` for the step.

## Adding argpacks
Imagine we now want to add some additional generalized arguments to both our steps. We have the ability to add these higher-defined arguments as <ins>argpacks</ins> from any level of `"submit_options"` that appears in a step's <ins>ancestry</ins>. For the sake of demonstrating this, we will not add an <ins>argpack</ins> at the highest level, and instead show how it can be inherited from the <ins>test</ins> level

In [None]:
 %%bash -s "$notebookDirectory"
cat << EOF > $1/../our-config.json
{
  "our-test" : 
  { 
    "submit_options" :
    {
      "arguments" :
      {
        "our-default-argpack" : [ "foobar" ]
      }
    },
    "steps" : 
    { 
      "our-step0" : 
      { 
        "command" : "./tests/scripts/echo_normal.sh",
        "arguments" : [ "foobar" ]
      },
      "our-step1" : 
      { 
        "command" : "./tests/scripts/echo_normal.sh",
        "arguments" : [ "why", "not more", "args?" ],
        "dependencies" : { "our-step0" : "afterany" }
      }
    }
  }
}
EOF

echo "$( realpath $1/../our-config.json ) :"
cat $1/../our-config.json

In [None]:
%%bash -s "$notebookDirectory"
$1/../.ci/runner.py $1/../our-config.json -t our-test -fc -i # using shorthand options

# Clean up all generated logs and files
rm $1/../our-config.json $1/../*.log

Now notice how in the step preparation phase between `Submitting step ...` and `Running command` for each respective step we now have a new output of `From our-test adding argument pack 'our-default-argpack'...`. This line tells us both the origin of the <ins>argpack</ins> (which level in our step's <ins>ancestry</ins> provided the defintion) and the effective values of the arguments to be added. Any additional lines of the format `From <origin> adding argument pack '<argpack>'...` would also be listed in the order applied to the step's run command, where `<argpack>` is determining that order.

This <ins>argpack</ins> is always listed after our steps' <ins>arguments</ins> in the step's final command listed - this is important to note!

This concludes our simplest example of a test config that gives enough of an overview to provide users with enough understanding to put together a sufficiently capabale test <ins>suite</ins>