# Utilizing advanced features - Joining <ins>tests</ins> for HPC submissions
In this tutorial we will be exploring how to use advanced features of the framework. We will be using common terminology found in the repo's README.md - refer to that for any underlined terms that need clarification. Additionally, we will be building upon the material covered in the [Advanced Test Config - HPC argpacks](./AdvancedTestConfig_hpc_argpacks.ipynb); please review that tutorial if you haven't already. Anything in `"code-quoted"` format refers specifically to the test config, and anything in <ins>underlined</ins> format refers to specific terminology that can be found in the [README.md](../README.md).


In [None]:
# Get notebook location
shellReturn = !pwd
notebookDirectory = shellReturn[0]
print( "Working from " + notebookDirectory )

Advanced usage of the <ins>run script</ins> command line option `-j` will be the focus of this tutorial :

In [None]:
%%bash -s "$notebookDirectory"
$1/../.ci/runner.py $1/../our-config.json -h | \
  tr $'\n' '@' | \
  sed -e 's/[ ]\+-h.*\?entire suite from/.../g' | \
  sed -e 's/[ ]\+-alt.*/.../g' | \
  tr '@' $'\n'


## Joining Tests for Single HPC Submission

Tests can specify general resource usage and steps can further refine those requirements. This works well to individually submit each step independently to an HPC grid. However, for relatively small steps or if the smallest resource allocations are comparably large (e.g. whole node allocations for large core-count CPUs) one may want to aggregate the tests into larger workloads to be more effective with the resources. 

This can also be an appealing option if queue times on the grids are long, requiring individually submitted steps to each wait in the queue. Thus for steps with dependencies this would emulate re-entering the queue multiple times.

To avoid the need to over-spec the test suite to one particular machine (remember we want this to remain generic and flexible) the framework natively supports joining tests and steps into single job submissions. Tests and steps should now be able to remain as small logically separate components, steering away from machine-specific large and fragile multiple tests in one script designs.

<div class="alert alert-block alert-info">
<b>Check the FQDN of the node if using host-specific selections!</b>
When using cummulative join features for HPC runs the host that actually runs the final individual step is actually the HPC node and not the login node from which the command was launched. Joining tests under one HPC job bundles all the specified tests and their respective steps into one HPC launch step that then runs the tests normally in the node. This can at times lead to different FQDN naming between where the initial launch of the job and where the step is finally run, depending on how your computing system had been configured.
<br><br>
    
Final <ins>step argacks</ins> to the step only rely on the location where the step script is started (i.e. when "Running command" for that step is shown). This cah be affected by using the joining capabilities as the selected <ins>submit options</ins> may differ between where you launch and where it runs.

The <ins>hpc argpacks</ins> required for the steps to be submitted to the grid, total aggregated via joining or submitted normally, will only ever rely on the host that starts the test suite (the HPC login node). They are never used in the node environment for the fully joined submission and thus are not affected by changes in FQDN.

**ALTERNATIVELY** consider using the `--forceFQDN` option to manually specify the selection criteria
</div>

### Where do resource amounts come from?
Recall that each step in the end specifies its own individual set of `"arguments"` from its cummulative ancestry. The `"hpc_arguments"` work the same way as all keys under `"submit_options"` do. The difference here is that `"hpc_arguments"` is slightly more complex dictionary as opposed to a single dictionary of lists.

As HPC systems can be very different in scheduler args, format, and resources available to request one of the simplest approaches is to let the user specify the details as explained in the [Advanced Test Config - HPC argpacks](./AdvancedTestConfig_hpc_argpacks.ipynb) tutorial. This then provides a generally easy means to isolate resource amounts as the <ins>resource argpacks</ins> values. 


### Accepted Formats
Internally, when steps and tests are joined for HPC submission values from <ins>resource argpacks</ins> are attempted to be joined and if unsuccessful the first value for that resource key is used. It's not a perfect solution but is easy to understand and works for the most part. Supported values that can be joined are :
* integers
* memory amount strings in the format `[0-9]+(t|g|m|k)?(b|w)` where `(t|g|m|k)` correspond to unit prefix multipliers and `(b|w)` to bytes or words as units, case insensitive - e.g. "4GB"

### Initial Config
To see this in action we will show one resource of each style in PBS-style :
* nodes as integers
* memory per node as memory string
* job priority as non-joinable string 

First, an initial setup :

In [None]:
%%bash -s "$notebookDirectory" 
cat << EOF > $1/../our-config.json
{
  "submit_options" :
  {
    "submission" : "PBS",
    "queue"      : "main",
    "timelimit"  : "00:01:00",
    "hpc_arguments" :
    {
      "select" : 
      { 
        "-l " : 
        {
          "select" : 1,
          "mem"    : "32gb"
        }
      },
      "priority" :
      {
        "-l " :
        {
          "job_priority" : "economy"
        }
      }
    }
  },
  "our-test" :
  {
    "steps" : { "our-step0" : { "command" : "./tests/scripts/echo_normal.sh" } }
  }
}
EOF

echo "$( realpath $1/../our-config.json ) :"
cat $1/../our-config.json

$1/../.ci/runner.py $1/../our-config.json -t our-test -fs -i -dry -a WORKFLOWS

### Accumulating Resources

Now that we have a setup, let's make some additional tests. As the resources are defined for the whole config, all tests will use and request the same amount of resources :

In [None]:
%%bash -s "$notebookDirectory" 
cat << EOF > $1/../our-config.json
{
  "submit_options" :
  {
    "submission" : "PBS",
    "queue"      : "main",
    "timelimit"  : "00:01:00",
    "hpc_arguments" :
    {
      "select" : 
      { 
        "-l " : 
        {
          "select" : 1,
          "mem"    : "32gb"
        }
      },
      "priority" :
      {
        "-l " :
        {
          "job_priority" : "economy"
        }
      }
    }
  },
  "our-test0" :
  {
    "steps" : { "our-step0" : { "command" : "./tests/scripts/echo_normal.sh" } }
  },
  "our-test1" :
  {
    "steps" : { "our-step1" : { "command" : "./tests/scripts/echo_normal.sh" } }
  },
  "our-test2" :
  {
    "steps" : { "our-step2" : { "command" : "./tests/scripts/echo_normal.sh" } }
  }
}
EOF

echo "$( realpath $1/../our-config.json ) :"
cat $1/../our-config.json

$1/../.ci/runner.py $1/../our-config.json -t our-test0 our-test1 our-test2 -fs -i -dry -a WORKFLOWS

Each should have its own set of resources requested via `qsub` in the final output command. If we go ahead and auto-join them for one large job we would instead get (we remove the `-fs` flag as that takes precedence over joining) :

In [None]:
%%bash -s "$notebookDirectory" 

$1/../.ci/runner.py $1/../our-config.json -t our-test0 our-test1 our-test2 -i -dry -a WORKFLOWS -j

We see that the output looks completely different than what we are used to. We start with a flurry of preprocessing of phases, calculation of resources, simulating threadpools, and so on... confusing stuff. All this happens before we even get to something we are familiar with of `Submitting step submit...`.

What is going? Before we can even submit an aggregated job we need to know what each test would take as resources. To get that we need to know what each step would take _and_ any explicit order of execution dictated by dependencies _AS WELL AS_ implicit order based on the size of the thread pool. 

That is used to tally the total maximum resources a test would use. As a pessimistic approximation, we use that maximum as what a test would use for its entire duration. From there the process pool is used to account for implicit execution order of tests, once again doing the same process of maximum resource usage but now based on the tests' allotments instead of steps. The final aggregation is what will be placed on the command line for the HPC submission command arguments.

Looking at the final maximums reported by `[file::our-config]` at `Maximum calculated resources|timelimit...`, it should be intuitive that we've effectively tripled our resource amounts for anything numeric and supported since with a process pool of 4 all the tests we listed will run concurrently. Likewise, our timelimit did not change for that same reason.

What follows is a in-situ generated test and step combo that submits a job to run this suite again locally in the node(s) requested with the aggregated resource total, and all options (implicit, default, provided and auto-filled) to the <ins>run script</ins> explicitly listed including, especially, the tests we initially wanted to run. In essence, the join feature acts as a wrapper to resubmitting multi-test running locally in a node environment and calculating the resources it would take to accomplish that.


To demonstrate that our command line options would be listened to, say we want our aggregated tests to run in a process pool of 1 as we can only check out one node at a time as an arbitraty reason. Regardless, we want to control the run options of the local run that would occur, and the resource request should reflect that :

In [None]:
%%bash -s "$notebookDirectory" 

$1/../.ci/runner.py $1/../our-config.json -t our-test0 our-test1 our-test2 -i -dry -a WORKFLOWS -j -p 1

The maximum outputs are now :
```
[file::our-config]  Maximum calculated resources for running all tests is '-l select=1:mem=32gb -l job_priority=economy'
[file::our-config]  Maximum calculated timelimit for running all tests is '00:03:00'
```

We are running the joined tests serially in our HPC job submission, and only increasing our runtime accordingly since they are require the same resources.

### Overriding joined resources

In the first joined resource grouping you may have noticed that we selected 3 nodes :

`-l select=3:mem=96gb -l job_priority=economy`

This might be fine, but let's say that each node can request up to 128GB and has more than enough resources to run all the tests at the same time. We _could_ rewrite the tests to all fit together specifically and only request one node in one of the tests so the final output requests just `-l select=1...`, but that is bad design for a few reasons :
1. Now the tests __MUST___ run together, limiting our ability to mix and match other tests
2. The tests' definitions do not stand on their own, limiting our ability to run each test independently
3. Leaving out resources to be filled by another test obfuscates that test's requirements
4. Writing the tests in this manner assumes particular hardware capabilities rather than writing to what a test _needs_

The suggested alternative is to override elements of the joined `"hpc_arguments"` at the command-line. This would be difficult to automate, inquiring about all the potential combination of resources what nodes are at your disposal and their respective capabilities, and so on. It _could_ be done, but is beyond the scope and philosophy of this framework to remain simple. 

To override the final resource aggregation, we would pass in an argument to the `-j` flag in the format of the `"hpc_arguments"` `{}` dictionary that was accumulated. Recall that the joining aggregates your resultant `"hpc_arguments"` for each step (resolving all host-specifics and regex <ins>hpc argpacks</ins>) and then tests into one large set of <ins>hpc argpacks</ins>. Due to the uniqueness requirements of each plus the added output help of `From [<origin>] adding '<hpc argpack>'...From <origin> adding resource '<resource argpack>[ : <value if present> ]`, it should be possible to write an `"hpc_arguments"` `{}` that can index and override any resource value.

<div class="alert alert-block alert-info">
<b>Flags and resource names cannot be changed!</b>

As the flags and resource names are used as dictionary keys, it is impossible to change their value when writing the override dictionary. This may be changed in the future, but is a limitation worth noting for now.
</div>

Let's go ahead and re-run our joined tests submission but set the node count to 1, and for extra credit bump the priority to premium. 

As our node selection is under <ins>hpc argpack</ins> `"select"`, then `"-l "`, and finally under <ins>resource argpack</ins> `"select"` we would use `{"select":{"-l ":{"select":1}}}` to override that (unnecessary spaces are left out for compactness, but can be left in).

Job priority is under `"priority"` -> `"-l "` -> `"job_priority"`. Thus `{"priority":{"-l ":{"job_priority":"premium"}}}` should be what we want.

Since `-j` only accepts one argument, we will need to merge these two dictionaries. Once again, the uniqueness requirements save us to dissallow confusing conflicts. As a final override we should now have :

`{"select":{"-l ":{"select":1}},"priority":{"-l ":{"job_priority":"premium"}}}`

Now to run with that as a string input argument to `-j` :


In [None]:
%%bash -s "$notebookDirectory" 

$1/../.ci/runner.py $1/../our-config.json -t our-test0 our-test1 our-test2 \
                                          -i -dry -a WORKFLOWS \
                                          -j '{"select":{"-l ":{"select":1}},"priority":{"-l ":{"job_priority":"premium"}}}'

After the max calculations, we get another line now : `Requested override of resources with <our input>...New maximum...is <what we wanted as final resources>`. Additionally, at the `[step::submit] Submitting step submit...Gathering HPC argument packs...` we now see that the `<origin>` for `'select'` is listed as `cli`, same for our `'job_priority'` <ins>resource argpack</ins>.

### Usage with regex <ins>hpc and resource argpacks</ins>

The above example was fairly simple as all the tests and steps use the same resources. To provide a more realistic complex setup that would make use of regex <ins>argpacks</ins> let's adjust our config.


In [None]:
%%bash -s "$notebookDirectory" 
cat << EOF > $1/../our-config.json
{
  "submit_options" :
  {
    "submission" : "PBS",
    "queue"      : "main",
    "timelimit"  : "00:01:00",
    "hpc_arguments" :
    {
      ".*quartnode.*::select" : 
      { 
        "-l " : 
        {
          "select" : 1,
          "mem"    : "32gb",
          "ncpus"  : 32,
          ".*mpi.*::mpiprocs" : 32
        }
      },
      ".*fullnode.*::select" :
      { 
        "-l " : 
        {
          "select" : 1,
          "mem"    : "128gb",
          "ncpus"  : 128,
          ".*mpi.*::mpiprocs" : 128
        }
      },
      "priority" :
      {
        "-l " :
        {
          "job_priority" : "economy"
        }
      }
    }
  },
  "quartnode-simple" :
  {
    "submit_options" :
    {
      "hpc_arguments" : { ".*quartnode.*::select" : { "-l " : { "mem" : "16gb" } } }
    },
    "steps" : 
    {
      "our-step0" :
      {
        "command" : "./tests/scripts/echo_normal.sh"
      },
      "our-step0-mpi" :
      {
        "command" : "./tests/scripts/echo_normal.sh"
      }
    }
  },
  "quartnode" :
  {
    "steps" : 
    {
      "our-step0" :
      {
        "command" : "./tests/scripts/echo_normal.sh"
      },
      "our-step0-mpi" :
      {
        "command" : "./tests/scripts/echo_normal.sh"
      }
    }
  },
  "fullnode-simple" :
  {
    "steps" : 
    {
      "our-step0" :
      {
        "command" : "./tests/scripts/echo_normal.sh"
      },
      "our-step0-mpi" :
      {
        "command" : "./tests/scripts/echo_normal.sh"
      }
    }
  },
  "fullnode-double" :
  {
    "submit_options" :
    {
      "hpc_arguments" : { ".*fullnode.*::select" : { "-l " : { "select" : 2 } } }
    },
    "steps" : 
    {
      "our-step0" :
      {
        "command" : "./tests/scripts/echo_normal.sh"
      },
      "our-step0-mpi" :
      {
        "command" : "./tests/scripts/echo_normal.sh"
      }
    }
  }
}
EOF

echo "$( realpath $1/../our-config.json ) :"
cat $1/../our-config.json

If our nodes can support 128GB, 128 CPUs (and that many mpi ranks), joining all with a process pool of 4+ and thread pool of 2+ per process would have each running concurrently (we remove the `-i` option which is forcing serialization of steps). That means all the `"quartnode..."` tests can be combined into one node and the `"fullnode..."` tests take 1+ nodes apiece. The `"fullnode-double"` will have each step taking two nodes. Let's see what the joining would look like without modifications :

In [None]:
%%bash -s "$notebookDirectory" 

$1/../.ci/runner.py $1/../our-config.json -t quartnode-simple quartnode fullnode-simple fullnode-double \
                                          -dry -a WORKFLOWS -j -p 4 -tp 2

What? Our results are completely wrong as in they would not even be allowed to be submitted!

The framework has no idea what our fictitious hardware capabilities are, what nodes we have, or the like. All it will do is aggregate the `"hpc_arguments"`, adding together things that can be added. That in the end gives us this monstrosity :

`-l select=10:mem=608gb:ncpus=640:mpiprocs=320 -l job_priority=economy`

8 steps, each taking at least one node :
* cores :
  * 4 take only 32 cpus
  * 2 take 128 cpus
  * 2 take 128 cpus per node at 2x nodes
* mpi ranks
  * 2 take only 32 cpus
  * 1 take 128 cpus
  * 1 take 128 cpus per node at 2x nodes
* memory :
  * 2 take 16GB
  * 2 take 32GB
  * 2 take 128GB
  * 2 take 128BG per node at 2x nodes

Clearly the framework has simply added all theses together without consideration that our particular setup is 128 CPUS/128GB per node. And of course, how would it know? That's our job to inform it since node configuration and heterogeneous setups make it difficult to account for every possible aggregation.

Interestingly, it does the math right, logically joining the `".*mpi.*::mpiprocs"` <ins>resource argpacks</ins> and even moreso the `"<regex>::select"` <ins>hpc argpacks</ins> despite the `<regex>` portions being different. If we look at the `[step::submit] Submitting step submit...Gathering HPC argument packs...` section, we see that the <ins>argpack</ins> names no longer have regexes in them. When joining, the <ins>argpack</ins> "basename" for <ins>hpc and resource argpacks</ins> is used to aggregate common entries. Furthermore, as we only care about the "basename" and many regex <ins>argpacks</ins> may be contributed from respective steps it makes no sense to track or use the original regexes as keys into our final joining `"hpc_arguments"` so instead it is simply stripped out. 

This is one of the key reasons for why <ins>hpc argpacks</ins> also carry a uniqueness limitation, as it ensures proper aggregation in a logical manner when joining. It also makes indexing for overriding at the command line more obvious rather than matching any one regex.


We've started to see that for large jobs we may not want to use this joining feature unless we want to only enter the batch queue system once. Its strength primarily lies in joining jobs into one node, but you are free to use the feature as you see fit. Regardless, for the sake of this exercise let's fix this up with overrides that make more sense. 

Firstly, we _know_ our nodes are limited to 128 cores/ranks/GB so we will use the following : 

`{"select":{"-l ":{"mem":"128GB","ncpus":128,"mpiprocs":128}}}`

Note that we don't need to use the regexes here as we are modifying the final maximum which strips it. Also, we are fully requesting `mpiprocs` even though for half the steps none are used and for a quarter they only need half that amount if packed into a single node (2x tests each with 32 mpi ranks). This might be better solved with a heterogeneous resource submission and further highlights where this feature breaks down, but let's continue with this less-efficient approach for completness. 

Second, as we are now maximizing the request of our nodes, we pack the 4 `simple` tests into one node reducing the count from 10 to 7, resulting in :

`{"select":{"-l ":{"select":7,"mem":"128GB","ncpus":128,"mpiprocs":128}}}`

In [None]:
%%bash -s "$notebookDirectory" 

$1/../.ci/runner.py $1/../our-config.json -t quartnode-simple quartnode fullnode-simple fullnode-double \
                                          -dry -a WORKFLOWS -p 4 -tp 2 \
                                          -j '{"select":{"-l ":{"select":7,"mem":"128GB","ncpus":128,"mpiprocs":128}}}'

That worked, but we had to basically do the whole aggregation calculations ourselves. This was an extreme example where all the tests and steps run at the same time, but effectively demonstrates the complexities involved with this sort of feature. 

Making it "smarter" might complicate the system further and/or begin to lean toward overspecifying to a particular hardware configuration or batch scheduler. This may be revisited in the future.


Note : A heterogeneous submission would be possible to inject (very poorly) via override as so :

`{"select":{"-l ":{"select":3,"mem":"128GB","ncpus":128,"mpiprocs":"128+1:mem:48GB:ncpus:64:mpiprocs:64+1:...",}}}`

Note : When using the `-j` option you may also use the `-jn` / `--joinName` option to manually specify the joined test's name rather than using the default `"joinHPC_<all tests concatenated>"` which may be overly long for many tests.