Skip to content

Commit

Permalink
add example of using flux tree with variables (#147)
Browse files Browse the repository at this point in the history
* add script to run

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
  • Loading branch information
vsoch committed Apr 21, 2023
1 parent aed30f0 commit 9b28b18
Show file tree
Hide file tree
Showing 4 changed files with 136 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/tutorials/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Although some of the others above are also workflows, these examples are going t
submit different job hierarchies and get around the etcd bottleneck in Kubernetes.

- [Basic Tree](https://github.com/flux-framework/flux-operator/blob/main/examples/workflows/tree)
- [Instance Variables](https://github.com/flux-framework/flux-operator/blob/main/examples/workflows/tree-with-variables)

We have just started this arm of our experiments and you can expect more as we go!

Expand Down
81 changes: 81 additions & 0 deletions examples/workflows/tree-with-variables/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Tree with Variables

We can use [flux tree](https://github.com/flux-framework/flux-sched/blob/master/t/t2001-tree-real.t#L43-L51)
to create instances inside of instances. For this example, we will start with a root, create
two instances under it, and two instances under each of those. We will (instead of running hostname) run
a script that demonstrates the environment available to each subinstance.
You can read more about [the utility here](https://github.com/flux-framework/flux-sched/blob/master/resource/utilities/README.md).

## Usage

First, let's create a kind cluster. From the context of this directory:

```bash
$ kind create cluster --config ../../kind-config.yaml
```

And then install the operator, create the namespace, and apply the MiniCluster YAML here.

```bash
$ kubectl apply -f ../../dist/flux-operator.yaml
$ kubectl create namespace flux-operator
$ kubectl apply -f ./minicluster.yaml
```

The cluster creation has the present working directory (where you are reading this file)
bound to `/tmp/workflow`, and we are running the `flux tree` command there. You can check the logs
for the run via:

```bash
$ kubectl logs -n flux-operator flux-sample-0-7tx7s -f
```

And when it's done, the tree.out (written to `/tmp/workflow` in the cluster) will be written to `tree.out`.
In here you will see:

```bash
$ flux tree -T2x2 -J 4 -N 4 -c 4 -o /tmp/workflow/tree.out -Q easy:fcfs /bin/bash ./run-on-instance.sh
```
```console
$ cat tree.out
TreeID Elapsed(sec) Begin(Epoch) End(Epoch) Match(usec) NJobs NNodes CPN GPN
tree 3.646440 1682094481.024492 1682094484.670933 0.000000 4 4 4 0
tree.2 1.847760 1682094482.167398 1682094484.015160 0.000000 2 2 4 0
tree.2.2 0.146933 1682094483.195491 1682094483.342424 0.000000 1 1 4 0
tree.2.1 0.098842 1682094483.068877 1682094483.167719 0.000000 1 1 4 0
tree.1 1.789910 1682094482.071364 1682094483.861272 0.000000 2 2 4 0
tree.1.2 0.102510 1682094483.056029 1682094483.158540 0.000000 1 1 4 0
tree.1.1 0.119904 1682094482.937050 1682094483.056954 0.000000 1 1 4 0
```

This information is repeated from the [basic tree](../tree) example, and you can look there for details about what the above means.
For this example, we focus on the variables available in the script, and we write files that are named by the tree id! You
should be able to see them in the present working directory:

```bash
$ ls
```
```console
minicluster.yaml README.md run-on-instance.sh tree.1.1-output.txt tree.1.2-output.txt tree.2.1-output.txt tree.2.2-output.txt tree.out
```

If we look in a script we can see the variables available to the instance:

```bash
$ cat tree.1.2-output.txt
```
```console
FLUX_TREE_ID tree.1.2
FLUX_TREE_JOBSCRIPT_INDEX 1
FLUX_TREE_NNODES 1
FLUX_TREE_NCORES_PER_NODE 1
FLUX_TREE_NGPUS_PER_NODE 0
```

Note that for this example we are only running the scripts on the leaves, hence why we only see one `NNODES` above. The table above
that shows we go from `4 > 2 > 1`. You would direct custom logic in this little script to control execution of your job, likely with different instances using different resources.
It's super cool!

```bash
$ kubectl delete -f minicluster.yaml
```
34 changes: 34 additions & 0 deletions examples/workflows/tree-with-variables/minicluster.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
apiVersion: flux-framework.org/v1alpha1
kind: MiniCluster
metadata:
name: flux-sample
namespace: flux-operator
spec:
# suppress all output except for test run
logging:
quiet: false

# Number of pods to create for MiniCluster
size: 4
tasks: 4

# Make this kind of persistent volume and claim available to pods
volumes:
data:
storageClass: hostpath
path: /tmp/workflow

# See examples in this test file:
# https://github.com/flux-framework/flux-sched/blob/master/t/t2001-tree-real.t#L43-L51
# And documentation here:
# https://github.com/flux-framework/flux-sched/blob/master/resource/utilities/README.md
containers:
- image: ghcr.io/flux-framework/flux-restful-api:latest
launcher: true
cores: 4

# provide the /tmp/workflow as an output directory for each tree to write to!
command: flux tree -T2x2 -J 4 -N 4 -c 4 -o /tmp/workflow/tree.out -Q easy:fcfs /bin/bash /tmp/workflow/run-on-instance.sh /tmp/workflow
volumes:
data:
path: /tmp/workflow
20 changes: 20 additions & 0 deletions examples/workflows/tree-with-variables/run-on-instance.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/bin/bash

# It's hard to see this for a quick job, so let's write to a file!
outdir=${1}
outfile="${outdir}/${FLUX_TREE_ID}-output.txt"

# ID string uniquely identifying the hierarchical path of the Flux instance on which Jobscript is being executed
echo "FLUX_TREE_ID ${FLUX_TREE_ID}" > "${outfile}"

# the integer ID of each jobscript invocation local to the Flux instance. It starts from 1 and sequentially increases.
echo "FLUX_TREE_JOBSCRIPT_INDEX ${FLUX_TREE_JOBSCRIPT_INDEX}" >> "${outfile}"

# the number nodes assigned to the instance
echo "FLUX_TREE_NNODES ${FLUX_TREE_NNODES}" >> "${outfile}"

# the number of cores per node assigned to the instance
echo "FLUX_TREE_NCORES_PER_NODE ${FLUX_TREE_NCORES_PER_NODE}" >> "${outfile}"

# the number of GPUs per node assigned to the instance.
echo "FLUX_TREE_NGPUS_PER_NODE ${FLUX_TREE_NGPUS_PER_NODE}" >> "${outfile}"

0 comments on commit 9b28b18

Please sign in to comment.