<div>
<center><img src="../assets/Flux-logo.svg" width="400"/>
</div>

# Chapter 3: Process, Monitoring, Utilities, and More, Oh My!

Now that we have learned about basic flux commands, and hierarchical scheduling and its benefits, let's dive deeper into the structure of the individual Flux instances that comprise a hierarchy and talk about some additional "plumbing" that helps Flux to run. In this module, we cover:
1. More advanced flux commands for querying data
2. The structure of Flux instances
3. Examples `flux kvs` that powers a lot of higher level commands
4. Advanced job specification interaction with flux job
<br>

# Process, Monitoring, and Job Utilities ⚙️
## flux exec 👊️

<div class="alert alert-block" style="background-color:lightgreen">
<span style="font-weight:600">Description:</span> Executing commands across ranks
</div>

Have you ever wanted a quick way to execute a command to all of your nodes in a flux instance? It might be to create a directory, or otherwise interact with a file. This can be hugely useful in environments where you don't have a shared filesystem, for example. This is a job for flux exec! Here is a toy example to execute the command to every rank (`-r all`) to print.

In [23]:
!flux exec -r all echo "Hello from a flux rank!"

Hello from a flux rank!
Hello from a flux rank!
Hello from a flux rank!
Hello from a flux rank!


You can also use `-x` to exclude ranks. For example, we often do custom actions on the main or "leader" rank, and just want to issue commands to the workers.

In [24]:
! flux exec -r all -x 0 echo "Hello from everyone except the lead (0) rank!"

Hello from everyone except the lead (0) rank!
Hello from everyone except the lead (0) rank!
Hello from everyone except the lead (0) rank!


Here is a similar example, but asking to execute only on rank 2, and to have it print the rank.

In [25]:
!flux exec -r 2 flux getattr rank 

2


And of course, we could do the same to print for all ranks! This is a derivative of the first example we showed you.

In [26]:
!flux exec flux getattr rank

0
3
1
2


You can imagine that `flux exec` is hugely useful in the context of batch jobs.

## flux jobs

<div class="alert alert-block" style="background-color:lightgreen">
<span style="font-weight:600">Description:</span> Querying all jobs on a cluster
</div>

Flux provides a way to query information about all of the jobs on a cluster, which was discussed in Chapter 1. We sort of skipped over the powerful ways through which you can customize this command.

In [31]:
!flux jobs

       JOBID USER     NAME       ST NTASKS NNODES     TIME INFO


In [33]:
!flux jobs -a

       JOBID USER     NAME       ST NTASKS NNODES     TIME INFO
[01;32m  ƒ28G7V4TKD jovyan   echo       CD      1      1   0.139s a100cf62dc4f
[0;0m[01;32m  ƒ28G7TaU2v jovyan   echo       CD      1      1   0.131s a100cf62dc4f
[0;0m[01;32m  ƒ28G7TaU2u jovyan   echo       CD      1      1   0.130s a100cf62dc4f
[0;0m[01;32m  ƒ28G7TaU2s jovyan   echo       CD      1      1   0.128s a100cf62dc4f
[0;0m[01;32m  ƒ28G7TaU2t jovyan   echo       CD      1      1   0.126s a100cf62dc4f
[0;0m[01;32m  ƒ286xAsqfd jovyan   echo       CD      4      1   0.128s a100cf62dc4f
[0;0m[01;32m  ƒ283fwmzTR jovyan   echo       CD      1      1   0.118s a100cf62dc4f
[0;0m[01;32m  ƒ27p3Bm4Tu jovyan   true       CD      1      1   0.129s a100cf62dc4f
[0;0m[01;32m  ƒ27p3Bm4Tt jovyan   true       CD      1      1   0.128s a100cf62dc4f
[0;0m[01;32m  ƒ27p3Bm4Tq jovyan   true       CD      1      1   0.126s a100cf62dc4f
[0;0m[01;32m  ƒ27p3Bm4Ts jovyan   true       CD      1      1   0.125s a100cf62d

In [38]:
!flux jobs -a --no-header --format="{status}" ƒ28G7V4TKD

[01;32mCOMPLETED
[0;0m

## flux job info

<div class="alert alert-block" style="background-color:lightgreen">
<span style="font-weight:600">Description:</span> Querying information about a Flux job
</div>

Flux provides a way to get information from a JOBID, like jobspec, eventlog, and R (resource set).

In [45]:
!echo "Querying the jobspec"
!flux job info $(flux job last) jobspec | jq .resources
!echo "Querying the resource set"
!flux job info $(flux job last) R | jq .execution.nodelist

Querying the jobspec
[1;39m[
  [1;39m{
    [0m[34;1m"type"[0m[1;39m: [0m[0;32m"slot"[0m[1;39m,
    [0m[34;1m"count"[0m[1;39m: [0m[0;39m1[0m[1;39m,
    [0m[34;1m"with"[0m[1;39m: [0m[1;39m[
      [1;39m{
        [0m[34;1m"type"[0m[1;39m: [0m[0;32m"core"[0m[1;39m,
        [0m[34;1m"count"[0m[1;39m: [0m[0;39m1[0m[1;39m
      [1;39m}[0m[1;39m
    [1;39m][0m[1;39m,
    [0m[34;1m"label"[0m[1;39m: [0m[0;32m"task"[0m[1;39m
  [1;39m}[0m[1;39m
[1;39m][0m
Querying the resource set
[1;39m[
  [0;32m"a100cf62dc4f"[0m[1;39m
[1;39m][0m


In [10]:
!echo "Querying the job eventlog"
!flux job eventlog -H $(flux job last)
!echo "Querying the exec eventlog"
!flux job eventlog -H -p exec $(flux job last)
!echo "Querying the eventlog"
!flux job eventlog -H -p output $(flux job last)

Querying the job eventlog
[1m[32m[May21 22:47][0m [33msubmit[0m [34muserid[0m=[37m1000[0m [34murgency[0m=[37m16[0m [34mflags[0m=[37m0[0m [34mversion[0m=[37m1[0m
[32m[  +0.016689][0m [33mvalidate[0m
[32m[  +0.030445][0m [33mdepend[0m
[32m[  +0.030472][0m [33mpriority[0m [34mpriority[0m=[37m16[0m
[32m[  +0.034623][0m [33malloc[0m
[32m[  +0.047154][0m [33mstart[0m
[32m[  +0.174103][0m [33mfinish[0m [34mstatus[0m=[37m0[0m
[32m[  +0.175727][0m [33mrelease[0m [34mranks[0m=[35m"all"[0m [34mfinal[0m=[35mtrue[0m
[32m[  +0.175769][0m [33mfree[0m
[32m[  +0.175785][0m [33mclean[0m
Querying the exec eventlog
[1m[32m[May21 22:47][0m [33minit[0m
[32m[  +0.001631][0m [33mstarting[0m
[32m[  +0.107334][0m [33mshell.init[0m [34mservice[0m=[35m"1000-shell-f28G7V4TKD"[0m [34mleader-rank[0m=[37m2[0m [34msize[0m=[37m1[0m
[32m[  +0.112651][0m [33mshell.start[0m [34mtaskmap[0m=[35m{"version":1,"map":[[0,1,1

## flux uptime

<div class="alert alert-block" style="background-color:lightgreen">
<span style="font-weight:600">Description:</span> Showing how long a flux instance has been running
</div>

Flux provides an `uptime` utility to display properties of the Flux instance such as state of the current instance, how long it has been running, its size and if scheduling is disabled or stopped. The output shows how long the instance has been up, the instance owner, the instance depth (depth in the Flux hierarchy), and the size of the instance (number of brokers).

In [46]:
!flux uptime

 01:07:51 run 5.3h,  owner jovyan,  depth 0,  size 4


## flux top 

<div class="alert alert-block" style="background-color:lightgreen">
<span style="font-weight:600">Description:</span> Showing a table of real-time Flux processes
</div>

Flux provides a feature-full version of `top` for nested Flux instances and jobs. In the <button data-commandLinker-command="terminal:open" data-name="flux" href="#">JupyterLab terminal</button> invoke `flux top` to see the "sleep" jobs. If they have already completed you can resubmit them. 

We recommend not running `flux top` in the notebook as it is not designed to display output from a command that runs continuously.

## flux pstree 

<div class="alert alert-block" style="background-color:lightgreen">
<span style="font-weight:600">Description:</span> Showing a flux process tree (and seeing nesting in instances)
</div>

In analogy to `top`, Flux provides `flux pstree`. Try it out in the <button data-commandLinker-command="terminal:open" data-name="flux" href="#">JupyterLab terminal</button> or here in the notebook.

## flux proxy

<div class="alert alert-block" style="background-color:lightgreen">
<span style="font-weight:600">Description:</span> Interacting with a job hierarchy
</div>

Flux proxy is used to route messages to and from a Flux instance. We can use `flux proxy` to connect to a running Flux instance and then submit more nested jobs inside it. From the <button data-commandLinker-command="terminal:open" data-name="flux" href="#">JupyterLab terminal</button> run the commands below!

```bash
# Outputs the JOBID
flux batch --nslots=2 --cores-per-slot=1 --nodes=2 ./sleep_batch.sh

# Put the JOBID into an environment variable
JOBID=$(flux job last)

# See the flux process tree
flux pstree -a

# Connect to the Flux instance corresponding to JOBID above
flux proxy ${JOBID}

# Note the depth is now 1 and the size is 2: we're one level deeper in a Flux hierarchy and we have only 2 brokers now.
flux uptime

# This instance has 2 "nodes" and 2 cores allocated to it
flux resource list
```

## flux queue

<div class="alert alert-block" style="background-color:lightgreen">
<span style="font-weight:600">Description:</span> Interacting with and inspecting Flux queues
</div>

Flux has a command for controlling the queue within the `job-manager`: `flux queue`.  This includes disabling job submission, re-enabling it, waiting for the queue to become idle or empty, and checking the queue status:

In [51]:
!flux queue enable
!flux queue -h

Job submission is enabled
usage: flux-queue [-h] {list,status,enable,disable,start,stop,drain,idle} ...

options:
  -h, --help            show this help message and exit

subcommands:

  {list,status,enable,disable,start,stop,drain,idle}


## flux getattr

<div class="alert alert-block" style="background-color:lightgreen">
<span style="font-weight:600">Description:</span> Getting attributes about your system and environment
</div>

Each Flux instance has a set of attributes that are set at startup that affect the operation of Flux, such as `rank`, `size`, and `local-uri` (the Unix socket usable for communicating with Flux).  Many of these attributes can be modified at runtime, such as `log-stderr-level` (1 logs only critical messages to stderr while 7 logs everything, including debug messages). Here is an example set that you might be interested in looking at:

In [52]:
!flux getattr rank
!flux getattr size
!flux getattr local-uri
!flux setattr log-stderr-level 3
!flux lsattr -v

0
4
local:///tmp/flux-nygpcr/local-0
broker.boot-method                      simple
broker.cleanup-timeout                  none
broker.critical-ranks                   0
broker.mapping                          [[0,1,4,1]]
broker.pid                              8
broker.quorum                           4
broker.quorum-warn                      1m
broker.rc1_path                         /etc/flux/rc1
broker.rc3_path                         /etc/flux/rc3
broker.sd-stop-timeout                  none
broker.shutdown-warn                    1m
broker.starttime                        1747857107.61
conf.shell_initrc                       /etc/flux/shell/initrc.lua
conf.shell_pluginpath                   /usr/lib/flux/shell/plugins
config.path                             -
content.backing-module                  content-sqlite
content.hash                            sha1
hostlist                                a100cf62dc4f,a100cf62dc4f,a100cf62dc4f,a100cf62dc4f
instance-level                 

## flux module

<div class="alert alert-block" style="background-color:lightgreen">
<span style="font-weight:600">Description:</span> Managing Flux extension modules
</div>

Services within a Flux instance are implemented by modules. To query and manage broker modules, use `flux module`.  Modules that we have already directly interacted with in this tutorial include `resource` (via `flux resource`), `job-ingest` (via `flux` and the Python API) `job-list` (via `flux jobs`) and `job-manager` (via `flux queue`). For the most part, services are implemented by modules of the same name.  In some circumstances, where multiple implementations for a service exist, a module of a different name implements a given service (e.g., in this instance, `sched-fluxion-qmanager` provides the `sched` service and thus `sched.alloc`, but in another instance `sched-simple` might provide the `sched` service).

In [53]:
!flux module list

Module                   Idle  S Sendq Recvq Service
job-manager                57  R     0     0 
job-exec                   60  R     0     0 
resource                   63  R     0     0 
content-sqlite             60  R     0     0 content-backing
sched-fluxion-qmanager     60  R     0     0 sched
kvs                        60  R     0     0 
heartbeat                   1  R     0     0 
cron                     idle  R     0     0 
connector-local             0  R     0     0 
job-ingest               idle  R     0     0 
sched-fluxion-resource     60  R     0     0 feasibility
job-info                 idle  R     0     0 
job-list                 idle  R     0     0 
kvs-watch                idle  R     0     0 
barrier                  idle  R     0     0 
content                    60  R     0     0 


In [57]:
!flux module unload sched-fluxion-resource
!flux module load sched-simple
!flux module list

Module                   Idle  S Sendq Recvq Service
job-manager                 0  R     0     0 
job-exec                 idle  R     0     0 
resource                    0  R     0     0 
content-sqlite           idle  R     0     0 content-backing
kvs                      idle  R     0     0 
heartbeat                   0  R     0     0 
cron                     idle  R     0     0 
connector-local             0  R     0     0 
job-ingest               idle  R     0     0 
sched-simple                0  R     0     0 feasibility,sched
job-info                 idle  R     0     0 
job-list                 idle  R     0     0 
kvs-watch                idle  R     0     0 
barrier                  idle  R     0     0 
content                  idle  R     0     0 


## flux dmesg

<div class="alert alert-block" style="background-color:lightgreen">
<span style="font-weight:600">Description:</span> Viewing Flux system messages
</div>


If you need some additional help debugging your Flux setup, you might be interested in `flux dmesg`, which is akin to the [Linux dmesg](https://man7.org/linux/man-pages/man1/dmesg.1.html) but delivers messages for Flux.

In [58]:
!flux dmesg -H

[1m[32m[May21 19:51][0m [33mbroker.debug[0][0m: [34minsmod connector-local[0m
[32m[  +0.005402][0m [33mbroker.info[0][0m: start: none->join 12.9368ms[0m
[32m[  +0.008454][0m [33mbroker.info[0][0m: parent-none: join->init 2.93317ms[0m
[32m[  +0.027397][0m [33mconnector-local.debug[0][0m: [34mallow-guest-user=false[0m
[32m[  +0.027451][0m [33mconnector-local.debug[0][0m: [34mallow-root-owner=false[0m
[32m[  +0.041013][0m [33mbroker.debug[0][0m: [34maccepting connection from a100cf62dc4f (rank 3) status full[0m
[32m[  +0.041447][0m [33mbroker.debug[0][0m: [34maccepting connection from a100cf62dc4f (rank 1) status full[0m
[32m[  +0.041517][0m [33mbroker.debug[0][0m: [34maccepting connection from a100cf62dc4f (rank 2) status full[0m
[32m[  +0.133438][0m [33mbroker.debug[0][0m: [34minsmod content[0m
[32m[  +0.170387][0m [33mbroker.debug[0][0m: [34minsmod barrier[0m
[32m[  +0.284711][0m [33mbroker.debug[0][0m: [34minsmod content-

<br>

## Flux Modules

To manage and query modules, Flux provides the `flux module` command. The sub-commands provided by `flux module` can be seen by running the cell below.

In [2]:
!flux module --help

Usage: flux-module COMMAND [OPTIONS]
  -h, --help             Display this message.

flux module subcommands:
   list            List loaded modules
   remove          Unload module
   load            Load module
   reload          Reload module
   stats           Display stats on module
   debug           Get/set module debug flags


Some examples of Flux modules include:
* `job-ingest` (used by Flux submission commands like `flux batch` and `flux run`)
* `job-list` (used by `flux jobs`)
* `sched-fluxion-qmanager` (used by `flux tree`)
* `sched-fluxion-resource` (also used by `flux tree`)

We can see that these services are loaded and available by running the cell below.

In [3]:
!flux module list

Module                   Idle  S Service
job-exec                 idle  R 
heartbeat                   1  R 
job-list                 idle  R 
sched-fluxion-resource   idle  R 
content-sqlite           idle  R content-backing
resource                 idle  R 
job-ingest               idle  R 
content                  idle  R 
job-info                 idle  R 
sched-fluxion-qmanager   idle  R sched
kvs-watch                idle  R 
kvs                      idle  R 
cron                     idle  R 
job-manager              idle  R 
barrier                  idle  R 
connector-local             0  R 


In [60]:
!flux module stats job-manager | jq .inactive_jobs

[0;39m90[0m


Users and system administrators can easily load and unload modules using the `flux module load` and `flux module remove` commands. To show this, let's unload Fluxion (Flux's graph-based scheduler) and replace it with the built-in simple scheduler.

In [4]:
!flux module remove sched-fluxion-qmanager
!flux module remove sched-fluxion-resource
!flux module load sched-simple
!flux module list

Module                   Idle  S Service
job-exec                 idle  R 
heartbeat                   0  R 
job-list                 idle  R 
content-sqlite           idle  R content-backing
resource                    0  R 
job-ingest               idle  R 
content                     0  R 
job-info                 idle  R 
kvs-watch                idle  R 
kvs                         0  R 
cron                     idle  R 
job-manager                 0  R 
sched-simple                0  R sched
barrier                  idle  R 
connector-local             0  R 


In this code block, we unload the 2 services that comprise Fluxion: `sched-fluxion-qmanager` and `sched-fluxion-resource`. Next, we load the simple scheduler (`sched-simple`), and, finally, we look at the running servicees. We now see that Fluxion is not available, and the simple scheduler is. Next, let's reload Fluxion, but, this time, let's pass some extra arguments to specialize our Flux instance. In particular, we will limit the scheduling depth to 4 and populate Fluxion's resource graph with:
* Nodes
* Sockets
* Cores

In [5]:
# Run flux dmesg to make sure sched-simple has no more work before unloading
!flux dmesg -C
!flux module remove sched-simple
!flux module load sched-fluxion-resource load-allowlist=node,socket,core
!flux module load sched-fluxion-qmanager queue-params=queue-depth=4
!flux module list

Module                   Idle  S Service
job-exec                 idle  R 
heartbeat                   1  R 
job-list                 idle  R 
sched-fluxion-qmanager      0  R sched
content-sqlite           idle  R content-backing
resource                    0  R 
job-ingest               idle  R 
content                     0  R 
job-info                 idle  R 
kvs-watch                idle  R 
sched-fluxion-resource      0  R 
kvs                         0  R 
cron                     idle  R 
job-manager                 0  R 
barrier                  idle  R 
connector-local             0  R 


### flux kvs

One of the core services built into Flux is the key-value store (KVS). It is used in many other services, including most of Flux's resource management services, and the `flux archive` service below.

The `flux kvs` command provides a utility to list and manipulate values of the KVS. As a example of using `flux kvs`, let's use the command to examine information saved by the `resource` service.

In [None]:
!flux kvs ls
!flux kvs ls resource
!flux kvs get resource.R | jq

The KVS is such an essential component of Flux that we provide C and Python APIs to interact with it. To learn more about interacting with the KVS from these languages, take a look at these documentation pages:
* C's `flux_kvs_commit` [family of functions](https://flux-framework.readthedocs.io/projects/flux-core/en/latest/man3/flux_kvs_commit.html)
* C's `flux_kvs_copy` [family of functions](https://flux-framework.readthedocs.io/projects/flux-core/en/latest/man3/flux_kvs_copy.html)
* C's `flux_kvs_getroot` [family of functions](https://flux-framework.readthedocs.io/projects/flux-core/en/latest/man3/flux_kvs_getroot.html)
* C's `flux_kvs_lookup` [family of functions](https://flux-framework.readthedocs.io/projects/flux-core/en/latest/man3/flux_kvs_lookup.html)
* C's `flux_kvs_namespace_create` [family of functions](https://flux-framework.readthedocs.io/projects/flux-core/en/latest/man3/flux_kvs_namespace_create.html)
* C's `flux_kvs_txn_create` [family of functions](https://flux-framework.readthedocs.io/projects/flux-core/en/latest/man3/flux_kvs_txn_create.html)
* Python's `flux.kvs` [module](https://flux-framework.readthedocs.io/projects/flux-core/en/latest/python/autogenerated/flux.kvs.html#module-flux.kvs)

In [64]:
## a humorous example of writing to the Flux KVS using the python bindings
from functools import partial

import flux
import flux.future
import flux.kvs

handle = flux.Flux()

def we_await_donuts(future):
    print("We await donuts")
    with flux.kvs.KVSTxn(flux_handle=handle) as kt:
        kt.mkdir("donuts")
        kt.put("donuts.old_fashioned", "best")
        kt.put("donuts.plain_raised", "good ol' standby")
        kt.put("donuts.apple_fritter", "excellent alternative")

def samir_brought_donuts(hand, watcher, revents, args, future):
    ## Fetch donut ratings from the flux key-value store
    donuts = flux.kvs.KVSDir(hand, ".donuts")
    for donut in donuts.files():
        print(f'{donut}: {flux.kvs.get(hand, donuts.key_at(donut))}')
    future.fulfill()

def donuts_are_here(future):
    print("The donut future has been fulfilled")

## Create a future for the donuts on our flux handle
donut_future = flux.future.FutureExt(we_await_donuts, flux_handle=handle)

## When the donut future (promise) is fulfilled, we tell everyone the donuts are here
donut_future.then(donuts_are_here)

## We schedule the donut future to be fulfilled in 1 minute 
donut_watcher = handle.timer_watcher_create(6, partial(samir_brought_donuts, future=donut_future))

donut_watcher.start()

handle.reactor_run()

We await donuts
apple_fritter: excellent alternative
old_fashioned: best
plain_raised: good ol' standby
The donut future has been fulfilled


0

<br>

## flux jobspec generation

Underlying much interaction with jobs is the creation of job specifications. When you use the command line or Python SDK and submit from a command or script, under the hood (back to that plumbing reference) we are creating a job specification "Jobspec" that is passed further through Flux. The command `flux submit` makes it possible to provide a similar command, but instead of running it, to generate the jobspec. Let's do that now. We will generate and view a Jobspec for a simple "hello world" job. We do that by adding `--dry-run`.

In [61]:
! flux submit --dry-run echo hello potato 🥔️🍠️ > potato-job.txt
! cat potato-job.txt | jq

[1;39m{
  [0m[34;1m"resources"[0m[1;39m: [0m[1;39m[
    [1;39m{
      [0m[34;1m"type"[0m[1;39m: [0m[0;32m"slot"[0m[1;39m,
      [0m[34;1m"count"[0m[1;39m: [0m[0;39m1[0m[1;39m,
      [0m[34;1m"with"[0m[1;39m: [0m[1;39m[
        [1;39m{
          [0m[34;1m"type"[0m[1;39m: [0m[0;32m"core"[0m[1;39m,
          [0m[34;1m"count"[0m[1;39m: [0m[0;39m1[0m[1;39m
        [1;39m}[0m[1;39m
      [1;39m][0m[1;39m,
      [0m[34;1m"label"[0m[1;39m: [0m[0;32m"task"[0m[1;39m
    [1;39m}[0m[1;39m
  [1;39m][0m[1;39m,
  [0m[34;1m"tasks"[0m[1;39m: [0m[1;39m[
    [1;39m{
      [0m[34;1m"command"[0m[1;39m: [0m[1;39m[
        [0;32m"echo"[0m[1;39m,
        [0;32m"hello"[0m[1;39m,
        [0;32m"potato"[0m[1;39m,
        [0;32m"🥔️🍠️"[0m[1;39m
      [1;39m][0m[1;39m,
      [0m[34;1m"slot"[0m[1;39m: [0m[0;32m"task"[0m[1;39m,
      [0m[34;1m"count"[0m[1;39m: [0m[1;39m{
        [0m[34;1m"per_slot"[0m[1;39

You'll notice there is a lot of content in there! At this point you could write this to file (as we did, saving to `potato-job.txt`, edit it, and provide it directly to `flux job submit` to run. Let's try that now.

In [62]:
! flux job submit ./potato-job.txt
! flux job attach $(flux job last)

ƒ3KaQMY4kX
hello potato 🥔️🍠️


# This concludes Chapter 3.

## Some light reading...
## The structure of Flux instances

As mentioned in [Chapter 2](./01_flux_tutorial.ipynb), a Flux instance is comprised of one or more Flux brokers. A high-level depiction of the design of a Flux broker is shown in the figure below.

<figure>
<img src="../img/flux-broker-design.png">
<figcaption>
<i>Image created by Ian Lumsden for the Flux tutorials</i></figcaption>
</figure>

Each broker is a program built on top of the ∅MQ networking library. The broker contains two main components. First, the broker implements Flux-specific networking abstractions over ∅MQ, such as remote-proceedure call (RPC) and publication-subscription (pub-sub). Second, the broker contains several core services, such as PMI (for MPI support), run control support (for enabling automatic startup of other services), and, most importantly, broker module management. The remainder of a Flux broker's functionality comes from broker modules: specially designed services that the broker can deploy in independent OS threads. Some examples of broker modules provided by Flux include:
* Job scheduling (both traditional and hierarchical)
* [Fluxion](https://github.com/flux-framework/flux-sched) (Flux's advanced graph-based scheduler)
* Banks and accounting (for system-wide deployments of Flux)
* [PMIx](https://github.com/openpmix/openpmix) (for OpenMPI)
* An in-memory content store (useful for preloading data into pods on cloud)

When Flux starts, it launches one or more brokers across the resources it manages. By default, Flux will launch one broker per node, but this can be configured (e.g., with the `--test-size` flag to `flux start` shown in [Chapter 1](./01_flux_tutorial.ipynb)). After launching the brokers, Flux will designate one broker as the "leader" and the rest as "followers". The leader serves as entrypoint into the Flux instance, and it serves as the starting point for most Flux commands. The distribution of brokers and the "leader-follower" designations are shown in the following figure:

<figure>
<img src="../img/flux-instance-pre-tbon.png">
<figcaption>
<i>Image created by Vanessa Sochat for Flux Framework Components documentation</i></figcaption>
</figure>

After launching the brokers and designating a leader, Flux uses the brokers' network abstractions to connect the brokers together into what we call the "tree-based overlay network" or TBON for short. This network is shown in the figure below. This overlay network connects brokers together in a pre-defined tree-based topology (e.g., *k*-ary and binomial). Whenever brokers or instances of distributed services running on top of the brokers need to communicate, they can send messages up and down this tree-structured network. This tree-structured network is used over alternative designs (e.g., all-to-all networks used by MPI) because it provides better scalability (by minimizing communication), security, and fault tolerance for a service-focused framework. More information about these benefits and Flux's overall design can be found in our [publications](https://flux-framework.org/publications/) (particularly our [2014 paper on Flux](https://ieeexplore.ieee.org/document/7103433) presented at ICPP).

<figure>
<img src="../img/flux-instance-w-tbon.png">
<figcaption>
<i>Image created by Vanessa Sochat for Flux Framework Components documentation</i></figcaption>
</figure>

Flux functionality can be extended with modules, which you might think of like services. For Flux instances, additional services are typically implemented as broker modules that can be deployed across one or more brokers. Once deployed, these services can leverage the other components of the broker, including message routing over the TBON and services provided by other broker modules. As a result, broker modules allow for the creation of composable, easily deployable services for Flux instances.
    
    
# This concludes the Flux tutorial! 😄️

In this tutorial, we:
* Introduced Flux, and showed you how to get started
* Showed how to perform traditional batch scheduling with Flux
* Showed how to perform hierarchical scheduling with Flux
* Described the structure of Flux instances and Flux modules

And don't worry, you'll have more opportunities for using Flux! We hope you reach out to us on any of our [project repositories](https://flux-framework.org) and ask any questions that you have. We'd love your contribution to code, documentation, or just saying hello! 👋️ If you have feedback on the tutorial, please let us know so we can improve it for next year. 

> But what do I do now?

Feel free to experiment more with Flux here, or (for more freedom) in the terminal. You can try more of the examples in the `flux-workflow-examples` directory in the window to the left. If you're using a shared system like the one on the HPCIC AWS tutorial please be mindful of other users and don't run compute intensive workloads. If you're running the tutorial in a job on an HPC cluster... compute away! ⚾️

> Where can I learn to set this up on my own?

If you're interested in installing Flux on your cluster, take a look at the [system instance instructions](https://flux-framework.readthedocs.io/projects/flux-core/en/latest/guide/admin.html). If you are interested in running Flux on Kubernetes, check out the [Flux Operator](https://github.com/flux-framework/flux-operator). 

> How can I run this tutorial on my own?

All materials for this tutorial (including other versions of the tutorial) can be found in our [Tutorials repo](https://github.com/flux-framework/Tutorials). To run this tutorial on your own, you can clone this repo, enter the directory for the version of the tutorial you want to run, and follow the instructions in that directory's README. All versions of this tutorial are designed to either be deployed to cloud (e.g., AWS) or be run locally using Docker.


## How can I learn more about Flux?

We've got lots of resources for learning about Flux!
- [https://flux-framework.org/](https://flux-framework.org/) Flux Framework portal for projects, releases, and publication.
 - [Flux Documentation](https://flux-framework.readthedocs.io/en/latest/).
 - [Flux Framework Cheat Sheet](https://flux-framework.org/cheat-sheet/)
 - [Flux Glossary of Terms](https://flux-framework.readthedocs.io/en/latest/glossary.html)
 - [Flux Comics](https://flux-framework.readthedocs.io/en/latest/comics/fluxonomicon.html) come and meet FluxBird - the pink bird who knows things!
 - [Flux Learning Guide](https://flux-framework.readthedocs.io/en/latest/guides/learning_guide.html) learn about what Flux does, how it works, and real research applications 
 - [Getting Started with Flux and Go](https://converged-computing.github.io/flux-go/)
 - [Getting Started with Flux in C](https://converged-computing.github.io/flux-c-examples/) *looking for contributors*

We also have talks and recent publications or work related to Flux in the cloud:

 - [Flux Alongside User-Space Kubernetes](https://arxiv.org/abs/2406.06995): A possible future for running Kubernetes in user space on a traditional HPC cluster (with Flux)!
 - [The Flux Operator](https://flux-framework.org/flux-operator/getting_started/user-guide.html): For deploying an entire Flux cluster in seconds in Kubernetes.
 - [Fluence, a scheduler-plugin for Kubernetes](https://github.com/flux-framework/flux-k8s): to schedule pods with Fluxion.

And, of course, you can always reach out to us on any of our [project repositories](https://flux-framework.org) and ask any questions that you have. We'd love your contribution to code, documentation, or just saying hello!

![https://flux-framework.org/flux-operator/_static/images/flux-operator.png](https://flux-framework.org/flux-operator/_static/images/flux-operator.png)

>> See you next year! 👋️😎️

## flux archive 📚️

<div class="alert alert-block" style="background-color:lightgreen">
<span style="font-weight:600">Description:</span> Creating file and content archives to access later and between ranks
</div>

As Flux is used more in cloud environments, we might find ourselves in a situation where we have a cluster without a shared filesystem. The `flux archive` command helps with this situation. At a high level, `flux archive` allows us to save named pieces of data (e.g., files) to the Flux KVS for later retrieval.

When using `flux archive`, we first have to create an named archive. In the code below, we will create a text file and then save it into an archive using `flux archive`. Note that, for larger files, you can speed up the creation and extraction of archives by using the `--mmap` flag.

In [35]:
!echo "Sweet dreams 🌚️ are made of cheese, who am I to diss a brie? 🧀️" > shared-file.txt
!flux archive create --name myarchive --directory $(pwd) shared-file.txt

When we run this code, we are creating an archive in the leader broker. Now that the archive is created, we will want to extract its contents onto the other nodes of our cluster. To do this, we first need to ensure that the directory that we will extract into exists on those nodes. This can be done using `flux exec`. The `flux exec` command will execute a command on the nodes associated with specified brokers. Let's use `flux exec` to run `mkdir` on all the nodes of our cluster except the leader broker's node.

In [36]:
!flux exec -r all -x 0 mkdir -p $(pwd)

The flags provided to `flux exec` do the following:
* `-r all`: run across all brokers in the Flux instance
* `-x 0`: don't runn on broker 0 (i.e., the leader broker)

Now that the directory has been created on all our nodes, we can extract the archive onto those nodes by combining `flux exec` and `flux archive extract`.

Finally, when we're done with the archive, we can remove it with `flux archive remove`.

In [38]:
!flux archive remove --name myarchive

Finally, note that `flux archive` was named `flux filemap` in earlier versions of Flux.

In [37]:
!flux exec -r all -x 0 flux archive extract --name myarchive --directory $(pwd) shared-file.txt

flux-archive: shared-file.txt: write: Attempt to overwrite existing file
flux-archive: shared-file.txt: write: Attempt to overwrite existing file
flux-archive: shared-file.txt: write: Attempt to overwrite existing file
[1-3]: Exit 1
