get through steps of providing command for final burst

For the time being while we are developing, it is nice to have all the configs generated and then a startscript generated we can manually run. This commit adds all these steps, and next I need to move it onto an allocation and test how to actually turn on the other hosts. Signed-off-by: vsoch <vsoch@users.noreply.github.com>
converged-computing · Jul 14, 2023 · 1fc934f · 1fc934f
1 parent 34126a6
commit 1fc934f
Show file tree

Hide file tree

Showing 7 changed files with 258 additions and 63 deletions.
diff --git a/README.md b/README.md
@@ -7,7 +7,6 @@ in the context of simply starting a set of nodes that are alongside one another
 in an allocation.
 
 For instructions, see the [main flux-burst repository](https://github.com/converged-computing/flux-burst).
-Tutorials are available under the [flux operator](https://github.com/flux-framework/flux-operator/tree/main/examples/experimental/bursting)
 
 ![https://raw.githubusercontent.com/converged-computing/flux-burst/main/docs/assets/img/logo.png](https://raw.githubusercontent.com/converged-computing/flux-burst/main/docs/assets/img/logo.png)
 

diff --git a/example/README.md b/example/README.md
@@ -13,6 +13,9 @@ lead broker. We are choosing this design because likely a local burst will need
 to do (and automate) both steps.
 
 ```bash
+# Ensure the installed executable is on your path
+export PATH=$HOME/.local/bin:$PATH
+
 # If you are using a custom flux install
 $ python burst-slurm-allocation.py --config-dir ./configs --flux-root /path/to/flux/root --network-device eno1
 
@@ -22,6 +25,101 @@ $ python burst-slurm-allocation.py --config-dir ./configs --network-device eno1
 # Development with one node (e.g., DevContainer)
 $ python3 burst-slurm-allocation.py --config-dir ./configs --network-device eno1 --hostnames $(hostname)
 ```
+```
+🌳️ Flux root set to /usr
+🦩️ Writing flux config to /workspaces/flux-burst-local/example/configs/system/system.toml
+🌀️ Done! Use the following command to start your Flux instance and burst!
+    It is also written to /workspaces/flux-burst-local/example/configs/start.sh
+
+/usr/bin/flux start --broker-opts --config /workspaces/flux-burst-local/example/configs -Stbon.fanout=256 -Srundir=/workspaces/flux-burst-local/example/configs/run -Sstatedir=/workspaces/flux-burst-local/example/configs/run -Slocal-uri=local:///workspaces/flux-burst-local/example/configs/run/local -Slog-stderr-level=7 -Slog-stderr-mode=local /home/vscode/.local/bin/flux-burst-local --config-dir /workspaces/flux-burst-local/example/configs --flux-root /usr
+```
+The above script is going to setup the configs and give you a command that will use them
+to start a flux instance, and then also start a more standard flux burst plugin flow (with the same
+local configs) so you can burst to local instances. Note that the flux root should have lib, bin, libexec, etc. in it. It's the `--prefix`
+you chose for the install. Here is what the generated tree looks like under configs:
+
+```bash
+tree ./configs
+```
+```console
+$ tree example/configs/
+example/configs/
+├── curve.cert
+├── R
+├── run
+│   └── content.sqlite
+├── start.sh
+└── system
+    └── system.toml
+
+2 directories, 5 files
+```
+The sockets (e.g., "local") will be generated under run. And here is what it looks like without the
+secondary brokers starting yet:
+
+```console
+$ bash configs/start.sh
+broker.debug[0]: insmod connector-local
+broker.info[0]: start: none->join 0.6186ms
+broker.info[0]: parent-none: join->init 0.034561ms
+connector-local.debug[0]: allow-guest-user=false
+connector-local.debug[0]: allow-root-owner=false
+broker.debug[0]: insmod barrier
+broker.debug[0]: insmod content-sqlite
+content-sqlite.debug[0]: /workspaces/flux-burst-local/example/configs/run/content.sqlite (0 objects) journal_mode=WAL synchronous=NORMAL
+broker.debug[0]: content backing store: enabled content-sqlite
+broker.debug[0]: insmod kvs
+broker.debug[0]: insmod kvs-watch
+broker.debug[0]: insmod resource
+resource.debug[0]: reslog_cb: resource-init event posted
+resource.debug[0]: reslog_cb: resource-define event posted
+broker.debug[0]: insmod cron
+cron.info[0]: synchronizing cron tasks to event heartbeat.pulse
+broker.debug[0]: insmod job-manager
+job-manager.debug[0]: jobtap plugin .history registered method job-manager.history.get
+job-manager.info[0]: restart: 0 jobs
+job-manager.info[0]: restart: 0 running jobs
+job-manager.info[0]: restart: checkpoint.job-manager not found
+job-manager.debug[0]: restart: max_jobid=ƒ1
+job-manager.debug[0]: duration-validator: updated expiration to 0.00
+broker.debug[0]: insmod job-info
+broker.debug[0]: insmod job-list
+job-list.debug[0]: job_state_init_from_kvs: read 0 jobs
+broker.debug[0]: insmod job-ingest
+job-ingest.debug[0]: configuring validator with plugins=(null), args=(null) (enabled)
+job-ingest.debug[0]: fluid ts=1ms
+broker.debug[0]: insmod job-exec
+job-exec.debug[0]: using default shell path /usr/libexec/flux/flux-shell
+broker.debug[0]: insmod heartbeat
+broker.info[0]: rc1.0: running /etc/flux/rc1.d/01-sched-fluxion
+broker.debug[0]: insmod sched-fluxion-resource
+sched-fluxion-resource.info[0]: version 0.27.0-38-ge0b49993
+sched-fluxion-resource.debug[0]: mod_main: resource module starting
+sched-fluxion-resource.warning[0]: create_reader: allowlist unsupported
+sched-fluxion-resource.debug[0]: resource graph datastore loaded with rv1exec reader
+sched-fluxion-resource.info[0]: populate_resource_db: loaded resources from core's resource.acquire
+sched-fluxion-resource.debug[0]: resource status changed (rankset=[all] status=DOWN)
+sched-fluxion-resource.debug[0]: mod_main: resource graph database loaded
+broker.debug[0]: insmod sched-fluxion-qmanager
+sched-fluxion-qmanager.info[0]: version 0.27.0-38-ge0b49993
+sched-fluxion-qmanager.debug[0]: service_register
+sched-fluxion-qmanager.debug[0]: enforced policy (queue=default): fcfs
+sched-fluxion-qmanager.debug[0]: effective queue params (queue=default): default
+sched-fluxion-qmanager.debug[0]: effective policy params (queue=default): default
+sched-fluxion-qmanager.debug[0]: handshaking with sched-fluxion-resource completed
+job-manager.debug[0]: scheduler: hello
+job-manager.debug[0]: scheduler: ready unlimited
+sched-fluxion-qmanager.debug[0]: handshaking with job-manager completed
+broker.info[0]: rc1.0: running /etc/flux/rc1.d/02-cron
+broker.info[0]: rc1.0: /etc/flux/rc1 Exited (rc=0) 0.4s
+broker.info[0]: rc1-success: init->quorum 0.3982s
+broker.debug[0]: groups: broker.online=0
+broker.info[0]: online: c35948d1ed31 (ranks 0)
+broker.info[0]: quorum-full: quorum->run 0.100979s
+resource.debug[0]: reslog_cb: online event posted
+sched-fluxion-resource.debug[0]: resource status changed (rankset=[0] status=UP)
+TODO START OTHER WOKRERS
+...
+```
 
-Note that the flux root should have lib, bin, libexec, etc. in it. It's the `--prefix`
-you chose for the install.
+I wasn't able to get an allocation so I'll develop this tomorrow.
diff --git a/example/burst-slurm-allocation.py b/example/burst-slurm-allocation.py
@@ -58,36 +58,11 @@ def main():
     # {'gke': <module 'fluxburst_gke' from '/home/flux/.local/lib/python3.8/site-packages/fluxburst_gke/__init__.py'>}
 
     # Load our plugin and provide the dataclass to it!
+    # Unlike other plugins, the local one handles setting up the flux instance
+    # (and then issuing the burst). This could change (e.g., if we have already)
+    # generated configs or started the cluster.
     client.load("local", params)
 
-    # Sanity check loaded
-    print(f"flux-burst client is loaded with plugins for: {client.choices}")
-
-    # We are using the default algorithms to filter the job queue and select jobs.
-    # If we weren't, we would add them via:
-    # client.set_ordering()
-    # client.set_selector()
-
-    # Here is how we can see the jobs that are contenders to burst!
-    # client.select_jobs()
-
-    # Now let's run the burst! The active plugins will determine if they
-    # are able to schedule a job, and if so, will do the work needed to
-    # burst. unmatched jobs (those we weren't able to schedule) are
-    # returned, maybe to do something with? Note that the default mock
-    # generates a N=4 job. For compute engine that will be 3 compute
-    # nodes and 1 login node.
-    unmatched = client.run_burst()
-    assert not unmatched
-    plugin = client.plugins["compute_engine"]
-    print(
-        f"Terraform configs and working directory are found at {plugin.params.terraform_dir}"
-    )
-    input("Press Enter to when you are ready to destroy...")
-
-    # Get a handle to the plugin so we can cleanup!
-    plugin.cleanup()
-
 
 if __name__ == "__main__":
     main()
diff --git a/fluxburst_local/__init__.py b/fluxburst_local/__init__.py
@@ -17,9 +17,10 @@ def init(dataclass, **kwargs):
     this means starting another flux instance with the resources.
     If SLURM we assume we are inside a SLURM allocation.
     """
-    from .plugin import FluxBurstSlurm, SlurmBurstParameters
+    from .plugin import FluxBurstLocal, FluxBurstSlurm, SlurmBurstParameters
 
     if isinstance(dataclass, SlurmBurstParameters):
         # Set variables from slurm
         FluxBurstSlurm.setup(dataclass)
         return FluxBurstSlurm(dataclass, **kwargs)
+    return FluxBurstLocal(dataclass, **kwargs)
diff --git a/fluxburst_local/flux.py b/fluxburst_local/flux.py
@@ -0,0 +1,75 @@
+#!/usr/bin/env python
+
+# This is a script called by flux-burst-local, and it's assumed that files
+# are generated in the directed config directory, and we've started the
+# main broker and can now burst.
+
+import argparse
+
+from fluxburst.client import FluxBurst
+
+# How we provide custom parameters to a flux-burst plugin
+from fluxburst_local.plugin import BurstParameters
+
+
+def get_parser():
+    parser = argparse.ArgumentParser(
+        description="Flux Local Broker Start",
+        formatter_class=argparse.RawTextHelpFormatter,
+    )
+    parser.add_argument("--config-dir", help="Configuration directory for flux")
+    parser.add_argument(
+        "--flux-root", help="Flux root (should correspond with broker running Flux)"
+    )
+    return parser
+
+
+def main():
+    parser = get_parser()
+    args, _ = parser.parse_known_args()
+
+    # Create the dataclass for the plugin config
+    # We use a dataclass because it does implicit validation of required params, etc.
+    params = BurstParameters(
+        flux_root=args.flux_root,
+        config_dir=args.config_dir,
+        # This says to not re-generate our configs!
+        regenerate=False,
+    )
+    assert params
+    client = FluxBurst()
+
+    # For debugging, here is a way to see plugins available
+    # import fluxburst.plugins as plugins
+    # print(plugins.burstable_plugins)
+    print("TODO START OTHER WOKRERS")
+
+    # Load our plugin and provide the dataclass to it!
+    # client.load("local", params)
+
+    # Sanity check loaded
+    client = FluxBurst()
+    print(f"flux-burst client is loaded with plugins for: {client.choices}")
+
+    # We are using the default algorithms to filter the job queue and select jobs.
+    # If we weren't, we would add them via:
+    # client.set_ordering()
+    # client.set_selector()
+
+    # Here is how we can see the jobs that are contenders to burst!
+    # client.select_jobs()
+
+    # Now let's run the burst! The active plugins will determine if they
+    # are able to schedule a job, and if so, will do the work needed to
+    # burst. unmatched jobs (those we weren't able to schedule) are
+    # returned, maybe to do something with? Note that the default mock
+    # generates a N=4 job. For compute engine that will be 3 compute
+    # nodes and 1 login node.
+    unmatched = client.run_burst()
+    assert not unmatched
+    plugin = client.plugins["local"]
+    print(plugin)
+
+
+if __name__ == "__main__":
+    main()