# AI Agent Integration with HPC Slurm Jobs

In this second tutorial, you will use an **AI agent framework** to instantiate an AI agent to help you write code and generate Slurm job scripts for submitting code as jobs to an HPC system scheduler.

This agent framework is built around the `Agent` class (see `./TACC_exAI/agent.py`) and supports multiple **actions** such as:
- Creating and running multi-step plans.
- Summarizing context and replying to the user.
- Generating code snippets.
- Writing Slurm job scripts for HPC clusters.
- Optional runtime tracing with Arize Phoenix for observability.

Just like in Tutorial 1, the underlying language model runs on a local **Ollama** server using an OpenAI-compatible API, so everything stays on your machine while still using an LLM backend.

## Planning versus ReAct Style Agents

In the data analysis tutorial, we constructed a ReAct style agent where the agent decides each action to take sequentially after its execution.  In this tutorial we will put together a planning agent that will first generate a plan for a series of actions that it will perform in sequence. This allows the agent to coordinate long range dependencies between actions, enabling it to tackle longer tasks autonomously.

## Configuring a Long Context Model on the Local Ollama Backend

The `Agent` class in `agent.py` is designed to be flexible: you can run it from the command line or import and use it as a Python class. In this tutorial, you will work with it directly as a class object inside this notebook.

Below is code that will configure our ollama instance to have a 20,000 token context limit for the qwen3-code:30b model. We will point our agent to this new long context model `qwen3-coder:30b-20k`

In [1]:
# Cell 2: Create the extended model in Ollama
import subprocess
import os

# Create the Modelfile in the current directory
modelfile_content = '''FROM qwen3-coder:30b

PARAMETER num_ctx 20000

TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{- end }}{{- if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{- end }}<|im_start|>assistant
{{ .Response }}<|im_end|>"""'''

with open("Modelfile", "w") as f:
    f.write(modelfile_content)

# Run the ollama create command
model_name = "qwen3-coder:30b-20k"
print(f"Creating model {model_name} with 20k context...")

try:
    result = subprocess.run(
        ["ollama", "create", model_name, "-f", "Modelfile"],
        capture_output=True,
        text=True,
        check=True
    )
    print("✅ Model created successfully!")
    print(result.stdout)
except subprocess.CalledProcessError as e:
    print("❌ Error creating model:")
    print(e.stderr)
finally:
    # Optionally clean up Modelfile
    if os.path.exists("Modelfile"):
        os.remove("Modelfile")

Creating model qwen3-coder:30b-20k with 20k context...
✅ Model created successfully!



## 1. Single-Tool Chatbot

We will start with the **simplest possible configuration** of this agent: give it **one tool** and let it execute that tool once, so it behaves like a basic chatbot.

Key choices for this first example:
- Use an **isolated session** so the agent does *not* load or reuse any previous conversation history.
- Disable session saving with `no_save=True` so no history is written to disk.
- Configure `self.actions` to contain only `SummarizeAndReplyAction`, so the agent simply summarizes the user input and responds.

Conceptually, you can think of it as: *"The agent receives a message, runs a single Summarize-and-Reply step, and returns a friendly answer."*

In [1]:
import sys
import os

# Add the agent framework code to our system path so we can import it
notebook_dir = os.getcwd()
notebook_dir = os.path.join(notebook_dir, "TACC_exAI")
if notebook_dir not in sys.path:
    sys.path.insert(0, notebook_dir)

# import agent framework
from TACC_exAI.agent import Agent
from TACC_exAI.actions.summarize_and_reply import SummarizeAndReplyAction

# Instantiate the agent as a single-turn chatbot:
agent = Agent(
    experiment=True,           # run one non-interactive cycle
    experiment_prompt=None,    # we'll set the prompt manually below
    force_ollama=True,         # use our local ollama backend as our llm inferencing provider
    default_action_model_name="qwen3-coder:30b-20k",  # set the model name to use
    isolated_session=True,     # do not load logs of other chats
    no_save=True,              # do not save current conversation to chat history logs
    display_mode="light",      # configure console display color pallete
    mode="dev",                # verbose logging outputs including prompts
)

# The agent will run each action listed in this array in series when agent.run() is called
# Note: we need to give our action a reference to the agent so it can read/write to agent 
# runtime variables
agent.actions = [SummarizeAndReplyAction(agent=agent)]

# Give the agent a simple prompt: ask for a dad joke.
agent.experiment_prompt = (
    "Tell me a science-themed dad joke about the experiment below:"
    "Deep in the forest, a team of ambitious squirrels launched the world’s first acorn-based computing experiment. Using hollowed acorns as data nodes and twigs for circuitry, the rodents organized a network that stored information in nut patterns rather than binary code. Their system processed “bit-nuts” faster than expected—especially when powered by leftover winter energy reserves. Researchers observed the squirrels debugging by gnawing on faulty connections, proving nature’s resourcefulness knows no bounds. The Acorn Computer 1.0 might not browse the web, but it has successfully cracked the code of woodland ingenuity."
)

# Run the agent once and capture the reply.
response = agent.run()
print("\nAgent response:")
print(response)


Agent response:
Why don't squirrels ever need computer repairs? Because they just gnaw on the faulty connections! Though I suppose that's why their Acorn Computer 1.0 only runs at 300 MHz... it's powered by leftover winter energy reserves and 'bit-nuts'! The debugging process is so inefficient that even the acorn's CPU has to take a break and nibble on the power supply for energy.


In this configuration:
- `agent.actions` contains only `SummarizeAndReplyAction`, so the agent’s pipeline is a single step: summarize the input and respond to the user.
- Because `experiment=True`, `agent.run()` processes a single prompt (stored in `agent.experiment_prompt`) and then returns the final assistant reply instead of entering an interactive loop.
- Using `isolated_session=True` and `no_save=True` prevents any previous or future sessions from influencing this run.

The effect is a simple, well-contained chatbot that behaves similarly to the structured joke generator you built in Tutorial 1, but now implemented on top of a general-purpose agent framework.

## 2. Planning and Tool Use Agent

Now we will enable more of the agent framework’s features. Instead of executing a single fixed action, the agent will:

1. **Create a plan** using structured generation that describes a sequence of actions to accomplish the user’s request.
2. **Run the plan**, invoking the appropriate tools (actions) in order. Revises plan on Action Failures

We will configure:
- `self.actions` (the pipeline) to include `CreatePlanAction` followed by `RunPlanAction`.
- `self.available_actions` (the tool set) to include:
  - `SummarizeAndReplyAction`: summarize the messages in the agent's context and compose a new message to the user.
  - `GenerateCodeAction`: write Python code to solve the problem, execute it, and report the execution outputs.

In [2]:
from actions.create_plan import CreatePlanAction
from actions.run_plan import RunPlanAction
from actions.generate_code import GenerateCodeAction

# Instantiate another agent configured for planning + tool use.
plan_agent = Agent(
    experiment=True,           # run one non-interactive cycle
    experiment_prompt=None,    # we'll set the prompt manually below
    force_ollama=True,         # use our local ollama backend as our llm inferencing provider
    default_action_model_name="qwen3-coder:30b",  # set the model name to use
    isolated_session=True,     # do not load logs of other chats
    no_save=True,              # do not save current conversation to chat history logs
    display_mode="light",      # configure console display color pallete
    mode="dev",                # verbose logging outputs including prompts
)

# Configure the pipeline actions: first create a plan, then run it.
plan_agent.actions = [
    CreatePlanAction(agent=plan_agent),#, tracer=None),
    RunPlanAction(agent=plan_agent),#, tracer=None),
]

# Restrict the available tools the plan can choose from.
plan_agent.available_actions = [
    SummarizeAndReplyAction(),
    GenerateCodeAction(),
]

# User task: write and reason about analysis code.
plan_agent.experiment_prompt = (
    "I want you to first write code that"
    " reports the speed of searching a random 1000 object test array using two different methods."
    " After writing the code, message me with a final report summarizing the code output benchmark"
    " results and a theory for why one method is faster."
    
)

plan_response = plan_agent.run()

### How the planning flow works

With this configuration, a typical execution sequence looks like:

1. **CreatePlanAction** inspects the prompt and generates a structured plan: for example, steps like *"call GenerateCodeAction with these instructions"* followed by *"call SummarizeAndReplyAction with the results"*.
2. **RunPlanAction** executes each step in order, handing off context (such as generated code or intermediate outputs) between actions.
3. **GenerateCodeAction** produces the requested Python code, executes it in an isolated environment, revises errors, and reports the console outputs back to the agent.
4. **SummarizeAndReplyAction** summarizes the code outputs and responds with a user-friendly summary as the final answer.

In later steps of this tutorial, you will extend this pattern by enabling additional actions such as **GenerateSlurmScriptAction** so that the agent can not only write analysis code but also author complete Slurm job scripts that you can submit on your HPC system.

You now have an end-to-end workflow where an AI agent plans, generates, and refines Slurm job scripts for your HPC workloads. In the next tutorial, you can extend this pattern to more complex agents that analyze job outputs, adapt parameters, or orchestrate multi-stage HPC pipelines.

## 3. Generating Slurm scripts

This section demonstrates how the AI agent can both author and operationalize compute tasks on an HPC system. Below we will configure the agent to both write and test code and generate an HPC Slurm job script that runs that code on multiple nodes with GPU resources.

The agent will:

1) Write and test a short Python script that records the node hostname and hardware logging output from the node it runs on.

2) Generate a Slurm job script that would launch this Python code on multiple nodes, one task per node, and then aggregate the results.



In [1]:
import sys
import os

# Add the agent framework code to our system path so we can import it
notebook_dir = os.getcwd()
notebook_dir = os.path.join(notebook_dir, "TACC_exAI")
if notebook_dir not in sys.path:
    sys.path.insert(0, notebook_dir)

# import agent framework
from TACC_exAI.agent import Agent
from TACC_exAI.actions.summarize_and_reply import SummarizeAndReplyAction
from actions.create_plan import CreatePlanAction
from actions.run_plan import RunPlanAction
from actions.generate_code import GenerateCodeAction

from TACC_exAI.actions.generate_slurm_script import GenerateSlurmScriptAction


# Initialize the agent
agent = Agent(
    experiment=True,
    experiment_prompt=None,
    force_ollama=False,
    # force_ollama=True,
   # default_action_model_name="llama3.2",  # long context model
    isolated_session=True,
    no_save=True,
    display_mode="light",
    mode="dev",
    
)

# Configure toolset: enable code generation and slurm generation
agent.available_actions = [
    GenerateCodeAction(agent=agent),
    GenerateSlurmScriptAction(agent=agent),
    SummarizeAndReplyAction(agent=agent),
]

# Configure pipeline: plan creation and plan execution
agent.actions = [
    CreatePlanAction(agent=agent),
    RunPlanAction(agent=agent),
]

# Set the experiment prompt
agent.experiment_prompt = (
    "First, write and test a Python script that saves the hostname and the output "
    "of some general system statistics using psutils to a local file named `node_info_<hostname>.txt` "
    " and prints that information to the console which. "
    "Then, create a Slurm job script that runs this Python code across 2 nodes "
    "(1 task per node), requesting GPUs appropriately. "
    "After all tasks finish, combine all the generated output files "
    "into one file named `combined_node_info.txt`."
)

# Run the agent and capture output
response = agent.run()
print("\nAgent response:")
print(response)



Agent response:
Here is the Slurm job script to execute the Python code across 2 nodes and combine results:

```bash
#!/bin/bash
#SBATCH --job-name=system_stats
#SBATCH --nodes=2
#SBATCH --ntasks=2
#SBATCH --tasks-per-node=1
#SBATCH --gres=gpu:1
#SBATCH --output=slurm_output_%j.log

# Load Python environment
module load python/3.9

# Run Python script on each node
srun python node_info_script.py

# Combine output files
cat node_info_*.txt > combined_node_info.txt

echo "Combined system statistics saved to combined_node_info.txt"
```

To use:
1. Save this as `submit_job.slurm`
2. Submit with: `sbatch submit_job.slurm`
3. Check progress in `slurm_output_<jobid>.log`

The script requests 2 nodes with 1 GPU each, executes the Python script in parallel, and automatically combines the output files after completion. Adjust GPU allocation (`--gres=gpu:1`) based on your cluster's configuration if needed.


## Slurm Script Chaining for Multi-Job Workflows

In this section, we task our agent to handle a more complex, multi‑stage workflow that combines code authoring, Slurm script generation, and iterative job submission.  

We will instruct the agent to:

1. **Write a Python script** that computes digits of π to high precision using the Chudnovsky algorithm (or similar).  
   - The script will store results in a file, e.g. `pi_digits.txt`.  
   - On each run, it will detect how many digits are already present and extend the file by another 10,000 digits.  
   - After completion, it will report how many digits are now stored.

2. **Create a Slurm job submission script** to run this computation efficiently on an HPC system.  
   - The job requests reasonable CPU, memory, and walltime resources.  
   - All standard output and error streams will be captured in log files (e.g., `pi_job.out`, `pi_job.err`).

3. **Write a Python “launch” script** that uses `sbatch` to queue jobs iteratively.  
   - Each job will sit in the queue, but begin only after the previous one finishes (using Slurm’s dependency feature).  
   - This allows automated, chained computation expansions (e.g., 10,000 → 20,000 → 30,000 digits) that are common in model training and simulation workflows.  
   - Because we plan to test and launch this script manually, we ask the agent to bypass its normal code execution test. To achieve this, we wrap the runnable logic in a **guarded flag block**—defaulting to `True`—which users can later toggle to enable actual execution.

The code cell below initializes the agent, configures its available toolset and planning pipeline, and then defines a detailed prompt specifying the desired behavior. Once executed, the agent will design and output all three scripts—each adapted for HPC use via Slurm.

> **Tip:** The final “launch” script demonstrates how to **chain Slurm jobs automatically** to build cumulative results without manual re‑submission—a powerful automation pattern for iterative HPC workloads.

In [1]:
import sys
import os

# Add the agent framework code to our system path so we can import it
notebook_dir = os.getcwd()
notebook_dir = os.path.join(notebook_dir, "TACC_exAI")
if notebook_dir not in sys.path:
    sys.path.insert(0, notebook_dir)

# import agent framework
from TACC_exAI.agent import Agent
from TACC_exAI.actions.summarize_and_reply import SummarizeAndReplyAction
from actions.create_plan import CreatePlanAction
from actions.run_plan import RunPlanAction
from actions.generate_code import GenerateCodeAction

from TACC_exAI.actions.generate_slurm_script import GenerateSlurmScriptAction


# Initialize the agent
agent = Agent(
    experiment=True,
    experiment_prompt=None,
    force_ollama=False,
    # force_ollama=True,
    default_action_model_name="DeepSeek-V3-0324",  # long context model
    isolated_session=True,
    no_save=True,
    display_mode="light",
    mode="dev",
    
)

# Configure toolset: enable code generation and slurm generation
agent.available_actions = [
    GenerateCodeAction(agent=agent),
    GenerateSlurmScriptAction(agent=agent),
    SummarizeAndReplyAction(agent=agent),
]

# Configure pipeline: plan creation and plan execution
agent.actions = [
    CreatePlanAction(agent=agent),
    RunPlanAction(agent=agent),
]

# Set the experiment prompt
agent.experiment_prompt = (
    f"Write three scripts:\n"
    f"1. **A Python script** that computes pi to 10,000 digits (using a high‑precision method such as "
    f"Chudnovsky), reads any previously computed digits from a file (e.g., `pi_digits.txt`) if it exists,"
    f" extends the total by 10,000 digits beyond what is already stored, saves the updated digits back "
    f"to the file, and prints how many digits are now stored before exiting.\n"
    f"2. **A Slurm script** that submits this Python script as a job, requests reasonable CPU, memory, "
    f"and walltime, and writes stdout and stderr to log files (e.g., `pi_job.out` and `pi_job.err`), "
    f" exiting when the Python script finishes.\n"
    f"3. **A Python launch script** that submits the Slurm script **iteratively** using sbatch to queue "
    f"successive runs so that each job waits in the slurm queue but starts only after the previous job "
    f" finishes, adding 10,000 more digits of pi with each run (from 10,000 → 20,000 → 30,000), without "
    f"checking the file contents; assume each run simply appends 10,000 more digits on top of the prior "
    f"result. Use sbatch features to accomplish the slurm script chaining.\n"
    f"Note: Write the final script with a flag default to true that skips the whole script and tells "
    f"the user to edit the script to flip it to false. Within the skipped section, give the script "
    f"a CLI to ingest the path to the slurm and python scripts, I'll run it later."
)

# Run the agent and capture output
response = agent.run()
print("\nAgent response:")
print(response)


Agent response:
Here is the generated code that solved the user's task:

```python
import sys
import subprocess
import os
import argparse

# Flag to skip execution by default
SKIP_EXECUTION = True

if SKIP_EXECUTION:
    print("This script is skipped by default. To run it, edit the script and set SKIP_EXECUTION = False.")
    print("Within this skipped section, you can use the CLI to provide paths to the Slurm and Python scripts.")
    sys.exit(0)

# CLI for paths
def parse_args():
    parser = argparse.ArgumentParser(description="Submit Slurm jobs iteratively to compute pi digits.")
    parser.add_argument("--slurm_script", type=str, required=True, help="Path to the Slurm script.")
    parser.add_argument("--python_script", type=str, required=True, help="Path to the Python script for pi computation.")
    return parser.parse_args()

def submit_job(slurm_script, python_script, dependency_job_id=None):
    command = ["sbatch"]
    if dependency_job_id:
        command.extend(["--depend