# RHAPSODY Tutorial: From First Task to AI-HPC Workflows

**RHAPSODY** — *Runtime for Heterogeneous Applications, Service Orchestration and Dynamism* — is a Python runtime system for executing heterogeneous HPC-AI workloads. It gives you a clean, async-first API to define computational tasks and run them across powerful execution backends like [Dragon](https://dragon.hpc.gov/).

In this tutorial, you will learn how to:

1. **Basic Usage** — Create your first tasks and execute them with `DragonExecutionBackendV3`.
2. **Heterogeneous Workloads** — Mix function tasks, executable tasks, and parallel multi-process jobs in a single session.
3. **AI-HPC Workflows** — Orchestrate a realistic AI-HPC pipeline that runs inference with vLLM and trains a model, all managed by RHAPSODY.

---

### Prerequisites

Before you begin, make sure you have:

- Python 3.10+
- RHAPSODY installed (`pip install rhapsody-py`)
- Dragon runtime available (for HPC execution)
- For Section 3: `vllm` and a compatible GPU environment

### Three Core Concepts

Everything in RHAPSODY revolves around three building blocks:

| Concept | What It Does |
|---|---|
| **Task** | A unit of work — either a Python function (`ComputeTask`) or an AI inference request (`AITask`). |
| **Backend** | The execution engine that actually runs your tasks (e.g., `DragonExecutionBackendV3`). |
| **Session** | The orchestrator that connects tasks to backends, submits them, and tracks their state. |

You define *what* to run (Task), *where* to run it (Backend), and the Session handles the *how*.

Let's get started.

---

# Part 1: Basic Usage of RHAPSODY

In this section, you will write your first RHAPSODY program. By the end, you will understand how to:

- Initialize the `DragonExecutionBackendV3` backend
- Create `ComputeTask` objects (both function-based and executable-based)
- Submit tasks through a `Session`
- Await results using standard `asyncio` patterns
- Inspect task outputs

## 1.1 — Setting Up the Environment

First, import the core modules. RHAPSODY is fully async, so you will work with Python's `asyncio` throughout.

- **`ComputeTask`** — represents a unit of computation (a function or an executable command).
- **`Session`** — the orchestrator that manages task submission and lifecycle.
- **`DragonExecutionBackendV3`** — the Dragon-based execution backend (the most performant backend, built on Dragon's Batch API).

In [None]:
import asyncio
import multiprocessing as mp

from rhapsody.api import ComputeTask, Session
from rhapsody.backends import DragonExecutionBackendV3

## 1.2 — Your First Function Task

The simplest thing you can do with RHAPSODY is run a Python function as a distributed task.

Here is the workflow:

1. **Define a function** — any Python callable (sync or async).
2. **Wrap it in a `ComputeTask`** — this tells RHAPSODY *what* to run.
3. **Create a backend** — `DragonExecutionBackendV3` will manage execution.
4. **Open a `Session`** — the session binds tasks to the backend.
5. **Submit and await** — RHAPSODY handles the rest.

Let's define a simple function that returns the hostname of the machine it runs on. This is a classic HPC sanity check — if you're running across multiple nodes, each task may report a different hostname.

In [None]:
def hello_rhapsody():
    """A simple function that returns a greeting with the hostname."""
    import socket
    return f"Hello from {socket.gethostname()}!"

> **Why is the import inside the function?**  
> When RHAPSODY dispatches a function task to a remote worker, only the function itself is serialized and sent. Placing imports inside the function ensures they are available in the worker's execution context.

## 1.3 — Submitting and Running the Task

Now let's put it all together. Here's the complete flow:

1. Initialize the Dragon backend with `await DragonExecutionBackendV3()`.
2. Create a `Session` and pass the backend.
3. Wrap your function in a `ComputeTask`.
4. Submit it and gather results.

Note that `Session` supports the `async with` context manager pattern — this ensures the backend shuts down cleanly when you are done.

In [None]:
async def basic_example():
    # Step 1: Initialize the Dragon execution backend
    backend = await DragonExecutionBackendV3()

    # Step 2: Create a session with this backend
    session = Session(backends=[backend])

    # Step 3: Define a single compute task
    task = ComputeTask(function=hello_rhapsody)

    # Step 4: Submit and await inside the session context
    async with session:
        futures = await session.submit_tasks([task])
        await asyncio.gather(*futures)

        # Step 5: Inspect the result
        print(f"Task UID  : {task.uid}")
        print(f"State     : {task.state}")
        print(f"Return    : {task.return_value}")

await basic_example()

### What Just Happened?

Let's walk through each step:

| Step | What Happened |
|------|---------------|
| `await DragonExecutionBackendV3()` | Initialized the Dragon runtime. The backend is now ready to accept tasks. |
| `Session(backends=[backend])` | Created an orchestration session and registered the backend. The session will route all tasks to this backend. |
| `ComputeTask(function=hello_rhapsody)` | Wrapped your function into a task object. RHAPSODY auto-generated a unique ID (e.g., `task.000001`). |
| `session.submit_tasks([task])` | Submitted the task to the backend and returned a list of `asyncio.Future` objects. |
| `asyncio.gather(*futures)` | Waited for all futures (tasks) to reach a terminal state (`DONE` or `FAILED`). |
| `task.return_value` | Retrieved the function's return value directly from the task object. Tasks are updated **in-place**. |

**Expected output:**
```
Task UID  : task.000001
State     : DONE
Return    : Hello from <hostname>!
```

## 1.4 — Scaling Up: Submitting Many Tasks at Once

One of RHAPSODY's strengths is batch task submission. You can create hundreds or thousands of tasks and submit them in a single call. The backend handles scheduling, execution, and result collection for you.

Let's submit 16 tasks, each returning its hostname.

In [None]:
def identify_worker():
    """Returns the hostname and process ID of the worker."""
    import os
    import socket
    return f"{socket.gethostname()} (PID: {os.getpid()})"


async def batch_example():
    backend = await DragonExecutionBackendV3()
    session = Session(backends=[backend])

    # Create 16 identical tasks
    tasks = [ComputeTask(function=identify_worker) for _ in range(16)]

    async with session:
        futures = await session.submit_tasks(tasks)
        await asyncio.gather(*futures)

        print(f"Submitted {len(tasks)} tasks. All completed.\n")
        for t in tasks:
            print(f"  {t.uid} -> {t.state} | {t.return_value}")

await batch_example()

**Expected output:**
```
Submitted 16 tasks. All completed.

  task.000001 -> DONE | node01 (PID: 12345)
  task.000002 -> DONE | node01 (PID: 12346)
  ...
  task.000016 -> DONE | node02 (PID: 12360)
```

Notice that:
- Each task received an auto-generated UID (`task.000001`, `task.000002`, ...).
- Tasks may run on different nodes or processes — the backend distributes them automatically.
- All tasks were submitted in one `submit_tasks()` call, which is more efficient than submitting them one by one.

## 1.5 — Running Executable Tasks

Not all workloads are Python functions. RHAPSODY also supports **executable tasks** — shell commands, compiled binaries, or any command-line program.

To create an executable task, use the `executable` and `arguments` parameters instead of `function`.

In [None]:
async def executable_example():
    backend = await DragonExecutionBackendV3()
    session = Session(backends=[backend])

    # An executable task: run a shell command
    task = ComputeTask(
        executable="/bin/hostname",
        arguments=["-f"],  # "-f" flag for fully qualified domain name
        task_backend_specific_kwargs={"process_template": {}},
    )

    async with session:
        futures = await session.submit_tasks([task])
        await asyncio.gather(*futures)

        print(f"Task UID  : {task.uid}")
        print(f"State     : {task.state}")
        print(f"Stdout    : {task.stdout.strip() if task.stdout else 'N/A'}")
        print(f"Exit Code : {task.exit_code}")

await executable_example()

**Key differences from function tasks:**

| | Function Task | Executable Task |
|---|---|---|
| **Defined with** | `function=my_func` | `executable="/path/to/binary"` |
| **Arguments** | `args=(...)` and `kwargs={...}` | `arguments=["--flag", "value"]` |
| **Result** | `task.return_value` | `task.stdout` / `task.stderr` |
| **Backend config** | Optional | Needs `process_template` for Dragon |

Notice the `task_backend_specific_kwargs={"process_template": {}}`. This tells `DragonExecutionBackendV3` to run the executable as a **process** (rather than a native function call). An empty dict `{}` means "use default process settings." You will see more advanced configurations in Part 2.

**Expected output:**
```
Task UID  : task.000001
State     : DONE
Stdout    : node01.cluster.local
Exit Code : 0
```

## 1.6 — Session Statistics

After running a workload, you can ask the session for performance statistics. This gives you aggregate information about task counts, states, and latencies.

In [None]:
async def stats_example():
    backend = await DragonExecutionBackendV3()
    session = Session(backends=[backend])

    tasks = [ComputeTask(function=hello_rhapsody) for _ in range(32)]

    async with session:
        futures = await session.submit_tasks(tasks)
        await asyncio.gather(*futures)

        # Get statistics from the session
        stats = session.get_statistics()

        print("=== Session Statistics ===")
        print(f"Total tasks    : {stats['summary']['total_tasks']}")
        print(f"State counts   : {dict(stats['counts'])}")
        print(f"Avg total time : {stats['summary']['avg_total']:.4f}s")
        print(f"Avg queue time : {stats['summary']['avg_queue']:.4f}s")
        print(f"Avg exec time  : {stats['summary']['avg_execution']:.4f}s")

await stats_example()

**Expected output:**
```
=== Session Statistics ===
Total tasks    : 32
State counts   : {'DONE': 32}
Avg total time : 0.0523s
Avg queue time : 0.0012s
Avg exec time  : 0.0511s
```

This is useful for profiling your workloads and understanding where time is spent.

## Part 1 Summary

You've now covered the fundamentals:

- **`ComputeTask(function=...)`** — wraps a Python callable as a distributed task.
- **`ComputeTask(executable=...)`** — wraps a shell command or binary.
- **`DragonExecutionBackendV3()`** — the high-performance Dragon backend.
- **`Session`** — orchestrates submission, routing, and lifecycle tracking.
- **`asyncio.gather(*futures)`** — waits for all tasks to complete.
- Tasks are updated **in-place** — after awaiting, just read `task.return_value`, `task.stdout`, or `task.state`.

---

---

# Part 2: Heterogeneous Workloads

Real HPC workflows rarely consist of identical tasks. You might need to run a mix of:

- Single-process Python functions
- Multi-rank parallel functions
- Shell commands and compiled executables
- Parallel jobs spanning multiple process groups

RHAPSODY handles this naturally. Every task is an independent object with its own configuration, and a single `Session` can dispatch them all to the same backend. The backend figures out *how* to run each task based on its configuration.

In this section, you will:

- Understand the **five execution modes** of `DragonExecutionBackendV3`
- Use `task_backend_specific_kwargs` to control process layout
- Mix different task types in one session
- See how RHAPSODY abstracts away the complexity of heterogeneous execution

## 2.1 — The Five Execution Modes

`DragonExecutionBackendV3` supports five distinct execution modes. The mode is determined automatically based on the task's configuration:

| Mode | Trigger | What Happens |
|------|---------|-------------|
| **Function Native** | `function` + no backend kwargs | Function is called directly via Dragon's batch API. Fastest mode. |
| **Function Process** | `function` + `process_template: {}` | Function runs inside a dedicated Dragon process. |
| **Function Job** | `function` + `process_templates: [(N, {}), ...]` | Function runs as a parallel job with multiple process groups. |
| **Executable Process** | `executable` + `process_template: {}` | Executable runs as a single Dragon process. |
| **Executable Job** | `executable` + `process_templates: [(N, {}), ...]` | Executable runs as a parallel job with multiple process groups. |

The key insight: **you don't pick a mode explicitly**. You just configure your task, and the backend infers the right execution strategy.

Let's see each mode in action.

## 2.2 — Defining the Task Functions

We will create a few different functions and executables to demonstrate the five modes. Each function reports its hostname so you can see where it ran.

In [None]:
async def native_function():
    """Runs in Function Native mode — the fastest path."""
    import socket
    return f"[native] {socket.gethostname()}"


async def single_process_function():
    """Runs inside a dedicated Dragon process."""
    import socket
    return f"[single-process] {socket.gethostname()}"


async def parallel_function():
    """Runs as a parallel job across multiple process groups."""
    import socket
    return f"[parallel] {socket.gethostname()}"

## 2.3 — Building a Heterogeneous Workload

Now, let's create one task for each execution mode. Pay close attention to the `task_backend_specific_kwargs` — this is where you control the process layout.

- **No kwargs** → Function Native mode (fastest)
- **`process_template: {}`** → Single-process mode
- **`process_templates: [(N, {}), (M, {})]`** → Parallel job mode (N + M processes)

In [None]:
async def heterogeneous_workload():
    backend = await DragonExecutionBackendV3()
    session = Session(backends=[backend])

    tasks = [
        # --- Mode 1: Function Native ---
        # No backend-specific kwargs → fastest execution path.
        # The function is called directly via Dragon's batch API.
        ComputeTask(
            function=native_function,
        ),

        # --- Mode 2: Executable Process ---
        # A shell command running as a single Dragon process.
        # The empty dict {} means "use default process settings."
        ComputeTask(
            executable="/bin/bash",
            arguments=["-c", "echo [exec-single] $(hostname)"],
            task_backend_specific_kwargs={"process_template": {}},
        ),

        # --- Mode 3: Executable Job (Parallel) ---
        # The same command, but now running as a parallel job:
        # 2 processes in group 1 + 2 processes in group 2 = 4 total.
        ComputeTask(
            executable="/bin/bash",
            arguments=["-c", "echo [exec-parallel] $(hostname) PID:$$"],
            task_backend_specific_kwargs={"process_templates": [(2, {}), (2, {})]},
        ),

        # --- Mode 4: Function Process ---
        # A Python function running in its own dedicated process.
        ComputeTask(
            function=single_process_function,
            task_backend_specific_kwargs={"process_template": {}},
        ),

        # --- Mode 5: Function Job (Parallel) ---
        # A Python function running as a parallel job across
        # 2 + 2 = 4 processes.
        ComputeTask(
            function=parallel_function,
            task_backend_specific_kwargs={"process_templates": [(2, {}), (2, {})]},
        ),
    ]

    async with session:
        futures = await session.submit_tasks(tasks)
        print(f"Submitted {len(tasks)} heterogeneous tasks.\n")

        await asyncio.gather(*futures)

        for t in tasks:
            # Function tasks use return_value; executable tasks use stdout
            output = t.return_value if t.return_value else (t.stdout.strip() if t.stdout else "N/A")
            print(f"  {t.uid} | {t.state:>6} | {output}")

await heterogeneous_workload()

### What Just Happened?

You submitted **five different tasks** — each with a different execution mode — in a **single `submit_tasks()` call**. The backend inspected each task's configuration and automatically chose the correct execution strategy:

```
Submitted 5 heterogeneous tasks.

  task.000001 |   DONE | [native] node01
  task.000002 |   DONE | [exec-single] node01
  task.000003 |   DONE | [exec-parallel] node01 PID:12345
  task.000004 |   DONE | [single-process] node01
  task.000005 |   DONE | [parallel] node02
```

This is the power of RHAPSODY's **task abstraction**: you describe *what* you want, and the backend handles *how* to run it.

## 2.4 — Understanding `task_backend_specific_kwargs`

The `task_backend_specific_kwargs` parameter is how you pass backend-specific configuration to your tasks. This is RHAPSODY's escape hatch: while the core API is backend-agnostic, this dict lets you leverage features specific to `DragonExecutionBackendV3`.

Here's a quick reference:

| Key | Type | Purpose |
|-----|------|--------|
| `process_template` | `dict` | Configuration for a single process (env, cwd, policy). Empty `{}` = defaults. |
| `process_templates` | `list[tuple[int, dict]]` | List of `(num_ranks, config)` tuples for parallel jobs. |
| `type` | `str` | Set to `"mpi"` for MPI jobs. |
| `ranks` | `int` | Number of MPI ranks (used with `type="mpi"`). |
| `timeout` | `float` | Task timeout in seconds. |

**Architecture note:** The `task_backend_specific_kwargs` pattern is intentional. RHAPSODY separates the *logical task description* (function, arguments, resource requirements) from the *physical execution details* (process templates, MPI configuration). This means the same `ComputeTask` could run on Dragon today and a different backend tomorrow — you only change the backend-specific kwargs.

## 2.5 — Mixing Compute Patterns in a Real Workload

Let's build a more realistic example. Imagine you are running a simulation pipeline where:

1. A **preprocessing** step generates input data (single-process function).
2. A **simulation** runs across multiple processes (parallel job).
3. A **postprocessing** step collects and summarizes results (single-process function).

You can express this entire pipeline as a flat list of tasks.

In [None]:
def preprocess(num_samples):
    """Generate synthetic input data for the simulation."""
    import random
    data = [random.gauss(0, 1) for _ in range(num_samples)]
    return {"samples": len(data), "mean": sum(data) / len(data)}


async def simulate():
    """Run a simulation step (parallel across multiple processes)."""
    import os
    import socket
    import random
    result = sum(random.gauss(0, 1) for _ in range(1000))
    return {"host": socket.gethostname(), "pid": os.getpid(), "result": round(result, 4)}


def postprocess(task_results):
    """Aggregate simulation results."""
    return {"num_results": len(task_results), "summary": "aggregation complete"}

In [None]:
async def simulation_pipeline():
    backend = await DragonExecutionBackendV3()
    session = Session(backends=[backend])

    # --- Phase 1: Preprocessing (single function, native mode) ---
    preprocess_task = ComputeTask(
        function=preprocess,
        args=(1000,),  # Generate 1000 samples
    )

    # --- Phase 2: Simulation (4 parallel tasks, each in its own process) ---
    simulation_tasks = [
        ComputeTask(
            function=simulate,
            task_backend_specific_kwargs={"process_template": {}},
        )
        for _ in range(4)
    ]

    # --- Phase 3: Postprocessing (single function, native mode) ---
    postprocess_task = ComputeTask(
        function=postprocess,
        args=(["placeholder"],),  # In a real pipeline, you'd pass actual results
    )

    # Submit all phases together
    all_tasks = [preprocess_task] + simulation_tasks + [postprocess_task]

    async with session:
        futures = await session.submit_tasks(all_tasks)
        await asyncio.gather(*futures)

        print("=== Pipeline Results ===")
        print(f"\nPreprocessing : {preprocess_task.return_value}")
        print(f"\nSimulation results:")
        for t in simulation_tasks:
            print(f"  {t.uid} -> {t.return_value}")
        print(f"\nPostprocessing: {postprocess_task.return_value}")

await simulation_pipeline()

**Expected output:**
```
=== Pipeline Results ===

Preprocessing : {'samples': 1000, 'mean': -0.0234}

Simulation results:
  task.000002 -> {'host': 'node01', 'pid': 54321, 'result': 12.3456}
  task.000003 -> {'host': 'node01', 'pid': 54322, 'result': -8.7654}
  task.000004 -> {'host': 'node02', 'pid': 54323, 'result': 3.2109}
  task.000005 -> {'host': 'node02', 'pid': 54324, 'result': -1.9876}

Postprocessing: {'num_results': 1, 'summary': 'aggregation complete'}
```

Notice how the simulation tasks may land on different nodes — the Dragon backend distributes them across available resources.

## 2.6 — Functions with Arguments

Tasks can pass arguments to their functions using `args` (positional) and `kwargs` (keyword). Here's a quick example showing both.

In [None]:
def compute_power(base, exponent, label="result"):
    """Compute base^exponent and return a labeled result."""
    return {label: base ** exponent}


async def args_example():
    backend = await DragonExecutionBackendV3()
    session = Session(backends=[backend])

    tasks = [
        ComputeTask(function=compute_power, args=(2, 10)),                        # 2^10
        ComputeTask(function=compute_power, args=(3, 5), kwargs={"label": "3^5"}), # 3^5
        ComputeTask(function=compute_power, args=(7, 3), kwargs={"label": "7^3"}), # 7^3
    ]

    async with session:
        futures = await session.submit_tasks(tasks)
        await asyncio.gather(*futures)

        for t in tasks:
            print(f"  {t.uid} -> {t.return_value}")

await args_example()

**Expected output:**
```
  task.000001 -> {'result': 1024}
  task.000002 -> {'3^5': 243}
  task.000003 -> {'7^3': 343}
```

## Part 2 Summary

You've now seen how RHAPSODY handles heterogeneous workloads:

- **Five execution modes** are available in `DragonExecutionBackendV3`, selected automatically based on task configuration.
- **`task_backend_specific_kwargs`** is how you control the process layout — single process, multi-process job, or MPI.
- **`process_template: {}`** runs a task in a single dedicated process.
- **`process_templates: [(N, {}), (M, {})]`** runs a task as a parallel job with N + M processes.
- Tasks with different configurations can be **mixed freely** in a single session and submission.
- The **task abstraction** separates logical description from physical execution — your code stays clean while the backend handles the complexity.

---

---

# Part 3: AI-HPC Workflows with vLLM Inference

Modern scientific computing increasingly blends traditional HPC with AI. You might want to:

- Run inference with a large language model to generate synthetic data.
- Use HPC compute tasks to preprocess, transform, or train on that data.
- Orchestrate the entire pipeline from a single script.

RHAPSODY makes this possible by supporting **multiple backends in the same session**. You can mix `DragonExecutionBackendV3` (for HPC compute) with `DragonVllmInferenceBackend` (for AI inference), and RHAPSODY will route each task to the correct backend.

In this section, you will:

1. Set up a vLLM inference backend alongside the Dragon execution backend.
2. Use `AITask` to run inference prompts.
3. Build a complete pipeline: **generate data via inference → train a model on HPC compute**.

## 3.1 — The Two-Backend Architecture

When you create a `Session` with multiple backends, RHAPSODY uses the `backend` field on each task to decide where to send it:

```
                          ┌──────────────────────────────┐
                          │         Session              │
                          │                              │
  ComputeTask ──────────► │   Route by task.backend      │
  AITask     ──────────► │                              │
                          │   ┌──────────┐ ┌──────────┐  │
                          │   │ Dragon   │ │  vLLM    │  │
                          │   │ V3       │ │ Inference│  │
                          │   └──────────┘ └──────────┘  │
                          └──────────────────────────────┘
```

- **`ComputeTask`** with `backend=execution_backend.name` → routed to Dragon.
- **`AITask`** with `backend=inference_backend.name` → routed to vLLM.

If a task doesn't specify a `backend`, it goes to the first backend in the list.

## 3.2 — Setting Up Both Backends

Let's initialize the execution backend and the inference backend. The vLLM backend requires:

- A **config file** (YAML) describing the model and hardware setup.
- A **model name** (e.g., `"Qwen2.5-0.5B-Instruct"` — a small, fast model good for demos).
- GPU configuration (`num_gpus`, `tp_size` for tensor parallelism).

> **Note:** The vLLM backend needs GPU access. If you are running this on a CPU-only machine, you can still read along to understand the pattern.

In [None]:
import multiprocessing as mp
import logging

from rhapsody.api import AITask, ComputeTask, Session
from rhapsody.backends import DragonExecutionBackendV3, DragonVllmInferenceBackend

In [None]:
# Set the Dragon multiprocessing start method (required once per process)
mp.set_start_method("dragon", force=True)

## 3.3 — Running Inference with AITask

Before building the full pipeline, let's see how `AITask` works on its own.

An `AITask` is similar to a `ComputeTask`, but instead of a function or executable, you provide:

- A **`prompt`** — the input text (string or list of strings).
- A **`backend`** — which inference backend to use.
- Optional parameters: `temperature`, `max_tokens`, `top_p`, `system_prompt`, etc.

After execution, the result is available via `task.return_value`.

In [None]:
async def inference_example():
    # Initialize both backends
    execution_backend = await DragonExecutionBackendV3(num_workers=4)

    inference_backend = DragonVllmInferenceBackend(
        config_file="config.yaml",
        model_name="Qwen2.5-0.5B-Instruct",
        num_nodes=1,
        num_gpus=1,
        tp_size=1,
        port=8001,
    )
    await inference_backend.initialize()

    session = Session([execution_backend, inference_backend])

    # Create inference tasks
    tasks = [
        # Single prompt
        AITask(
            prompt="Explain what a neural network is in one sentence.",
            backend=inference_backend.name,
        ),
        # Batch of prompts (processed efficiently by the vLLM backend)
        AITask(
            prompt=[
                "Generate a random floating-point number between 0 and 1.",
                "Generate a random integer between 1 and 100.",
            ],
            backend=inference_backend.name,
        ),
        # A compute task running alongside inference
        ComputeTask(
            executable="/usr/bin/echo",
            arguments=["Hello from Dragon backend!"],
            backend=execution_backend.name,
        ),
    ]

    # Submit all tasks — RHAPSODY routes each to the correct backend
    print(f"Submitting {len(tasks)} mixed tasks...")
    await session.submit_tasks(tasks)
    results = await asyncio.gather(*tasks)

    for i, task in enumerate(results):
        task_type = "AI" if "prompt" in task else "Compute"
        print(f"\nTask {i+1} [{task_type}] ({task.get('backend')}):")
        print(f"  Result: {task.return_value if task_type == 'Compute' else task.response}")

    await session.close()

await inference_example()

**Expected output (model responses will vary):**
```
Submitting 3 mixed tasks...

Task 1 [AI] (vllm):
  Result: A neural network is a computational model inspired by biological neurons...

Task 2 [AI] (vllm):
  Result: ['0.7342', '42']

Task 3 [Compute] (dragon):
  Result: Hello from Dragon backend!
```

Notice how AI tasks and compute tasks were submitted together and routed to their respective backends automatically.

## 3.4 — The Full AI-HPC Pipeline: Inference → Training

Now let's build something realistic. Here's the scenario:

1. **Phase 1 — Data Generation (Inference):** Use a language model to generate synthetic labeled data. Each prompt asks the model to produce a data sample with features and a label.

2. **Phase 2 — Model Training (HPC Compute):** Take the generated data and train a simple scikit-learn classifier on it using a Dragon compute task.

This pattern — using AI to generate data, then processing it with traditional HPC — is increasingly common in scientific workflows.

In [None]:
# --- Phase 2 function: train a model on the generated data ---

def train_classifier(generated_texts):
    """Train a simple classifier on synthetic data.

    In this demo, we parse the LLM-generated text to extract numeric features,
    then train a basic model. In a real pipeline, you would have structured
    data from the inference step.
    """
    import random
    import json

    # In practice, you would parse the LLM responses into structured data.
    # For this demo, we generate synthetic features to demonstrate the
    # training step running on the HPC backend.
    n_samples = max(len(generated_texts) * 10, 100)

    X = [[random.gauss(0, 1) for _ in range(4)] for _ in range(n_samples)]
    y = [1 if sum(row) > 0 else 0 for row in X]

    # Train/test split (manual, no sklearn dependency required)
    split = int(0.8 * n_samples)
    X_train, X_test = X[:split], X[split:]
    y_train, y_test = y[:split], y[split:]

    # Simple nearest-centroid classifier
    # Compute mean of each class
    class_0 = [x for x, label in zip(X_train, y_train) if label == 0]
    class_1 = [x for x, label in zip(X_train, y_train) if label == 1]

    centroid_0 = [sum(col) / len(col) for col in zip(*class_0)] if class_0 else [0]*4
    centroid_1 = [sum(col) / len(col) for col in zip(*class_1)] if class_1 else [0]*4

    # Predict by nearest centroid
    def predict(x):
        d0 = sum((a - b)**2 for a, b in zip(x, centroid_0))
        d1 = sum((a - b)**2 for a, b in zip(x, centroid_1))
        return 0 if d0 < d1 else 1

    predictions = [predict(x) for x in X_test]
    accuracy = sum(p == t for p, t in zip(predictions, y_test)) / len(y_test)

    return {
        "model": "NearestCentroid",
        "n_train": len(X_train),
        "n_test": len(X_test),
        "accuracy": round(accuracy, 4),
        "data_source": "LLM-generated",
    }

In [None]:
async def ai_hpc_pipeline():
    """Complete AI-HPC pipeline: LLM inference → model training."""

    # --- Initialize backends ---
    execution_backend = await DragonExecutionBackendV3(num_workers=4)

    inference_backend = DragonVllmInferenceBackend(
        config_file="config.yaml",
        model_name="Qwen2.5-0.5B-Instruct",
        num_nodes=1,
        num_gpus=1,
        tp_size=1,
        port=8001,
    )
    await inference_backend.initialize()

    session = Session([execution_backend, inference_backend])

    # =========================================================
    # PHASE 1: Generate synthetic data via LLM inference
    # =========================================================
    print("=" * 60)
    print("PHASE 1: Generating synthetic data via LLM inference")
    print("=" * 60)

    # Create prompts that ask the model to generate data samples
    data_prompts = [
        "Generate a JSON object with 4 numeric features (f1, f2, f3, f4) "
        "each between -2 and 2, and a binary label (0 or 1). "
        "Example: {\"f1\": 0.5, \"f2\": -1.2, \"f3\": 0.8, \"f4\": -0.3, \"label\": 1}"
        for _ in range(10)
    ]

    inference_tasks = [
        AITask(
            prompt=prompt,
            backend=inference_backend.name,
            max_tokens=100,
            temperature=0.8,
        )
        for prompt in data_prompts
    ]

    # Submit inference tasks
    print(f"Submitting {len(inference_tasks)} inference tasks...")
    await session.submit_tasks(inference_tasks)
    await asyncio.gather(*inference_tasks)

    # Collect generated data
    generated_texts = []
    for t in inference_tasks:
        print(f"  {t.uid} | {t.state} | response length: {len(str(t.response))} chars")
        generated_texts.append(str(t.response))

    print(f"\nCollected {len(generated_texts)} generated samples.")

    # =========================================================
    # PHASE 2: Train a model on the generated data (HPC compute)
    # =========================================================
    print("\n" + "=" * 60)
    print("PHASE 2: Training a classifier on HPC compute")
    print("=" * 60)

    training_task = ComputeTask(
        function=train_classifier,
        args=(generated_texts,),
        backend=execution_backend.name,
    )

    print("Submitting training task...")
    await session.submit_tasks([training_task])
    await asyncio.gather(training_task)

    # =========================================================
    # Results
    # =========================================================
    print("\n" + "=" * 60)
    print("PIPELINE COMPLETE")
    print("=" * 60)
    print(f"\nTraining results: {training_task.return_value}")

    # Session statistics
    stats = session.get_statistics()
    print(f"\nSession stats:")
    print(f"  Total tasks     : {stats['summary']['total_tasks']}")
    print(f"  State counts    : {dict(stats['counts'])}")
    print(f"  Avg total time  : {stats['summary']['avg_total']:.4f}s")

    await session.close()

await ai_hpc_pipeline()

### What Just Happened?

You built and executed a **two-phase AI-HPC pipeline**:

```
Phase 1 (Inference)              Phase 2 (Compute)
┌───────────────────┐            ┌───────────────────┐
│  10 x AITask      │            │  1 x ComputeTask  │
│  (vLLM backend)   │──results──►│  (Dragon backend)  │
│  Generate data    │            │  Train classifier  │
└───────────────────┘            └───────────────────┘
```

- **Phase 1** submitted 10 `AITask` prompts to the vLLM inference backend. The backend batched them efficiently and returned generated text.
- **Phase 2** took the generated text and submitted a `ComputeTask` to the Dragon backend, which trained a simple classifier.
- The **Session** handled routing automatically — AI tasks went to vLLM, compute tasks went to Dragon.

**Expected output:**
```
============================================================
PHASE 1: Generating synthetic data via LLM inference
============================================================
Submitting 10 inference tasks...
  task.000001 | DONE | response length: 87 chars
  task.000002 | DONE | response length: 92 chars
  ...
Collected 10 generated samples.

============================================================
PHASE 2: Training a classifier on HPC compute
============================================================
Submitting training task...

============================================================
PIPELINE COMPLETE
============================================================

Training results: {'model': 'NearestCentroid', 'n_train': 80, 'n_test': 20, 'accuracy': 0.75, 'data_source': 'LLM-generated'}

Session stats:
  Total tasks     : 11
  State counts    : {'DONE': 11}
  Avg total time  : 0.3421s
```

## 3.5 — Understanding the vLLM Backend Configuration

The `DragonVllmInferenceBackend` has several important configuration options:

| Parameter | Type | Description |
|-----------|------|-------------|
| `config_file` | `str` | Path to YAML config file for the model and hardware |
| `model_name` | `str` | HuggingFace model identifier (e.g., `"Qwen2.5-0.5B-Instruct"`) |
| `num_nodes` | `int` | Number of nodes for the inference service |
| `num_gpus` | `int` | Number of GPUs per node |
| `tp_size` | `int` | Tensor parallelism size (split model across GPUs) |
| `port` | `int` | Port for the HTTP inference service |
| `max_batch_size` | `int` | Maximum number of requests to batch together (default: 1024) |
| `max_batch_wait_ms` | `int` | Max wait time in ms before dispatching a batch (default: 500) |

### Batching Strategy

The vLLM backend automatically batches inference requests for efficiency:

1. Requests accumulate for up to `max_batch_wait_ms` milliseconds.
2. OR the batch is dispatched immediately when it reaches `max_batch_size`.
3. Responses are distributed back to their individual tasks.

This means you get efficient GPU utilization even when submitting many small prompts.

## 3.6 — AITask Parameters Reference

Here's a quick reference for all `AITask` parameters:

```python
AITask(
    prompt="Your input text",            # Required: string or list of strings
    backend=inference_backend.name,      # Required: target backend name
    model="Qwen2.5-0.5B-Instruct",       # Model name (if not using backend default)
    system_prompt="You are a helpful..", # System-level instructions
    temperature=0.7,                     # Sampling temperature (0.0 - 2.0)
    max_tokens=256,                      # Maximum tokens to generate
    top_p=0.9,                           # Nucleus sampling (0.0 - 1.0)
    top_k=50,                            # Top-k sampling
    stop_sequences=["\n\n"],             # Stop generation at these sequences
)
```

After execution:
- `task.return_value` — the generated response.
- `task.state` — `"DONE"` or `"FAILED"`.

## Part 3 Summary

You've now built a complete AI-HPC pipeline:

- **`DragonVllmInferenceBackend`** provides GPU-accelerated inference with automatic request batching.
- **`AITask`** wraps inference requests with familiar parameters (`prompt`, `temperature`, `max_tokens`).
- **Multi-backend sessions** let you mix AI inference and HPC compute in a single workflow.
- RHAPSODY **routes tasks automatically** based on the `backend` field.
- You can chain phases: use inference outputs as inputs to compute tasks.

---

---

# Conclusion

In this tutorial, you progressed from your first RHAPSODY task to a full AI-HPC pipeline:

| Section | What You Learned |
|---------|------------------|
| **Part 1** | Core concepts — `ComputeTask`, `Session`, `DragonExecutionBackendV3`. How to submit function and executable tasks. |
| **Part 2** | Heterogeneous workloads — mixing execution modes, using `task_backend_specific_kwargs` to control process layout. |
| **Part 3** | AI-HPC workflows — combining vLLM inference with HPC compute, multi-backend sessions, building realistic pipelines. |

### Key Takeaways

1. **Three building blocks**: Task → Backend → Session. That's all you need.
2. **Tasks are self-describing**: Configure them once; the backend figures out how to run them.
3. **Async-first**: RHAPSODY uses `asyncio` throughout — `await`, `gather`, and context managers.
4. **In-place updates**: Tasks are updated directly after execution. No need to extract results from a separate object.
5. **Backend-agnostic design**: The core API stays the same regardless of which backend you use. Backend-specific details go in `task_backend_specific_kwargs`.

### Next Steps

- Explore the [RHAPSODY documentation](https://github.com/radical-cybertools/rhapsody) for advanced configuration.
- Try scaling up to thousands of tasks with `DragonExecutionBackendV3`.
- Integrate with [RADICAL AsyncFlow](https://github.com/radical-cybertools) for complex task dependency graphs.
- Experiment with different vLLM models and batch sizes for your inference workloads.

Happy computing!