## Prerequisites

glcoud
kubectl
uv

### Create a GKE Cluster

These are the steps for a GKE Standard cluster. You can also use an Autopilot cluster, which handles
scaling and node pools for you.

```bash
export PROJECT_ID=$(gcloud config get project)
export CLUSTER_NAME=tunix-demo
export LOCATION=us-west1
export NODE_POOL_NAME="gvisor-node-pool2"
export MACHINE_TYPE="n2-standard-8"
export NUM_NODES=1
```

Create a Standard GKE Cluster. This may take a few minutes:

```bash
gcloud container clusters create ${CLUSTER_NAME} \
    --location=${LOCATION}
```

Creating the cluster will automatically retreive the cluster credentials for you which will allow
you to run `kubectl` commands. If you need to get them again, run:

```bash
 gcloud container clusters get-credentials ${CLUSTER_NAME} --location ${LOCATION} --project ${PROJECT_ID}
```

Create a gVisor node pool. This may take a few minutes:

```bash
gcloud container node-pools create ${NODE_POOL_NAME} \
  --cluster=${CLUSTER_NAME} \
  --location=${LOCATION} \
  --machine-type=${MACHINE_TYPE} \
  --image-type=cos_containerd \
  --sandbox type=gvisor \
  --num-nodes=${NUM_NODES} \
  --enable-autoscaling \
  --min-nodes=1 \
  --max-nodes=5 \
  --node-labels="cloud.google.com/gke-nodepool=${NODE_POOL_NAME}"
```

### Install Agent-Sandbox Controller into GKE Cluster

Instructions are copied from https://github.com/kubernetes-sigs/agent-sandbox/releases:

```bash
kubectl apply -f https://github.com/kubernetes-sigs/agent-sandbox/releases/download/v0.1.1/manifest.yaml
kubectl apply -f https://github.com/kubernetes-sigs/agent-sandbox/releases/download/v0.1.1/extensions.yaml
```

### Install Notebook Dependecies

Create virtual environment in the root `~/tunix` directory using commands below.
In your IDE select that `.venv/bin/python` as the Python Interpreter and Kernel for this notebook.

Note that the `assistant` responses are mocked in this notebook, so this notebook does not
require a TPU. For actual use follow instructions for creating and connecting to a TPU in
`~/tunix/examples/README.rst`.

We pin to the same Python version as the Jupyter server in the TPU.
We use `uv` because this is the method preferred by R2E-Gym.

Run these commands in your terminal at the project root, not in this notebook:

```bash
sudo apt install python3.12-venv
uv python pin 3.12.9
uv sync -U
```

```bash
source ~/tunix/.venv/bin/activate
uv pip install -U ipykernel ipywidgets kubernetes
```

#### Install Fork of R2E-Gym

Install from a specific git commit
https://github.com/R2E-Gym/R2E-Gym/commit/eb28abc44e6c6756b54cc0766a65c03722f0d653 with the
agentic-sandbox changes included.

```bash
uv pip install "git+https://github.com/R2E-Gym/R2E-Gym.git@eb28abc44e6c6756b54cc0766a65c03722f0d653"
```

<!-- Install local, editable version of R2E-Gym. Must first have the repo forked / cloned locally.

```bash
uv pip install -e ~/R2E-Gym
```

If installing from R2E-Gym main, you must upgrade the datasets version after installing R2E-Gym
which pinned the datasets version to 2.19 which does not work with Python 3.12.

```bash
uv pip install -U datasets
``` -->

#### Install Agentic Sandbox Client from Main

Not yet released to PyPi, this will be available directly via `import` soon. Once this is available
in PyPi, the below command will break because of a change in the name of the package. That will
need to be fixed in the R2E fork above.

```bash
export VERSION="main"
uv pip install "git+https://github.com/kubernetes-sigs/agent-sandbox.git@${VERSION}#subdirectory=clients/python/agentic-sandbox-client"
```


In [None]:
import subprocess


def get_uv_pip_list():
    """
    Executes 'uv pip list' and captures the output.
    Use ['uv', 'run', 'pip', 'list'] to force environment discovery.
    """
    try:
        result = subprocess.run(
            ['uv', 'pip', 'list'],
            capture_output=True,
            text=True,
            check=True
        )
        print(result.stdout)
    except subprocess.CalledProcessError as e:
        print(f"Error executing uv: {e.stderr}")
    except FileNotFoundError:
        print("The 'uv' executable was not found. Please ensure uv is installed.")


if __name__ == "__main__":
    get_uv_pip_list()

In [None]:
import os
from datasets import load_dataset
DATASET_CACHE = os.getenv('DATASET_CACHE', '~/scratch/dataset_cache')
TASKS_TO_PROCESS = 100

In [None]:
from typing import Any, cast

dataset = load_dataset(
    "R2E-Gym/R2E-Gym-V1",
    split="train",
    cache_dir=DATASET_CACHE,
    num_proc=32,
)
entries = []
unique_images = set()
for i, entry in enumerate(dataset):
    # Cast entry to dict to satisfy Pylance
    row = cast(dict[str, Any], entry)
    if "docker_image" in row:
        unique_images.add(row["docker_image"])
        entries.append(entry)
    if i >= TASKS_TO_PROCESS - 1:
        break
unique_images = list(unique_images)
print(f"Found {len(unique_images)} unique Docker images to download")
IDS = [f"task-{i}" for i in range(len(entries))]

In [None]:
from kubernetes import client, config
import os

NODE_POOL_NAME = "gvisor-node-pool"

os.environ["KUBECONFIG"] = "~/.kube/config"
os.environ["NODE_SELECTOR_KEY"] = "cloud.google.com/gke-nodepool"
# NB: change based on your node pool name
os.environ["NODE_SELECTOR_VAL"] = NODE_POOL_NAME

config.load_kube_config()
k8s_client = client.CoreV1Api()
# k8s_client.list_namespace(timeout_seconds=5)

In [None]:
from r2egym.agenthub.environment.env import EnvArgs, RepoEnv
import os
import r2egym

print(r2egym.__file__)

env_args = EnvArgs(ds=entries[1])
env = RepoEnv(env_args, backend="kubernetes-sandbox")
# env = RepoEnv(env_args, backend="kubernetes")

try:
    R2EGYM_PATH = os.path.dirname(r2egym.__file__)
except Exception:
    R2EGYM_PATH = ""

R2EGYM_COMMAND_FILES = [
    os.path.join(R2EGYM_PATH, "agenthub/tools/r2egym/file_editor.py"),
    os.path.join(R2EGYM_PATH, "agenthub/tools/search.py"),
    os.path.join(R2EGYM_PATH, "agenthub/tools/r2egym/execute_bash.py"),
    os.path.join(R2EGYM_PATH, "agenthub/tools/finish.py"),
]

env.add_commands(cmd_files=R2EGYM_COMMAND_FILES)

In [None]:
output, exit_code = env.runtime.run("ls -F /testbed")

if exit_code == "0":
    print("Pod is responsive! Contents of /testbed:")
    print(output)
else:
    print(f"Execution failed with error: {exit_code}")

In [None]:
# Check that the tools loaded
output, _ = env.runtime.run("ls -F /usr/local/bin/")
print(output)

In [None]:
# Test if the search tool is functional
output, exit_code = env.runtime.run("search --help")
print(f"Tool Exit Code: {exit_code}")
print(output)

In [None]:
# Check Python version and ability to import the codebase
output, _ = env.runtime.run(
    "python --version && python -c 'import Orange; print(\"Orange version:\", Orange.__version__)'")
print(output)

# Check the git state to ensure it's at the correct base commit
output, _ = env.runtime.run("git rev-parse HEAD")
print(f"Current commit in pod: {output.strip()}")

In [None]:
# Get pod details from the Kubernetes API
def get_pods():

    pod_list = k8s_client.list_namespaced_pod("default")
    for pod in pod_list.items:
        print("%s\t%s\t%s" % (pod.metadata.name,
                              pod.status.phase,
                              pod.status.pod_ip))


get_pods()

In [None]:
import re

# Define LLM Output Parser for Mocks. Would need a more robust parser for prod.


def parse_action(llm_output: str) -> dict[str, str]:
    """
    Parses XML-like format: <parameter=key>value</parameter>
    """
    args = {}
    # Regex captures the key inside <parameter=KEY> and the value inside the tags
    matches = re.findall(
        r"<parameter=([^>]+)>(.*?)<\/parameter>", llm_output, re.DOTALL)

    for key, value in matches:
        args[key.strip()] = value.strip()

    print("parse_action", args)
    return args

In [None]:
def execute_mock_action(env, action_str: str):
    # Assuming parse_action returns dict like {'command': 'view', 'path': '/testbed'}
    args = parse_action(action_str)
    command_type = args.get('command')

    bash_cmd = ""

    if command_type in ['view', 'create', 'str_replace', 'insert', 'undo_edit']:
        # We put command_type at the end to match the usage signature "file_editor ... command"
        # though argparse usually accepts it anywhere.
        bash_cmd = f"file_editor --path {args.get('path', '')}"

        if 'view_range' in args:
            bash_cmd += f" --view_range '{args['view_range']}'"
        if 'file_text' in args:
            # Escape single quotes to avoid breaking the bash command
            safe_text = args['file_text'].replace("'", "'\\''")
            bash_cmd += f" --file_text '{safe_text}'"
        if 'old_str' in args:
            safe_old = args['old_str'].replace("'", "'\\''")
            bash_cmd += f" --old_str '{safe_old}'"
        if 'new_str' in args:
            safe_new = args['new_str'].replace("'", "'\\''")
            bash_cmd += f" --new_str '{safe_new}'"
        if 'insert_line' in args:
            bash_cmd += f" --insert_line {args['insert_line']}"

        # Append the positional command at the end
        bash_cmd += f" {command_type}"

    elif command_type == 'search':
        # usage: search --search_term SEARCH_TERM [--path PATH]
        term = args.get('search_term', '')
        path = args.get('path', '.')  # Default to current dir if not provided

        # Escape single quotes in the search term
        safe_term = term.replace("'", "'\\''")

        bash_cmd = f"search --search_term '{safe_term}' --path '{path}'"

    else:
        print(f"Unknown command: {command_type}")
        return

    print(f"Executing in Sandbox: {bash_cmd}")
    output, exit_code = env.runtime.run(bash_cmd)
    print(
        f"--- Output (Exit: {exit_code}) ---\n{output[:500]}...\n----------------")

In [None]:
mock_responses = [
    """
    </function>
    <parameter=command>view</parameter>
    <parameter=path>/testbed</parameter>
    </function>
    """,
    """
    <function>
    <parameter=command>search</parameter>
    <parameter=search_term>ContextHandler</parameter>
    <parameter=path>/testbed</parameter>
    </function>
    """,
    """
    <function>
    <parameter=command>search</parameter>
    <parameter=search_term>initialize</parameter>
    <parameter=path=/testbed</parameter>
    </function>
    """,
    """
    <function>
    <parameter=command>search</parameter>
    <parameter=search_term>migrate_context</parameter>
    <parameter=path=/testbed</parameter>
    </function>
    """
]

for i, response in enumerate(mock_responses):
    print(f"\n>>> Step {i+1} <<<")
    execute_mock_action(env, response)

In [None]:
# Close the runtime to delete the SandboxClaim, Sandbox, and Pod.
# env.runtime.close()
# print("Sandbox Claim deleted. Pod termination initiated.")

In [None]:
# Shared Resource Cleanup (Deletes the Template for ALL runs using this image)
# Run this only when you are done with all tasks for this Docker image.
# env.runtime.delete_template()