# TNAD Colab Runner (GPU)

End-to-end notebook for executing the Tensor Network-Augmented Decoding (TNAD) experiments on Google Colab with CUDA-enabled GPUs. Follow the cells sequentially.

## 1. Runtime Preparation

1. In Colab, switch the runtime to **GPU**: `Runtime → Change runtime type → Hardware accelerator → GPU`.
2. Execute the cells below in order. Installation steps may take a few minutes the first time.

In [None]:
# Verify that a CUDA GPU is visible to PyTorch.
import torch

print(f"PyTorch version : {torch.__version__}")
print(f"CUDA available  : {torch.cuda.is_available()}")
if torch.cuda.is_available():
    gpu = torch.cuda.get_device_properties(0)
    print(f"GPU name        : {gpu.name}")
    print(f"Total memory    : {gpu.total_memory / 1e9:.2f} GB")
else:
    raise EnvironmentError("A CUDA-enabled runtime is required. Switch the Colab runtime to GPU.")

In [None]:
%%capture
!pip install -U pip
!pip install -q accelerate bitsandbytes datasets loguru matplotlib pyyaml seaborn sentencepiece tqdm transformers

## 2. Retrieve the Project

Clones the repository that contains the TNAD source code. Update `REPO_URL` if your fork lives at a different location. If you have already uploaded the files manually, you can skip this cell.

In [None]:
import os
from pathlib import Path

REPO_URL = "https://github.com/your-org/quantum-search-llm.git"  # TODO: replace with your repository URL
REPO_DIR = Path("quantum-search-llm")

if REPO_DIR.exists():
    print(f"Repository already present at {REPO_DIR.resolve()}")
else:
    !git clone --depth 1 {REPO_URL}
    print("Clone complete.")

os.chdir(REPO_DIR)
print(f"Working directory: {Path.cwd()}")

In [None]:
%%capture
# Install the project in editable mode so notebooks can import the TNAD package.
!pip install -e .

## 3. Configure Experiment Parameters

Adjust the variables below to control which model runs, how many benchmark samples are evaluated, and the FGBS hyperparameters. The defaults are chosen to fit comfortably within a Colab T4/A100 session while still exercising the full pipeline.

In [None]:
from pathlib import Path
import yaml

MODEL_NAME = "microsoft/phi-2"          # Use a 2.7B model that fits easily on Colab GPUs.
USE_8BIT = True                         # Requires bitsandbytes (installed above). Keeps VRAM usage low for larger models.
ALPHA = 0.5                             # Fluency vs coherence balance.
BOND_DIM = 16                           # Logical bandwidth χ.
BEAM_WIDTH = 5                          # Number of beams.
NUM_EXAMPLES = 25                       # Evaluate this many samples per benchmark (set to -1 for full dataset).

BASE_CONFIG_PATH = Path("configs/default.yaml")
COLAB_CONFIG_PATH = Path("configs/colab_autogen.yaml")

with BASE_CONFIG_PATH.open("r") as f:
    config = yaml.safe_load(f)

config["model"]["name"] = MODEL_NAME
config["model"]["device"] = "cuda"
config["model"]["load_in_8bit"] = bool(USE_8BIT)
config["model"]["torch_dtype"] = "float16"

config["fgbs"]["alpha"] = float(ALPHA)
config["fgbs"]["bond_dim"] = int(BOND_DIM)
config["fgbs"]["beam_width"] = int(BEAM_WIDTH)

config["experiment"]["num_examples"] = int(NUM_EXAMPLES)

if COLAB_CONFIG_PATH.exists():
    COLAB_CONFIG_PATH.unlink()

with COLAB_CONFIG_PATH.open("w") as f:
    yaml.safe_dump(config, f)

print(f"Wrote Colab-specific configuration to {COLAB_CONFIG_PATH}")

## 4. Run the Reproduction Script on GPU

This cell executes the main experiment driver using the Colab-friendly configuration. Logs and tables will be stored under `paper_results/`.

In [None]:
import subprocess

cmd = [
    "python",
    "experiments/reproduce_paper_results.py",
    "--model", MODEL_NAME,
    "--num_examples", str(NUM_EXAMPLES),
    "--alpha", str(ALPHA),
    "--bond_dim", str(BOND_DIM),
    "--config", str(COLAB_CONFIG_PATH),
]

print("Running command:\n", " ".join(cmd))
subprocess.run(cmd, check=True)

## 5. Inspect Results

The reproduction script saves both raw JSON and formatted tables. The cell below displays the latest run summary.

In [None]:
from pathlib import Path
import json

results_dir = Path("paper_results")
table_files = sorted(results_dir.glob("tables_*.txt"))
if not table_files:
    raise FileNotFoundError("No results found. Ensure the previous cell completed successfully.")

latest_tables = table_files[-1]
print(f"Displaying summary from {latest_tables}")
print(latest_tables.read_text())

json_files = sorted(results_dir.glob("full_results_*.json"))
if json_files:
    latest_json = json_files[-1]
    with latest_json.open("r") as f:
        payload = json.load(f)
    print("\nStored result keys:", list(payload.keys()))
else:
    print("No JSON artifact found.")

## 6. Optional: Interactive Generation

Use TNAD directly for ad-hoc reasoning experiments once the model is loaded into memory.

In [None]:
from tnad import FidelityGuidedBeamSearcher
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.float16,
    device_map="cuda",
    load_in_8bit=bool(USE_8BIT),
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

interactive_searcher = FidelityGuidedBeamSearcher(
    model=model,
    tokenizer=tokenizer,
    beam_width=BEAM_WIDTH,
    alpha=ALPHA,
    bond_dim=BOND_DIM,
)

prompt = "Solve: If a train travels 60 miles in 1.5 hours, what is its average speed in miles per hour?\nAnswer:"
result = interactive_searcher.generate(prompt, max_length=128, min_length=12, return_details=False)
print(result["text"])