# Remora / VibeCheck quickstart

Try a preset vision-language model with optional Triton surgery and batched generation via `VibeCheckModel`.

## Environment setup

Uncomment the cell below if you need to install dependencies in a fresh runtime.

In [None]:
# Optional: install remora with VLMEval extras.
# !pip install -e ".[eval]"
# Or minimal install without extras:
# !pip install -e .


## Load model and wrap with VibeCheck

Configure your preset, device, and whether to apply Triton surgery. Presets shipped here: `molmo-7b`, `smolvlm-base`, `qwen2.5-vl-7b`. On CPU-only machines, set `apply_surgery = False` if Triton is unavailable.

In [None]:
import torch

from remora.integration import VibeCheckModel
from remora.models import load_model_and_tokenizer
from remora.surgery import hijack_model

# Core knobs
preset = "smolvlm-base"  # options: molmo-7b, smolvlm-base, qwen2.5-vl-7b
device = "cuda" if torch.cuda.is_available() else "cpu"
apply_surgery = True  # set False to keep stock nn.Linear layers

# Queue behavior for batching sequential calls
batch_size = 4
flush_ms = 10.0
max_queue = 64

print(f"Loading preset '{preset}' on {device}...")
model, tokenizer = load_model_and_tokenizer(preset, device=device)

if apply_surgery:
    print("Applying Triton surgery to Linear layers...")
    hijack_model(model)
else:
    print("Skipping Triton surgery (using stock nn.Linear layers).")

vibe = VibeCheckModel(
    model=model,
    tokenizer=tokenizer,
    batch_size=batch_size,
    flush_ms=flush_ms,
    max_queue=max_queue,
)
print("VibeCheck is ready.")


## Text-only generation

Send a simple prompt through the batching wrapper. Adjust `max_new_tokens` or other generation kwargs as needed.

In [None]:
prompt = "Describe how Triton-accelerated linear layers can speed up VLM inference."
text_only = vibe.generate(prompt, max_new_tokens=64)
print(text_only)


## Vision + text generation (optional)

Provide an image path to include vision context. Leave `image_path = None` to skip.

In [None]:
from PIL import Image

image_path = None  # e.g., "samples/cat.png"

if image_path:
    image = Image.open(image_path).convert("RGB")
    vision_prompt = "Describe the image in one short paragraph."
    with torch.inference_mode():
        vision_out = vibe.generate(vision_prompt, image=image, max_new_tokens=64)
    print(vision_out)
else:
    print("Set image_path to a file on disk to run vision + text generation.")


## Cleanup

Stop the worker thread and clear queued requests.

In [None]:
vibe.close()
