# tviz + Tinker RL Training

This notebook shows the key integration points for adding tviz to a Tinker RL training loop.

For the complete working example, see [`examples/rl_loop_with_tviz.py`](https://github.com/sdan/tviz/blob/main/examples/rl_loop_with_tviz.py).

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sdan/tviz/blob/main/examples/tinker_rl.ipynb)

## Installation

```bash
pip install tinker tinker-cookbook tviz
```

## Integration Points

Adding tviz to `tinker_cookbook/recipes/rl_loop.py` requires only 4 changes:

### 1. Import tviz

In [None]:
from tviz import TvizLogger

### 2. Create logger and add to multiplex

After `ml_log.setup_logging()`:

In [None]:
# >>> tviz <<<
tviz_logger = TvizLogger(run_name="math_rl_gsm8k")
tviz_logger.log_hparams(vars(config))
ml_logger.loggers.append(tviz_logger)
# >>> tviz <<<

### 3. Collect and log rollouts

Inside the training loop, collect trajectories:

In [None]:
# >>> tviz <<<
tviz_rollouts: list[dict] = []
# >>> tviz <<<

for group_idx, (sample_futures, prompt_tokens, question, answer) in enumerate(...):
    
    # >>> tviz <<<
    tviz_trajectories: list[dict] = []
    # >>> tviz <<<
    
    for traj_idx, future in enumerate(sample_futures):
        # ... existing sampling code ...
        
        # >>> tviz <<<
        tviz_trajectories.append({
            "trajectory_idx": traj_idx,
            "reward": reward,
            "output_text": content,
            "output_tokens": sampled_tokens,
            "logprobs": sampled_logprobs,
        })
        # >>> tviz <<<
    
    # >>> tviz <<<
    tviz_rollouts.append({
        "group_idx": group_idx,
        "prompt_text": question,
        "trajectories": tviz_trajectories,
    })
    # >>> tviz <<<

# >>> tviz <<<
tviz_logger.log_rollouts(tviz_rollouts, step=batch_idx)
# >>> tviz <<<

### 4. Close the logger

At the end of training:

In [None]:
# >>> tviz <<<
tviz_logger.close()
print(f"View run at: {tviz_logger.get_logger_url()}")
# >>> tviz <<<

## Full Example

See the complete annotated script:

```bash
# Run the example
python examples/rl_loop_with_tviz.py

# Start the dashboard
cd tviz && bun dev
```

Open http://localhost:3003 to see:
- Reward curves over training
- Individual rollout trajectories with rewards
- Token-level logprobs visualization

## Alternative: Using the Tinker Adapter

If you're using `tinker_cookbook.rl.train` with `TrajectoryGroup` objects, use the adapter:

In [None]:
from tviz.adapters.tinker import from_tinker_batch

# After sampling:
# trajectory_groups: list[TrajectoryGroup] = env.step(...)

# Convert and log
rollouts = from_tinker_batch(trajectory_groups, tokenizer=tokenizer)
tviz_logger.log_rollouts(rollouts, step=batch_idx)