# Maze Environment Visualization and Model Demo

This notebook demonstrates how to:
1. Load the Maze environment configuration.
2. Generate examples from the environment.
3. Visualize the raw prompts sent to the model.
4. Run the model to see its formatted output.

In [1]:
import yaml
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from environments.maze_env import MazeEnvironment

# 1. Load Configuration
config_path = '../configs/config_maze.yaml'
with open(config_path, 'r') as f:
    config = yaml.safe_load(f)

print(f"Loaded config for environment: {config['environment']['name']}")

  from .autonotebook import tqdm as notebook_tqdm


Loaded config for environment: maze


In [2]:
# 2. Initialize Environment and Dataset
env = MazeEnvironment(config)
dataset = env.get_dataset(config)

print(f"Dataset generated with {len(dataset)} examples.")

Generating Maze dataset with config: MazeConfig(min_dist=5, max_dist=10, min_grid_size=5, max_grid_size=10, seed=42, size=500)


Map: 100%|██████████| 500/500 [00:00<00:00, 5819.17 examples/s]

Dataset generated with 500 examples.





In [9]:
# 3. Visualize a Sample
import random

idx = random.randint(0, len(dataset)-1)
sample = dataset[idx]

print(f"--- Sample Index: {idx} ---\n")
print("System Prompt:")
print(sample['prompt'][0]['content'])
print("\nUser Prompt (The Maze):")
print(sample['prompt'][1]['content'])
print("\nGround Truth Answer:")
print(sample['answer'])

--- Sample Index: 124 ---

System Prompt:
You are a helpful assistant. You must output your reasoning steps within <think></think> tags and your final answer within <answer></answer> tags.
            You goal is to find the shortest path from start to end, you can move steps in passage, while avoiding the Walls.

User Prompt (The Maze):
Navigate from ':' (start) to 'Q' (goal):

```
mmmmmmmm
m=m=:==m
m===m=mm
m=m====m
m=m==m=m
m=m==mmm
mmm==Q=m
mmmmmmmm
```
Legend: 'm' = Wall, '=' = Passage

What is the minimum number of steps to reach the goal?
Give only the number of steps as your final answer, no other text or formatting.

Ground Truth Answer:
8


In [4]:
# 4. Load Model (Optional - can skip if no GPU or just want to see data)
print("Loading model...")
model_name ="/home/rlvr_experiments/outputs/maze-sft"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    torch_dtype='auto', 
    device_map='auto'
)
print("Model loaded.")

Loading model...
Model loaded.


In [5]:
sample['prompt']

[{'content': 'You are a helpful assistant. You must output your reasoning steps within <think></think> tags and your final answer within <answer></answer> tags.\n            You goal is to find the shortest path from start to end, you can move steps in passage, while avoiding the Walls.',
  'role': 'system'},
 {'content': "Navigate from 'g' (start) to 'V' (goal):\n\n```\niiiiiiiiii\ni>>>>>>i>i\ni>>>>>iiii\ni>>>>>>>>i\ni>>>>>>>>i\ni>i>>ii>>i\ni>>>V>iiii\ni>>>i>>>ii\ni>i>i>i>gi\niiiiiiiiii\n```\nLegend: 'i' = Wall, '>' = Passage\n\nWhat is the minimum number of steps to reach the goal?\nGive only the number of steps as your final answer, no other text or formatting.",
  'role': 'user'}]

In [11]:
# 5. Run Inference
def generate_response(prompt_messages):

    text = tokenizer.apply_chat_template(prompt_messages, tokenize=False, add_generation_prompt=True)
    text = text.replace("Give only the number of steps as your final answer, no other text or formatting.","")

    print(text)
    inputs = tokenizer(text, 
    return_tensors='pt').to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs, 
            max_new_tokens=config['generation']['max_completion_length'],
            do_sample=True,
        )
    
    # Decode only the new tokens
    response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=False)
    return response

print("Running inference on the sample above...\n")
response = generate_response(sample['prompt'])

print("--- Model Response ---")
print(response)


Running inference on the sample above...

<|im_start|>system
You are a helpful assistant. You must output your reasoning steps within <think></think> tags and your final answer within <answer></answer> tags.
            You goal is to find the shortest path from start to end, you can move steps in passage, while avoiding the Walls.<|im_end|>
<|im_start|>user
Navigate from ':' (start) to 'Q' (goal):

```
mmmmmmmm
m=m=:==m
m===m=mm
m=m====m
m=m==m=m
m=m==mmm
mmm==Q=m
mmmmmmmm
```
Legend: 'm' = Wall, '=' = Passage

What is the minimum number of steps to reach the goal?
<|im_end|>
<|im_start|>assistant

--- Model Response ---
<think>To find the minimum number of steps required to reach the goal, we need to consider all possible paths between the start and goal positions.</th>
<answer>4</answer></think><answer>5</answer><|im_end|>


In [7]:
sample.keys()

dict_keys(['question', 'answer', 'metadata', 'prompt'])

In [8]:
sample["answer"]

'6'