In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os
os.chdir("..")

In [3]:
import torch
import numpy as np
from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM
from tqdm.auto import tqdm
from datasets import load_dataset
from cluster_intrep_repo.utils import initialize_tokenizer, tokenize_blocksworld_generation, THINK_TOKEN, THINK_START_TOKEN



os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

compute_dtype = torch.bfloat16
device   = 'cuda:2'
model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"

In [4]:
tokenizer = initialize_tokenizer(model_id)

In [5]:
blocksworld_type = "6-blocks"

dataset = load_dataset(f"dmitriihook/deepseek-r1-qwen-32b-planning-{blocksworld_type}")["train"]

In [6]:
model     = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=compute_dtype, attn_implementation="sdpa", 
                                                device_map="auto")

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

In [7]:
item = dataset[0]
generation = item["generation"]
tokens = tokenize_blocksworld_generation(tokenizer, item,)
tokens.shape

torch.Size([1, 3909])

In [8]:
import warnings
warnings.simplefilter('ignore')

In [10]:
text = ""

for line_n, line in enumerate(generation.split("\n\n")):
    text = text + line + "\n\n"
    if line_n < 20 or len(line) < 50:
        continue

    print(line)
    print("==========="*3)
    print()

    _text = text + "Now, the stacks are:\n\n"

    tokens = tokenize_blocksworld_generation(tokenizer, item, _text)[:, :-1]

    with torch.no_grad():
        new_generation = model.generate(tokens.to(device), do_sample=False, max_new_tokens=50, pad_token_id=tokenizer.eos_token_id, attention_mask=torch.ones_like(tokens).to(device))  

    new_text = tokenizer.decode(new_generation[0][tokens.shape[1]:], skip_special_tokens=True)

    print(new_text)
    print("----------"*3)
    print()

    with torch.no_grad():
        hidden_states = model(tokens.to(device), output_hidden_states=True)

    hidden_states = [x[0][-10:].cpu().to(torch.float16).numpy() for x in hidden_states.hidden_states]

    # print(hidden_states[-1].shape)
    # # breal

- F is on the table, with B on top, D on top of B, and C on top of D. So, stack F-B-D-C.

- F (table) → B → D → C
- A (table)
- E (table)

Goal stacks:

- D (table) → B → C → E → A → F

Wait, no. Because in the goal,
------------------------------

- D on the table? Wait, no. Because in the goal, D is under B, which is under C, which is under E, which is under A, which is under F.

- D is the base, then B, then C, then E, then A, then F.

So, to get there, I need to move F from the table to on top of A, but A is on the table. So,
------------------------------

Wait, no, the goal says F is on top of A, which is on E, which is on C, which is on B, which is on D. So, D is the base, then B, then C, then E, then A, then F.

Initial:
- F (table) → B → D → C
- A (table)
- E (table)

Goal:
- D (table) → B → C → E → A → F

So, I need to move F from
------------------------------

But initially, D is on top of B, which is on F. So, to get D to be under B, I need to move B and D.

- F → B → D → 

In [None]:
print(tokenizer.decode(tokens[0]))

<｜begin▁of▁sentence｜><｜User｜>I am playing with a set of blocks where I need to arrange the blocks into stacks. Here are the actions I can do

Pick up a block
Unstack a block from on top of another block
Put down a block
Stack a block on top of another block

I have the following restrictions on my actions:
I can only pick up or unstack one block at a time.
I can only pick up or unstack a block if my hand is empty.
I can only pick up a block if the block is on the table and the block is clear. A block is clear if the block has no other blocks on top of it and if the block is not picked up.
I can only unstack a block from on top of another block if the block I am unstacking was really on top of the other block.
I can only unstack a block from on top of another block if the block I am unstacking is clear.
Once I pick up or unstack a block, I am holding the block.
I can only put down a block that I am holding.
I can only stack a block on top of another block if I am holding the block being

In [None]:
print(new_text[len(_text):])

D, Block D is on top of Block B, Block A is on the table, Block E is on the table and Block F is on the table.
My goal is for the following to be true: Block A is on top of Block E, Block B is on top of Block D, Block C is on top of Block B, Block E is on top of Block C and Block F is on top of Block A.

My plan is as follows:<｜Assistant｜>Okay, so I have this block-stacking problem to solve. Let me try to figure out the steps needed to get from the initial state to the goal state. I'll take it step by step because I'm still getting the hang of these kinds of problems.

First, let me understand the initial conditions. The blocks are A, B, C, D, E, F. The hand is empty. The clear blocks are A, C, and E. That means these blocks have nothing on top of them and are on the table or on another block that's clear.

Looking at the initial setup:

- Block B is on top of F.
- Block C is on top of D.
- Block D is on top of B.
- So, the stack from bottom to top is F, B, D, C.
- Blocks A and E are o