<a href="https://colab.research.google.com/github/hieunguyen7337/Qwen_LLM_RL/blob/main/Hangman_game_RL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer

In [2]:
model_name = "Qwen/Qwen3-0.6B"

In [None]:
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

In [4]:
# prepare the model input
prompt = """You are playing a game of Hangman.

The word has 6 letters.
The current state is: _ _ _ _ _ _
Incorrect guesses remaining: 6
Guessed letters: [ ]

Your task is to guess one letter. Output a single letter at the end."""
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)


thinking content: <think>
Okay, let's see. I need to figure out the correct letter to guess in this Hangman game. The word has 6 letters, and the current state is that there are four underscores, and the remaining guesses are 6. The guessed letters are empty. So the goal is to determine which letter to guess next.

Wait, but the problem says "output a single letter at the end". Hmm, maybe I need to pick a letter that's not guessed yet, not in the wrong position, and not part of the wrong letters. Let me think.

The initial state shows that there are four underscores. So the word has 6 letters, and the underscores are the letters that haven't been guessed yet. So the possible letters to guess could be any of the 6 letters that haven't been guessed yet, but also not in the wrong positions. Wait, but the problem says that the current state is "_ _ _ _ _ _", so maybe the underscores are the letters that need to be filled in. The remaining guesses are 6. So the user has 6 guesses left, but 

In [5]:
# prepare the model input
prompt = """You are playing a game of Hangman.

The word has 6 letters.
The current state is: P _ _ _ O _
Incorrect guesses remaining: 5
Guessed letters: [E, P, O]

Your task is to either guess one character or guess the full word.
Format your response as follows:
- To guess a character, output a single letter preceded by "CHARACTER:"  at the end (e.g., "A").
- To guess the full word, output the word preceded by "WORD:" at the end (e.g., "WORD: PYTHON").

Your guess:"""
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)


thinking content: 
content: <think>
Okay, let's see. I need to figure out the answer to this Hangman game. The user has provided the current state, the number of incorrect guesses left, and the letters that have been guessed. The word is 6 letters long, and the current state is P _ _ _ O _. So far, the letters guessed are E, P, O. The remaining letters in the word are probably A, N, T, H, and maybe another one? Wait, the current state has P, then underscores, then O. So the letters in the word are P, A, N, T, H, and O? Wait, no, the original word is 6 letters. Let me think again. The current state is P _ _ _ O _. So the letters guessed are P, O. The current state shows that the word has P, then three underscores, then O. So the letters not guessed yet are A, N, T. Because the word is 6 letters. So the word is P A N T H O? Or maybe P A N T O H? Wait, the current state shows O at the end. So the word is 6 letters. Let me count: P, then three underscores, then O. So the word is P followed

In [6]:
model_inputs

{'input_ids': tensor([[151644,    872,    198,   2610,    525,   5619,    264,   1809,    315,
          40775,   1515,    382,    785,   3409,    702,    220,     21,  11931,
            624,    785,   1482,   1584,    374,     25,    393,    716,    716,
            716,    506,  22983,  40468,  60537,   9664,     25,    220,     20,
            198,  16780,  21712,  11931,     25,    508,     36,     11,    393,
             11,    506,   2533,   7771,   3383,    374,    311,   2987,   7942,
            825,   3668,    476,   7942,    279,   2480,   3409,    624,   4061,
            697,   2033,    438,  11017,    510,     12,   2014,   7942,    264,
           3668,     11,   2550,    264,   3175,   6524,  52480,    553,    330,
          15237,  37397,   2974,    220,    518,    279,    835,    320,     68,
           1302,   2572,    330,     32,  38609,     12,   2014,   7942,    279,
           2480,   3409,     11,   2550,    279,   3409,  52480,    553,    330,
           722