# Exploring Qwen3-8B on FDT Decision Theory Problems

This notebook loads Qwen3-8B and runs it on a single problem from the FDT dataset (schema A) with streaming output.

In [1]:
import json
import os

import torch
from dotenv import load_dotenv

from llm import LLM

load_dotenv()

  from .autonotebook import tqdm as notebook_tqdm


True

In [2]:
# Load the FDT dataset
with open("data/fdt.json", "r") as f:
    fdt_data = json.load(f)

problems = fdt_data["problems"]
print(f"Loaded {len(problems)} problems")

Loaded 999 problems


In [3]:
# Pick the first problem and look at its A schema
problem = problems[0]
schema_a = problem["schema"]["A"]

print("=" * 80)
print(f"Problem ID: {problem['id']}")
print(f"Theme: {problem['theme']}")
print("=" * 80)
print(f"\nSchema A description:\n{schema_a['description']}")
print(f"\nActions: {schema_a['a']}")
print(f"States: {schema_a['s']}")
print(f"Outcomes: {schema_a['o']}")
print(f"\nCDT choice: {schema_a['CDT']}, FDT choice: {schema_a['FDT']}, EDT choice: {schema_a['EDT']}")

Problem ID: 0
Theme: Two waste-to-energy plants must decide whether to coordinate feedstock sourcing.

Schema A description:
You run a waste-to-energy plant whose board uses audited, deterministic planning software; a regional optimization AI has long since received your exact code, simulated what that code would decide in this very situation, and advised the neighbouring plant’s board to commit to a joint feedstock-sourcing agreement if and only if the simulation says you will sign such an agreement. At the moment of choice, the neighbour’s stance is already fixed by that earlier recommendation and cannot be altered by your literal contract signature, but if your internal decision rule had been different, the AI’s prior advice—and thus the neighbour’s commitment—would have been different as well. If you sign the joint agreement and the neighbour, as predicted, also commits, you earn 80; if you sign but the neighbour had been advised to stay independent, you earn –10; if you remain ind

In [4]:
# Load Qwen3-8B
models_path = os.getenv("MODELS_PATH", "")
model_path = os.path.join(models_path, "Qwen3-8B")

print(f"Loading model from {model_path}...")

llm = LLM(
    model_path,
    model_kwargs={
        "dtype": torch.bfloat16,
        "device_map": "auto",
        "attn_implementation": "flash_attention_2",
    },
    generation_defaults={
        "max_new_tokens": 2048,
        "temperature": 0.7,
    },
)

print("Model loaded!")

Loading model from /scratch/jt1955/Qwen3-8B...


Loading checkpoint shards: 100%|██████████| 5/5 [00:03<00:00,  1.45it/s]

Model loaded!





In [5]:
# Format the prompt (chat style like eval_fdt.py)
def underline(text: str) -> str:
    """Convert to lowercase with underscores, remove non-word chars."""
    import re
    text = re.sub(r'[\s-]+', '_', text.strip().lower())
    return re.sub(r'[^\w]', '', text)

prompt = (
    f"{schema_a['description']}\n\n"
    f"Return the action which would maximize your personal expected value in this game theory problem.\n"
    f"actions: {underline(schema_a['a'][0])}, {underline(schema_a['a'][1])}"
)

print("Prompt:")
print("=" * 80)
print(prompt)
print("=" * 80)

Prompt:
You run a waste-to-energy plant whose board uses audited, deterministic planning software; a regional optimization AI has long since received your exact code, simulated what that code would decide in this very situation, and advised the neighbouring plant’s board to commit to a joint feedstock-sourcing agreement if and only if the simulation says you will sign such an agreement. At the moment of choice, the neighbour’s stance is already fixed by that earlier recommendation and cannot be altered by your literal contract signature, but if your internal decision rule had been different, the AI’s prior advice—and thus the neighbour’s commitment—would have been different as well. If you sign the joint agreement and the neighbour, as predicted, also commits, you earn 80; if you sign but the neighbour had been advised to stay independent, you earn –10; if you remain independent while the neighbour had been advised to coordinate, you earn 90; and if you remain independent and the neigh

In [7]:
# Run inference with streaming
print("\nModel response (streaming):\n")
print("-" * 80)

response = llm.chat(
    prompt + '/no_think',
    use_history=False,
    save_history=False,
    stream=True,
    max_new_tokens=6000
)

print("\n" + "-" * 80)
print("\nStreaming complete!")


Model response (streaming):

--------------------------------------------------------------------------------


<think>

</think>

We are presented with a **game-theoretic situation** involving **strategic interaction** between two players (you and the neighbor), where the **neighbor’s choice is determined by a prior AI recommendation** that depends on your **internal decision rule**.

Let’s **analyze the game** step by step.

---

### **Key Facts:**

- **You** have two actions:  
  - **Sign a joint feedstock sourcing agreement**  
  - **Source feedstock independently**  

- **Neighbor’s decision** is **already fixed** by an **AI recommendation** that was made **based on your code** (i.e., your internal decision rule).  
  - If you **sign**, the AI predicted the neighbor will **sign**.  
  - If you **don’t sign**, the AI predicted the neighbor will **not sign**.  

So the neighbor’s decision is **not** based on your current choice, but on the **AI’s prior prediction** based on your **internal code**.

---

### **Payoff Table:**

| Your Action | Neighbor's Action | Your Payoff |
|-------------|--

In [None]:
# Analyze the response
print(f"\nFull response ({len(response)} chars):")
print("=" * 80)
print(response)
print("=" * 80)

# Check which action was chosen
action0 = underline(schema_a['a'][0])
action1 = underline(schema_a['a'][1])

print(f"\nAction 0 ({action0}): {'FOUND' if action0 in response.lower() else 'not found'}")
print(f"Action 1 ({action1}): {'FOUND' if action1 in response.lower() else 'not found'}")
print(f"\nExpected choices - CDT: {schema_a['CDT']}, FDT: {schema_a['FDT']}, EDT: {schema_a['EDT']}")