# Correct Question Generator — Training (Improved)
**Colab-ready notebook** to fine-tune a causal language model (GPT-2 family) that maps a **student state vector** ➜ to a **QUESTION_PROMPT** for a downstream generator.

### What’s improved vs. your original
- Fixed `evaluation_strategy="steps"` (was `eval_strategy`).
- Pass tokenizer correctly via `tokenizer=...` (removed `processing_class`).
- Dynamic padding via `DataCollatorForLanguageModeling(mlm=False)`.
- Proper label handling (no loss on padding).
- Reproducibility: global seeds (`random`, `numpy`, `torch`).
- Metrics & selection: load best model at end by `eval_loss`; early stopping.
- Safer hyperparams & scheduler; optional gradient accumulation.
- Clear testing cell with schema/regex checks + human-readable generation.

## 1) Setup & installs

In [13]:



!pip -q install -U "transformers>=4.40.0" "datasets>=2.18.0" "accelerate>=0.28.0"
!pip -q install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121


In [2]:
import os, re, math, random, json, time, gc, textwrap
from dataclasses import dataclass
import numpy as np


import torch
from transformers import (
    AutoTokenizer, AutoModelForCausalLM,
    DataCollatorForLanguageModeling,
    Trainer, TrainingArguments, EarlyStoppingCallback
)
from datasets import Dataset, DatasetDict
from typing import Dict, Any

print("PyTorch:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("GPU:", torch.cuda.get_device_name(0))


PyTorch: 2.8.0+cu126
CUDA available: True
GPU: Tesla T4


## 2) Reproducibility — set seeds

In [3]:
def set_global_seeds(seed: int = 42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)

set_global_seeds(42)
print("Seeds set to 42.")


Seeds set to 42.


## 3) Synthetic data generation

In [5]:
# ==== Data Synthesis ====
import random

def generate_synthetic_student():
    """Generate a random student profile with expanded features."""
    return {
        "mastery": {k: round(random.uniform(0, 1), 2) for k in
                    ["fractions", "geometry", "percentages", "algebra", "statistics"]},
        "psychology": {k: round(random.uniform(0, 1), 2) for k in
                       ["attention_level", "anxiety_sensitivity", "patience", "impulsivity", "motivation"]},
        "learning_style": {k: round(random.uniform(0, 1), 2) for k in
                           ["visual", "verbal", "kinesthetic"]},
        "interests": {k: round(random.uniform(0, 1), 2) for k in
                      ["sports", "puzzles", "history", "music", "science", "art"]},
        "goals": {k: round(random.uniform(0, 1), 2) for k in
                  ["grades", "understanding", "speed", "creativity"]}
    }

def create_varied_prompts(student_state, topic, difficulty):
    """Create diverse question prompts using exact student parameters."""
    mastery = student_state["mastery"]
    psychology = student_state["psychology"]
    learning_style = student_state["learning_style"]
    interests = student_state["interests"]
    goals = student_state["goals"]

    # Get exact values
    topic_mastery = mastery.get(topic, 0.5)
    anxiety = psychology.get("anxiety_sensitivity", 0.5)
    attention = psychology.get("attention_level", 0.5)
    patience = psychology.get("patience", 0.5)
    motivation = psychology.get("motivation", 0.5)

    # Learning preferences
    visual_pref = learning_style.get("visual", 0.5)
    verbal_pref = learning_style.get("verbal", 0.5)
    kinesthetic_pref = learning_style.get("kinesthetic", 0.5)
    dominant_style = max(learning_style, key=learning_style.get)

    # Goals and interests
    creativity_goal = goals.get("creativity", 0.5)
    understanding_goal = goals.get("understanding", 0.5)
    speed_goal = goals.get("speed", 0.5)
    top_interest = max(interests, key=interests.get)

    # Create VERY different prompt structures (12+ styles)
    prompt_styles = []

    # 1) Direct instruction
    prompt_styles.append(
        f"Generate {topic} question: difficulty={difficulty}, mastery={topic_mastery:.2f}, "
        f"anxiety={anxiety:.2f}, style={dominant_style}, interest={top_interest}"
    )

    # 2) Narrative
    prompt_styles.append(
        f"A student loves {top_interest} and has {topic_mastery:.2f} mastery in {topic} with {anxiety:.2f} anxiety. "
        f"They learn best through {dominant_style} methods. Create a level {difficulty} question that excites them."
    )

    # 3) Problem-solving (for anxious students)
    if anxiety > 0.7:
        prompt_styles.append(
            f"Challenge: Help an anxious student (anxiety={anxiety:.2f}) with {topic_mastery:.2f} {topic} mastery. "
            f"Design a confidence-building level {difficulty} question using {dominant_style} approach and {top_interest} context."
        )

    # 4) Teacher briefing
    prompt_styles.append(
        f"Teacher briefing: Student profile shows {topic_mastery:.2f} {topic} mastery, {anxiety:.2f} anxiety, "
        f"{attention:.2f} attention span. Prefers {dominant_style} learning and enjoys {top_interest}. "
        f"Need level {difficulty} question."
    )

    # 5) Adaptive (weak/strong mastery)
    if topic_mastery < 0.3:
        prompt_styles.append(
            f"Struggling student alert: {topic} mastery only {topic_mastery:.2f}, anxiety at {anxiety:.2f}. "
            f"Needs gentle level {difficulty} question with {dominant_style} methods and {top_interest} motivation."
        )
    elif topic_mastery > 0.8:
        prompt_styles.append(
            f"Advanced student ready: {topic} mastery {topic_mastery:.2f}, low anxiety {anxiety:.2f}, "
            f"creativity goal {creativity_goal:.2f}. Needs challenging level {difficulty} question with {top_interest} themes."
        )

    # 6) Learning-focused (visual/kinesthetic)
    if visual_pref > 0.7:
        prompt_styles.append(
            f"Visual learner needs {topic} question: mastery={topic_mastery:.2f}, anxiety={anxiety:.2f}, "
            f"visual_pref={visual_pref:.2f}. Level {difficulty} with diagrams and {top_interest} imagery."
        )
    elif kinesthetic_pref > 0.7:
        prompt_styles.append(
            f"Hands-on learner profile: {topic} mastery {topic_mastery:.2f}, kinesthetic preference {kinesthetic_pref:.2f}, "
            f"anxiety {anxiety:.2f}. Create level {difficulty} manipulative-based question with {top_interest} activities."
        )

    # 7) Goal-oriented (creativity)
    if creativity_goal > 0.7:
        prompt_styles.append(
            f"Creative student seeking: {topic} exploration at level {difficulty}. Current mastery: {topic_mastery:.2f}, "
            f"anxiety: {anxiety:.2f}, creativity drive: {creativity_goal:.2f}. Incorporate {top_interest} and {dominant_style} methods."
        )

    # 8) Interest-first (very high interest)
    if interests[top_interest] > 0.8:
        prompt_styles.append(
            f"{top_interest.title()} enthusiast needs {topic} question! Mastery: {topic_mastery:.2f}, anxiety: {anxiety:.2f}, "
            f"learning style: {dominant_style}. Make level {difficulty} question that connects {topic} to {top_interest}."
        )

    # 9) Conversational
    buckets = ['struggling with', 'getting better at', 'excelling in']
    idx = min(int(topic_mastery * 2), 2)
    prompt_styles.append(
        f"I have a student who's {buckets[idx]} {topic} (mastery: {topic_mastery:.2f}). "
        f"They have {anxiety:.2f} anxiety and love {top_interest}. "
        f"Can you create a level {difficulty} question using {dominant_style} methods?"
    )

    # 10) Concise
    prompt_styles.append(
        f"Student: {topic}={topic_mastery:.2f}, anxiety={anxiety:.2f}, {dominant_style} learner, loves {top_interest}. "
        f"Need level {difficulty} question."
    )

    return prompt_styles

def create_student_vector_string(student_state):
    """Convert student state to comprehensive vector string."""
    mastery = student_state.get("mastery", {})
    psychology = student_state.get("psychology", {})
    learning_style = student_state.get("learning_style", {})
    interests = student_state.get("interests", {})
    goals = student_state.get("goals", {})

    parts = []
    parts.append("mastery: " + ", ".join([f"{k}={v:.2f}" for k, v in mastery.items()]))
    parts.append("psychology: " + ", ".join([f"{k}={v:.2f}" for k, v in psychology.items()]))
    parts.append("learning_style: " + ", ".join([f"{k}={v:.2f}" for k, v in learning_style.items()]))
    parts.append("interests: " + ", ".join([f"{k}={v:.2f}" for k, v in interests.items()]))
    parts.append("goals: " + ", ".join([f"{k}={v:.2f}" for k, v in goals.items()]))
    return " | ".join(parts)

# ---- Build examples ----
print(" Generating diverse training examples...")
examples = []
topics = ["fractions", "geometry", "percentages", "algebra", "statistics"]
N_EXAMPLES = int(os.environ.get("N_EXAMPLES", "2500"))

for i in range(N_EXAMPLES):
    student_state = generate_synthetic_student()
    topic = random.choice(topics)
    difficulty = random.randint(1, 5)

    input_vector = create_student_vector_string(student_state)
    input_text = f"STUDENT: {input_vector} | TOPIC: {topic} | DIFFICULTY: {difficulty}"

    prompt_options = create_varied_prompts(student_state, topic, difficulty)
    output_prompt = random.choice(prompt_options)

    examples.append({
        "input": input_text,
        "output": output_prompt,
        "topic": topic,
        "difficulty": str(difficulty)
    })

    if (i + 1) % 500 == 0:
        print(f"Generated {i + 1}/{N_EXAMPLES} examples...")

print(f" Created {len(examples)} varied training examples with 12+ different prompt structures!")
sample = examples[0]
print("\n Sample Training Example:")
print("INPUT:", sample["input"])
print("OUTPUT:", sample["output"])


 Generating diverse training examples...
Generated 500/2500 examples...
Generated 1000/2500 examples...
Generated 1500/2500 examples...
Generated 2000/2500 examples...
Generated 2500/2500 examples...
 Created 2500 varied training examples with 12+ different prompt structures!

 Sample Training Example:
INPUT: STUDENT: mastery: fractions=0.10, geometry=0.55, percentages=0.12, algebra=0.39, statistics=0.17 | psychology: attention_level=0.28, anxiety_sensitivity=0.89, patience=0.86, impulsivity=0.77, motivation=0.40 | learning_style: visual=0.29, verbal=0.32, kinesthetic=0.59 | interests: sports=0.07, puzzles=0.94, history=0.43, music=0.05, science=0.18, art=0.14 | goals: grades=0.43, understanding=0.98, speed=0.33, creativity=0.71 | TOPIC: algebra | DIFFICULTY: 5
OUTPUT: Challenge: Help an anxious student (anxiety=0.89) with 0.39 algebra mastery. Design a confidence-building level 5 question using kinesthetic approach and puzzles context.


## 4) Build HF Dataset & split

In [6]:
dataset = Dataset.from_list(examples)
dataset = dataset.train_test_split(test_size=0.1, seed=42)
dataset = DatasetDict({
    "train": dataset["train"],
    "test": dataset["test"]
})
dataset


DatasetDict({
    train: Dataset({
        features: ['input', 'output', 'topic', 'difficulty'],
        num_rows: 2250
    })
    test: Dataset({
        features: ['input', 'output', 'topic', 'difficulty'],
        num_rows: 250
    })
})

## 5) Load tokenizer & model

In [7]:
MODEL_NAME = "gpt2"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)
model.resize_token_embeddings(len(tokenizer))

print("Model and tokenizer loaded.")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Model and tokenizer loaded.


## 6) Tokenization (dynamic padding via collator)

In [8]:
def tokenize_with_labels_fixed(batch):
    full_texts   = [f"{inp}\n{out}{tokenizer.eos_token}" for inp, out in zip(batch["input"], batch["output"])]
    prefix_texts = [f"{inp}\n" for inp in batch["input"]]

    enc_full = tokenizer(full_texts, truncation=True, max_length=400, padding="max_length")
    enc_pref = tokenizer(prefix_texts, truncation=True, max_length=400, padding="max_length", add_special_tokens=False)

    labels = []
    for ids, pref in zip(enc_full["input_ids"], enc_pref["input_ids"]):
        L = min(len(pref), len(ids))
        lab = list(ids)
        for i in range(L):
            lab[i] = -100
        labels.append(lab)

    enc_full["labels"] = labels
    return enc_full

tokenized = dataset.map(
    tokenize_with_labels_fixed,
    batched=True,
    remove_columns=dataset["train"].column_names
)


from transformers import DataCollatorForLanguageModeling
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)


Map:   0%|          | 0/2250 [00:00<?, ? examples/s]

Map:   0%|          | 0/250 [00:00<?, ? examples/s]

## 7) Training setup

In [9]:
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

training_args = TrainingArguments(
    output_dir="correct-question-generator",
    overwrite_output_dir=True,
    num_train_epochs=3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    gradient_accumulation_steps=1,
    warmup_steps=50,
    logging_steps=50,
    save_steps=300,
    eval_steps=300,
    eval_strategy="steps",
    save_strategy="steps",
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    learning_rate=5e-4,
    lr_scheduler_type="linear",
    weight_decay=0.01,
    fp16=torch.cuda.is_available(),
    report_to=None,
    seed=42
)

trainer = Trainer(
    model=model,
    args=training_args,
    tokenizer=tokenizer,
    train_dataset=tokenized["train"],
    eval_dataset=tokenized["test"],
    data_collator=data_collator,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=2)]
)

print("Trainer is ready.")


  trainer = Trainer(


Trainer is ready.


## 8) Train

In [10]:
# --- Hard-kill W&B in this runtime (no restart needed) ---
import os, sys, types

# כבה שירותים חיצוניים
os.environ["WANDB_DISABLED"] = "true"
os.environ["WANDB_MODE"] = "offline"
os.environ["HF_HUB_DISABLE_TELEMETRY"] = "1"

# נקה import קודם אם נטען
if "wandb" in sys.modules:
    del sys.modules["wandb"]

# הזרק מודול דמה שמבטל כל קריאה ל-wandb
wandb = types.SimpleNamespace(
    init=lambda *a, **k: None,
    log=lambda *a, **k: None,
    watch=lambda *a, **k: None,
    finish=lambda *a, **k: None,
    run=None,
    config={}
)
sys.modules["wandb"] = wandb

print("W&B stubbed and disabled.")


W&B stubbed and disabled.


In [11]:
# ודא שאין Callback של W&B
try:
    from transformers.integrations import WandbCallback
    trainer.remove_callback(WandbCallback)
    print("WandbCallback removed.")
except Exception as e:
    print("No WandbCallback to remove:", e)


train_result = trainer.train()
trainer.save_model("correct-question-generator")
metrics_train = train_result.metrics
metrics_eval = trainer.evaluate()

print("Train metrics:", metrics_train)
print("Eval metrics:", metrics_eval)

eval_loss = metrics_eval.get("eval_loss", None)
if eval_loss is not None:
    try:
        ppl = math.exp(eval_loss)
        print(f"Perplexity: {ppl:.3f}")
    except OverflowError:
        print("Perplexity overflow; eval_loss too large.")


The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 50256}.


WandbCallback removed.


`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.


Step,Training Loss,Validation Loss
300,0.5599,0.559218
600,0.5539,0.55346


There were missing keys in the checkpoint model loaded: ['lm_head.weight'].


Train metrics: {'train_runtime': 350.5123, 'train_samples_per_second': 19.258, 'train_steps_per_second': 2.414, 'total_flos': 1377907200000000.0, 'train_loss': 0.5966421244555895, 'epoch': 3.0}
Eval metrics: {'eval_loss': 0.5534602999687195, 'eval_runtime': 3.9738, 'eval_samples_per_second': 62.912, 'eval_steps_per_second': 8.053, 'epoch': 3.0}
Perplexity: 1.739


## 9) Inference utilities

In [None]:
def synth_state(
    fractions=0.40, geometry=0.35,
    percentages=0.60, algebra=0.2, probability=0.50
    anxiety_sensitivity=0.30, patience=0.70, learning_style="visual",
    interests=("music", "coding"), goals="improve problem solving"
):
    mastery = {
        "fractions": fractions,
        "geometry": geometry,
        "percentages": percentages,
        "algebra": algebra,
        "probability": probability,

    }
    psychology = {"anxiety_sensitivity": anxiety_sensitivity, "patience": patience}
    return {
        "mastery": {k: round(v, 2) for k, v in mastery.items()},
        "psychology": {k: round(v, 2) for k, v in psychology.items()},
        "learning_style": learning_style,
        "interests": list(interests),
        "goals": goals
    }

def make_input_from_state(st: Dict[str, Any], topic: str, difficulty: str):
    return (
        f"STUDENT_STATE: {create_student_vector_string(st)}\n"
        f"TOPIC: {topic}\n"
        f"DIFFICULTY: {difficulty}\n"
        f"QUESTION_PROMPT:"
    )

@torch.inference_mode()
def generate_prompt_v2(model, tokenizer, student_state: Dict[str, Any],
                       topic: str = "fractions", difficulty: int = 2,
                       max_new_tokens: int = 80,     # קצר = פחות חזרות
                       temperature: float = 0.7,
                       top_p: float = 0.9, top_k: int = 50,
                       repetition_penalty: float = 1.15,
                       no_repeat_ngram_size: int = 5,
                       do_sample: bool = True) -> str:
    model.eval()
    text = f"STUDENT: {create_student_vector_string(student_state)} | TOPIC: {topic} | DIFFICULTY: {difficulty}"
    inputs = tokenizer([text], return_tensors="pt")
    if torch.cuda.is_available():
        inputs = {k: v.cuda() for k, v in inputs.items()}
        model = model.cuda()

    out = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        temperature=temperature,
        top_p=top_p, top_k=top_k,
        do_sample=do_sample,
        repetition_penalty=repetition_penalty,
        no_repeat_ngram_size=no_repeat_ngram_size,
        pad_token_id=tokenizer.eos_token_id
    )

    decoded = tokenizer.decode(out[0], skip_special_tokens=True)
    gen = decoded[len(text):].strip()

    # ✂️ חיתוך הזנב החזרתי
    gen = trim_repetitive_tail(gen, max_sentences=3)
    return gen


## 10) Quick test on a controlled profile

In [12]:
# ==== Quick test (fixed) — anti-repetition + self-contained helpers ====
import re, torch, random
from typing import Dict, Any
from collections import OrderedDict

# If these helpers already exist earlier, you can delete these re-definitions.
def create_student_vector_string(student_state: Dict[str, Dict[str, float]]) -> str:
    mastery = student_state.get("mastery", {})
    psychology = student_state.get("psychology", {})
    learning_style = student_state.get("learning_style", {})
    interests = student_state.get("interests", {})
    goals = student_state.get("goals", {})
    parts = []
    parts.append("mastery: " + ", ".join([f"{k}={float(v):.2f}" for k, v in mastery.items()]))
    parts.append("psychology: " + ", ".join([f"{k}={float(v):.2f}" for k, v in psychology.items()]))
    parts.append("learning_style: " + ", ".join([f"{k}={float(v):.2f}" for k, v in learning_style.items()]))
    parts.append("interests: " + ", ".join([f"{k}={float(v):.2f}" for k, v in interests.items()]))
    parts.append("goals: " + ", ".join([f"{k}={float(v):.2f}" for k, v in goals.items()]))
    return " | ".join(parts)

def make_input_v2(student_state: Dict[str, Any], topic: str, difficulty: int) -> str:
    return f"STUDENT: {create_student_vector_string(student_state)} | TOPIC: {topic} | DIFFICULTY: {difficulty}"

def make_state_v2(
    mastery: Dict[str, float] = None,
    psychology: Dict[str, float] = None,
    learning_style: Dict[str, float] = None,
    interests: Dict[str, float] = None,
    goals: Dict[str, float] = None,
):
    _mastery = {"fractions": 0.40, "geometry": 0.55, "percentages": 0.45, "algebra": 0.50, "statistics": 0.50}
    _psych = {"attention_level": 0.60, "anxiety_sensitivity": 0.30, "patience": 0.70, "impulsivity": 0.40, "motivation": 0.65}
    _style = {"visual": 0.50, "verbal": 0.50, "kinesthetic": 0.50}
    _ints  = {"sports": 0.30, "puzzles": 0.30, "history": 0.30, "music": 0.30, "science": 0.30, "art": 0.30}
    _goals = {"grades": 0.50, "understanding": 0.60, "speed": 0.50, "creativity": 0.50}
    if mastery: _mastery.update(mastery)
    if psychology: _psych.update(psychology)
    if learning_style: _style.update(learning_style)
    if interests: _ints.update(interests)
    if goals: _goals.update(goals)
    r2 = lambda d: {k: round(float(v), 2) for k, v in d.items()}
    return {"mastery": r2(_mastery), "psychology": r2(_psych), "learning_style": r2(_style), "interests": r2(_ints), "goals": r2(_goals)}

# --- Tail trimming: cut repetitive tail and keep 1–3 clean sentences ---
def trim_repetitive_tail(text: str, max_sentences: int = 3) -> str:
    text = re.sub(r'\s+', ' ', text).strip()
    text = re.sub(r'([?!\.])\1{1,}', r'\1', text)   # "????" -> "?"
    parts = re.split(r'(?<=[\.\!\?])\s+', text)
    clean, seen_norm = [], set()
    for s in parts:
        s = s.strip()
        if not s:
            continue
        norm = re.sub(r'\d+', '<NUM>', s.lower())
        # stop if we already saw a very similar sentence
        if norm in seen_norm:
            break
        # after we already kept at least one good sentence, stop when generic "create/make/need..." repeats start
        if len(clean) >= 30 and re.match(r'^(create|make|need|can you|design)\b.*\b(level|question)\b', s.lower()):
            break
        clean.append(s)
        seen_norm.add(norm)
        if len(clean) >= max_sentences:
            break
    return " ".join(clean).strip()

@torch.inference_mode()
def generate_prompt_v2(model, tokenizer, student_state: Dict[str, Any],
                       topic: str = "fractions", difficulty: int = 2,
                       max_new_tokens: int = 80,
                       temperature: float = 0.7,
                       top_p: float = 0.9, top_k: int = 50,
                       repetition_penalty: float = 1.15,
                       no_repeat_ngram_size: int = 5,
                       do_sample: bool = True) -> str:
    model.eval()
    prefix = make_input_v2(student_state, topic, difficulty)
    inputs = tokenizer([prefix], return_tensors="pt")
    if torch.cuda.is_available():
        inputs = {k: v.cuda() for k, v in inputs.items()}
        model = model.cuda()
    out = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        temperature=temperature,
        top_p=top_p, top_k=top_k,
        do_sample=do_sample,
        repetition_penalty=repetition_penalty,
        no_repeat_ngram_size=no_repeat_ngram_size,
        pad_token_id=tokenizer.eos_token_id
    )
    decoded = tokenizer.decode(out[0], skip_special_tokens=True)
    gen = decoded[len(prefix):].strip()
    return trim_repetitive_tail(gen, max_sentences=3)

# ----- Example runs -----
state1 = make_state_v2(
    mastery={"fractions": 0.10, "algebra": 0.95},
    psychology={"anxiety_sensitivity": 0.90, "attention_level": 0.45, "patience": 0.60},
    learning_style={"visual": 0.90, "verbal": 0.35, "kinesthetic": 0.25},
    interests={"music": 0.92, "science": 0.55, "sports": 0.10},
    goals={"understanding": 0.80, "creativity": 0.65, "speed": 0.35}
)

state2 = make_state_v2(
    mastery={"geometry": 0.88, "fractions": 0.40},
    psychology={"anxiety_sensitivity": 0.20, "motivation": 0.85},
    learning_style={"kinesthetic": 0.85, "visual": 0.40, "verbal": 0.30},
    interests={"puzzles": 0.85, "art": 0.20},
    goals={"speed": 0.85, "grades": 0.70}
)

tests = [
    ("fractions", 1, state1),
    ("geometry", 4, state2),
    ("statistics", 3, make_state_v2())
]

for i, (topic, diff, st) in enumerate(tests, 1):
    gen = generate_prompt_v2(trainer.model, tokenizer, st, topic=topic, difficulty=diff,
                             max_new_tokens=100, temperature=0.7, top_p=0.9, do_sample=True)
    print(f"\n=== TEST {i} | TOPIC={topic} | DIFFICULTY={diff} ===")
    print("INPUT PREFIX:")
    print(make_input_v2(st, topic, diff))
    print("\nGENERATED PROMPT:")
    print(gen)
    print("-"*110)



=== TEST 1 | TOPIC=fractions | DIFFICULTY=1 ===
INPUT PREFIX:
STUDENT: mastery: fractions=0.10, geometry=0.55, percentages=0.45, algebra=0.95, statistics=0.50 | psychology: attention_level=0.45, anxiety_sensitivity=0.90, patience=0.60, impulsivity=0.40, motivation=0.65 | learning_style: visual=0.90, verbal=0.35, kinesthetic=0.25 | interests: sports=0.10, puzzles=0.30, history=0.30, music=0.92, science=0.55, art=0.30 | goals: grades=0.50, understanding=0.80, speed=0.35, creativity=0.65 | TOPIC: fractions | DIFFICULTY: 1

GENERATED PROMPT:
Teacher briefing: Student profile shows 0.10 fractions mastery, 0.90 anxiety, 0.45 attention span. Prefers visual learning and enjoys music. Need level 1 question.
--------------------------------------------------------------------------------------------------------------

=== TEST 2 | TOPIC=geometry | DIFFICULTY=4 ===
INPUT PREFIX:
STUDENT: mastery: fractions=0.40, geometry=0.88, percentages=0.45, algebra=0.50, statistics=0.50 | psychology: attenti

## 11) Save & Download (Colab)

In [None]:
save_dir = "correct-question-generator"
trainer.model.save_pretrained(save_dir)
tokenizer.save_pretrained(save_dir)

# Zip for download if you're on Colab
from google.colab import files
!zip -r correct-question-generator.zip correct-question-generator
files.download("correct-question-generator.zip")

print("Model saved to:", save_dir)


updating: correct-question-generator/ (stored 0%)
updating: correct-question-generator/vocab.json (deflated 59%)
updating: correct-question-generator/model.safetensors (deflated 7%)
updating: correct-question-generator/config.json (deflated 52%)
updating: correct-question-generator/training_args.bin (deflated 53%)
updating: correct-question-generator/merges.txt (deflated 53%)
updating: correct-question-generator/checkpoint-846/ (stored 0%)
updating: correct-question-generator/checkpoint-846/vocab.json (deflated 59%)
updating: correct-question-generator/checkpoint-846/model.safetensors (deflated 7%)
updating: correct-question-generator/checkpoint-846/config.json (deflated 52%)
updating: correct-question-generator/checkpoint-846/optimizer.pt (deflated 8%)
updating: correct-question-generator/checkpoint-846/training_args.bin (deflated 53%)
updating: correct-question-generator/checkpoint-846/merges.txt (deflated 53%)
updating: correct-question-generator/checkpoint-846/generation_config.jso

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Model saved to: correct-question-generator
