<a href="https://colab.research.google.com/github/hamidb201214-svg/Lectures/blob/main/M3_3_NLG_8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How Smart Are They? Understanding the Scale of GPT-3 and GPT-4


| Assumption                                  | Description                                                                                       |
|---------------------------------------------|---------------------------------------------------------------------------------------------------|
| **Average Tokens per Book**                 | Estimated at 135,000 tokens per book, based on an average book length of 80,000 to 100,000 words.  |
| **Average Reading Lifetime of an Individual** | Estimated at 510 books per lifetime, assuming a moderate reading habit of 5-12 books per year over 60 years. |
| **Tokens per Word**                         | Estimated at 1.5 tokens per word, accounting for spaces and punctuation.                          |



| Detail                             | GPT-3                                   | GPT-4                                   |
|------------------------------------|-----------------------------------------|-----------------------------------------|
| **Developed By**                   | OpenAI                                  | OpenAI                                  |
| **Approximate Training Data Size** | 45 terabytes of text data               | Larger than GPT-3 (exact size unknown)  |
| **Estimated Token Count**          | 300-400 billion tokens                  | Likely over 500 billion tokens          |
| **Equivalent Number of Books**     | 2,222,222 - 2,962,963 books             | >3,703,704 books                        |
| **Equivalent Knowledge of People** | 4,356 - 5,810 people                    | >7,263 people                           |


![](https://raw.githubusercontent.com/aaubs/ds-master/main/data/Images/transformermodel_architecture.png)

# Why adapt the language model?

- LMs are trained in a task-agnostic way.
- Downstream tasks can be very different from language modeling on the Pile.
For example, consider the natural language inference (NLI) task (is the hypothesis entailed by the premise?):

      Premise: I have never seen an apple that is not red.
      Hypothesis: I have never seen an apple.
      Correct output: Not entailment (the reverse direction would be entailment)

- The format of such a task may not be very natural for the model.

# Ways downstream tasks can be different

- **Formatting**: for example, NLI takes in two sentences and compares them to produce a single binary output. This is different from generating the next token or filling in MASKs. Another example is the presence of MASK tokens in BERT training vs. no MASKs in downstream tasks.
- **Topic shift**: the downstream task is focused on a new or very specific topic (e.g., medical records)
- **Temporal shift**: the downstream task requires new knowledge that is unavailable during pre-training because 1) the knowledge is new (e.g., GPT3 was trained before Biden became President), 2) the knowledge for the downstream task is not publicly available.


# Optimizing Large Language Models

There are several options to optimize Large Language Models:

    Prompt engineering by providing samples (In-Context Learning)
    Prompt Tuning
    Fine-Tuning
       - Supervised fine-tuning (SFT): Classic fine-tuning by changing all weights
       - Transfer Learning - PEFT fine-tuning by changing only a few weights
       - Reinforcement Learning Human Feedback (RLHF)

An important question is which of these options is the most effective one and which one can overwrite previous optimizations.

### Understanding Prompt Engineering, Prompt Tuning, and PEFT
These techniques are essential for efficiently adapting large, pre-trained models like GPT or BERT to specialized tasks or domains, optimizing resource usage and reducing training time.


1. **Prompt Engineering (In-Context Learning)**:
   - **Definition**: Crafting input prompts to guide a Large Language Model (LLM) for desired outputs.
   - **Application**: Uses natural language prompts to "program" the LLM, leveraging its contextual understanding.
   - **Model Change**: No alteration to the model's parameters; relies on the model's existing knowledge and interpretive abilities.

2. **Prompt Tuning**:
   - **Difference from Prompt Engineering**: Involves appending a trainable tensor (prompt tokens) to the LLM's input embeddings.
   - **Process**: Fine-tunes this tensor for a specific task and dataset, keeping other model parameters unchanged.
   - **Example**: Adapting a general LLM for specific tasks like sentiment classification by adjusting prompt tokens.

3. **Parameter-Efficient Fine-Tuning (PEFT)**:
   - **Overview**: A set of techniques to enhance model performance on specific tasks or datasets by tuning a small subset of parameters.
   - **Objective**: Targeted improvements without the need for full model retraining.
   - **Relation to Prompt Tuning**: Prompt tuning is a subset of PEFT, focusing on fine-tuning specific parts of the model for task/domain adaptation.



![](https://raw.githubusercontent.com/aaubs/ds-master/main/data/Images/PEFT_LLMs.png)

### Challenges

Fine-tuning models can certainly help to get models to do what you want them to do. However, there are some potential issues:

> - **Catastrophic forgetting**: This phenomenon describes a behavior when fine-tuning or prompts can overwrite the pre-trained model characteristics.
> - **Overfitting**: If only a certain AI task has been fine-tuned, other tasks can suffer in terms of performance.

In general, fine-tuning should be used wisely and best practices should be applied, for example, the quality of the data is more important than the quantity and multiple AI tasks should be fine-tuned at the same time vs after each other.

# Applications

There are many platforms that can be used for LLMs' applications:


| Tool                                                                                                    | Category                             | Best For                                                                         | Type        |
| :------------------------------------------------------------------------------------------------------ | :----------------------------------- | :------------------------------------------------------------------------------- | :---------- |
| **[LangChain](https://docs.langchain.com)**                                                             | Orchestration                        | Agents, tools, RAG, observability                                                | Open-source |
| **[Flowise](https://docs.flowiseai.com)**                                                               | App Builder / Orchestration (Visual) | Low-code drag-and-drop LLM apps (chatbots, RAG flows), rapid prototyping         | Open-source |
| **[CrewAI](https://docs.crewai.com)**                                                                   | Agent Orchestration (Multi-agent)    | Role-based multi-agent workflows, task delegation, coordinated tool-using agents | Open-source |
| **[Hugging Face](https://huggingface.co/docs)**                                                         | Model Hub                            | Open models, fine-tuning, hosting                                                | Platform    |
| **[vLLM](https://docs.vllm.ai)** / **[SGLang](https://github.com/sgl-project/sglang)**                  | Serving                              | High-throughput / Structured generation                                          | Open-source |
| **[Ollama](https://github.com/ollama/ollama)** / **[llama.cpp](https://github.com/ggml-org/llama.cpp)** | Local Run                            | Local inference & model management                                               | Open-source |
| **[bitsandbytes](https://huggingface.co/docs/transformers/en/quantization/bitsandbytes)**               | Quantization (4/8-bit)               | Fit models into less VRAM; decent speed/quality tradeoffs                        | Open-source |
| **[Pydantic](https://docs.pydantic.dev/)**                                                              | Validation / Schemas                 | Type-safe data validation; enforce structured outputs and tool I/O               | Open-source |
| **[LlamaIndex](https://docs.llamaindex.ai)**                                                            | Data / RAG                           | Ingestion, indexing, retrieval                                                   | Open-source |
| **[Haystack](https://haystack.deepset.ai)**                                                             | RAG Pipelines                        | Production pipelines, Doc QA                                                     | Open-source |
| **[Semantic Kernel](https://github.com/microsoft/semantic-kernel)**                                     | Orchestration                        | Enterprise workflows (C#/Python)                                                 | Open-source |


In [None]:
!pip install --upgrade transformers

In [None]:
import torch
import gc

# Delete the model and any other large tensors
del model
del tokenizer

# Force garbage collection
gc.collect()

# Clear the PyTorch CUDA cache
torch.cuda.empty_cache()


In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-4B-Thinking-2507"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content) # no opening <think> tag
print("content:", content)


In [None]:
!nvidia-smi

In [None]:
!pip install -U bitsandbytes>=0.46.1

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_name = "Qwen/Qwen3-4B-Thinking-2507"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,   # <-- match fp16
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.float16,              # <-- match fp16
    quantization_config=bnb_config,
)

In [None]:


# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content) # no opening <think> tag
print("content:", content)


In [None]:
!nvidia-smi

In [None]:
from typing import List, Literal, Optional
from pydantic import BaseModel, Field, ValidationError

from transformers import AutoModelForCausalLM, AutoTokenizer

# -----------------------------
# 1) Pydantic models (schemas)
# -----------------------------
class ChatMessage(BaseModel):
    role: Literal["system", "user", "assistant"]
    content: str = Field(min_length=1)

class GenerationRequest(BaseModel):
    prompt: str = Field(min_length=1)
    max_new_tokens: int = Field(default=512, ge=1, le=4096)

class GenerationResult(BaseModel):
    thinking: str = ""
    answer: str = Field(min_length=1)

# -----------------------------
# 2) Your code, with validation
# -----------------------------
model_name = "Qwen/Qwen3-4B-Thinking-2507"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

req = GenerationRequest(prompt="Give me a short introduction to large language model.", max_new_tokens=512)

# validate messages
messages: List[ChatMessage] = [ChatMessage(role="user", content=req.prompt)]
messages_dicts = [m.model_dump() for m in messages]  # convert to plain dicts for HF

text = tokenizer.apply_chat_template(
    messages_dicts,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(**model_inputs, max_new_tokens=req.max_new_tokens)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

# parsing thinking content (your logic)
try:
    index = len(output_ids) - output_ids[::-1].index(151668)  # </think> token id
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

# validate / package output
try:
    result = GenerationResult(thinking=thinking_content, answer=content)
except ValidationError as e:
    # e.g. answer empty -> you get a clean error instead of silent bad data
    raise

print("thinking content:", result.thinking)
print("content:", result.answer)


# LangChain
## Deep Agents overview



### Step 1: Install dependencies

In [None]:
!pip install deepagents tavily-python

### Step 2: Set up your API keys

In [None]:
import os
from getpass import getpass

os.environ["GEMINI_API_KEY"] = getpass("Enter GEMINI_API_KEY: ").strip()
os.environ["TAVILY_API_KEY"] = getpass("Enter TAVILY_API_KEY: ").strip()

print('export GEMINI_API_KEY="***"')
print('export TAVILY_API_KEY="***"')

### Step 3: Create a search tool

In [None]:
import os
from typing import Literal
from tavily import TavilyClient
from deepagents import create_deep_agent

tavily_client = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])

def internet_search(
    query: str,
    max_results: int = 5,
    topic: Literal["general", "news", "finance"] = "general",
    include_raw_content: bool = False,
):
    """Run a web search"""
    return tavily_client.search(
        query,
        max_results=max_results,
        include_raw_content=include_raw_content,
        topic=topic,
    )

### Step 4: Create a deep agent

In [None]:
from langchain.chat_models import init_chat_model

# System prompt to steer the agent to be an expert researcher
research_instructions = """You are an expert researcher. Your job is to conduct thorough research and then write a polished report.

You have access to an internet search tool as your primary means of gathering information.

## `internet_search`

Use this to run an internet search for a given query. You can specify the max number of results to return, the topic, and whether raw content should be included.
"""

# Initialize the Gemini model using the GEMINI_API_KEY set earlier
# The model name 'gemini-1.5-flash' is a common and capable choice for general tasks.
model = init_chat_model(model="google_genai:gemini-2.5-flash-lite")

agent = create_deep_agent(
    model=model, # Explicitly pass the initialized model
    tools=[internet_search],
    system_prompt=research_instructions
)

### Step 5: Run the agent

In [None]:
result = agent.invoke({"messages": [{"role": "user", "content": "What is langgraph?"}]})

# Print the agent's response
print(result["messages"][-1].content)

In [None]:
result

In [None]:
# %pip -q install -U google-genai

import os
from google import genai

api_key = os.environ.get("GOOGLE_API_KEY") or os.environ.get("GEMINI_API_KEY")
client = genai.Client(api_key=api_key)

available = []
for m in client.models.list():
    # Docs example uses m.supported_actions and checks for "generateContent"
    if "generateContent" in getattr(m, "supported_actions", []):
        available.append(m.name.replace("models/", ""))

print("Models that support generateContent:")
for name in available[:50]:
    print(" -", name)


In [None]:
!pip install langchain_huggingface

In [None]:
%pip -q install -U "protobuf>=5.26.1,<6" "grpcio-status>=1.71.2,<2" jedi


In [None]:
!pkill -f vllm || true
!nvidia-smi


In [None]:
!pip install -U langgraph deepagents "langchain[openai]" "langchain[google-genai]"

## Human-in-the-loop

Learn how to configure human approval for sensitive tool operations

Some tool operations may be sensitive and require human approval before execution. Deep agents support human-in-the-loop workflows through LangGraph’s interrupt capabilities. You can configure which tools require approval using the interrupt_on parameter.

In [None]:
import os
import json
import uuid
import getpass
import argparse
from pathlib import Path
from typing import Any, Dict, List, Optional

from langchain.tools import tool
from langchain.chat_models import init_chat_model
from deepagents import create_deep_agent
from langgraph.checkpoint.memory import MemorySaver
from langgraph.types import Command

# -----------------------------
# 0) Provider selection + API key prompting
# -----------------------------
OPENAI_DEFAULT_MODEL = "openai:gpt-4o-mini"
GEMINI_DEFAULT_MODEL = "google_genai:gemini-2.5-flash-lite"

def choose_provider(cli_value: Optional[str]) -> str:
    if cli_value in {"openai", "gemini"}:
        return cli_value

    # Interactive prompt if not provided
    while True:
        choice = input("Choose provider [openai/gemini] (default: openai): ").strip().lower()
        if choice == "":
            return "openai"
        if choice in {"openai", "gemini"}:
            return choice
        print("Please type 'openai' or 'gemini'.")

def ensure_api_key(provider: str) -> None:
    """
    Prompt for the provider's API key if missing, and store in env.
    - OpenAI: OPENAI_API_KEY
    - Gemini: GOOGLE_API_KEY (LangChain checks this first; GEMINI_API_KEY is also supported as fallback)
    """
    if provider == "openai":
        if not os.getenv("OPENAI_API_KEY"):
            key = getpass.getpass("Enter OPENAI_API_KEY (input hidden): ").strip()
            if not key:
                raise RuntimeError("OPENAI_API_KEY was not provided.")
            os.environ["OPENAI_API_KEY"] = key

    elif provider == "gemini":
        # Prefer GOOGLE_API_KEY because that's what LangChain docs show; GEMINI_API_KEY is also accepted.
        if not (os.getenv("GOOGLE_API_KEY") or os.getenv("GEMINI_API_KEY")):
            key = getpass.getpass("Enter GOOGLE_API_KEY (Gemini) (input hidden): ").strip()
            if not key:
                raise RuntimeError("GOOGLE_API_KEY was not provided.")
            os.environ["GOOGLE_API_KEY"] = key

    else:
        raise ValueError("Unknown provider. Use 'openai' or 'gemini'.")

def pick_model_id(provider: str, override: Optional[str]) -> str:
    if override:
        return override
    return OPENAI_DEFAULT_MODEL if provider == "openai" else GEMINI_DEFAULT_MODEL

# -----------------------------
# Storage layout: ./submissions/<student_id>/*
# -----------------------------
ROOT = Path("./submissions").resolve()
ROOT.mkdir(exist_ok=True)

def _student_dir(student_id: str) -> Path:
    p = (ROOT / student_id).resolve()
    if ROOT not in p.parents:
        raise ValueError("Invalid student_id (path traversal blocked).")
    p.mkdir(exist_ok=True)
    return p

def _list_files(student_id: str) -> List[str]:
    d = _student_dir(student_id)
    return sorted([p.name for p in d.iterdir() if p.is_file()])

def _read_file(student_id: str, filename: str) -> str:
    d = _student_dir(student_id)
    p = (d / filename).resolve()
    if d not in p.parents:
        raise ValueError("Invalid filename (path traversal blocked).")
    if not p.exists():
        return ""
    return p.read_text(encoding="utf-8")

def _append_outbox(text: str) -> None:
    outbox = ROOT / "OUTBOX.txt"
    existing = outbox.read_text(encoding="utf-8") if outbox.exists() else ""
    outbox.write_text(existing + text, encoding="utf-8")

# -----------------------------
# Tools (LangChain)
# -----------------------------
@tool
def list_submission_files(student_id: str) -> List[str]:
    """List files in a student's submission folder."""
    return _list_files(student_id)

@tool
def read_submission_file(student_id: str, filename: str) -> str:
    """Read a file from a student's submission folder."""
    text = _read_file(student_id, filename)
    if text == "":
        return f"(empty or missing) {filename}"
    return text

@tool
def auto_validate(student_id: str) -> Dict[str, Any]:
    """
    Run simple validity checks and return a report + recommended verdict.
    Verdict: 'valid' or 'resubmit'
    """
    files = _list_files(student_id)
    required = {"report.md", "solution.py"}
    missing_files = sorted(list(required - set(files)))

    report = _read_file(student_id, "report.md")
    solution = _read_file(student_id, "solution.py")

    required_headings = ["# Problem", "# Method", "# Results"]
    missing_headings = [h for h in required_headings if h not in report]

    has_required_function = "def solve(" in solution

    issues = []
    if missing_files:
        issues.append(f"Missing required files: {missing_files}")
    if "report.md" in files and missing_headings:
        issues.append(f"Missing required headings in report.md: {missing_headings}")
    if "solution.py" in files and not has_required_function:
        issues.append("solution.py missing required function signature: def solve(...)")

    recommended_verdict = "valid" if not issues else "resubmit"

    recommended_message = (
        "✅ Your submission looks valid. Nice work!"
        if recommended_verdict == "valid"
        else "⚠️ Please fix the issues listed and resubmit."
    )

    return {
        "student_id": student_id,
        "files": files,
        "issues": issues,
        "recommended_verdict": recommended_verdict,
        "recommended_message": recommended_message,
    }

@tool
def record_verdict(student_id: str, verdict: str, notes: str) -> str:
    """
    Record the official verdict (sensitive).
    Writes to ./submissions/verdicts.json
    """
    out = ROOT / "verdicts.json"
    data = json.loads(out.read_text(encoding="utf-8")) if out.exists() else {}
    data[student_id] = {"verdict": verdict, "notes": notes}
    out.write_text(json.dumps(data, indent=2), encoding="utf-8")
    return f"Recorded verdict for {student_id}: {verdict}"

@tool
def message_student(student_id: str, message: str) -> str:
    """
    Mock messaging (sensitive).
    Appends to ./submissions/OUTBOX.txt instead of actually sending email.
    """
    _append_outbox(f"\n=== TO {student_id} ===\n{message}\n")
    return f"Queued message to {student_id} (see submissions/OUTBOX.txt)"

# -----------------------------
# Console HITL "review UI"
# -----------------------------
def _prompt_decision(tool_name: str, args: Dict[str, Any], allowed: List[str]) -> Dict[str, Any]:
    print("\n--- HUMAN REVIEW REQUIRED ---")
    print(f"Tool: {tool_name}")
    print("Proposed args:")
    print(json.dumps(args, indent=2))
    print(f"Allowed decisions: {allowed}")

    while True:
        choice = input("Type approve / reject / edit: ").strip().lower()
        if choice == "approve" and "approve" in allowed:
            return {"type": "approve"}
        if choice == "reject" and "reject" in allowed:
            return {"type": "reject"}
        if choice == "edit" and "edit" in allowed:
            print(
                "Paste edited args as JSON "
                "(e.g. {\"student_id\": \"student_001\", \"verdict\": \"valid\", \"notes\": \"...\"})"
            )
            edited_args = json.loads(input("> ").strip())
            return {"type": "edit", "edited_action": {"name": tool_name, "args": edited_args}}
        print("Invalid choice for this tool. Try again.")

# -----------------------------
# Main runner
# -----------------------------
def run(student_id: str, provider: str, model_id: str) -> None:
    # Ensure correct key exists before model init
    ensure_api_key(provider)

    checkpointer = MemorySaver()

    # init_chat_model accepts provider:model identifiers like openai:... and google_genai:...
    model = init_chat_model(model_id)

    agent = create_deep_agent(
        model=model,
        tools=[
            list_submission_files,
            read_submission_file,
            auto_validate,
            record_verdict,
            message_student,
        ],
        system_prompt=(
            "You are a TA agent.\n"
            "Workflow:\n"
            "1) Call auto_validate(student_id).\n"
            "2) Summarize the issues (if any).\n"
            "3) Propose record_verdict(student_id, verdict, notes).\n"
            "4) If helpful, propose message_student(student_id, message).\n"
            "Keep notes short and factual."
        ),
        interrupt_on={
            # Sensitive: human must approve/edit/reject official verdict
            "record_verdict": True,  # default allows approve/edit/reject
            # Sensitive: outbound message needs approval (no edit allowed here)
            "message_student": {"allowed_decisions": ["approve", "reject"]},
            # Safe: no interrupts
            "auto_validate": False,
            "read_submission_file": False,
            "list_submission_files": False,
        },
        checkpointer=checkpointer,
    )

    config = {"configurable": {"thread_id": str(uuid.uuid4())}}
    user_prompt = (
        f"Validate {student_id}. "
        "Run auto checks, then record an official verdict, and message the student with next steps."
    )

    result = agent.invoke({"messages": [{"role": "user", "content": user_prompt}]}, config=config)

    while result.get("__interrupt__"):
        payload = result["__interrupt__"][0].value
        action_requests = payload["action_requests"]
        review_configs = {cfg["action_name"]: cfg for cfg in payload["review_configs"]}

        decisions = []
        for action in action_requests:
            name = action["name"]
            args = action["args"]
            allowed = review_configs[name]["allowed_decisions"]
            decisions.append(_prompt_decision(name, args, allowed))

        result = agent.invoke(Command(resume={"decisions": decisions}), config=config)

    print("\n=== FINAL ASSISTANT MESSAGE ===")
    print(result["messages"][-1].content)

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--provider", choices=["openai", "gemini"], help="Model provider")
    parser.add_argument("--model", help="Override model id (e.g., openai:gpt-4o-mini or google_genai:gemini-2.5-flash-lite)")
    parser.add_argument("--student", default="student_001", help="Student submission folder name")
    parser.add_argument("--seed", action="store_true", help="Create a demo submission if missing")
    args, unknown = parser.parse_known_args() # Modified line

    provider = choose_provider(args.provider)
    model_id = pick_model_id(provider, args.model)

    # Optional demo seed
    if args.seed:
        sid = args.student
        sdir = _student_dir(sid)
        if not (sdir / "report.md").exists():
            (sdir / "report.md").write_text("# Problem\n...\n# Method\n...\n# Results\n...\n", encoding="utf-8")
        if not (sdir / "solution.py").exists():
            (sdir / "solution.py").write_text("def solve(x):\n    return x\n", encoding="utf-8")
        print(f"Seeded demo submission in: {sdir}")

    run(args.student, provider, model_id)
