# Phase 1: Automated Problem Bank Generation

### **Overview**
This notebook contains the complete automated pipeline for **Phase 1 (Step 1)** of the methodology. It utilizes three frontier Large Language Models (LLMs) acting as "curriculum designers" to collaboratively generate a standardized dataset of C programming problems.

### **Methodology**
To prevent stylistic bias in the problem descriptions, the generation task is distributed across three high-capacity "teacher" models:
* **Llama-3.3-70b** (Dense Transformer)
* **GPT-OSS 120B** (Sparse Mixture-of-Experts)
* **Moonshot Kimi K2** (MoE with Latent Reasoning)

Each model generates problems across three core introductory C programming competencies, escalating in cognitive load:
1. Pointers and Pointer Arithmetic
2. Dynamic Memory Allocation (DMA)
3. Data Structures (Singly Linked Lists)

**Target Output:** A diverse, mixed-source dataset of 300 unique problems (100 per topic) saved in a standardized format, ready to be ingested by the Phase 2 Audit Pipeline.

---
**Note for Double-Blind Review:** This codebase has been fully anonymized. Please ensure you provide your own API keys in the environment variables before execution.

### 1. Environment Setup & Dependency Installation
This cell initializes the Google Colab environment for the generation pipeline. It installs the `groq` client library for high-throughput LLM API access, imports necessary Python modules for JSON parsing and regex pattern matching, and mounts the local Google Drive.

Mounting the drive ensures that the generated 300-problem dataset is persistently saved and can be cleanly ingested by the Phase 2 Audit Pipeline.

> **Note:** For double-blind execution, you must provide your own Groq API key by replacing the placeholder string below.

In [None]:
# 1. Install the official Groq library
!pip install -q groq

import os, json, time, re
from google.colab import drive
from groq import Groq
from datetime import datetime

# 2. Set API Key (ANONYMIZED FOR DOUBLE-BLIND REVIEW)
# Reviewers: Please insert your own Groq API key below.
os.environ["GROQ_API_KEY"] = "YOUR_GROQ_API_KEY_HERE"
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))

# 3. Mount Google Drive (for persistent storage of the generated problem bank)
drive.mount('/content/drive')

### 2. Phase 1: Problem Bank Generation (Teacher Models)
This cell executes **Step 1** of the methodology: generating novel, standardized C programming problems.

**Execution Note:** To mitigate Google Colab environment timeouts and prevent data loss during long-running API batches, this script is explicitly designed for modular execution. The researcher selects one `MODEL_NAME` and one `topic_index` per execution block, appending the outputs to a continuous JSONL file.

The prompt engineers the model to act as a Computer Science professor, enforcing strict pedagogical constraints (e.g., clear inputs/outputs, mandatory struct usage, and C11 compliance) to ensure high-quality standardized inputs for the Phase 2 audit.

In [None]:
# ==========================================
# --- MULTI-TURN GENERATION PIPELINE ---
# ==========================================
import json
import time
from datetime import datetime

# Uncomment ONE target model per Colab execution session:
# MODEL_NAME = "llama-3.3-70b-versatile"
MODEL_NAME = "openai/gpt-oss-120b"
# MODEL_NAME = "moonshotai/kimi-k2-instruct-0905"

current_date = datetime.now().strftime("%Y%m%d")

output_folder = "/content/drive/MyDrive/CANAI_LLM_Results/"

if not os.path.exists(output_folder):
    os.makedirs(output_folder, exist_ok=True)
    print(f"Created directory: {output_folder}")

# 1. TOPIC SELECTION SYSTEM
topics = [
    "Pointers and Pointer Arithmetic",
    "Dynamic Memory Allocation (malloc, free)",
    "Implementing Data Structures (e.g., Singly Linked Lists)"
]
topic_index = 2
SELECTED_TOPIC = topics[topic_index]

# 2. MANUAL BATCH CONTROL (e.g., run 0-50, then 50-100 to prevent timeouts)
start_at = 0
end_at = 2

# Dynamic filename for safe storage
filename = f"/content/drive/MyDrive/CANAI_LLM_Results/{current_date}_{MODEL_NAME.replace('/', '_')}_{start_at}-{end_at}_{SELECTED_TOPIC.replace(' ', '_')}.jsonl"

# 3. THE 6-STEP PROMPT CHAIN
prompts = [
    f"""# STEP 1: PROBLEM
    Topic: {SELECTED_TOPIC}
    You are a Computer Science professor designing an undergraduate curriculum.
    Generate a novel programming problem on {SELECTED_TOPIC}. Your response must begin with the header # STEP 1: PROBLEM.
    The problem statement must be clear, unambiguous, and suitable for a student who has just learned this topic.
    Requirements:
    1. Clear background story or context.
    2. A precise list of requirements for the program's functionality.
    3. Simple Example of expected Input/Output.
    4. An additional constraint, such as 'Must use a struct to represent the primary data entity.', 'Logic for displaying the details of ONE specific entity must be in a function called displayEntity.' or 'The solution must be implemented with a single function besides main()'. The constraint(s) should be listed under the header ### CONSTRAINTS.
    5. MANDATORY CONSTRAINTS IF A MENU IS IMPLEMENTED:
       - Must include a specific menu option to EXIT the program (clearly state the number or keyword).""",

    f"""# STEP 2: SOLUTION
    Based on the problem you just generated, provide a complete and correct C solution. Your response must begin with the header # STEP 2: SOLUTION.
    The code must be well-commented to explain the logic of key sections. It must follow modern C standards (e.g., C11), include all necessary headers, and be formatted for readability.
    CRITICAL:
    - The code MUST check the return value of all malloc/realloc calls.
    - All allocated memory MUST be freed before exit.
    - Follow the constraints outlined in the previous STEP 1: PROBLEM statement.""",

    f"""# STEP 3: EXPLANATION
    Regarding the solution code you just provided, write a clear, step-by-step explanation of how it works. Your response must begin with the header # STEP 3: EXPLANATION. Your explanation should be aimed at a student who understands the basic syntax of C but is struggling with {SELECTED_TOPIC}.
    Do not just describe what the code does line-by-line; explain the underlying concepts and the 'why' behind the implementation decisions.""",

    f"""# STEP 4: HINTS
    Now, imagine a student is stuck on the original problem you created and has not yet seen the solution.
    Generate a series of three progressively more helpful hints to guide them. The hints must not give away the code.
    Your response must begin with the header # STEP 4: HINTS.
    CRITICAL GUARDRAIL: The hints must not give away any actual C code syntax.
    Hint 1: Should be a high-level conceptual nudge about the overall approach.
    Hint 2: Should point them toward a specific part of the problem or a key C feature to use.
    Hint 3: Should be more direct, suggesting a specific logic structure or the first step to take.""",

    f"""# STEP 5: SUMMARY
    Finally, provide a concise summary of the key learning objectives that this problem-solution pair covers. The summary should be in bullet-point format and highlight the main C programming concepts a student would master by completing this exercise. Your response must begin with the header # STEP 5: SUMMARY.""",

    f"""# STEP 6: TEST CASES
    For the problem you generated, create a comprehensive suite of 5 test cases. Include at least one common case, one edge case (e.g., empty input, null pointer, zero value), and one invalid input case to test the program's error handling. Your response must begin with the header # STEP 6: TEST CASES.

    CRITICAL FOR AUTOMATION:
    After your descriptions, you MUST provide a machine-readable JSON block
    containing the raw strings that a user would type to execute these tests. Ensure newlines within the JSON string are represented as literal '\\n' characters and not actual line breaks.

    FORMAT:
    ```json
    {{
      "exit_command": "4",
      "test_suite": [
        {{"input": "1\\nJohn\\n100", "expected_keyword": "John"}},
        {{"input": "2\\nJohn", "expected_keyword": "removed"}}
      ]
    }}
    ```"""
]

print(f"Current Topic: {SELECTED_TOPIC}")
print(f"Saving results to: {filename}")
print("-" * 65)

# 4. EXECUTION LOOP
for i in range(start_at, end_at):
    print(f"\nSTARTING ITERATION {i+1}...")

    # Structure matches Phase 3 parsing requirements
    module_data = {"iteration": i+1, "topic": SELECTED_TOPIC, "model": MODEL_NAME, "steps": {}}
    history = [{"role": "system", "content": "You are a CS Professor and Socratic Tutor."}]

    step = 0
    while step < len(prompts):
        try:
            history.append({"role": "user", "content": prompts[step]})

            completion = client.chat.completions.create(
                model=MODEL_NAME,
                messages=history,
                temperature=0.7 # 0.7 ensures novel problem generation across the 100 iterations
            )
            answer = completion.choices[0].message.content

            print(f"--- Iteration {i+1} | Step {step+1} Generated ---")
            # Preview the response cleanly by removing rapid newlines
            preview_text = answer.replace('\n', ' ')[:150]
            print(f"{preview_text}...")

            # Save step data and append to conversational memory
            module_data["steps"][f"step_{step+1}"] = answer
            history.append({"role": "assistant", "content": answer})

            step += 1
            time.sleep(12) # Safety buffer to respect API rate limits

        except Exception as e:
            print(f"\n!!!! API Error: {e}")
            print("Action: Stop cell, wait for token reset, and resume from current iteration.")
            raise

    # Append the fully generated 6-step iteration to the JSONL file
    with open(filename, "a", encoding="utf-8") as f:
        f.write(json.dumps(module_data) + "\n")

    print(f"Iteration {i+1} successfully completed and saved.")

### 3. Artifact Generation: Human-Readable Markdown Report
While the JSONL format is necessary for the automated ingestion pipeline in Phase 2, this cell generates a supplementary human-readable Markdown (`.md`) report.

This allows researchers and peer reviewers to easily spot-check the qualitative aspects of the generated problem bank (e.g., verifying pedagogical constraint adherence and narrative context) without needing to parse the raw data structures.

In [None]:
# ==========================================
# --- HUMAN-READABLE REPORT GENERATION ---
# ==========================================
import json

# Sanitize variables for safe file naming
safe_model_name = MODEL_NAME.replace('/', '_')
safe_topic_name = SELECTED_TOPIC.replace(' ', '_')

report_file = f"/content/drive/MyDrive/CANAI_LLM_Results/Readable_Report_{current_date}_{safe_model_name}_{safe_topic_name}_{start_at}-{end_at}.md"

print("Converting raw JSONL data into human-readable Markdown...")

# Convert raw JSONL data into a clean, formatted Markdown file
# Added encoding="utf-8" to safely handle LLM-generated special characters
with open(filename, "r", encoding="utf-8") as f, open(report_file, "w", encoding="utf-8") as out:
    out.write(f"# C Education Research Report: {SELECTED_TOPIC}\n")
    out.write(f"**Model:** {MODEL_NAME} | **Date:** {current_date}\n\n")

    for line in f:
        try:
            data = json.loads(line)
        except json.JSONDecodeError:
            continue

        if not data.get("steps"):
            continue

        out.write(f"## Iteration {data.get('iteration', 'Unknown')}\n")

        # Dynamically loops through all available steps (Steps 1-6)
        for key, value in data["steps"].items():
            # Formats "step_1" into "STEP 1"
            clean_header = key.replace("_", " ").upper()
            out.write(f"### {clean_header}\n")
            out.write(f"{value}\n\n")

        out.write("---\n\n")

print(f"Readable report successfully created in Drive:")
print(f"{report_file}")