# Phase 2: Solution Synthesis and Artifact Generation

### **Overview**
This notebook contains the execution pipeline for **Phase 2 (Steps 2-6)** of the methodology. It ingests the standardized C programming problems generated in Phase 1 and systematically prompts the target Large Language Models (LLMs) to generate the corresponding solutions and pedagogical artifacts.

### **Methodology**
For each problem in the generated dataset, the pipeline maintains a continuous conversational context and tasks the evaluated models with producing:
* **Step 2 (Solution):** A fully implemented, standard-compliant (C11) code solution.
* **Step 3 (Explanation):** A conceptual, step-by-step breakdown of the underlying logic.
* **Step 4 (Hints):** Progressively helpful, syntax-free conceptual nudges for students.
* **Step 5 (Summary):** A bulleted list of core learning objectives mapped to the solution.
* **Step 6 (Test Cases):** A machine-readable JSON array of inputs/outputs for dynamic testing.

**Evaluated Models:** This synthesis is performed across the four target architectures analyzed in the study: Llama-3.3-70b, Moonshot Kimi K2, OpenAI (GPT-OSS 120B), and Qwen 3 32B.

**Target Output:** A comprehensive, consolidated JSONL dataset containing the complete 1195-iteration corpus, which serves as the direct input for the Phase 3 Automated Audit Pipeline.

---
**Note for Double-Blind Review:** This codebase has been fully anonymized. Please ensure you provide your own API keys in the environment variables before execution, and update the file paths to point to your Phase 1 output files.

### 1. Environment Setup & Dependency Installation
This cell initializes the Python environment required for the Phase 2 Solution Synthesis pipeline. It installs the `groq` client library, imports necessary data-handling modules, and mounts the local Google Drive.

Mounting the drive establishes the file path connection needed to read the problem bank generated in Phase 1, and allows for persistent, batch-by-batch saving of the generated solutions.

> **Note:** For double-blind execution, you must provide your own Groq API key by replacing the placeholder string below.

In [None]:
# 1. Install the official Groq library
!pip install -q groq

import os, json, time, re
from google.colab import drive
from groq import Groq
from datetime import datetime

# 2. Set API Key (ANONYMIZED FOR DOUBLE-BLIND REVIEW)
# Reviewers: Please insert your own Groq API key below.
os.environ["GROQ_API_KEY"] = "YOUR_GROQ_API_KEY_HERE"
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))

# 3. Mount Google Drive (for persistent storage of the generated problem bank)
drive.mount('/content/drive')

### 2. Data Ingestion & Standardization (The 300-Problem Bank)
Before solution synthesis begins, the raw outputs from Phase 1 must be aggregated, deduplicated, and standardized.

This cell scans the Phase 1 output directories, extracts the generated `step_1` problem statements, and preserves the provenance metadata (which model generated it and during which iteration). It then enforces the dataset constraints by batching exactly 100 unique problems per topic.

To facilitate human verification alongside automated processing, this script outputs both machine-readable JSON batches and formatted Markdown previews.

In [None]:
# Cell 2
# --- PROBLEM EXTRACTION & BATCHING SYSTEM WITH METADATA ---
import os, json, glob

# Ensure these folders exist in your Drive
input_folder = "/content/drive/MyDrive/CANAI_LLM_Results"
output_folder = "/content/drive/MyDrive/CANAI_LLM_Results/Standardized_Problems/"

if not os.path.exists(output_folder):
    os.makedirs(output_folder, exist_ok=True)
    print(f"Created directory: {output_folder}")

eval_folder = "/content/drive/MyDrive/CANAI_LLM_Results/Eval/"

if not os.path.exists(eval_folder):
    os.makedirs(eval_folder, exist_ok=True)
    print(f"Created directory: {eval_folder}")

# Map filenames to your three specific topics
topic_map = {
    "Pointers_and_Pointer_Arithmetic": [],
    "Dynamic_Memory_Allocation_(malloc,_free)": [],
    "Implementing_Data_Structures_(e.g.,_Singly_Linked_Lists)": []
}

# 1. Extract Step 1 and Metadata from all JSONL files
jsonl_files = glob.glob(os.path.join(input_folder, "*.jsonl"))
print(f"Found {len(jsonl_files)} JSONL files to process.")

for file_path in jsonl_files:
    # Identify topic from filename
    current_topic = None
    for topic_key in topic_map.keys():
        if topic_key in file_path:
            current_topic = topic_key
            break

    if not current_topic: continue

    with open(file_path, "r") as f:
        for line in f:
            try:
                data = json.loads(line)
                if "step_1" in data.get("steps", {}):
                    problem_text = data["steps"]["step_1"]
                    # Extract source metadata
                    source_model = data.get("model", "Unknown Model")
                    source_iter = data.get("iteration", "N/A")

                    # Create a metadata package
                    problem_entry = {
                        "text": problem_text,
                        "model": source_model,
                        "iteration": source_iter
                    }

                    # Check for duplicates using the text only
                    existing_texts = [p["text"] for p in topic_map[current_topic]]
                    if problem_text not in existing_texts:
                        topic_map[current_topic].append(problem_entry)
            except:
                continue

# 2. Save into batches of 100 (BOTH JSON AND MD)
for topic, problems in topic_map.items():
    print(f"\nProcessing Topic: {topic} | Unique Problems: {len(problems)}")

    for i in range(0, len(problems), 100):
        batch = problems[i : i + 100]
        if not batch: continue

        batch_num = (i // 100) + 1

        # Define Filenames
        json_batch_filename = f"{output_folder}{topic}_Batch_{batch_num}.json"
        md_preview_filename = f"{output_folder}{topic}_Batch_{batch_num}_PREVIEW.md"

        # --- SAVE JSON BATCH (Contains objects instead of just strings) ---
        with open(json_batch_filename, "w") as jf:
            json.dump(batch, jf)

        # --- SAVE MD PREVIEW (With formatted headers) ---
        with open(md_preview_filename, "w") as md_file:
            md_file.write(f"# Problem Batch Preview: {topic.replace('_', ' ')}\n")
            md_file.write(f"**Batch Number:** {batch_num} | **Total Problems:** {len(batch)}\n\n")
            md_file.write("---\n\n")

            for idx, prob in enumerate(batch):
                # Format: ## Problem X - Model Name - Iteration Y
                header = f"## Problem {idx + 1} - {prob['model']} - Iteration {prob['iteration']}"
                md_file.write(f"{header}\n")
                md_file.write(f"{prob['text']}\n\n")
                md_file.write("---\n\n")

        print(f"Saved Batch {batch_num}:")
        print(f"    - JSON: {os.path.basename(json_batch_filename)}")
        print(f"    - MD:   {os.path.basename(md_preview_filename)}")

### 3. Solution Synthesis & Artifact Generation (Solver Mode)
This cell executes **Steps 2 through 6** of the methodology. It reads the standardized problem batches from Phase 1 and tasks the evaluated models with synthesizing the C11-compliant solutions, Socratic hints, conceptual explanations, and machine-readable JSON test suites.

**Double-Blind Reviewer Note:** A `TEST_MODE` toggle is provided below. When set to `True`, the script will only process the first 5 problems of the selected batch. This allows for quick verification of the prompt chain and dynamic test generation without incurring significant API token costs or requiring long execution times.

In [None]:
# --- RESEARCH CONFIGURATION (SOLVER MODE) ---
#MODEL_NAME = "llama-3.3-70b-versatile"
#MODEL_NAME = "openai/gpt-oss-120b"
MODEL_NAME = "qwen/qwen3-32b"
#MODEL_NAME = "moonshotai/kimi-k2-instruct-0905"
current_date = datetime.now().strftime("%Y%m%d")

# 1. SELECT THE BATCH TO SOLVE
topics = [
    "Pointers_and_Pointer_Arithmetic",
    "Dynamic_Memory_Allocation_(malloc,_free)",
    "Implementing_Data_Structures_(e.g.,_Singly_Linked_Lists)"
]
SELECTED_TOPIC = topics[0] # 0, 1, or 2
BATCH_NUM = 1 # 1, 2, or 3

# Load the standardized problems
input_batch_file = f"/content/drive/MyDrive/CANAI_LLM_Results/Standardized_Problems/{SELECTED_TOPIC}_Batch_{BATCH_NUM}.json"
with open(input_batch_file, "r") as f:
    ALL_PROBLEMS = json.load(f)


# --- TEST OVERRIDE: Change this to True to run only 5 problems ---
TEST_MODE = False

if TEST_MODE:
   PROBLEM_LIST = ALL_PROBLEMS[:5] # Takes only the first 5
   print(f"TEST MODE ACTIVE: Processing only {len(PROBLEM_LIST)} problems.")
else:
   PROBLEM_LIST = ALL_PROBLEMS
# ----------------------------------------------------------------

# Output filename
filename = f"/content/drive/MyDrive/CANAI_LLM_Results/Eval/SOLVED_{current_date}_{MODEL_NAME.replace('/', '_')}_{SELECTED_TOPIC}_B{BATCH_NUM}.jsonl"

# We removed Step 1 from prompts since we already have it
prompts = [
    """# STEP 2: SOLUTION
    Based on the problem provided in the previous message, provide a complete and correct C solution. Your response must begin with the header # STEP 2: SOLUTION.
    The code must be well-commented to explain the logic of key sections. It must follow modern C standards (e.g., C11), include all necessary headers, and be formatted for readability.
    CRITICAL:
    - The code MUST check the return value of all malloc/realloc calls.
    - All allocated memory MUST be freed before exit.
    - Follow the constraints outlined in the problem.""",

    f"""# STEP 3: EXPLANATION
    Regarding the solution code you just provided, write a clear, step-by-step explanation of how it works. Your response must begin with the header # STEP 3: EXPLANATION.
    Your explanation should be aimed at a student who understands the basic syntax of C but is struggling with {SELECTED_TOPIC}.
    Do not just describe what the code does line-by-line; explain the underlying concepts and the 'why' behind the implementation decisions.""",

    f"""# STEP 4: HINTS
    Now, imagine a student is stuck on the original problem you created and has not yet seen the solution.
    Generate a series of three progressively more helpful hints to guide them. The hints must not give away the code.
    Your response must begin with the header # STEP 4: HINTS.
    CRITICAL GUARDRAIL: The hints must not give away any actual C code syntax.
    Hint 1: Should be a high-level conceptual nudge about the overall approach.
    Hint 2: Should point them toward a specific part of the problem or a key C feature to use.
    Hint 3: Should be more direct, suggesting a specific logic structure or the first step to take.""",

    f"""# STEP 5: SUMMARY
    Step 5: Finally, provide a concise summary of the key learning objectives that this problem-solution pair covers.
    The summary should be in bullet-point format and highlight the main C programming concepts a student would master by completing this exercise.
    Your response must begin with the header # STEP 5: SUMMARY.""",

    f"""# STEP 6: TEST CASES
    For the problem you generated, create a comprehensive suite of 5 test cases. Include at least one common case, one edge case (e.g., empty input, null pointer, zero value), and one invalid input case to test the program's error handling. Your response must begin with the header # STEP 6: TEST CASES.

    CRITICAL FOR AUTOMATION:
    After your descriptions, you MUST provide a machine-readable JSON block.
    containing the raw strings that a user would type to execute these tests. Ensure newlines within the JSON string are represented as literal '\n' characters and not actual line breaks.

    FOLLOW THIS EXACT STRUCTURE AND FORMAT FOR YOUR JSON BLOCK (i.e., exit_command, test_suite, input, expected_keyword):
    ```json
    {{
      "exit_command": "4",
      "test_suite": [
        {{"input": "1\\nJohn\\n100", "expected_keyword": "John"}},
        {{"input": "2\\nJohn", "expected_keyword": "removed"}}
      ]
    }}
    ```"""
]

print(f"Solving Topic: {SELECTED_TOPIC} | Batch: {BATCH_NUM}")
print(f"Saving to: {filename}")

# Process each problem in the batch
for idx, problem_entry in enumerate(PROBLEM_LIST):
    # Retrieve the text from the dictionary
    problem_text = problem_entry["text"]
    original_model = problem_entry["model"]
    original_iter = problem_entry["iteration"]

    print(f"\nSOLVING PROBLEM {idx+1} (Source: {original_model} Iter: {original_iter})...")

    module_data = {
        "iteration": idx + 1,
        "topic": SELECTED_TOPIC,
        "model": MODEL_NAME, # Current solver model
        "source_metadata": f"{original_model}_iter_{original_iter}", # Preserve origin
        "steps": {"step_1": problem_text}
    }

    # Initialize history with the specific problem to be solved
    history = [
        {"role": "system", "content": "You are a CS Professor and Socratic Tutor."},
        {"role": "user", "content": f"Here is a programming problem. I need you to provide the solution and educational content for it.\n\n{problem_text}"},
        {"role": "assistant", "content": "I have received the problem. Please provide the next instructions."}
    ]

    step_idx = 0
    while step_idx < len(prompts):
        try:
            history.append({"role": "user", "content": prompts[step_idx]})
            completion = client.chat.completions.create(
                model=MODEL_NAME,
                messages=history,
                temperature=0.0,
                reasoning_effort="none" #For Qwen3 only. Comment this part for other models
            )
            answer = completion.choices[0].message.content

            print(f"--- Step {step_idx + 2} Completed ---")

            # Key matches your original format (step_2, step_3, etc.)
            module_data["steps"][f"step_{step_idx + 2}"] = answer
            history.append({"role": "assistant", "content": answer})

            step_idx += 1
            time.sleep(12)

        except Exception as e:
            print(f"!!!! API Error: {e}")
            raise

    # Save to JSONL after each problem is fully solved
    with open(filename, "a") as f:
        f.write(json.dumps(module_data) + "\n")

print(f"Batch completed.")



### 4. Artifact Generation: Solved Markdown Report
This final cell converts the fully synthesized, machine-readable JSONL dataset into a human-readable Markdown (`.md`) format.

This supplementary artifact allows peer reviewers and educators to easily perform qualitative assessments of the generated solutions, pedagogical explanations, and Socratic hints, while explicitly tracking the provenance (source model and iteration) of the original problem constraint.

In [None]:
# Re-initialize safe names in case this cell is run independently
safe_model_name = MODEL_NAME.replace('/', '_')
safe_topic_name = SELECTED_TOPIC.replace(' ', '_')

filename = f"/content/drive/MyDrive/CANAI_LLM_Results/Eval/SOLVED_{current_date}_{safe_model_name}_{safe_topic_name}_B{BATCH_NUM}.jsonl"
report_file = f"/content/drive/MyDrive/CANAI_LLM_Results/Eval/Readable_Report_SOLVED_{current_date}_{safe_model_name}_{safe_topic_name}_B{BATCH_NUM}.md"

with open(filename, "r") as f, open(report_file, "w") as out:
    out.write(f"# C Education Standardized Research Report: {SELECTED_TOPIC.replace('_', ' ')}\n")
    out.write(f"**Solver Model:** {MODEL_NAME} | **Date:** {current_date} | **Batch:** {BATCH_NUM}\n\n")
    out.write("---\n\n")

    for line in f:
        data = json.loads(line)
        if not data["steps"]: continue

        # Display Iteration and the Original Source Metadata
        source_info = data.get("problem_source", "Unknown Source")
        out.write(f"## Iteration {data['iteration']} (Problem Source: {source_info})\n")

        for step_num in range(1, 7):
            key = f"step_{step_num}"
            out.write(f"### {key.upper()}\n")
            out.write(f"{data['steps'].get(key, 'Empty Step')}\n\n")
        out.write("---\n\n")

print(f"Readable report created: {os.path.basename(report_file)}")