<a href="https://colab.research.google.com/github/jeffheaton/app_generative_ai/blob/main/t81_559_class_04_5_coder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-559: Applications of Generative Artificial Intelligence
**Module 4: LangChain: Chat and Memory**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Module 4 Material

* Part 4.1: LangChain Conversations [[Video]]() [[Notebook]](t81_559_class_04_1_langchain_chat.ipynb)
* Part 4.2: Conversation Buffer Window Memory [[Video]]() [[Notebook]](t81_559_class_04_2_memory_buffer.ipynb)
* Part 4.3: Chat with Summary and Fixed Window [[Video]]() [[Notebook]](t81_559_class_04_3_summary.ipynb)
* Part 4.4: Chat with Persistence, Rollback and Regeneration [[Video]]() [[Notebook]](t81_559_class_04_4_persistence.ipynb)
* **Part 4.5: Automated Coder Application** [[Video]]() [[Notebook]](t81_559_class_04_5_coder.ipynb)

# Google CoLab Instructions

The following code ensures that Google CoLab is running and maps Google Drive if needed.

In [None]:
import os

try:
    from google.colab import drive, userdata
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

# OpenAI Secrets
if COLAB:
    os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

# Install needed libraries in CoLab
if COLAB:
    !pip install langchain langchain_openai
    !wget -q https://raw.githubusercontent.com/jeffheaton/app_generative_ai/main/util_chat.py -O util_chat.py

# 4.5: Coder Assistant

This script builds directly on the **ChatConversation** class you developed earlier, extending it into a structured two-agent workflow. Instead of using a single conversational agent, this design sets up both a **code generator** and a **code reviewer**, each implemented as its own ChatConversation. The generator produces candidate Python code for a defined task (in this case, solving the Traveling Salesman Problem with dynamic programming), while the reviewer evaluates the submission against a strict rubric. This setup allows the two agents to interact iteratively, refining solutions until they either meet the acceptance criteria or the process terminates due to stagnation or iteration limits.

At a higher level, the code provides the infrastructure to manage this review loop within a Colab/Jupyter-friendly environment. It handles details like extracting fenced code blocks from model output, repairing syntax if necessary, and enforcing reviewer discipline so that acceptance occurs only when the exact token is returned. With clear logging, persistence of agent state, and flexible configuration, this script demonstrates how the earlier ChatConversation abstraction can be scaled up into a **collaborative system of agents** that generate, critique, and converge on production-ready code.

This block centralizes the switches you will most often tweak in Colab/Jupyter to control how the two-agent loop behaves. You **do not** need to change the orchestration code below; instead, adjust these constants to change verbosity, iteration limits, minimum code length checks, persistence, and the default task prompt. The `IPython.display` import is wrapped with a safe fallback so the script can run outside notebooks without errors.

**What to modify (typical tweaks):**
- `VERBOSE`: Set to `True` to see full generator/reviewer messages during each round. Keep `False` for compact “Iteration N: …” summaries.
- `MAX_ROUNDS`: Upper bound on generator↔reviewer iterations. Increase if the reviewer keeps asking for small fixes and you want more chances to converge.
- `CODE_MIN_CHARS`: Minimal extracted code length from the model’s fenced block. Raise this if you see trivial stubs; lower it if your tasks are intentionally short.
- `SAVE_STATE`: When `True`, each agent persists its memory to disk after revisions. Set `False` for ephemeral runs.
- `GENERATOR_STATE_PATH` / `REVIEWER_STATE_PATH`: Filenames for on-disk memory snapshots. Change these if you want separate runs or to avoid overwriting prior state.
- `TASK`: Default instruction for the generator. Replace the string to target a different coding task without changing any other part of the script.


In [None]:
from __future__ import annotations
from typing import Optional, List, Dict, Any, Tuple
import ast
import hashlib
import html
import json
import logging
import re
import textwrap
import time
from pathlib import Path

try:
    from IPython.display import display_markdown  # type: ignore
except Exception:
    def display_markdown(*args, **kwargs):  # fallback no-op
        pass

# ============================================================
# Notebook constants (tweak here)
# ============================================================
VERBOSE: bool = False                 # <— set False for concise iteration summaries
MAX_ROUNDS: int = 5
CODE_MIN_CHARS: int = 120
SAVE_STATE: bool = True
GENERATOR_STATE_PATH: str = "generator_mem.json"
REVIEWER_STATE_PATH: str = "reviewer_mem.json"

# Task can be overridden from a notebook cell if desired
TASK = "provide traveling salesman solution using dynamic programming"

### Reviewer’s Contract and Code-Extraction Rules

This block defines exactly **how acceptance is signaled** and **what standards are enforced** during review. The reviewer must return the literal `ACCEPT_TOKEN` string `<<<ACCEPTED>>>` with nothing else when the submission is production-ready. The `RUBRIC` is the reviewer’s checklist: it covers DP correctness (recurrence, base cases, path reconstruction), matching complexity claims, robust edge-case handling, testability, readability, and performance hygiene.

To make the pipeline resilient to model formatting quirks, `FENCE_PATTERNS` lists regexes that extract a single Python code block whether the fence is labeled (`python`, `py`) or unlabeled. A `NullHandler` logger is also installed so logging remains silent unless you enable handlers elsewhere. **If you customize anything here**, change `ACCEPT_TOKEN` (and mirror it in the reviewer’s system prompt), refine `RUBRIC` for your domain, or extend `FENCE_PATTERNS` if your model uses unusual fencing.

In [None]:
# ============================================================
# Task / review config
# ============================================================
ACCEPT_TOKEN = "<<<ACCEPTED>>>"

RUBRIC = (
    "Review criteria:\n"
    "• Correctness (dynamic programming recurrence, base cases, path reconstruction)\n"
    "• Complexity claims match implementation (O(n^2·2^n) time, O(n·2^n) memory)\n"
    "• Edge cases (n=0/1, non-square matrix, inf edges)\n"
    "• Testability (clear API, deterministic output)\n"
    "• Readability (docstring, type hints)\n"
    "• Performance (avoid quadratic copies, precompute bitmasks)\n"
)

FENCE_PATTERNS = [
    r"```python\s*\n(.*?)```",
    r"```py\s*\n(.*?)```",
    r"```[a-zA-Z0-9_+\-]*\s*\n(.*?)```",
    r"```\s*\n(.*?)```",
]

logger = logging.getLogger(__name__)
logger.addHandler(logging.NullHandler())




This section establishes the ground rules for how the reviewer agent will signal acceptance and what standards it should apply. The `ACCEPT_TOKEN` is the **exact string** the reviewer must output when code is considered production-ready—nothing else is allowed. The `RUBRIC` defines the evaluation checklist, covering algorithmic correctness, complexity alignment, edge case handling, API clarity, readability, and efficiency. Together, these constraints enforce discipline on the reviewer so it cannot drift into free-form commentary or generate code itself.

Additionally, this block includes regex patterns in `FENCE_PATTERNS` to reliably extract Python code from fenced Markdown blocks, regardless of whether the model labels them as `python`, `py`, or leaves the language tag blank. A lightweight logger is also initialized here, configured with a `NullHandler` so that logging won’t interfere unless explicitly enabled later. These utilities ensure that both the acceptance process and code parsing are strict and deterministic.

In [None]:
# ============================================================
# Helper utilities
# ============================================================
def extract_code_from_markdown(text: str, *, min_chars: int = 120) -> str:
    for pat in FENCE_PATTERNS:
        m = re.search(pat, text, flags=re.DOTALL | re.IGNORECASE)
        if m:
            code = textwrap.dedent(m.group(1)).strip()
            if len(code) >= min_chars:
                return code
    blocks = re.findall(r"```(?:[a-zA-Z0-9_+\-]*)?\s*\n(.*?)```", text, flags=re.DOTALL)
    if blocks:
        code = textwrap.dedent("\n\n".join(b.strip() for b in blocks)).strip()
        if len(code) >= min_chars:
            return code
    return text.strip()

def is_valid_python(code: str) -> Tuple[bool, Optional[str]]:
    try:
        ast.parse(code)
        return True, None
    except SyntaxError as e:
        return False, f"{e.msg} (line {e.lineno}, col {e.offset})"

def ensure_syntax_or_repair(code: str, generator) -> str:
    ok, err = is_valid_python(code)
    if ok:
        return code
    repair_prompt = (
        "The following code has a syntax error.\n"
        f"Error: {err}\n\n"
        "Return ONLY a single ```python fenced block with a syntactically valid fix. "
        "Keep functionality identical where possible.\n\n"
        f"```python\n{code}\n```"
    )
    repaired = generator.invoke(repair_prompt).content
    return extract_code_from_markdown(repaired, min_chars=80)

def _digest(s: str) -> str:
    return hashlib.sha256(s.encode("utf-8")).hexdigest()

def _normalize_accept_text(s: str) -> str:
    s = html.unescape(s or "").strip()
    m = re.fullmatch(r"```(?:\w+)?\s*\n?(.*?)\n?```", s, flags=re.DOTALL)
    if m:
        s = m.group(1).strip()
    if (s.startswith('"') and s.endswith('"')) or (s.startswith("'") and s.endswith("'")):
        s = s[1:-1].strip()
    return s

def is_strict_accept(review_text: str, token: str) -> bool:
    return _normalize_accept_text(review_text) == token

def ask_generator_strict(generator, prompt: str, retries: int = 2, backoff: float = 0.8) -> str:
    last_reply = ""
    for i in range(retries + 1):
        last_reply = generator.invoke(prompt).content
        code = extract_code_from_markdown(last_reply, min_chars=100)
        if code != last_reply or "```" in last_reply:
            return code
        if i < retries:
            time.sleep(backoff * (2 ** i))
    return extract_code_from_markdown(last_reply, min_chars=80)



### Orchestrate the Generator Reviewer Loop

The `run_review_loop` function drives the entire two-agent workflow: it prompts the **generator** for code, extracts a fenced Python block, auto-repairs syntax if needed, and then sends that code to the **reviewer** with the formal rubric and acceptance rules. Each iteration prints a compact status line (or full transcripts when `VERBOSE=True`), evaluates whether the reviewer returned the exact `ACCEPT_TOKEN`, and either stops on success or continues with a targeted revision prompt. To avoid wheel-spinning, the loop detects **stagnation** by hashing both the current code and the reviewer’s feedback; if they repeat, the run exits early with a clear reason.

You typically won’t modify this function’s logic. Instead, use its keyword arguments—wired to the notebook constants—to tune behavior: `max_rounds` (how many review cycles), `code_min_chars` (minimum code size to accept from the generator), and `save_state` plus the `*_STATE_PATH` settings (to persist each agent’s memory via `.save(...)`). The function ends with a clean **final report** and returns `(final_code, reason)`, so you can programmatically capture the accepted artifact or diagnose why acceptance was not reached.


In [None]:
# ============================================================
# Orchestrator
# ============================================================
def run_review_loop(
    code_generator,
    code_reviewer,
    task: str = TASK,
    *,
    max_rounds: int = MAX_ROUNDS,
    code_min_chars: int = CODE_MIN_CHARS,
    save_state: bool = SAVE_STATE,
    generator_state_path: str = GENERATOR_STATE_PATH,
    reviewer_state_path: str = REVIEWER_STATE_PATH,
):
    print("=== Task ===")
    print(task)
    print("============")

    gen_prompt = (
        f"Generate Python code to {task}.\n"
        "Return ONLY a single fenced block formatted exactly as:\n"
        "```python\n# code here\n```\n"
        "No prose, no explanations."
    )

    # Initial generation
    gen_text = code_generator.invoke(gen_prompt).content
    code_only = extract_code_from_markdown(gen_text, min_chars=code_min_chars)
    code_only = ensure_syntax_or_repair(code_only, code_generator)

    if VERBOSE:
        print("\n--- Code Generator (raw) ---\n")
        print(gen_text)
        print("\n--- Code Extracted For Review ---\n")
        print(code_only)
    else:
        print("\n(Beginning iterations...)\n")

    last_code_hash = last_review_hash = None
    final_reason = None

    for round_num in range(1, max_rounds + 1):
        review_prompt = (
            f"{RUBRIC}\n\n"
            "Review the following Python code. Do NOT quote or restate these instructions.\n"
            f"If the code is production-ready, reply EXACTLY with {ACCEPT_TOKEN} and nothing else.\n\n"
            f"{code_only}"
        )
        review_text = code_reviewer.invoke(review_prompt).content

        # If reviewer output includes code, force non-accept
        if "```" in review_text:
            review_text += "\n\n(Note: Reviewer included a code block, violating rules. Treating as NOT accepted.)"

        accepted = is_strict_accept(review_text, ACCEPT_TOKEN)

        if VERBOSE:
            print(f"\n--- Code Reviewer (round {round_num}) ---\n")
            print(review_text)
        else:
            print(f"Iteration {round_num}: {'ACCEPTED ✅' if accepted else 'rejected ❌'}")

        if accepted:
            final_reason = "Reviewer accepted the solution."
            break

        # Stagnation detection
        code_hash = _digest(code_only)
        review_hash = _digest(review_text)
        if last_code_hash == code_hash and last_review_hash == review_hash:
            final_reason = "Stagnation detected (same code and same review)."
            break
        last_code_hash, last_review_hash = code_hash, review_hash

        # Revise based on feedback
        revise_prompt = (
            "Reviewer feedback (do NOT include this text in your output):\n"
            f"{review_text}\n\n"
            "Apply ONLY the changes requested. Do not modify unrelated code. "
            "Return ONLY one ```python fenced block."
        )
        gen_text = ask_generator_strict(code_generator, revise_prompt)
        code_only = extract_code_from_markdown(gen_text, min_chars=code_min_chars)
        code_only = ensure_syntax_or_repair(code_only, code_generator)

        if VERBOSE:
            print(f"\n--- Code Generator (revised raw, round {round_num}) ---\n")
            print(gen_text)
            print("\n--- Code Extracted For Review ---\n")
            print(code_only)

        if save_state:
            try:
                code_generator.save(generator_state_path)
                code_reviewer.save(reviewer_state_path)
            except Exception as e:
                logger.debug("Persistence skipped: %s", e)
    else:
        final_reason = "Loop ended without reviewer acceptance (max rounds)."

    # Final report
    print("\n=== Final Result ===")
    print(final_reason or "(no reason captured)")
    print("\n--- Final Code ---\n")
    print(code_only)

    return code_only, final_reason or ""



### Build the two agents from your existing `ChatConversation`

This helper wires up a **code generator** and a **code reviewer** using the `ChatConversation` contract you built earlier—no new agent class required. Each agent gets a focused system prompt: the generator is constrained to emit a single fenced Python block (no prose), and the reviewer is constrained to **never** emit code and to accept only by returning the exact `ACCEPT_TOKEN`. Because both agents conform to the same `.invoke(...)` / `.save(...)` interface, they slot directly into the orchestrator without additional glue code.

**What you might customize here:**
- `model`: Swap `"gpt-5-mini"` for another model you use in your notebook.
- `GEN_SYS_PROMPT` / `REVIEWER_SYS_PROMPT`: Tighten or relax guardrails, or tailor to a different domain (e.g., SQL, data viz).
- `strategy_name` / `strategy_kwargs`: Adjust your summarization window (`keep_last`, `trigger_len`, `max_summary_chars`) to balance recall vs. token usage.
- `temperature`: Use `0.0–0.2` for deterministic generation/reviews, or raise it slightly if the generator needs more exploration.
The function returns `(code_generator, code_reviewer)`, ready to pass to `run_review_loop(...)`.


In [None]:
# ============================================================
# Agent builder (uses your existing ChatConversation)
# ============================================================
def build_agents():
    """
    Construct the two agents using your existing ChatConversation class contract.
    Assumes ChatConversation(model, system_prompt, strategy_name, strategy_kwargs, temperature)
    with .invoke(...) and .save(...).
    """
    from util_chat import ChatConversation

    GEN_SYS_PROMPT = (
        "You are a coding assistant. Output ONLY Python code.\n"
        "Rules:\n"
        "1) Return exactly one fenced block:\n"
        "```python\n# optional brief comments\n# then real code\n```\n"
        "2) No prose outside the fence. If you need to explain, use Python comments at the top of the file."
    )

    REVIEWER_SYS_PROMPT = (
        "You are a strict code reviewer. Do not write or paste code yourself.\n"
        f"If the code is production-ready for the stated task, reply with EXACTLY this token and nothing else:\n{ACCEPT_TOKEN}\n"
        "Do not quote or restate the token unless you are accepting. Do not echo any part of the user's prompt.\n"
        "Otherwise, identify specific problems and recommended changes in words only. No code blocks."
    )

    code_generator = ChatConversation(
        model="gpt-5-mini",
        system_prompt=GEN_SYS_PROMPT,
        strategy_name="summary",
        strategy_kwargs={"keep_last": 6, "trigger_len": 10, "max_summary_chars": 1000},
        temperature=0.2,
    )

    code_reviewer = ChatConversation(
        model="gpt-5-mini",
        system_prompt=REVIEWER_SYS_PROMPT,
        strategy_name="summary",
        strategy_kwargs={"keep_last": 6, "trigger_len": 10, "max_summary_chars": 1000},
        temperature=0.2,
    )

    return code_generator, code_reviewer

We can now run it and generate code.

In [None]:
gen, rev = build_agents()
run_review_loop(gen, rev)