# Lecture 2 — Marketing Automation with HITL

**Goal**: Build a reliable outreach pipeline:

1. Summarize lead context → **JSON**
2. Draft outreach email + subject → **JSON**
3. Run a QC pass that flags risky outputs
4. Create a simple **human-in-the-loop** review queue
5. Run a tiny evaluation to compare prompt versions

## Setup
This notebook expects an OpenRouter key:

- `OPENROUTER_API_KEY` (required)
- `OPENROUTER_MODEL` (optional; default set below)

You will be provided keys in class. Do **not** commit keys to git.


In [None]:
import json
import os
from dataclasses import dataclass
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple

import httpx
import pandas as pd

DATA_DIR = Path("../data")
OUTPUT_DIR = DATA_DIR / "outputs"
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")
OPENROUTER_MODEL = os.getenv("OPENROUTER_MODEL", "openai/gpt-4o-mini")

if not OPENROUTER_API_KEY:
    raise RuntimeError(
        "Missing OPENROUTER_API_KEY. Set it in your environment before running this notebook."
    )

print("Using model:", OPENROUTER_MODEL)



In [None]:
def openrouter_chat(messages: List[Dict[str, str]], *, temperature: float = 0.2) -> str:
    """Call OpenRouter Chat Completions API and return the assistant text."""
    url = "https://openrouter.ai/api/v1/chat/completions"
    headers = {
        "Authorization": f"Bearer {OPENROUTER_API_KEY}",
        "Content-Type": "application/json",
    }
    payload = {
        "model": OPENROUTER_MODEL,
        "messages": messages,
        "temperature": temperature,
    }

    with httpx.Client(timeout=60) as client:
        resp = client.post(url, headers=headers, json=payload)
        resp.raise_for_status()
        data = resp.json()

    return data["choices"][0]["message"]["content"]


def parse_json_strict(text: str) -> Dict[str, Any]:
    """Parse JSON; raise with a useful message if it fails."""
    try:
        return json.loads(text)
    except json.JSONDecodeError as e:
        raise ValueError(f"Model returned invalid JSON: {e}\n---\n{text}")



In [None]:
leads = pd.read_csv(DATA_DIR / "leads.csv")
leads.head(3)



In [None]:
brand = (DATA_DIR / "brand_guidelines.md").read_text()
one_pager = (DATA_DIR / "product_one_pager.md").read_text()
rubric = (DATA_DIR / "rubric.md").read_text()

print(brand.splitlines()[0:6])



## Step 1 — Lead summarization (structured JSON)

Edit the prompt to improve:
- faithfulness to the lead notes
- extracting concrete pain points
- listing missing info questions that would help outreach

**Constraint**: return *only* JSON.


In [None]:
LEAD_SUMMARY_SCHEMA = {
    "lead_summary": "string",
    "pain_points": ["string"],
    "suggested_angle": "string",
    "missing_info": ["string"],
}


def summarize_lead(row: pd.Series) -> Dict[str, Any]:
    system = """You are a careful assistant.
Return ONLY valid JSON. No markdown. No backticks.
"""

    prompt = f"""Summarize the lead for outreach drafting.

JSON schema (keys and types):
{json.dumps(LEAD_SUMMARY_SCHEMA, indent=2)}

Lead:
- name: {row['name']}
- role: {row['role']}
- company: {row['company']}
- industry: {row['industry']}
- company_size: {row['company_size']}
- region: {row['region']}
- notes: {row['notes']}

Rules:
- Do not invent facts.
- Pain points must be grounded in the notes.
- Missing info must be phrased as questions.
"""

    text = openrouter_chat(
        [
            {"role": "system", "content": system},
            {"role": "user", "content": prompt},
        ]
    )
    obj = parse_json_strict(text)
    return obj


example_summary = summarize_lead(leads.iloc[0])
example_summary



## Step 2 — Draft outreach email (structured JSON)

Use the lead summary + the allowed product facts + brand guidelines.

**Constraints** (from the brand doc):
- Subject: 4–8 words, no emojis
- Body: 90–140 words
- No hype, no guarantees, no advice

Edit the prompt to improve quality and reduce compliance risk.


In [None]:
DRAFT_SCHEMA = {
    "subject": "string",
    "email_body": "string",
    "personalization_tokens": ["string"],
}


def draft_email(row: pd.Series, summary: Dict[str, Any]) -> Dict[str, Any]:
    system = """You write high-quality, compliant outreach.
Return ONLY valid JSON. No markdown. No backticks.
"""

    prompt = f"""Write a first-touch outreach email.

You may ONLY use facts from:
1) The lead fields and notes
2) The product one-pager

Brand guidelines:
{brand}

Product one-pager:
{one_pager}

JSON schema:
{json.dumps(DRAFT_SCHEMA, indent=2)}

Lead:
- name: {row['name']}
- role: {row['role']}
- company: {row['company']}
- industry: {row['industry']}
- notes: {row['notes']}
- compliance_tags: {row['compliance_tags']}

Lead summary:
{json.dumps(summary, indent=2)}

Hard rules:
- Subject line must be 4–8 words, no emojis.
- Email body must be 90–140 words.
- Avoid guarantees and advice.
- Do not invent metrics, customers, or outcomes.
- End with a simple sign-off.
"""

    text = openrouter_chat(
        [
            {"role": "system", "content": system},
            {"role": "user", "content": prompt},
        ],
        temperature=0.4,
    )
    return parse_json_strict(text)


example_draft = draft_email(leads.iloc[0], example_summary)
example_draft



## Step 3 — QC pass (risk scoring)

We use a second pass to flag drafts that might be unsafe to ship.

Edit the QC prompt to:
- catch invented facts / outcomes
- catch compliance issues based on tags
- catch format violations (word count, subject length)

Return JSON with a `risk_level` and `reasons`.


In [None]:
QC_SCHEMA = {
    "risk_level": "low|medium|high",
    "reasons": ["string"],
    "required_edits": ["string"],
}


def word_count(text: str) -> int:
    return len([w for w in text.split() if w.strip()])


def qc_draft(row: pd.Series, summary: Dict[str, Any], draft: Dict[str, Any]) -> Dict[str, Any]:
    system = """You are a strict QA reviewer.
Return ONLY valid JSON. No markdown. No backticks.
"""

    prompt = f"""You are reviewing an outreach email for compliance, correctness, and formatting.

Allowed facts:
- Lead fields and notes
- Product one-pager

Brand guidelines:
{brand}

Product one-pager:
{one_pager}

JSON schema:
{json.dumps(QC_SCHEMA, indent=2)}

Lead:
{row.to_dict()}

Lead summary:
{summary}

Draft:
{draft}

Checks:
1) Invented facts: flag any claims not supported by lead notes or one-pager.
2) Compliance tags: enforce the lead's compliance_tags.
3) Formatting: subject 4–8 words, body 90–140 words.
4) Tone: avoid hype, guarantees, advice.

Decide risk_level:
- low: safe to send
- medium: needs edits but safe after fixes
- high: should not send; major issues

Provide concrete required_edits.
"""

    text = openrouter_chat(
        [
            {"role": "system", "content": system},
            {"role": "user", "content": prompt},
        ],
        temperature=0.0,
    )
    result = parse_json_strict(text)

    # Add a couple deterministic checks too
    subject_wc = word_count(draft.get("subject", ""))
    body_wc = word_count(draft.get("email_body", ""))
    if not (4 <= subject_wc <= 8):
        result.setdefault("reasons", []).append(
            f"Subject word count out of range: {subject_wc}"
        )
    if not (90 <= body_wc <= 140):
        result.setdefault("reasons", []).append(
            f"Body word count out of range: {body_wc}"
        )

    return result


example_qc = qc_draft(leads.iloc[0], example_summary, example_draft)
example_qc



## Step 4 — Human-in-the-loop queue

We create a review table and simulate a simple approve/edit/reject loop.

In class, you’ll implement one of:
- a minimal terminal-like loop in the notebook, or
- a simple `review_decision` column you fill manually.

The key idea: **models draft; humans ship**.


In [None]:
# Batch run: summaries -> drafts -> QC

rows: List[Dict[str, Any]] = []

for _, r in leads.iterrows():
    s = summarize_lead(r)
    d = draft_email(r, s)
    q = qc_draft(r, s, d)

    rows.append(
        {
            "lead_id": r["lead_id"],
            "name": r["name"],
            "company": r["company"],
            "compliance_tags": r["compliance_tags"],
            "subject": d.get("subject", ""),
            "email_body": d.get("email_body", ""),
            "risk_level": q.get("risk_level", "unknown"),
            "reasons": " | ".join(q.get("reasons", [])),
            "required_edits": " | ".join(q.get("required_edits", [])),
        }
    )

qc_report = pd.DataFrame(rows)
qc_report

# Simple HITL: you (or students) fill decisions
qc_report["review_decision"] = ""  # approve | edit | reject
qc_report["review_notes"] = ""  # optional

# Suggested default policy (students can change): auto-approve low risk, queue medium/high
qc_report.loc[qc_report["risk_level"] == "low", "review_decision"] = "approve"
qc_report.loc[qc_report["risk_level"].isin(["medium", "high"]), "review_decision"] = "edit"

qc_report

# Save outputs
qc_path = OUTPUT_DIR / "qc_report.csv"
qc_report.to_csv(qc_path, index=False)

# Final drafts: approved only
final_drafts = qc_report[qc_report["review_decision"] == "approve"].copy()
final_path = OUTPUT_DIR / "drafts.csv"
final_drafts.to_csv(final_path, index=False)

print("Wrote:")
print("-", qc_path)
print("-", final_path)

# Mini-eval: pass-rate by risk_level (proxy). In class, replace with rubric scoring.
pass_rate = (qc_report["risk_level"] == "low").mean()
print(f"Proxy pass-rate (risk_level==low): {pass_rate:.2%}")



## Extensions / Optional challenges

- **Rubric-based grader**: have the model score drafts using `data/rubric.md`; compare to human scores.
- **Batching + cost controls**: cache lead summaries; estimate token/cost; compare one-pass vs two-pass QC.
- **Policy-driven compliance**: translate `compliance_tags` into explicit deny/allow rules and required disclaimers.
- **Prompt/version tracking**: log prompt templates, model/version, and params alongside outputs.
- **Multi-variant testing**: run A/B prompt variants and choose winners via eval metrics.
