<p style="text-align:center">
  <a href="https://www.linkedin.com/company/100622063" target="_blank" title="Follow LevelUp360 on LinkedIn">
    <img src="../../assets/levelup360-inverted-logo-transparent.svg" alt="LevelUp360" width="220">
  </a>
</p>

# Marketing Team – Week 04: Agentic Content Generation Workflow (LangGraph)

This notebook validates the full agentic workflow implemented in Week 4:

**content_planning → tools → content_generation → content_evaluation → (loop or end)**

We use a content planning node to decide tools (RAG, Web, both, none), LangGraph’s ToolNode to execute tools, a generation node that synthesizes tool contexts into a draft, and an evaluation node that scores quality and controls a regeneration loop (iteration_count vs max_iterations, threshold). For deterministic refinements, the generation node applies an optimization system message on regeneration.

---

## What We’re Testing

- Tool routing: content planning node chooses `rag_search`, `web_search`, both, or none
- Agentic execution: ToolNode fetches contexts; generation consumes ToolMessages (no duplicate calls)
- Draft creation: ContentGenerator.generate_from_context builds prompts via PromptBuilder
- Evaluation & loop: ContentEvaluator scores content; evaluation node increments `iteration_count`, sets `meets_quality_threshold`, and routes regenerate/end
- Optional deterministic refinement: `evaluator_optimizer` applies `models.content_optimization.system_message` during regeneration

**Architecture:** content_planning (LLM tool-calling) → ToolNode (exec) → content_generation (uses tool_contexts) → content_evaluation (Critique + loop control)

---

## Environment Setup

### Prerequisites
- Python 3.10+
- `.env` with API keys (see below)
- Virtual environment in repo root
- VS Code Jupyter extension (or JupyterLab)

### Required Environment Variables
    # OpenRouter
    OPENROUTER_API_KEY=sk-...

    # Azure (optional for embeddings)
    AZURE_OPENAI_ENDPOINT=https://...
    AZURE_OPENAI_API_KEY=...
    AZURE_OPENAI_API_VERSION=...

    # Web search
    TAVILY_API_KEY=...

    # LangSmith (optional tracing)
    LANGSMITH_TRACING_V2=true
    LANGSMITH_API_KEY=...
    LANGSMITH_PROJECT=levelup360-marketing-team
    LANGSMITH_ENDPOINT=https://api.smith.langchain.com

    # (Optional) Pricing for local cost tracking
    GPT4O_INPUT_PRICE_PER_1K=...
    GPT4O_OUTPUT_PRICE_PER_1K=...
    EMBEDDING_PRICE_PER_1K=...
    TAVILY_PRICE_PER_CALL=...

### One-Time Setup (PowerShell on Windows)
    # From workspace root:
    python -m venv .venv
    .\\.venv\\Scripts\\Activate.ps1

    # If activation blocked, run once (PowerShell):
    Set-ExecutionPolicy -Scope CurrentUser RemoteSigned

    # Upgrade tooling
    python -m pip install --upgrade pip setuptools wheel

    # Install dependencies
    if (Test-Path ./requirements.txt) {
      pip install -r requirements.txt
    } else {
      pip install openai python-dotenv pydantic langsmith pandas chromadb tavily-python tiktoken pyyaml rich ipykernel langchain_openai langgraph langchain-core langchain
    }

    # Editable install (so `from src...` works everywhere)
    # Ensure pyproject.toml exists at the project root where src/ lives:
    #   [build-system]
    #   requires = ["setuptools>=68", "wheel"]
    #   build-backend = "setuptools.build_meta"
    #
    #   [project]
    #   name = "marketing-team"
    #   version = "0.1.0"
    #   requires-python = ">=3.10"
    #
    #   [tool.setuptools.packages.find]
    #   where = ["."]
    #   include = ["src*"]
    pip install -e .

    # (Optional) Register kernel
    python -m ipykernel install --user --name marketing-team --display-name "Python (marketing-team)"

### One-Time Setup (macOS/Linux bash)
    # From workspace root:
    python -m venv .venv
    source .venv/bin/activate

    # Upgrade tooling
    python -m pip install --upgrade pip setuptools wheel

    # Install dependencies
    if [ -f ./requirements.txt ]; then
      pip install -r requirements.txt
    else
      pip install openai python-dotenv pydantic langsmith pandas chromadb tavily-python tiktoken pyyaml rich ipykernel langgraph langchain-core
    fi

    # Editable install (-e) so `from src...` imports work in scripts/notebooks/tests
    pip install -e .

    # (Optional) Register kernel
    python -m ipykernel install --user --name marketing-team --display-name "Python (marketing-team)"

### Verify Installation
    # Should print "OK" without ImportError
    python -c "import src; from src.agents.graph import build_content_workflow; print('OK')"

---

## Notebook Flow

1. **Setup** – Autoreload, env, logging  
2. **Build Workflow** – `build_content_workflow(brand)` compiles the LangGraph app  
3. **Helpers** – Utilities to run scenarios and render message flow, draft, evaluation  
4. **Scenarios** – RAG-only, Web-only, Both, None  
5. **Optional: Optimization Loop** – Switch to `evaluator_optimizer` to validate deterministic refinement behavior  
6. **Summary** – Inspect loop stats, metadata, outcomes; capture for documentation

---

## Data & Config

- `config/brands/` – Brand YAMLs (models, retrieval, formatting, voice, CTA)  
- `data/chroma_db/` – Vector store persistence (RAG)  
- `outputs/week4_validation/` – Optional export path for results

---

## Key Settings Per Run

- `template`: e.g., `LINKEDIN_POST_ZERO_SHOT`, `LINKEDIN_LONG_POST_ZERO_SHOT`, `BLOG_POST`  
- `pattern`: `single_pass`, `reflection`, `evaluator_optimizer`  
- `max_iterations`: int (e.g., 3)  
- `quality_threshold`: float (e.g., 7.0)  
- `use_cot`: bool; adds reasoning scaffold to the prompt (not included in output)  
- `brand`: `"levelup360"` or `"ossie_naturals"`

---

## Expected Outputs

- Message flow with clear tool routing (`AIMessage.tool_calls`, `ToolMessage`s)  
- Draft content (preview)  
- Evaluation summary with score, reasoning, violations (if any)  
- Loop stats: `iteration_count`, `max_iterations`, `meets_quality_threshold`  
- Generation/Evaluation metadata (model, cost, latency, tokens) if available

---

## Notes

- Agentic path: tools are called once by content planning; generation consumes tool results; no duplicate tool calls.  
- Deterministic refinement: when `evaluator_optimizer` is used and the draft fails threshold, generation uses `models.content_optimization.system_message` during regeneration.  
- Routing evaluator (from Week 4) remains separate; this notebook focuses on end-to-end content generation and evaluation loop.

In [1]:
%load_ext autoreload
%autoreload 2

import logging
from dotenv import load_dotenv
load_dotenv()

logging.basicConfig(level=logging.INFO, format='%(asctime)s | %(levelname)s | %(name)s | %(message)s')
logger = logging.getLogger("week4_notebook")
logger.info("Environment & logging initialized")

2025-11-16 13:29:23,653 | INFO | week4_notebook | Environment & logging initialized


In [None]:
from typing import Dict, Any
from pprint import pprint

from langchain_core.messages import HumanMessage, AIMessage, ToolMessage

from src.utils.config_loader import load_brand_config
from src.agents.graphs.content_generation_graph import build_content_workflow
from src.agents.states.content_generation_state import ContentGenerationState


## Build Workflow App

We build the agentic content workflow for a given brand.
The compiled app executes the full graph with a single `invoke()` call.


In [None]:
brand = "itconsulting"  
brand_config = load_brand_config(brand)

app = build_content_workflow(brand)
logger.info("Workflow built and compiled")

## Helpers

Utility functions to run a scenario and display the message flow, the generated draft, and the evaluation outcome.


In [None]:
def print_messages(messages):
    print("\n--- Message Flow ---")
    for i, m in enumerate(messages):
        mtype = type(m).__name__
        line = f"{i:02d}: {mtype}"
        if isinstance(m, AIMessage):
            tool_calls = getattr(m, 'tool_calls', None)
            if tool_calls:
                tool_names = [tc.get('name') for tc in tool_calls]
                line += f" (calls: {tool_names})"
        if isinstance(m, ToolMessage):
            line += f" (tool: {getattr(m, 'name', 'unknown')})"
        print(line)

def run_scenario(topic: str, *, thread_id: str, template: str = "LINKEDIN_POST_ZERO_SHOT", use_cot: bool = False,
                 max_iterations: int = 3, quality_threshold: float = 7.0, pattern: str = "single_pass") -> Dict[str, Any]:
    """
    Run a full end-to-end scenario:
      content_planning → tools → content_generation → content_evaluation → (loop or end)
    Returns the final state.
    """
    initial_state: ContentGenerationState = ContentGenerationState(
        messages=[HumanMessage(content=topic)],
        topic=topic,
        brand=brand,
        brand_config=brand_config,
        template=template,
        use_cot=use_cot,
        draft_content="",
        critique=None,
        iteration_count=0,
        max_iterations=max_iterations,
        quality_threshold=quality_threshold,
        meets_quality_threshold=None,
        generation_metadata=None,
        evaluation_metadata=None,
        pattern=pattern,
    )

    result_state = app.invoke(initial_state, config={"configurable": {"thread_id": thread_id}})
    return result_state

def show_result(state: Dict[str, Any]):
    print_messages(state.get("messages", []))
    draft = state.get("draft_content", "")
    print("\n--- Draft (first 600 chars) ---\n")
    print(draft[:600])
    if len(draft) > 600:
        print(f"... ({len(draft) - 600} more chars)")

    print("\n--- Evaluation ---\n")
    critique = state.get("critique")
    if critique:
        score = getattr(critique, "average_score", None)
        meets = state.get("meets_quality_threshold")
        print(f"Score (adjusted to dimension's weight): {('%.2f' % score) if isinstance(score, (int, float)) else 'N/A'} | Meets threshold: {meets}")

        try:
            scores = critique.scores
            print("Dimension scores:", {k: f"{v:.2f}" for k, v in scores.items()})
        except Exception:
            pass

        reasoning = getattr(critique, "reasoning", None)
        if reasoning:
            print("Reasoning:")
            print(reasoning)
        violations = getattr(critique, "violations", None) or []
        if violations:
            print("Violations:")
            for v in violations:
                print(f"- {v}")
    else:
        print("No critique present (possibly ended before evaluation).")

    print("\n--- Loop Stats ---\n")
    print(
        f"iteration_count={state.get('iteration_count')} | "
        f"max_iterations={state.get('max_iterations')} | "
        f"quality_threshold={state.get('quality_threshold')}"
    )

    gen_meta = state.get("generation_metadata")
    eval_meta = state.get("evaluation_metadata")
    if gen_meta:
        from pprint import pprint
        print("\nGeneration metadata:")
        sel = {k: gen_meta.get(k) for k in ["model", "cost", "latency", "input_tokens", "output_tokens", "pattern", "iterations"] if gen_meta.get(k) is not None}
        pprint(sel)
    if eval_meta:
        from pprint import pprint
        print("\nEvaluation metadata:")
        pprint(eval_meta)


## Test 1 — RAG-only scenario (Internal brand content)

Expected: content_planning calls `rag_search`; tools execute; generation uses RAG context; evaluation runs; loop ends when threshold met or max iterations reached.


In [None]:
topic_rag = "Create a post about our AI governance approach at LevelUp360"
state_rag = run_scenario(
    topic_rag,
    thread_id="w4_rag_1",
    template="LINKEDIN_POST_ZERO_SHOT",
    use_cot=True,
    max_iterations=3,
    quality_threshold=7.0,
)
show_result(state_rag)

## Test 2 — Web-only scenario (Current events)

Expected: content_planning calls `web_search`; generation uses web context.


In [None]:
topic_web = "Create a post about the latest AI regulation news in 2025"
state_web = run_scenario(
    topic_web,
    thread_id="w4_web_1",
    template="LINKEDIN_POST_ZERO_SHOT",
    use_cot=False,
    max_iterations=3,
    quality_threshold=7.0,
)
show_result(state_web)

## Test 3 — Both tools scenario (Internal + Industry)

Expected: content_planning calls both `rag_search` and `web_search`.


In [None]:
topic_both = "Compare our AI governance approach to current industry standards in 2025"
state_both = run_scenario(
    topic_both,
    thread_id="w4_both_1",
    template="LINKEDIN_POST_ZERO_SHOT",
    use_cot=True,
    max_iterations=3,
    quality_threshold=7.0,
)
show_result(state_both)

## Test 4 — No-tools scenario (Opinion)

Expected: content_planning calls no tools; generation runs direct.


In [None]:
topic_none = "Share my opinion on why AI governance matters for small teams"
state_none = run_scenario(
    topic_none,
    thread_id="w4_none_1",
    template="LINKEDIN_POST_ZERO_SHOT",
    use_cot=False,
    max_iterations=3,
    quality_threshold=7.0,
)
show_result(state_none)

## Summary

- content_planning routed correctly (RAG / Web / Both / None)
- tool results flowed into content_generation via ToolMessages → tool_contexts
- content_evaluation returned Critique, updated iteration_count, and set meets_quality_threshold
- Conditional routing looped to regeneration when threshold unmet and iterations left
