<p style="text-align:center">
  <a href="https://www.linkedin.com/company/100622063" target="_blank" title="Follow LevelUp360 on LinkedIn">
    <img src="../../assets/levelup360-inverted-logo-transparent.svg" alt="LevelUp360" width="220">
  </a>
</p>

# Marketing Team - Week 02: RAG Pipeline & Content Generation

Production-ready RAG system for brand-aware content generation.

---

## What We're Testing

**RAG Pipeline**: DocumentLoader → RAGHelper → VectorStore → PromptBuilder → LLM

**Components**:
- Document loading (brand guidelines, past posts)
- Vector storage (ChromaDB with metadata filtering)
- Prompt building (templates + RAG context + web search)
- Content generation (LinkedIn/Facebook posts)
- Evaluation (manual scoring against rubric)

**Architecture**: 3-layer separation of concerns (load → process → store)

---

## Environment Setup

### Prerequisites
- Python 3.10+
- `.env` file with API keys (see `.env.example`)
- Virtual environment in workspace root

### Required Environment Variables
```
NOTE: The current LLMProvider class handles works with both Azure AI Foundry (AZURE_OPENAI env variables) and OpenRouter models. Update the class an needed if using a different provider.

NOTE:OpenRouter doesn't offen embedding models, use Azure or your preferred provider for embedding models

OPENROUTER_API_KEY=sk-...          # For OpenRouter
AZURE_OPENAI_ENDPOINT=https://...  # For Azure
AZURE_OPENAI_API_KEY=...
TAVILY_API_KEY=...                 # For web search
LANGSMITH_API_KEY=...              # Optional: tracing
```

### One-Time Setup (PowerShell)
```powershell
# From workspace root (this repo):
python -m venv .venv
.\.venv\Scripts\Activate.ps1

# If activation is blocked, run once as admin to allow scripts:
Set-ExecutionPolicy -Scope CurrentUser RemoteSigned

# Upgrade packaging tooling
python -m pip install --upgrade pip setuptools wheel

# Install project dependencies (unlocked). Prefer installing from requirements.txt
# which should contain the top-level packages (no pinned versions).
if (Test-Path ./requirements.txt) {
  pip install -r requirements.txt
} else {
  # Fallback: install core packages used in the examples
  pip install openai python-dotenv pydantic pandas pyyaml rich langsmith chromadb tavily-python tiktoken
}

```

## Notebook Flow

1. **Setup** - Import modules, initialize LLM/embedding clients
2. **RAG Pipeline** - Load brand docs, chunk, generate embeddings, store in vector DB
3. **Query Testing** - Test vector search with metadata filters
4. **Prompt Building** - Build prompts with templates, RAG context, web search
5. **Content Generation** - Generate LinkedIn posts with full context
6. **Evaluation** - Score content against brand rubric

**Data directories**:
- `configs/` - Brand YAML file (test with the examples provided or replace with yours)
- `data/chroma_db/` - Persistent vector store
- `data/past_posts/` - Past Posts to upload to vector store
- `data/week_02_scores.csv` - Evaluation scores


## 1. Setup

Import modules and initialize clients (LLM, embeddings, cost tracking).

In [None]:
import sys
import yaml
import pandas as pd
import os
from pathlib import Path
from rich import print as rprint
from chromadb import Settings

# Add marketing_team/src to path
current_dir = Path.cwd()
src_path = current_dir.parent / "src"
if src_path.exists() and str(src_path) not in sys.path:
    sys.path.insert(0, str(src_path))

# Import all modules
from utils.llm_client import LLMClient
from utils.cost_tracker import CostTracker
from utils.scoring import ScoringHelper
from rag.vector_store import VectorStore
from rag.rag_helper import RAGHelper
from rag.document_loader import DocumentLoader
from search.tavily_client import TavilySearchClient
from prompts.templates import (
    LINKEDIN_POST_ZERO_SHOT, 
    LINKEDIN_POST_FEW_SHOT,
    LINKEDIN_ARTICLE,
    FACEBOOK_POST_ZERO_SHOT,
    FACEBOOK_POST_FEW_SHOT
)
from prompts.prompt_builder import PromptBuilder
from generation.generator import ContentGenerator

rprint("[green]✓ All modules imported[/green]")

In [None]:
# Initialize LLM clients
import os
import warnings

# Suppress LangSmith warnings if tracing fails
warnings.filterwarnings('ignore', category=UserWarning, module='langsmith')

completion_client = LLMClient()
completion_client.get_client("openrouter")

embedding_client = LLMClient()
embedding_client.get_client("azure")

rprint("[green]✓ LLM clients initialized[/green]")
rprint("[dim]Note: LangSmith tracing errors can be safely ignored[/dim]")

In [None]:
# Quick test: completion + cost tracking

user_message = "What are the most effective strategies for scaling enterprise cloud migrations while ensuring security, compliance, and cost control?"
messages = [{"role": "user", "content": user_message}]

response = completion_client.get_completion(
    model="gpt-4o-mini", 
    messages=messages
)

rprint(f"\nResponse: {response.content}")
rprint(f"Cost: €{response.cost:.6f} | Latency: {response.latency:.2f}s")

tracker = CostTracker()
summary = tracker.get_cost_summary()
rprint(f"\nTotal cost today: €{summary.total_cost:.6f} ({summary.total_calls} calls)")

## 2. RAG Pipeline

Load brand docs → chunk → generate embeddings → store in vector DB.

In [None]:
# Initialize RAG components
loader = DocumentLoader(base_path=current_dir.parent)

rag_helper = RAGHelper(
    embedding_client=embedding_client,
    embedding_model="text-embedding-3-small",
    chunk_size=150,
    chunk_overlap=30,
    chunk_threshold=150
)

settings = Settings(anonymized_telemetry=False)
persist_dir = str(current_dir.parent / "data" / "chroma_db")
vector_store = VectorStore(persist_directory=persist_dir, settings=settings)

collection_name = "marketing_content"
vector_store.clear_collection(collection_name)
collection = vector_store.get_or_create_collection(
    collection_name=collection_name,
    metadata={"hnsw:space": "cosine"}
)

rprint("[green]✓ RAG components initialized[/green]")

In [None]:
# Load brand past posts from subdirectories
past_posts_dir = current_dir.parent / "data" / "past_posts"

# Load itconsulting past posts 
itconsulting_docs = loader.load_files(
    directory=past_posts_dir,
    pattern="*itconsulting.md",
    metadata={"brand": "itconsulting", "doc_type": "past_post"},
    recursive=True
)

# Load cosmetics past posts 
cosmetics_docs = loader.load_files(
    directory=past_posts_dir,
    pattern="*cosmetics.md",
    metadata={"brand": "cosmetics", "doc_type": "past_post"},
    recursive=True
)

all_docs = itconsulting_docs + cosmetics_docs

rprint(f"Loaded {len(all_docs)} brand past posts ({len(itconsulting_docs)} IT Consulting, {len(cosmetics_docs)} Cosmetics)")

In [None]:
# Process and store documents (skip if already done)
doc_count = vector_store.get_document_count(collection_name)

if doc_count == 0:
    rprint("Processing documents (chunking + embeddings)...")
    documents = rag_helper.prepare_past_posts(all_docs, verbose=True)
    
    rprint(f"Adding {len(documents)} chunks to vector store...")
    count = vector_store.add_documents(collection_name, documents)
    rprint(f"[green]✓ Stored {count} chunks[/green]")
else:
    rprint(f"[yellow]Collection already has {doc_count} documents - skipping[/yellow]")

In [None]:
# Test vector query (need to embed the query text first)
query_text = "What are the most effective strategies for scaling enterprise cloud migrations while ensuring security, compliance, and cost control?"
query_embedding = rag_helper.embed_batch([query_text])[0]

query_results = vector_store.query(
    collection_name=collection_name,
    query_embeddings=query_embedding,
    n_results=3,
    where={"brand": "itconsulting"}
)

rprint(f"\n[cyan]Query:[/cyan] '{query_text}'")
rprint(f"[green]Found {len(query_results.ids)} results[/green]")
for i, (metadata, text, distance) in enumerate(zip(query_results.metadatas, query_results.texts, query_results.distances), 1):
    rprint(f"{i}. Distance: {distance:.4f} | Metadata: {metadata} | Text: {text[:200]}")

## 3. Prompt Building

Build prompts with templates, optionally adding RAG context and web search results.

In [None]:
# Initialize prompt builder
tavily_client = TavilySearchClient()
prompt_builder = PromptBuilder(vector_store, rag_helper, tavily_client)

# Load brand config
brand_config_path = current_dir.parent / "configs" / "itconsulting.yaml"
with open(brand_config_path, 'r', encoding='utf-8') as f:
    brand_config = yaml.safe_load(f)

topic = "What are the most effective strategies for scaling enterprise cloud migrations while ensuring security, compliance, and cost control?"
brand = "itconsulting"

rprint(f"[green]✓ Prompt builder ready for brand: {brand_config['name']}[/green]")

In [None]:
# Test 1: Zero-shot (no RAG, no search)
# This creates a basic prompt with only brand guidelines - no past examples or web data
prompt_basic = prompt_builder.build_user_message(
    collection_name=collection_name,
    template=LINKEDIN_POST_ZERO_SHOT,
    topic=topic,
    brand=brand,
    brand_config=brand_config,
    include_rag=False,              # Don't retrieve past post examples
    max_distance=0.50,              # Distance threshold for RAG similarity (not used here)
    include_search=False,           # Don't add web search results
    search_depth='basic',           # Search quality: 'basic' or 'advanced' (not used here)
    search_type='general',          # Use 'general' search for no domain filters. Use 'technical', 'industry', 'news', 'documentation' to apply domain filters (see tavily_client.py)
    llm_client=completion_client    # LLM used for query generation (not used here)
)

rprint(f"\n[cyan]Zero-shot prompt:[/cyan] {len(prompt_basic)} chars")
rprint(prompt_basic)

In [None]:
# Test 2: With RAG context
# This adds similar past posts from the vector database to guide style and content
prompt_with_rag = prompt_builder.build_user_message(
    collection_name=collection_name,
    template=LINKEDIN_POST_ZERO_SHOT,
    topic=topic,
    brand=brand,
    brand_config=brand_config,
    include_rag=True,               # Retrieve similar past posts as examples
    max_distance=0.50,              # Only use posts with similarity distance < 0.50 (closer = more similar)
    include_search=False,
    search_depth='basic',
    search_type='general',
    llm_client=completion_client
)

rprint(f"\n[cyan]Prompt with RAG:[/cyan] {len(prompt_with_rag)} chars")
rprint(prompt_with_rag)

In [None]:
# Test 3: With RAG + web search
# This adds both past posts AND current web search results for up-to-date information
prompt_full = prompt_builder.build_user_message(
    collection_name=collection_name,
    template=LINKEDIN_POST_ZERO_SHOT,
    topic=topic,
    brand=brand,
    brand_config=brand_config,
    include_rag=True,               # Include past post examples
    max_distance=0.50,              # Similarity threshold for retrieved posts
    include_search=True,            # Add web search results for current data/trends
    search_depth='advanced',        # Use 'advanced' for deeper web search (more sources, better quality)
    search_type='general',          # Use 'general' search for no domain filters. Use 'technical', 'industry', 'news', 'documentation' to apply domain filters (see tavily_client.py)
    llm_client=completion_client    # LLM generates optimized search queries
)

rprint(f"\n[cyan]Prompt with RAG + Search:[/cyan] {len(prompt_full)} chars")
rprint(prompt_full)

## 4. Content Generation (Manual)

Generate content using prompts directly and track costs.

In [None]:
# Generate LinkedIn post with full context
post = [{"role": "user", "content": prompt_full}]

response = completion_client.get_completion(
    model="gpt-4o-mini", 
    messages=messages
)

rprint("\n" + "="*70)
rprint("[bold green]Generated LinkedIn Post:[/bold green]")
rprint("="*70)
rprint(response.content)
rprint("="*70)
rprint(f"\nCost: €{response.cost:.6f} | Tokens: {response.input_tokens + response.output_tokens}")

## 4. Content Generation with ContentGenerator

Use ContentGenerator to test different configurations and compare results.

In [None]:
# Initialize generator for LevelUp360
generator = ContentGenerator(
    llm_client=completion_client,
    vector_store=vector_store,
    rag_helper=rag_helper,
    brand_config=brand_config,
    collection_name=collection_name,
    search_client=tavily_client
)

rprint("[green]✓ ContentGenerator initialized[/green]")

### A/B Test: Compare RAG vs No RAG vs RAG+Search

Generate the same post with 3 different configurations to compare quality and cost.

**Key Parameters Explained:**
- **`include_rag`**: Whether to retrieve similar past posts from vector DB as examples
- **`rag_max_distance`**: Similarity threshold (0.0-1.0) - lower = more similar posts only
- **`include_search`**: Whether to add current web search results
- **`search_depth`**: Search quality - `'basic'` (fast) or `'advanced'` (better quality, more sources)
- **`search_type`**: Search focus - use `'general'` for no domain filters, or specify domain filters: `'technical'`, `'industry'`, `'news'`, or `'documentation'` (see tavily_client.py).
- **`system_message`**: Custom instructions to guide the LLM's style and behavior
- **`temperature`**: Creativity level (0.0 = deterministic, 1.0 = very creative)
- **`model`**: LLM to use - `gpt-4o-mini` (fast/cheap) or `gpt-4o` (best quality)

In [None]:
# Test topic
test_topic = "What are the most effective strategies for scaling enterprise cloud migrations while ensuring security, compliance, and cost control?"

# System message: Instructions that guide the LLM's writing style and behavior
# Customize this to enforce your brand voice and content requirements
system_message = """You are a professional content generator.

Write clear, professional LinkedIn posts that:
- Start with concrete data or specific examples
- Use precise technical language
- Avoid buzzwords and vague phrases
"""

# Generate WITHOUT RAG
# This uses only the brand guidelines - no past examples or web data
rprint("\n[cyan]Generating WITHOUT RAG...[/cyan]")
result_no_rag = generator.generate(
    topic=test_topic,
    content_type="linkedin_post",     # Template type: linkedin_post, linkedin_article, facebook_post
    include_rag=False,                # Don't retrieve past posts
    include_search=False,             # Don't add web search
    search_depth='basic',             # Search quality (not used here)
    search_type='general',            # Search type (not used here)
    model="gpt-4o-mini",              # LLM model (gpt-4o-mini is faster/cheaper, gpt-4o is better quality)
    system_message=system_message,    # Custom instructions for content style
    temperature=0.7                   # Creativity level (0.0=deterministic, 1.0=creative)
)

rprint("\n" + "="*70)
rprint("[bold yellow]WITHOUT RAG:[/bold yellow]")
rprint("="*70)
rprint(result_no_rag['content'])
rprint("="*70)
rprint(f"Cost: €{result_no_rag['metadata']['cost']:.6f} | Latency: {result_no_rag['metadata']['latency']:.2f}s")
rprint(f"Tokens: {result_no_rag['metadata']['input_tokens']} in / {result_no_rag['metadata']['output_tokens']} out")

In [None]:
# Generate WITH RAG
# This retrieves similar past posts from the vector DB to guide style and content
rprint("\n[cyan]Generating WITH RAG...[/cyan]")
result_with_rag = generator.generate(
    topic=test_topic,
    content_type="linkedin_post",
    include_rag=True,                 # Retrieve similar past posts as context
    rag_max_distance=0.50,            # Only use posts with similarity < 0.50 (adjust based on your corpus)
    include_search=False,
    search_depth='basic',
    search_type='general',
    model="gpt-4o-mini",
    system_message=system_message,
    temperature=0.7
)

rprint("\n" + "="*70)
rprint("[bold green]WITH RAG:[/bold green]")
rprint("="*70)
rprint(result_with_rag['content'])
rprint("="*70)
rprint(f"Cost: €{result_with_rag['metadata']['cost']:.6f} | Latency: {result_with_rag['metadata']['latency']:.2f}s")
rprint(f"Tokens: {result_with_rag['metadata']['input_tokens']} in / {result_with_rag['metadata']['output_tokens']} out")

In [None]:
# Generate WITH RAG + SEARCH
# This combines past post examples with current web search for up-to-date, brand-consistent content
rprint("\n[cyan]Generating WITH RAG + SEARCH...[/cyan]")
result_with_search = generator.generate(
    topic=test_topic,
    content_type="linkedin_post",
    include_rag=True,                 # Include past posts for brand voice
    rag_max_distance=0.50,            # Similarity threshold for RAG
    include_search=True,              # Add web search for current data
    search_depth='advanced',          # Use advanced search for better quality results (more sources, deeper analysis)
    search_type='general',            # Use 'general' search for no domain filters. Use 'technical', 'industry', 'news', 'documentation' to apply domain filters (see tavily_client.py)
    model="gpt-4o",                   # Use gpt-4o for best quality (more expensive than gpt-4o-mini)
    system_message=system_message,
    temperature=0.7
)

rprint("\n" + "="*70)
rprint("[bold magenta]WITH RAG + SEARCH:[/bold magenta]")
rprint("="*70)
rprint(result_with_search['content'])
rprint("="*70)
rprint(f"Cost: €{result_with_search['metadata']['cost']:.6f} | Latency: {result_with_search['metadata']['latency']:.2f}s")
rprint(f"Tokens: {result_with_search['metadata']['input_tokens']} in / {result_with_search['metadata']['output_tokens']} out")

rprint("\n[green]✓ A/B test complete - compare the 3 variants above[/green]")

### Manual Inspection

Compare the 3 outputs above and note your observations:

**1. WITHOUT RAG:**
- Does it capture brand voice? (Y/N)
- Are facts accurate? (Y/N)
- Does it feel generic? (Y/N)

**2. WITH RAG:**
- Better brand voice than without RAG? (Y/N)
- More specific/less generic? (Y/N)
- Worth the extra cost/latency? (Y/N)

**3. WITH RAG + SEARCH:**
- Does search add value? (Y/N)
- Are web sources relevant? (Y/N)
- Worth the extra cost/latency? (Y/N)

**Initial observations:** [Write your notes here after running above cells]

## 5. Evaluation & Scoring

Use the cells above to generate posts for different topics (one at a time), and evaluate them against the criteria below. 