# Week 4 - Exercise 1: Content Moderation Pipeline

## üìã Exercise Overview

**Due:** Monday (Week 4)  
**Estimated Time:** 3-4 hours  
**Difficulty:** Intermediate

---

## üéØ Learning Objectives

In this exercise, you will:
1. Build a multi-stage content moderation system
2. Implement parallel checks for different violation types
3. Use conditional routing for decision-making
4. Create human-in-the-loop approval flows
5. Track moderation decisions and reasons

---

## üìù Requirements

Your Content Moderation Pipeline must:

### Core Features:
- ‚úÖ **Initial Classification:** Determine content type (text, comment, post)
- ‚úÖ **Parallel Checks:** Run multiple moderation checks simultaneously
  - Toxicity detection
  - Spam detection
  - PII (Personal Identifiable Information) detection
- ‚úÖ **Decision Logic:** Approve, flag for review, or reject based on checks
- ‚úÖ **Human Review:** For flagged content, route to human reviewer
- ‚úÖ **Audit Trail:** Track all decisions and reasons

### Technical Requirements:
- Use TypedDict for state management
- Implement at least 3 parallel moderation checks
- Use conditional edges for routing
- Store moderation metadata (timestamps, scores, reasons)
- Handle edge cases (empty content, very long content)

### Bonus Challenges (Optional):
- üåü Add confidence scores to moderation decisions
- üåü Implement escalation for multiple violations
- üåü Add content category-specific rules
- üåü Create moderation statistics dashboard
- üåü Implement appeal process for rejected content

---

## üí° Hints

<details>
<summary>Click for Hint 1: Parallel Checks Structure</summary>

```python
# Create separate nodes for each check
workflow.add_node("toxicity_check", check_toxicity)
workflow.add_node("spam_check", check_spam)
workflow.add_node("pii_check", check_pii)

# They can all run after classification
workflow.add_edge("classify", "toxicity_check")
workflow.add_edge("classify", "spam_check")
workflow.add_edge("classify", "pii_check")
```
</details>

<details>
<summary>Click for Hint 2: Decision Logic</summary>

```python
def make_decision(state):
    violations = 0
    if state["toxicity_score"] > 0.7:
        violations += 1
    if state["spam_score"] > 0.8:
        violations += 1
    
    if violations >= 2:
        return "reject"
    elif violations == 1:
        return "review"
    else:
        return "approve"
```
</details>

<details>
<summary>Click for Hint 3: State with Reducers</summary>

```python
class ModerationState(TypedDict):
    content: str
    violations: Annotated[list[str], operator.add]
    toxicity_score: float
    spam_score: float
    decision: str
```
</details>

---

## üîß Setup

In [None]:
# Import required libraries
import os
from dotenv import load_dotenv
from typing import TypedDict, Annotated, Sequence
import operator
from datetime import datetime

# LangChain imports
from langchain_openai import ChatOpenAI
from langchain_core.messages import BaseMessage
from langchain_core.prompts import ChatPromptTemplate

# LangGraph imports
from langgraph.graph import StateGraph, END

# Load environment variables
load_dotenv()

# Initialize LLM
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

print("‚úÖ Setup complete!")

---

## üìù Step 1: Define State Schema

Create the state structure for content moderation.

In [None]:
# TODO: Define the ModerationState TypedDict
class ModerationState(TypedDict):
    """State for content moderation pipeline."""
    # YOUR CODE HERE
    # Include fields for:
    # - content (str): The content to moderate
    # - content_type (str): Type of content (text/comment/post)
    # - toxicity_score (float): Score from toxicity check
    # - spam_score (float): Score from spam check
    # - pii_detected (bool): Whether PII was found
    # - violations (list with reducer): List of violation reasons
    # - decision (str): Final decision (approve/review/reject)
    # - timestamp (str): When moderation occurred
    # - human_review_required (bool): Whether human review is needed
    pass

print("‚úÖ State schema defined")

---

## üìù Step 2: Implement Classification Node

Classify the type of content being moderated.

In [None]:
def classify_content(state: ModerationState) -> ModerationState:
    """
    Classify the content type.
    
    Args:
        state: Current moderation state
        
    Returns:
        Updated state with content_type
    """
    # TODO: Use LLM to classify content type
    # Prompt should ask: Is this a comment, post, or general text?
    # YOUR CODE HERE
    
    pass

---

## üìù Step 3: Implement Moderation Check Nodes

Create nodes for different types of moderation checks.

### 3.1: Toxicity Check

In [None]:
def check_toxicity(state: ModerationState) -> ModerationState:
    """
    Check for toxic/harmful content.
    
    Should detect:
    - Hate speech
    - Harassment
    - Threats
    - Profanity
    """
    # TODO: Implement toxicity detection
    # Use LLM to rate toxicity from 0.0 to 1.0
    # If score > threshold, add to violations list
    # YOUR CODE HERE
    
    pass

### 3.2: Spam Check

In [None]:
def check_spam(state: ModerationState) -> ModerationState:
    """
    Check for spam content.
    
    Should detect:
    - Promotional content
    - Repetitive text
    - Suspicious links
    - Advertisement
    """
    # TODO: Implement spam detection
    # Use LLM to rate spam likelihood from 0.0 to 1.0
    # If score > threshold, add to violations list
    # YOUR CODE HERE
    
    pass

### 3.3: PII Check

In [None]:
def check_pii(state: ModerationState) -> ModerationState:
    """
    Check for Personal Identifiable Information.
    
    Should detect:
    - Email addresses
    - Phone numbers
    - Social Security Numbers
    - Credit card numbers
    - Physical addresses
    """
    # TODO: Implement PII detection
    # Use LLM or regex patterns
    # If PII found, add to violations list
    # YOUR CODE HERE
    
    pass

---

## üìù Step 4: Implement Decision Node

Make moderation decision based on check results.

In [None]:
def make_decision(state: ModerationState) -> ModerationState:
    """
    Make final moderation decision.
    
    Decision logic:
    - REJECT: High toxicity (>0.8) OR multiple severe violations
    - REVIEW: Medium toxicity (>0.5) OR any violations present
    - APPROVE: No violations, low scores
    """
    # TODO: Implement decision logic
    # Consider all scores and violations
    # Set decision and human_review_required fields
    # Add timestamp
    # YOUR CODE HERE
    
    pass

---

## üìù Step 5: Implement Routing Logic

Route content based on decision.

In [None]:
def route_decision(state: ModerationState) -> str:
    """
    Route based on moderation decision.
    
    Returns:
        - "approve": Content is safe
        - "review": Needs human review
        - "reject": Content is rejected
    """
    # TODO: Implement routing logic
    # Return one of: "approve", "review", "reject"
    # YOUR CODE HERE
    
    pass

---

## üìù Step 6: Implement Action Nodes

Handle approved, reviewed, and rejected content.

In [None]:
def approve_content(state: ModerationState) -> ModerationState:
    """
    Handle approved content.
    """
    # TODO: Log approval and return final state
    # YOUR CODE HERE
    pass

def flag_for_review(state: ModerationState) -> ModerationState:
    """
    Flag content for human review.
    """
    # TODO: Log review flag and return final state
    # YOUR CODE HERE
    pass

def reject_content(state: ModerationState) -> ModerationState:
    """
    Reject content.
    """
    # TODO: Log rejection and return final state
    # YOUR CODE HERE
    pass

---

## üìù Step 7: Build the Graph

Assemble all nodes into a complete moderation pipeline.

In [None]:
# TODO: Create the moderation graph
# 1. Initialize StateGraph
# 2. Add all nodes
# 3. Set entry point
# 4. Add edges for parallel checks
# 5. Add conditional edges for routing
# 6. Compile the graph

# YOUR CODE HERE

moderation_graph = # YOUR CODE HERE

print("‚úÖ Moderation graph created")

---

## üìù Step 8: Create ModerationPipeline Class

Wrap the graph in a reusable class.

In [None]:
class ModerationPipeline:
    """
    Content moderation pipeline.
    """
    
    def __init__(self):
        """Initialize the moderation pipeline."""
        # TODO: Store the compiled graph
        self.graph = moderation_graph
        
        # Track statistics
        self.stats = {
            "total": 0,
            "approved": 0,
            "rejected": 0,
            "flagged": 0
        }
    
    def moderate(self, content: str) -> dict:
        """
        Moderate a piece of content.
        
        Args:
            content: The content to moderate
            
        Returns:
            Moderation result dictionary
        """
        # TODO: Invoke graph with initial state
        # YOUR CODE HERE
        
        # TODO: Update statistics
        # YOUR CODE HERE
        
        pass
    
    def get_statistics(self) -> dict:
        """
        Get moderation statistics.
        """
        # TODO: Return statistics
        # YOUR CODE HERE
        pass
    
    # BONUS: Implement these methods
    
    def batch_moderate(self, contents: list[str]) -> list[dict]:
        """
        Moderate multiple pieces of content.
        """
        # TODO (BONUS): Batch moderation
        pass
    
    def export_report(self) -> str:
        """
        Export moderation report.
        """
        # TODO (BONUS): Generate report
        pass

---

## ‚úÖ Testing Your Implementation

Run these tests to verify your moderation pipeline works correctly:

### Test 1: Clean Content (Should Approve)

In [None]:
print("Test 1: Clean Content")
print("="*60)

pipeline = ModerationPipeline()

clean_content = "I really enjoyed this product. The quality is excellent and shipping was fast!"

result = pipeline.moderate(clean_content)

print(f"Content: {clean_content}")
print(f"\nDecision: {result['decision']}")
print(f"Toxicity Score: {result.get('toxicity_score', 0):.2f}")
print(f"Spam Score: {result.get('spam_score', 0):.2f}")
print(f"Violations: {result.get('violations', [])}")

# ‚úÖ Should be approved!

### Test 2: Toxic Content (Should Reject)

In [None]:
print("\nTest 2: Toxic Content")
print("="*60)

toxic_content = "You're an idiot and I hate everything about you. This is terrible!"

result = pipeline.moderate(toxic_content)

print(f"Content: {toxic_content}")
print(f"\nDecision: {result['decision']}")
print(f"Toxicity Score: {result.get('toxicity_score', 0):.2f}")
print(f"Violations: {result.get('violations', [])}")

# ‚úÖ Should be rejected!

### Test 3: Spam Content (Should Flag/Reject)

In [None]:
print("\nTest 3: Spam Content")
print("="*60)

spam_content = "BUY NOW!!! Limited time offer!!! Click here: www.sketchy-deals.com Get rich quick!!!"

result = pipeline.moderate(spam_content)

print(f"Content: {spam_content}")
print(f"\nDecision: {result['decision']}")
print(f"Spam Score: {result.get('spam_score', 0):.2f}")
print(f"Violations: {result.get('violations', [])}")

# ‚úÖ Should be rejected or flagged!

### Test 4: PII Content (Should Flag)

In [None]:
print("\nTest 4: PII Content")
print("="*60)

pii_content = "My email is john.doe@email.com and my phone is 555-123-4567"

result = pipeline.moderate(pii_content)

print(f"Content: {pii_content}")
print(f"\nDecision: {result['decision']}")
print(f"PII Detected: {result.get('pii_detected', False)}")
print(f"Violations: {result.get('violations', [])}")

# ‚úÖ Should be flagged for review!

### Test 5: Borderline Content (Should Flag for Review)

In [None]:
print("\nTest 5: Borderline Content")
print("="*60)

borderline_content = "This product is garbage and waste of money, but the customer service was okay."

result = pipeline.moderate(borderline_content)

print(f"Content: {borderline_content}")
print(f"\nDecision: {result['decision']}")
print(f"Toxicity Score: {result.get('toxicity_score', 0):.2f}")
print(f"Human Review Required: {result.get('human_review_required', False)}")

# ‚úÖ Should be flagged for human review!

### Test 6: Statistics

In [None]:
print("\nTest 6: Statistics")
print("="*60)

stats = pipeline.get_statistics()

print("üìä Moderation Statistics:")
for key, value in stats.items():
    print(f"  {key}: {value}")

# ‚úÖ Should show accurate counts!

---

## üé® Your Own Tests

Add your own test cases here:

In [None]:
# YOUR TEST CASES HERE


---

## üìä Self-Assessment

Rate your implementation (1-5):

| Criteria | Rating | Notes |
|----------|--------|-------|
| State Management | /5 | TypedDict used correctly? |
| Parallel Checks | /5 | Multiple checks run? |
| Decision Logic | /5 | Correct routing? |
| Accuracy | /5 | Detects violations? |
| Edge Cases | /5 | Handles edge cases? |
| Code Quality | /5 | Clean, documented? |
| Bonus Features | /5 | Extra features? |
| **Total** | **/35** | |

---

## ü§î Reflection Questions

Answer these questions in the markdown cell below:

1. How did parallel checks improve the moderation pipeline?
2. What challenges did you face with conditional routing?
3. How would you tune the threshold values for production?
4. What additional checks would improve accuracy?

---

### Your Answers:

**1. Parallel Checks:**
- [Your answer here]

**2. Conditional Routing Challenges:**
- [Your answer here]

**3. Threshold Tuning:**
- [Your answer here]

**4. Additional Checks:**
- [Your answer here]

---

## üì§ Submission

### Before Submitting:

- [ ] All tests pass
- [ ] Parallel checks implemented
- [ ] Conditional routing works
- [ ] Accurate moderation decisions
- [ ] Statistics tracking works
- [ ] Code is well-documented
- [ ] Reflection questions answered
- [ ] Notebook runs from top to bottom

### How to Submit:

1. Save this notebook
2. Commit: `git commit -m "Complete Week 4 Exercise 1"`
3. Push: `git push origin week4-exercise1`
4. Submit repository link

---

**Great work on your moderation pipeline! üéâ**