# Lesson 3: Practical LLM Applications

This notebook explores three fundamental applications of Large Language Models in professional contexts:

1. **Conversational Assistants**: Building context-aware dialogue systems
2. **Document Analysis and Information Extraction**: Extracting structured data from unstructured text
3. **Report Generation**: Automating the creation of structured reports

The notebook is organized in two parts:

**Part 1: Direct LLM Integration**
Working directly with LLM APIs (Groq and Gemini) to understand core concepts

**Part 2: LangChain Agentic Framework**
Leveraging LangChain to build more sophisticated applications with agents and tools

**Prerequisites:**
- Basic Python programming
- Understanding of API concepts
- Familiarity with LLM fundamentals (covered in Lessons 1-2)

**Estimated time:** 90-120 minutes

---
## Initial Setup

### API Configuration

This notebook uses two LLM providers:

**1. Groq API** - Fast inference with open-source models
- Console: https://console.groq.com
- Model: `llama-3.3-70b-versatile`

**2. Google Gemini API** - Google's multimodal models
- Get API key: https://aistudio.google.com/app/apikey
- Model: `gemini-2.0-flash`

### How to Get API Keys

**Groq:**
1. Sign up at https://console.groq.com
2. Go to "API Keys" in the sidebar
3. Click "Create API Key"

**Gemini:**
1. Go to https://aistudio.google.com
2. Sign in with your Google account
3. Click "Get API Key"

In [2]:
# Install required packages
# !pip install --quiet openai google-genai python-dotenv langchain langchain-groq langchain-google-genai

In [None]:
import os
import json
import re
from typing import Optional, List, Dict, Any
from datetime import datetime

from openai import OpenAI
from google import genai
from google.genai import types
from dotenv import load_dotenv

load_dotenv()

# Read keys from environment
GROQ_API_KEY = os.getenv("GROQ_API_KEY")
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")

print("API Keys configured:", {"groq": bool(GROQ_API_KEY), "gemini": bool(GEMINI_API_KEY)})

API Keys configured: {'groq': True, 'gemini': True}


In [5]:
# Initialize clients
groq_client = None
gemini_client = None

if GROQ_API_KEY:
    groq_client = OpenAI(api_key=GROQ_API_KEY, base_url="https://api.groq.com/openai/v1")
    print("Groq client initialized")

if GEMINI_API_KEY:
    gemini_client = genai.Client(api_key=GEMINI_API_KEY)
    print("Gemini client initialized")

if not any([groq_client, gemini_client]):
    print("Warning: No LLM clients configured. Set GROQ_API_KEY and/or GEMINI_API_KEY.")

Groq client initialized
Gemini client initialized


In [6]:
# LLM wrapper functions

def call_groq(
    prompt: str, 
    system_prompt: str = None,
    temperature: float = 0.0, 
    max_tokens: int = 1000, 
    model: str = "llama-3.3-70b-versatile"
) -> str:
    """Call Groq API with optional system prompt."""
    if not groq_client:
        return "Groq client not configured (set GROQ_API_KEY)."
    try:
        messages = []
        if system_prompt:
            messages.append({"role": "system", "content": system_prompt})
        messages.append({"role": "user", "content": prompt})
        
        resp = groq_client.chat.completions.create(
            model=model,
            messages=messages,
            temperature=temperature,
            max_tokens=max_tokens,
        )
        return resp.choices[0].message.content
    except Exception as e:
        return f"Groq API error: {e}"


def call_gemini(
    prompt: str, 
    system_prompt: str = None,
    temperature: float = 0.0, 
    max_tokens: int = 1000, 
    model: str = "gemini-2.0-flash"
) -> str:
    """Call Gemini API with optional system prompt."""
    if not gemini_client:
        return "Gemini client not configured (set GEMINI_API_KEY)."
    try:
        full_prompt = prompt
        if system_prompt:
            full_prompt = f"{system_prompt}\n\n{prompt}"
        
        response = gemini_client.models.generate_content(
            contents=full_prompt, 
            model=model, 
            config=types.GenerateContentConfig(
                temperature=temperature, 
                max_output_tokens=max_tokens
            )
        )
        return response.text
    except Exception as e:
        return f"Gemini API error: {e}"


def call_llm(
    prompt: str, 
    provider: str = "groq", 
    system_prompt: str = None,
    temperature: float = 0.0, 
    max_tokens: int = 1000, 
    model: Optional[str] = None
) -> str:
    """Unified wrapper for LLM calls."""
    provider = provider.lower()
    if provider == "groq":
        return call_groq(
            prompt, 
            system_prompt=system_prompt,
            temperature=temperature, 
            max_tokens=max_tokens, 
            model=(model or "llama-3.3-70b-versatile")
        )
    if provider == "gemini":
        return call_gemini(
            prompt, 
            system_prompt=system_prompt,
            temperature=temperature, 
            max_tokens=max_tokens, 
            model=(model or "gemini-2.0-flash")
        )
    return f"Unknown provider: {provider}. Use 'groq' or 'gemini'."


# Select default provider
PROVIDER = "groq"  # Change to "gemini" if preferred
print(f"Default provider: {PROVIDER}")
print("LLM wrapper functions ready")

Default provider: groq
LLM wrapper functions ready


In [7]:
# Quick connectivity test
test_prompt = "What is 2 + 2? Answer with just the number."
print("Testing LLM connection...")
result = call_llm(test_prompt, provider=PROVIDER)
print(f"Response: {result}")

Testing LLM connection...
Response: 4


---
# Part 1: Direct LLM Integration

In this section, we explore the three applications using direct LLM API calls. This approach provides full control over the interaction and helps understand the underlying mechanics.

---
## Application 1: Conversational Assistants

Conversational assistants are systems that maintain context across multiple exchanges, providing coherent and contextually relevant responses.

### Key Concepts:
- **Conversation History**: Maintaining state across turns
- **System Prompts**: Defining assistant behavior and personality
- **Context Window Management**: Handling token limits
- **Turn-taking**: Managing user/assistant message flow

### 1.1 Basic Conversational Assistant

In [14]:
class ConversationalAssistant:
    """
    A conversational assistant that maintains context across multiple exchanges.
    """
    
    def __init__(
        self, 
        system_prompt: str,
        provider: str = "groq",
        max_history: int = 10 #10 is the maximum number of exchanges to keep in history, we could set it to 20 or 30 but 
                                # it may impact performance and cost.
                                
    ):
        """
        Initialize the conversational assistant.
        
        Args:
            system_prompt: Defines the assistant's behavior and role
            provider: LLM provider to use ('groq' or 'gemini')
            max_history: Maximum number of exchanges to maintain
        """
        self.system_prompt = system_prompt
        self.provider = provider
        self.max_history = max_history
        self.conversation_history: List[Dict[str, str]] = []
    
    def _build_prompt(self, user_message: str) -> str:
        """Build the full prompt including conversation history."""
        history_text = ""
        for exchange in self.conversation_history[-self.max_history:]:
            history_text += f"User: {exchange['user']}\n"
            history_text += f"Assistant: {exchange['assistant']}\n\n"
        
        full_prompt = f"""Previous conversation:
{history_text}

Current user message: {user_message}

Provide a helpful response based on the conversation context."""
        
        return full_prompt
    
    def chat(self, user_message: str) -> str:
        """
        Process a user message and return the assistant's response.
        
        Args:
            user_message: The user's input message
            
        Returns:
            The assistant's response
        """
        prompt = self._build_prompt(user_message)
        
        response = call_llm(
            prompt=prompt,
            provider=self.provider,
            system_prompt=self.system_prompt,
            temperature=0.3
        )
        
        # Store in history
        self.conversation_history.append({
            "user": user_message,
            "assistant": response
        })
        
        return response
    
    def reset(self):
        """Clear the conversation history."""
        self.conversation_history = []
        print("Conversation history cleared.")
    
    def get_history(self) -> List[Dict[str, str]]:
        """Return the conversation history."""
        return self.conversation_history

In [10]:
# Example: Campaign Analysis Assistant

CAMPAIGN_ASSISTANT_PROMPT = """You are a professional advertising campaign analyst.
Your role is to help marketing professionals understand and optimize their campaigns.

Guidelines:
- Provide data-driven insights when possible
- Ask clarifying questions when information is incomplete
- Use industry-standard terminology
- Be concise but thorough
- When you don't have specific data, clearly state your assumptions"""

# Initialize the assistant
campaign_assistant = ConversationalAssistant(
    system_prompt=CAMPAIGN_ASSISTANT_PROMPT,
    provider=PROVIDER
)

# Simulate a conversation
print("=" * 60)
print("CAMPAIGN ANALYSIS ASSISTANT DEMO")
print("=" * 60)

# First exchange
response1 = campaign_assistant.chat(
    "I have a campaign with 5.2 million impressions and 48% reach. Is this good performance?"
)
print(f"\nUser: I have a campaign with 5.2 million impressions and 48% reach. Is this good performance?")
print(f"\nAssistant: {response1}")


CAMPAIGN ANALYSIS ASSISTANT DEMO

User: I have a campaign with 5.2 million impressions and 48% reach. Is this good performance?

Assistant: To assess the performance of your campaign, we need to consider a few more metrics. The 5.2 million impressions and 48% reach provide a good starting point, but I'd like to know more about the frequency, click-through rate (CTR), conversion rate, and the target audience size.

Can you please provide the following information:
1. What is the average frequency (the number of times a user sees your ad)?
2. What is the CTR for this campaign?
3. What are the campaign's conversion goals and metrics (e.g., sales, sign-ups, downloads)?
4. What is the estimated size of your target audience?

With this information, I can offer a more informed analysis of your campaign's performance and suggest potential areas for optimization.


In [11]:

# Second exchange (referencing previous context)
response2 = campaign_assistant.chat(
    "The target was adults 25-54. How does that change your assessment?"
)
print(f"\n{'='*60}")
print(f"\nUser: The target was adults 25-54. How does that change your assessment?")
print(f"\nAssistant: {response2}")




User: The target was adults 25-54. How does that change your assessment?

Assistant: With the target audience defined as adults 25-54, we can start to piece together a more comprehensive understanding of your campaign's performance. However, I still need to know the average frequency, CTR, conversion goals and metrics, and the estimated size of this target audience to provide a thorough analysis.

Assuming the United States as the target market, the estimated population of adults 25-54 is around 125 million people, according to the US Census Bureau. Given your campaign's 48% reach, this would translate to approximately 60 million people being exposed to your ads. 

Considering your campaign has 5.2 million impressions, this seems relatively low compared to the potential reach. This could indicate a low average frequency, meaning users are not being exposed to your ads multiple times.

To better understand the campaign's effectiveness, I would like to reiterate my previous questions:


In [12]:

# Third exchange
response3 = campaign_assistant.chat(
    "What frequency would you recommend for this target?"
)
print(f"\n{'='*60}")
print(f"\nUser: What frequency would you recommend for this target?")
print(f"\nAssistant: {response3}")



User: What frequency would you recommend for this target?

Assistant: Based on our conversation, I would recommend an average frequency of 3-5 for your campaign targeting adults 25-54. This range allows for sufficient exposure to your ads without overwhelming your target audience.

In the advertising industry, a frequency of 3-5 is often considered a sweet spot, as it enables users to become familiar with your brand and messaging without feeling bombarded. However, the ideal frequency may vary depending on your campaign's specific goals, ad creative, and targeting strategy.

Given your campaign's current 48% reach and 5.2 million impressions, aiming for an average frequency of 3-5 could help increase the overall impact of your ads. To achieve this, you may need to adjust your ad rotation, bidding strategy, or budget allocation.

To provide a more tailored recommendation, I would still like to know the following:
1. What is the CTR for this campaign?
2. What are the campaign's convers

In [15]:
response_4 = campaign_assistant.chat(
    "Thanks! Can you summarize the key points from our discussion?"
)
print(f"\n{'='*60}")
print(f"\nUser: Thanks! Can you summarize the key points from our discussion?")
print(f"\nAssistant: {response_4}")



User: Thanks! Can you summarize the key points from our discussion?

Assistant: Here's a summary of the key points from our discussion:

1. **Initial Assessment**: We started with your campaign's 5.2 million impressions and 48% reach, but determined that more metrics were needed to assess performance.
2. **Target Audience**: You identified the target audience as adults 25-54, which helped to provide context for the campaign's reach and potential impact.
3. **Estimated Target Audience Size**: Assuming a US target market, we estimated the target audience size to be around 125 million people, with your campaign reaching approximately 60 million people (48% of the target audience).
4. **Low Impressions**: Given the large potential reach, your campaign's 5.2 million impressions seemed relatively low, suggesting a low average frequency.
5. **Recommended Frequency**: I recommended an average frequency of 3-5 for your campaign, which is considered a sweet spot in the advertising industry, al

### 1.2 Specialized Domain Assistant

The following example demonstrates a more specialized assistant with domain-specific knowledge.

In [16]:
# Specialized prompt for TV advertising metrics

TV_METRICS_PROMPT = """You are an expert in television advertising metrics and measurement.

Your knowledge includes:
- Reach and Frequency calculations
- GRP (Gross Rating Points) analysis
- CPM and cost efficiency metrics
- Audience measurement methodologies
- Campaign optimization strategies

When answering questions:
1. Use precise industry terminology
2. Provide formulas when relevant
3. Explain the practical implications of metrics
4. Reference industry benchmarks when appropriate

If the user provides data, perform calculations and explain each step."""

metrics_assistant = ConversationalAssistant(
    system_prompt=TV_METRICS_PROMPT,
    provider=PROVIDER
)

# Demonstrate specialized knowledge
print("=" * 60)
print("TV METRICS EXPERT DEMO")
print("=" * 60)

response = metrics_assistant.chat(
    "I need to calculate the GRP for a campaign with 45% reach and 3.5 average frequency."
)
print(f"\nUser: I need to calculate the GRP for a campaign with 45% reach and 3.5 average frequency.")
print(f"\nAssistant: {response}")

TV METRICS EXPERT DEMO

User: I need to calculate the GRP for a campaign with 45% reach and 3.5 average frequency.

Assistant: To calculate the Gross Rating Points (GRP) for your campaign, we can use the following formula:

GRP = Reach (%) x Average Frequency

Given the data you provided:
Reach (%) = 45%
Average Frequency = 3.5

Plugging in the numbers:
GRP = 45 x 3.5
GRP = 157.5

So, the GRP for your campaign is 157.5.

In practical terms, this means that your campaign has delivered 157.5% of the total potential audience, with the average person seeing your ad 3.5 times. This is a relatively high GRP, indicating a strong campaign presence.

For context, industry benchmarks for GRP vary by category and target audience, but as a general guideline, a GRP of 100-200 is considered moderate to high for a national campaign. However, the optimal GRP will depend on your specific campaign goals and target audience.

Keep in mind that GRP is just one metric, and it's essential to consider other 

### Exercise 1.1: Build a Custom Conversational Assistant

Create a conversational assistant for a specific use case of your choice.

Requirements:
- Define a clear system prompt with role and guidelines
- Test with at least 3 conversation turns
- Demonstrate context retention across turns

In [17]:
# Exercise 1.1: Your solution here

# TODO: Define your system prompt
CUSTOM_ASSISTANT_PROMPT = """
# Define your assistant's role and guidelines here
"""

# TODO: Initialize the assistant
# custom_assistant = ConversationalAssistant(...)

# TODO: Test with multiple conversation turns
# response1 = custom_assistant.chat("...")
# response2 = custom_assistant.chat("...")
# response3 = custom_assistant.chat("...")

# See solutions.py for reference implementation

---
## Application 2: Document Analysis and Information Extraction

Information extraction involves identifying and extracting structured data from unstructured text documents.

### Key Concepts:
- **Named Entity Recognition**: Identifying entities (dates, amounts, names)
- **Structured Output**: Converting text to JSON/structured formats
- **Schema Validation**: Ensuring extracted data matches expected format
- **Multi-document Analysis**: Processing multiple documents consistently

### 2.1 Basic Information Extraction

In [18]:
def extract_campaign_info(document: str, provider: str = PROVIDER) -> Dict[str, Any]:
    """
    Extract structured campaign information from a text document.
    
    Args:
        document: The text document to analyze
        provider: LLM provider to use
        
    Returns:
        Dictionary containing extracted information
    """
    extraction_prompt = f"""Analyze the following document and extract campaign information.

DOCUMENT:
{document}

Extract the following fields if present:
- campaign_name: Name or title of the campaign
- start_date: Campaign start date (format: YYYY-MM-DD)
- end_date: Campaign end date (format: YYYY-MM-DD)
- budget: Campaign budget (numeric value)
- currency: Currency of the budget
- target_audience: Description of target audience
- channels: List of advertising channels used
- objectives: Campaign objectives or goals
- kpis: Key performance indicators mentioned

Return ONLY a valid JSON object with these fields.
Use null for fields that cannot be determined from the document.
Do not include any text outside the JSON object."""

    response = call_llm(
        prompt=extraction_prompt,
        provider=provider,
        temperature=0.0,
        max_tokens=1000
    )
    
    # Parse JSON from response
    try:
        # Try to extract JSON from the response
        json_match = re.search(r'\{.*\}', response, re.DOTALL)
        if json_match:
            return json.loads(json_match.group())
        return {"error": "Could not parse JSON from response", "raw_response": response}
    except json.JSONDecodeError as e:
        return {"error": f"JSON parse error: {e}", "raw_response": response}

In [19]:
# Example document for extraction
sample_document = """
CAMPAIGN BRIEF: Autumn 2024 Brand Awareness

Client: TechCorp Italia
Campaign Period: September 15, 2024 - November 30, 2024
Total Budget: EUR 250,000

Target Audience:
Adults aged 25-45, primary focus on professionals in urban areas.
Secondary audience includes tech-savvy young adults 18-24.

Media Channels:
- National TV (RAI, Mediaset)
- Digital Display (programmatic)
- Social Media (Instagram, LinkedIn)
- Out-of-Home in Milan and Rome

Campaign Objectives:
1. Increase brand awareness by 15% among target audience
2. Drive 50,000 qualified website visits
3. Achieve minimum reach of 60% in primary target

Key Performance Indicators:
- Brand awareness lift (pre/post survey)
- Website traffic from campaign sources
- Social media engagement rate
- TV reach and frequency metrics
"""

print("=" * 60)
print("DOCUMENT ANALYSIS DEMO")
print("=" * 60)
print("\nExtracting information from campaign brief...")

extracted_info = extract_campaign_info(sample_document)
print("\nExtracted Information:")
print(json.dumps(extracted_info, indent=2, ensure_ascii=False))

DOCUMENT ANALYSIS DEMO

Extracting information from campaign brief...

Extracted Information:
{
  "campaign_name": "Autumn 2024 Brand Awareness",
  "start_date": "2024-09-15",
  "end_date": "2024-11-30",
  "budget": 250000,
  "currency": "EUR",
  "target_audience": "Adults aged 25-45, primary focus on professionals in urban areas. Secondary audience includes tech-savvy young adults 18-24.",
  "channels": [
    "National TV (RAI, Mediaset)",
    "Digital Display (programmatic)",
    "Social Media (Instagram, LinkedIn)",
    "Out-of-Home in Milan and Rome"
  ],
  "objectives": [
    "Increase brand awareness by 15% among target audience",
    "Drive 50,000 qualified website visits",
    "Achieve minimum reach of 60% in primary target"
  ],
  "kpis": [
    "Brand awareness lift (pre/post survey)",
    "Website traffic from campaign sources",
    "Social media engagement rate",
    "TV reach and frequency metrics"
  ]
}


### 2.2 Multi-Field Extraction with Validation

In [20]:
class DocumentExtractor:
    """
    A document extractor with schema validation and error handling.
    """
    
    def __init__(self, schema: Dict[str, Any], provider: str = "groq"):
        """
        Initialize the extractor with a schema.
        
        Args:
            schema: Dictionary defining expected fields and their types
            provider: LLM provider to use
        """
        self.schema = schema
        self.provider = provider
    
    def _build_schema_description(self) -> str:
        """Build a description of the schema for the prompt."""
        lines = []
        for field, config in self.schema.items():
            field_type = config.get("type", "string")
            description = config.get("description", "")
            required = config.get("required", False)
            req_marker = "(required)" if required else "(optional)"
            lines.append(f"- {field} [{field_type}] {req_marker}: {description}")
        return "\n".join(lines)
    
    def extract(self, document: str) -> Dict[str, Any]:
        """
        Extract information from a document according to the schema.
        
        Args:
            document: The text document to analyze
            
        Returns:
            Dictionary containing extracted and validated information
        """
        schema_desc = self._build_schema_description()
        
        prompt = f"""Analyze the following document and extract information according to the schema.

DOCUMENT:
{document}

EXTRACTION SCHEMA:
{schema_desc}

INSTRUCTIONS:
1. Extract values for each field based on the document content
2. Use null for fields that cannot be determined
3. Ensure data types match the schema specifications
4. For list fields, provide an array of values
5. For numeric fields, extract only the number (without currency symbols)

Return ONLY a valid JSON object matching the schema. No additional text."""

        response = call_llm(
            prompt=prompt,
            provider=self.provider,
            temperature=0.0
        )
        
        # Parse and validate
        try:
            json_match = re.search(r'\{.*\}', response, re.DOTALL)
            if json_match:
                extracted = json.loads(json_match.group())
                validation_result = self._validate(extracted)
                return {
                    "data": extracted,
                    "validation": validation_result
                }
            return {"error": "Could not parse JSON", "raw_response": response}
        except json.JSONDecodeError as e:
            return {"error": f"JSON parse error: {e}", "raw_response": response}
    
    def _validate(self, data: Dict[str, Any]) -> Dict[str, Any]:
        """Validate extracted data against the schema."""
        errors = []
        warnings = []
        
        for field, config in self.schema.items():
            required = config.get("required", False)
            field_type = config.get("type", "string")
            
            if field not in data or data[field] is None:
                if required:
                    errors.append(f"Required field '{field}' is missing or null")
                continue
            
            value = data[field]
            
            # Type validation
            if field_type == "number" and not isinstance(value, (int, float)):
                errors.append(f"Field '{field}' should be a number, got {type(value).__name__}")
            elif field_type == "list" and not isinstance(value, list):
                errors.append(f"Field '{field}' should be a list, got {type(value).__name__}")
            elif field_type == "string" and not isinstance(value, str):
                warnings.append(f"Field '{field}' converted to string")
        
        return {
            "is_valid": len(errors) == 0,
            "errors": errors,
            "warnings": warnings
        }

In [21]:
# Define extraction schema
campaign_schema = {
    "campaign_name": {
        "type": "string",
        "description": "Name or title of the campaign",
        "required": True
    },
    "client": {
        "type": "string",
        "description": "Client or brand name",
        "required": True
    },
    "budget_amount": {
        "type": "number",
        "description": "Total budget as a numeric value",
        "required": True
    },
    "budget_currency": {
        "type": "string",
        "description": "Currency code (EUR, USD, etc.)",
        "required": False
    },
    "target_age_min": {
        "type": "number",
        "description": "Minimum age of target audience",
        "required": False
    },
    "target_age_max": {
        "type": "number",
        "description": "Maximum age of target audience",
        "required": False
    },
    "channels": {
        "type": "list",
        "description": "List of advertising channels",
        "required": False
    },
    "objectives": {
        "type": "list",
        "description": "List of campaign objectives",
        "required": False
    }
}

# Create extractor and process document
extractor = DocumentExtractor(schema=campaign_schema, provider=PROVIDER)

print("=" * 60)
print("SCHEMA-BASED EXTRACTION DEMO")
print("=" * 60)

result = extractor.extract(sample_document)
print("\nExtraction Result:")
print(json.dumps(result, indent=2, ensure_ascii=False))

SCHEMA-BASED EXTRACTION DEMO

Extraction Result:
{
  "data": {
    "campaign_name": "Autumn 2024 Brand Awareness",
    "client": "TechCorp Italia",
    "budget_amount": 250000,
    "budget_currency": "EUR",
    "target_age_min": 18,
    "target_age_max": 45,
    "channels": [
      "National TV",
      "Digital Display",
      "Social Media",
      "Out-of-Home"
    ],
    "objectives": [
      "Increase brand awareness by 15% among target audience",
      "Drive 50,000 qualified website visits",
      "Achieve minimum reach of 60% in primary target"
    ]
  },
  "validation": {
    "is_valid": true,
    "errors": [],
  }
}


### Exercise 2.1: Create a Custom Document Extractor

Create an extractor for a different document type (e.g., invoice, contract, report).

Requirements:
- Define a custom schema with at least 6 fields
- Include different data types (string, number, list)
- Test with a sample document
- Verify validation works correctly

In [None]:
# Exercise 2.1: Your solution here

# TODO: Define your custom schema
custom_schema = {
    # Add your fields here
}

# TODO: Create sample document text
sample_custom_document = """
# Your sample document here
"""

# TODO: Create extractor and test
# custom_extractor = DocumentExtractor(schema=custom_schema, provider=PROVIDER)
# result = custom_extractor.extract(sample_custom_document)
# print(json.dumps(result, indent=2))

# See solutions.py for reference implementation

---
## Application 3: Report Generation

Automated report generation involves creating structured, professional documents from data and analysis results.

### Key Concepts:
- **Template-based Generation**: Using templates for consistent formatting
- **Data Integration**: Incorporating numerical data and metrics
- **Narrative Generation**: Creating explanatory text from data
- **Multi-section Documents**: Organizing content logically

### 3.1 Basic Report Generator

In [22]:
class ReportGenerator:
    """
    Generates structured reports from data using LLM.
    """
    
    def __init__(self, provider: str = "groq"):
        self.provider = provider
    
    def generate_executive_summary(
        self, 
        campaign_data: Dict[str, Any],
        include_recommendations: bool = True
    ) -> str:
        """
        Generate an executive summary for a campaign.
        
        Args:
            campaign_data: Dictionary containing campaign metrics
            include_recommendations: Whether to include recommendations
            
        Returns:
            Generated executive summary text
        """
        prompt = f"""Generate a professional executive summary for the following campaign data.

CAMPAIGN DATA:
{json.dumps(campaign_data, indent=2)}

REQUIREMENTS:
1. Start with a brief overview (2-3 sentences)
2. Highlight key performance metrics
3. Compare against targets if provided
4. {"Include 2-3 actionable recommendations" if include_recommendations else "Do not include recommendations"}
5. Use professional business language
6. Keep the summary concise (maximum 300 words)

Write the executive summary:"""

        return call_llm(
            prompt=prompt,
            provider=self.provider,
            temperature=0.3,
            max_tokens=500
        )
    
    def generate_detailed_report(
        self,
        campaign_data: Dict[str, Any],
        sections: List[str] = None
    ) -> Dict[str, str]:
        """
        Generate a detailed multi-section report.
        
        Args:
            campaign_data: Dictionary containing campaign metrics
            sections: List of section names to include
            
        Returns:
            Dictionary mapping section names to generated content
        """
        if sections is None:
            sections = [
                "Overview",
                "Performance Analysis",
                "Audience Insights",
                "Recommendations"
            ]
        
        report = {}
        
        for section in sections:
            prompt = f"""Generate the "{section}" section for a campaign performance report.

CAMPAIGN DATA:
{json.dumps(campaign_data, indent=2)}

SECTION: {section}

GUIDELINES:
- Write 2-4 paragraphs appropriate for this section
- Use data from the campaign metrics
- Maintain professional tone
- Be specific with numbers and percentages

Generate the {section} section:"""

            content = call_llm(
                prompt=prompt,
                provider=self.provider,
                temperature=0.3,
                max_tokens=400
            )
            report[section] = content
        
        return report

In [23]:
# Sample campaign data for report generation
campaign_metrics = {
    "campaign_name": "Q4 2024 Brand Launch",
    "client": "TechCorp Italia",
    "period": {
        "start": "2024-10-01",
        "end": "2024-12-15"
    },
    "budget": {
        "planned": 200000,
        "spent": 187500,
        "currency": "EUR"
    },
    "performance": {
        "reach_percentage": 52.3,
        "reach_target": 50.0,
        "frequency": 4.2,
        "impressions": 6100000,
        "grp": 219.7
    },
    "audience": {
        "primary_target": "Adults 25-44",
        "reach_by_age": {
            "25-34": 58.2,
            "35-44": 49.1
        }
    },
    "channels": {
        "tv": {"spend_pct": 60, "reach": 45.2},
        "digital": {"spend_pct": 30, "reach": 38.5},
        "ooh": {"spend_pct": 10, "reach": 22.1}
    }
}

# Generate reports
report_generator = ReportGenerator(provider=PROVIDER)

print("=" * 60)
print("REPORT GENERATION DEMO")
print("=" * 60)

# Executive Summary
print("\n--- EXECUTIVE SUMMARY ---\n")
summary = report_generator.generate_executive_summary(campaign_metrics)
print(summary)

REPORT GENERATION DEMO

--- EXECUTIVE SUMMARY ---

The Q4 2024 Brand Launch campaign for TechCorp Italia aimed to effectively reach and engage the target audience, with a planned budget of €200,000. The campaign, which ran from October 1, 2024, to December 15, 2024, successfully achieved a reach percentage of 52.3%, exceeding the target of 50.0%. This campaign's performance demonstrates a strong foundation for future brand growth.

Key performance metrics include a frequency of 4.2, 6,100,000 impressions, and a GRP of 219.7, indicating a significant presence across the target audience. Notably, the primary target audience of Adults 25-44 was effectively reached, with 58.2% of 25-34-year-olds and 49.1% of 35-44-year-olds engaged. The campaign's channel distribution, with 60% allocated to TV, 30% to digital, and 10% to Out-of-Home (OOH), contributed to the overall reach, with TV and digital being the most effective channels.

To further optimize future campaigns, we recommend: (1) alloca

In [24]:
# Generate detailed multi-section report
print("\n" + "=" * 60)
print("DETAILED REPORT SECTIONS")
print("=" * 60)

detailed_report = report_generator.generate_detailed_report(
    campaign_metrics,
    sections=["Overview", "Performance Analysis", "Recommendations"]
)

for section, content in detailed_report.items():
    print(f"\n--- {section.upper()} ---\n")
    print(content)
    print()


DETAILED REPORT SECTIONS

--- OVERVIEW ---

The Q4 2024 Brand Launch campaign for TechCorp Italia was a comprehensive marketing effort that took place from October 1, 2024, to December 15, 2024. With a planned budget of €200,000, the campaign aimed to create awareness and reach a significant portion of the target audience. The actual spend was €187,500, which is 93.75% of the planned budget, indicating efficient resource allocation. The campaign's primary target audience was Adults 25-44, with a specific focus on reaching 50% of this demographic.

In terms of performance, the campaign achieved a reach percentage of 52.3%, surpassing the target of 50%. This translates to approximately 6,100,000 impressions, with a frequency of 4.2. The Gross Rating Point (GRP) for the campaign was 219.7, indicating a significant impact on the target audience. The reach by age group was also notable, with 58.2% of the 25-34 age group and 49.1% of the 35-44 age group being reached. These numbers demonstr

### Exercise 3.1: Create a Custom Report Template

Design and implement a custom report template for a specific use case.

Requirements:
- Create a new template with both static and dynamic sections
- Include both data fields and LLM-generated narrative
- Support at least 5 data fields
- Test with sample data

In [None]:
# Exercise 3.1: Your solution here

# TODO: Define your custom report template
CUSTOM_TEMPLATE = """
# Your template here
# Use {field_name} for dynamic values
"""

# TODO: Create test data
custom_data = {
    # Your data fields
}

# TODO: Generate the report using ReportGenerator or a custom approach

# See solutions.py for reference implementation

---
# Part 2: LangChain Agentic Framework

LangChain is a framework for developing applications powered by language models. It provides abstractions for:

- **Models**: Unified interface for different LLM providers
- **Prompts**: Templates and management for prompts
- **Chains**: Combining multiple components sequentially
- **Agents**: Autonomous systems that decide which actions to take
- **Tools**: Functions that agents can use to interact with the world

Reference: https://docs.langchain.com/oss/python/langchain/overview

In [None]:
# Import LangChain components
from langchain_groq import ChatGroq
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.tools import tool
from langchain.agents import create_tool_calling_agent, AgentExecutor

In [None]:
# Initialize LangChain models
llm_groq = None
llm_gemini = None

if GROQ_API_KEY:
    llm_groq = ChatGroq(
        api_key=GROQ_API_KEY,
        model_name="llama-3.3-70b-versatile",
        temperature=0
    )
    print("LangChain Groq model initialized")

if GEMINI_API_KEY:
    llm_gemini = ChatGoogleGenerativeAI(
        google_api_key=GEMINI_API_KEY,
        model="gemini-2.0-flash",
        temperature=0
    )
    print("LangChain Gemini model initialized")

# Select default model
llm = llm_groq if llm_groq else llm_gemini
print(f"\nDefault LangChain model: {'Groq' if llm_groq else 'Gemini'}")

---
## Application 1: Conversational Assistant with LangChain

LangChain provides a more structured approach to building conversational assistants with built-in memory management.

In [None]:
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

# Create a simple conversational chain with memory
prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a professional advertising campaign analyst.
Your role is to help marketing professionals understand and optimize their campaigns.
Be concise, data-driven, and use industry-standard terminology."""),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

chain = prompt | llm

# Store for managing session histories
session_histories = {}

def get_session_history(session_id: str):
    """Get or create a session history."""
    if session_id not in session_histories:
        session_histories[session_id] = InMemoryChatMessageHistory()
    return session_histories[session_id]

# Create runnable with history
conversational_chain = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history"
)

In [None]:
# Test the conversational chain
print("=" * 60)
print("LANGCHAIN CONVERSATIONAL ASSISTANT")
print("=" * 60)

session_id = "demo_session"

# First message
response1 = conversational_chain.invoke(
    {"input": "I have a TV campaign with 48% reach and frequency of 3.5. What is the GRP?"},
    config={"configurable": {"session_id": session_id}}
)
print(f"\nUser: I have a TV campaign with 48% reach and frequency of 3.5. What is the GRP?")
print(f"\nAssistant: {response1.content}")

# Follow-up (context should be maintained)
response2 = conversational_chain.invoke(
    {"input": "How does that compare to industry benchmarks?"},
    config={"configurable": {"session_id": session_id}}
)
print(f"\n{'='*60}")
print(f"\nUser: How does that compare to industry benchmarks?")
print(f"\nAssistant: {response2.content}")

---
## Application 2: Document Analysis Agent with Tools

LangChain agents can use tools to perform specific tasks. This is particularly useful for document analysis where we need structured extraction.

In [None]:
# Define tools for document analysis

@tool
def extract_dates(text: str) -> str:
    """
    Extract all dates mentioned in the text.
    Use this tool when you need to find campaign periods or deadlines.
    """
    patterns = [
        r'\d{4}-\d{2}-\d{2}',  # YYYY-MM-DD
        r'\d{1,2}/\d{1,2}/\d{4}',  # DD/MM/YYYY or MM/DD/YYYY
        r'(?:January|February|March|April|May|June|July|August|September|October|November|December)\s+\d{1,2},?\s+\d{4}',
    ]
    
    dates_found = []
    for pattern in patterns:
        matches = re.findall(pattern, text, re.IGNORECASE)
        dates_found.extend(matches)
    
    if dates_found:
        return f"Dates found: {', '.join(dates_found)}"
    return "No dates found in the text."


@tool
def extract_monetary_values(text: str) -> str:
    """
    Extract monetary values and currencies from the text.
    Use this tool when you need to find budgets, costs, or prices.
    """
    patterns = [
        r'(?:EUR|USD|GBP|\$|\u20ac|\u00a3)\s*[\d,]+(?:\.\d{2})?',
        r'[\d,]+(?:\.\d{2})?\s*(?:EUR|USD|GBP|euros?|dollars?)',
    ]
    
    values_found = []
    for pattern in patterns:
        matches = re.findall(pattern, text, re.IGNORECASE)
        values_found.extend(matches)
    
    if values_found:
        return f"Monetary values found: {', '.join(values_found)}"
    return "No monetary values found in the text."


@tool
def extract_percentages(text: str) -> str:
    """
    Extract percentage values from the text.
    Use this tool when you need to find reach, frequency, or performance metrics.
    """
    pattern = r'\d+(?:\.\d+)?%'
    matches = re.findall(pattern, text)
    
    if matches:
        return f"Percentages found: {', '.join(matches)}"
    return "No percentages found in the text."


@tool
def analyze_sentiment(text: str) -> str:
    """
    Analyze the overall sentiment and tone of the document.
    Use this to understand if a report is positive, negative, or neutral.
    """
    positive_words = ['success', 'exceeded', 'growth', 'improved', 'excellent', 'strong', 'achieved']
    negative_words = ['failed', 'declined', 'below', 'poor', 'missed', 'weak', 'challenge']
    
    text_lower = text.lower()
    pos_count = sum(1 for word in positive_words if word in text_lower)
    neg_count = sum(1 for word in negative_words if word in text_lower)
    
    if pos_count > neg_count:
        sentiment = "positive"
    elif neg_count > pos_count:
        sentiment = "negative"
    else:
        sentiment = "neutral"
    
    return f"Document sentiment: {sentiment} (positive indicators: {pos_count}, negative indicators: {neg_count})"

In [None]:
# Create the document analysis agent
tools = [extract_dates, extract_monetary_values, extract_percentages, analyze_sentiment]

agent_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a document analysis assistant specialized in advertising campaign documents.
Your job is to extract and analyze information from documents using the available tools.

When analyzing a document:
1. Use the appropriate tools to extract specific information
2. Synthesize the extracted information into a coherent summary
3. Highlight the most important findings

Always use tools when relevant information can be extracted."""),
    MessagesPlaceholder(variable_name="chat_history", optional=True),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad")
])

# Create the agent
agent = create_tool_calling_agent(llm, tools, agent_prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

In [None]:
# Test the document analysis agent
analysis_document = """
CAMPAIGN PERFORMANCE REPORT - Q4 2024

Campaign Period: October 1, 2024 - December 31, 2024
Total Investment: EUR 180,000

Key Results:
- Reach achieved: 52.3% (target was 50%)
- Average frequency: 4.1
- Campaign GRP: 214.4

The campaign showed strong performance, exceeding the reach target by 2.3 percentage points.
Digital channels achieved 38% reach while TV contributed 45% reach with some overlap.

Budget utilization was at 92%, with the remaining 8% to be reallocated to Q1 2025.
Overall, the campaign achieved its primary objectives of brand awareness growth.
"""

print("=" * 60)
print("DOCUMENT ANALYSIS AGENT")
print("=" * 60)

result = agent_executor.invoke({
    "input": f"""Analyze this campaign report and extract all key information:

{analysis_document}

Provide a summary of:
1. All dates found
2. All monetary values
3. All percentages
4. The overall sentiment of the report"""
})

print("\n" + "=" * 60)
print("ANALYSIS RESULT:")
print("=" * 60)
print(result["output"])

### Exercise 2.2: Extend the Document Analysis Agent

Add a new tool to the document analysis agent.

Requirements:
- Create a new @tool function
- The tool should extract or analyze a specific type of information
- Integrate it with the existing agent
- Test with a sample document

In [None]:
# Exercise 2.2: Your solution here

# TODO: Create a new tool
# @tool
# def your_new_tool(text: str) -> str:
#     """
#     Description of what this tool does.
#     """
#     # Your implementation
#     pass

# TODO: Add the tool to the agent and test
# extended_tools = tools + [your_new_tool]
# extended_agent = create_tool_calling_agent(llm, extended_tools, agent_prompt)
# extended_executor = AgentExecutor(agent=extended_agent, tools=extended_tools, verbose=True)

# See solutions.py for reference implementation

---
## Application 3: Report Generation Agent

An agent that can generate comprehensive reports by using tools to gather data and format output.

In [None]:
# Define tools for report generation

# Simulated database of campaign data
CAMPAIGN_DATABASE = {
    "Q4_2024": {
        "name": "Q4 2024 Brand Campaign",
        "client": "TechCorp",
        "period": {"start": "2024-10-01", "end": "2024-12-31"},
        "budget": {"planned": 200000, "spent": 185000, "currency": "EUR"},
        "performance": {
            "reach": 52.3,
            "reach_target": 50.0,
            "frequency": 4.1,
            "impressions": 6200000,
            "grp": 214.4
        }
    },
    "Q3_2024": {
        "name": "Q3 2024 Summer Campaign",
        "client": "TechCorp",
        "period": {"start": "2024-07-01", "end": "2024-09-30"},
        "budget": {"planned": 150000, "spent": 148500, "currency": "EUR"},
        "performance": {
            "reach": 45.2,
            "reach_target": 48.0,
            "frequency": 3.8,
            "impressions": 5100000,
            "grp": 171.8
        }
    }
}


@tool
def get_campaign_data(campaign_id: str) -> str:
    """
    Retrieve campaign data from the database.
    Use this to get performance metrics, budget information, and other campaign details.
    Available campaigns: Q4_2024, Q3_2024
    """
    if campaign_id in CAMPAIGN_DATABASE:
        data = CAMPAIGN_DATABASE[campaign_id]
        return json.dumps(data, indent=2)
    return f"Campaign '{campaign_id}' not found. Available: {list(CAMPAIGN_DATABASE.keys())}"


@tool
def calculate_campaign_kpis(impressions: int, budget_spent: float, reach: float) -> str:
    """
    Calculate key performance indicators for a campaign.
    Provide impressions (total), budget_spent (in currency), and reach (percentage).
    Returns CPM and cost per reach point.
    """
    if impressions <= 0 or budget_spent <= 0 or reach <= 0:
        return "Error: All values must be positive numbers."
    
    cpm = (budget_spent / impressions) * 1000
    cost_per_reach_point = budget_spent / reach
    
    return f"""Campaign KPIs:
- CPM (Cost per Mille): {cpm:.2f}
- Cost per Reach Point: {cost_per_reach_point:.2f}
- Impressions per Euro: {impressions / budget_spent:.1f}"""


@tool
def compare_campaigns(campaign_id_1: str, campaign_id_2: str) -> str:
    """
    Compare two campaigns and provide a performance comparison.
    Use this when asked to compare campaigns or analyze trends.
    """
    if campaign_id_1 not in CAMPAIGN_DATABASE or campaign_id_2 not in CAMPAIGN_DATABASE:
        return f"One or both campaigns not found. Available: {list(CAMPAIGN_DATABASE.keys())}"
    
    c1 = CAMPAIGN_DATABASE[campaign_id_1]
    c2 = CAMPAIGN_DATABASE[campaign_id_2]
    
    comparison = {
        "campaigns": [campaign_id_1, campaign_id_2],
        "reach_comparison": {
            campaign_id_1: c1["performance"]["reach"],
            campaign_id_2: c2["performance"]["reach"],
            "difference": c1["performance"]["reach"] - c2["performance"]["reach"]
        },
        "frequency_comparison": {
            campaign_id_1: c1["performance"]["frequency"],
            campaign_id_2: c2["performance"]["frequency"],
            "difference": c1["performance"]["frequency"] - c2["performance"]["frequency"]
        },
        "budget_utilization": {
            campaign_id_1: f"{c1['budget']['spent'] / c1['budget']['planned'] * 100:.1f}%",
            campaign_id_2: f"{c2['budget']['spent'] / c2['budget']['planned'] * 100:.1f}%"
        }
    }
    
    return json.dumps(comparison, indent=2)

In [None]:
# Create the report generation agent
report_tools = [
    get_campaign_data, 
    calculate_campaign_kpis, 
    compare_campaigns
]

report_agent_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a professional report generation assistant for advertising campaigns.

Your capabilities:
1. Retrieve campaign data from the database
2. Calculate performance KPIs
3. Compare campaigns

When generating reports:
- Always start by retrieving the relevant campaign data
- Calculate KPIs when performance analysis is needed
- Use compare_campaigns when asked about trends or comparisons
- Structure the output professionally

Be thorough but concise in your analysis."""),
    MessagesPlaceholder(variable_name="chat_history", optional=True),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad")
])

report_agent = create_tool_calling_agent(llm, report_tools, report_agent_prompt)
report_executor = AgentExecutor(agent=report_agent, tools=report_tools, verbose=True)

In [None]:
# Test the report generation agent
print("=" * 60)
print("REPORT GENERATION AGENT")
print("=" * 60)

result = report_executor.invoke({
    "input": """Generate a performance report for Q4_2024 campaign that includes:
1. Campaign overview
2. Key performance metrics and KPIs
3. Comparison with Q3_2024 campaign
4. Brief recommendations for future campaigns"""
})

print("\n" + "=" * 60)
print("GENERATED REPORT:")
print("=" * 60)
print(result["output"])

### Exercise 3.2: Build a Complete Report Generation System

Create an enhanced report generation agent with additional tools.

Requirements:
- Add at least 2 new tools (e.g., trend analysis, forecast, visualization suggestions)
- Create a comprehensive report for a given campaign
- Include executive summary and detailed analysis sections

In [None]:
# Exercise 3.2: Your solution here

# TODO: Create new tools for enhanced reporting
# @tool
# def analyze_trend(...) -> str:
#     """..."""
#     pass

# @tool
# def generate_forecast(...) -> str:
#     """..."""
#     pass

# TODO: Create enhanced agent with new tools
# enhanced_report_tools = report_tools + [analyze_trend, generate_forecast]
# enhanced_report_agent = create_tool_calling_agent(llm, enhanced_report_tools, report_agent_prompt)
# enhanced_report_executor = AgentExecutor(agent=enhanced_report_agent, tools=enhanced_report_tools, verbose=True)

# TODO: Test with a comprehensive report request

# See solutions.py for reference implementation

---
## Summary

In this notebook, you learned to build three fundamental LLM applications:

### Part 1: Direct LLM Integration
1. **Conversational Assistants**: Managing context and conversation history
2. **Document Analysis**: Extracting structured data with schema validation
3. **Report Generation**: Creating professional documents from data

### Part 2: LangChain Framework
1. **Conversational Chains**: Using LangChain's memory management
2. **Tool-Using Agents**: Creating autonomous agents with specialized tools
3. **Multi-Step Workflows**: Combining tools for complex report generation

### Key Takeaways
- LangChain provides higher-level abstractions that simplify agent development
- Tools extend agent capabilities beyond pure text generation
- Proper prompt engineering is essential for both approaches
- Schema validation ensures reliable structured output
- Template-based approaches provide consistency in generated content

### Next Steps
- Complete the exercises to reinforce your understanding
- Experiment with different LLM providers and models
- Explore LangChain documentation for additional features
- Consider adding error handling and monitoring for production use

---
## Solutions

All exercise solutions are available in `solutions.py`. To check your answers:

```python
from solutions import (
    # Exercise 1.1
    exercise_1_1_solution,
    
    # Exercise 2.1
    exercise_2_1_solution,
    
    # Exercise 2.2
    exercise_2_2_solution,
    
    # Exercise 3.1
    exercise_3_1_solution,
    
    # Exercise 3.2
    exercise_3_2_solution
)
```