# Groq + Parallel Web Search: Fast AI Research with Real-Time Data

## What You'll Learn

In this notebook, you'll discover how to combine:
- **Groq's fast AI inference** (1000+ tokens/second perfect for function calling/tool use for LLMs)
- **Parallel's web search tool** (live web searches and data via Parallel's Model Context Protocol (MCP) server)

Together, they enable fast and accurate AI research on current events, product launches, and real-time information.

## Why This Approach Works

Traditional LLMs have a knowledge cutoff and can't access current information. Web search tools are often slow and sequential. This demo shows how to get:

- **Speed**: Groq delivers responses in seconds, not minutes  
- **Accuracy**: Live web data ensures up-to-date information  
- **Efficiency**: Parallel searches happen simultaneously  
- **Transparency**: See exactly what sources were used  

## Prerequisites

Before we start, you'll need:
1. **Groq API Key** - Get yours at [console.groq.com](https://console.groq.com)
2. **Parallel API Key** - Get yours at [platform.parallel.ai](https://platform.parallel.ai)

---


## Setup: Import Libraries & Configure API Keys

First, let's import the necessary libraries and set up our API keys.


In [None]:
#!/usr/bin/env python3
"""
Groq + Parallel Web Search Demo

Fast LLM with real-time web search via MCP (Model Context Protocol)
"""

import json
import time
from datetime import datetime
import matplotlib.pyplot as plt
import numpy as np
from rich.console import Console
from rich.table import Table
from rich.panel import Panel
import seaborn as sns

# Using OpenAI client for both services - Groq, OpenAI 
from openai import OpenAI
from openai.types import responses as openai_responses

# Initialize rich console for pretty printing
console = Console()

# Set up matplotlib style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

# API Configuration
GROQ_API_KEY = ""
PARALLEL_API_KEY = ""
OPENAI_API_KEY = "" # For performance comparison

# Check if API keys are set
if not GROQ_API_KEY:
    print("Please set your Groq API key:")
    print("   export GROQ_API_KEY='your_key_here'")
    
if not PARALLEL_API_KEY:
    print("Please set your Parallel API key:")
    print("   export PARALLEL_API_KEY='your_key_here'")

if GROQ_API_KEY and PARALLEL_API_KEY:
    print("API keys configured successfully!")
    
# Model configuration
MODEL = "openai/gpt-oss-120b"  # Using OpenAI's GPT OSS 120B model via Groq API
TEMPERATURE = 0.0

API keys configured successfully!


## Core Function: Web Research with Groq's Responses API 

In [2]:
class ResearchCompany:
    """
    A flexible research company that can work with different LLM clients
    and compare their performance on research tasks.
    """
    
    def __init__(self):
        self.clients = {}
        self.results_history = []
    
    def add_client(self, name, client_config):
        """
        Add a client to the research company
        
        Args:
            name (str): Name for the client (e.g., "groq_local", "openai", "groq_cloud")
            client_config (dict): Configuration for the client including:
                - api_key: API key
                - base_url: Base URL for the API
                - model: Model to use
                - provider: Provider type ("groq" or "openai")
        """
        self.clients[name] = client_config
        print(f"✅ Added {name} client with model {client_config['model']}")
    
    def research_with_client(self, client_name, company, use_mcp=True):
        """
        Research a company using a specific client
        
        Args:
            client_name (str): Name of the client to use
            company (str): Company to research
            use_mcp (bool): Whether to use MCP tools for web search
            
        Returns:
            dict: Research results with timing and performance data
        """
        if client_name not in self.clients:
            raise ValueError(f"Client '{client_name}' not found. Available: {list(self.clients.keys())}")
        
        config = self.clients[client_name]
        
        # Create client based on configuration
        client = OpenAI(
            api_key=config['api_key'],
            base_url=config['base_url']
        )
        
        print(f"🔍 Researching {company} using {client_name}...")
        print(f"   Provider: {config['provider']}")
        print(f"   Model: {config['model']}")
        print(f"   Base URL: {config['base_url']}")
        
        start_time = time.time()
        
        # Configure tools if MCP is enabled
        tools = []
        if use_mcp and PARALLEL_API_KEY:
            tools = [
                openai_responses.tool_param.Mcp(
                    server_label="parallel_web_search",
                    server_url="https://mcp.parallel.ai/v1beta/search_mcp/",
                    headers={"x-api-key": PARALLEL_API_KEY},
                    type="mcp",
                    require_approval="never",
                )
            ]
        
        # Make the request
        try:
            if tools:
                response = client.responses.create(
                    model=config['model'],
                    input=f"You are a research assistant who writes comprehensive answers in Markdown, and provide citations whenever possible. Use parallel-web search to find current information, and do only a single search. Focus on recent information and provide specific details with sources.\n\nWhat does {company} do? Also, find recent product launches from them in the past year.",
                    tools=tools,
                    tool_choice="required",
                )
            else:
                # Fallback to basic chat completion if no MCP tools
                response = client.chat.completions.create(
                    model=config['model'],
                    messages=[{
                        "role": "user", 
                        "content": f"What does {company} do? Provide information about recent product launches from them in the past year."
                    }],
                    temperature=0.0
                )
        except Exception as e:
            error_time = time.time() - start_time
            return {
                "client_name": client_name,
                "provider": config['provider'],
                "model": config['model'],
                "company": company,
                "success": False,
                "error": str(e),
                "response_time": error_time,
                "timestamp": datetime.now().isoformat()
            }
        
        total_time = time.time() - start_time
        
        # Extract content based on response type
        if hasattr(response, 'output_text'):
            # Responses API format
            content = response.output_text
            executed_tools = []
            
            # Extract MCP calls from response.output
            for output_item in response.output:
                if output_item.type == "mcp_call":
                    executed_tools.append({
                        "type": "mcp",
                        "arguments": output_item.arguments,
                        "output": output_item.output,
                        "name": output_item.name,
                        "server_label": output_item.server_label
                    })
            
            # Try to get token usage from responses API
            tokens_used = getattr(response, 'usage', None)
            if tokens_used:
                prompt_tokens = getattr(tokens_used, 'input_tokens', 0)
                completion_tokens = getattr(tokens_used, 'output_tokens', 0)
                total_tokens = prompt_tokens + completion_tokens
            else:
                # Estimate tokens from content length (rough approximation: 4 chars = 1 token)
                completion_tokens = len(content) // 4 if content else 0
                prompt_tokens = 0
                total_tokens = completion_tokens
        else:
            # Chat completions format
            content = response.choices[0].message.content
            executed_tools = []
            
            # Get token usage from chat completions
            usage = getattr(response, 'usage', None)
            if usage:
                prompt_tokens = usage.prompt_tokens
                completion_tokens = usage.completion_tokens
                total_tokens = usage.total_tokens
            else:
                # Estimate tokens from content length
                completion_tokens = len(content) // 4 if content else 0
                prompt_tokens = 0
                total_tokens = completion_tokens
        
        # Create result object
        result = {
            "client_name": client_name,
            "provider": config['provider'],
            "model": config['model'],
            "company": company,
            "success": True,
            "content": content,
            "response_time": total_time,
            "mcp_calls": len(executed_tools),
            "executed_tools": executed_tools,
            "timestamp": datetime.now().isoformat(),
            "content_length": len(content) if content else 0,
            "prompt_tokens": prompt_tokens,
            "completion_tokens": completion_tokens,
            "total_tokens": total_tokens
        }
        
        # Store in history
        self.results_history.append(result)
        
        return result
    
    def compare_clients(self, company, client_names=None, use_mcp=True):
        """
        Compare multiple clients on the same research task
        
        Args:
            company (str): Company to research
            client_names (list): List of client names to compare (None = all clients)
            use_mcp (bool): Whether to use MCP tools
            
        Returns:
            dict: Comparison results with performance metrics
        """
        if client_names is None:
            client_names = list(self.clients.keys())
        
        print(f"⚖️  COMPARING CLIENTS: {', '.join(client_names)}")
        print("=" * 60)
        
        results = []
        
        for client_name in client_names:
            console.print(f"\n🧪 Testing [bold cyan]{client_name}[/bold cyan]...")
            result = self.research_with_client(client_name, company, use_mcp)
            results.append(result)
            
            if result['success']:
                console.print(f"   ✅ Success in [green]{result['response_time']:.2f}s[/green]")
                console.print(f"   📊 Content: {result['content_length']} chars")
                console.print(f"   🔍 MCP calls: {result['mcp_calls']}")
            else:
                console.print(f"   ❌ Failed: {result['error']}")
        
        # Calculate comparison metrics
        successful_results = [r for r in results if r['success']]
        
        if successful_results:
            fastest = min(successful_results, key=lambda x: x['response_time'])
            slowest = max(successful_results, key=lambda x: x['response_time'])
            
            # Create beautiful comparison table
            self._display_comparison_table(successful_results)
            
            
            if len(successful_results) > 1:
                speed_improvement = (slowest['response_time'] - fastest['response_time']) / slowest['response_time'] * 100
                console.print(f"   📊 Speed improvement: [cyan]{speed_improvement:.1f}%[/cyan]")
        
        return {
            "company": company,
            "results": results,
            "comparison_time": datetime.now().isoformat(),
            "fastest": fastest['client_name'] if successful_results else None,
            "slowest": slowest['client_name'] if successful_results else None
        }
    
    def display_result(self, result):
        """Display a single research result in a formatted way"""
        print("=" * 80)
        print("RESEARCH RESULTS")
        print("=" * 80)
        print(f"Client: {result['client_name']} ({result['provider']})")
        print(f"Model: {result['model']}")
        print(f"Company: {result['company']}")
        print(f"Response Time: {result['response_time']:.2f}s")
        print(f"MCP Calls: {result['mcp_calls']}")
        print("=" * 80)
        
        if result['success']:
            print(result['content'])
        else:
            print(f"❌ Error: {result['error']}")
        
        print("=" * 80)
        
        # Show MCP tool details if available
        if result['executed_tools']:
            print(f"\nSEARCH DETAILS: Found {len(result['executed_tools'])} parallel web searches:")
            print("-" * 50)
            
            for i, tool in enumerate(result['executed_tools'], 1):
                print(f"\nSearch #{i}")
                print(f"   Type: {tool['type']}")
                print(f"   Tool Name: {tool['name']}")
                print(f"   Server: {tool['server_label']}")
                try:
                    args = json.loads(tool['arguments'])
                    print(f"   Arguments: {args}")

                    # Print URLs from search results for citation transparency
                    if tool['output']:
                        output_data = json.loads(tool['output'])
                        if "results" in output_data:
                            print(f"   Sources found: {len(output_data['results'])} URLs")
                            for j, result_item in enumerate(output_data["results"][:5], 1):  # Show top 5
                                print(f"      {j}. {result_item.get('url', 'No URL')}")
                                                                                     
                        if len(output_data["results"]) > 5:
                                print(f"      ... and {len(output_data['results']) - 5} more sources")
                except Exception as e:
                    print(f"   Could not parse tool data: {e}")
    
    def _display_comparison_table(self, results):
        """Display a beautiful comparison table using Rich"""
        table = Table(title="🏁 Performance Comparison", show_header=True, header_style="bold magenta")
        
        table.add_column("Client", style="cyan", width=15)
        table.add_column("Provider", style="blue", width=10)
        table.add_column("Model", style="green", width=20)
        table.add_column("Response Time", justify="right", style="yellow", width=12)
        table.add_column("Total Tokens", justify="right", style="magenta", width=12)
        table.add_column("MCP Calls", justify="right", style="cyan", width=10)
        
        # Sort by response time (fastest first)
        sorted_results = sorted(results, key=lambda x: x['response_time'])
        
        for i, result in enumerate(sorted_results):
            # Add rank emoji
            rank = "🥇" if i == 0 else "🥈" if i == 1 else "🥉" if i == 2 else f"{i+1}."
            
            client_name = f"{rank} {result['client_name']}"
            response_time = f"{result['response_time']:.2f}s"
            total_tokens = f"{result['total_tokens']:,}"
            mcp_calls = str(result['mcp_calls'])
            
            table.add_row(
                client_name,
                result['provider'],
                result['model'],
                response_time,
                total_tokens,
                mcp_calls
            )
        
        console.print(table)
    
    def _create_performance_charts(self, results, company_name):
        """Create performance visualization charts"""
        if len(results) < 2:
            return
            
        # Prepare data
        client_names = [r['client_name'] for r in results]
        response_times = [r['response_time'] for r in results]
        total_tokens = [r['total_tokens'] for r in results]
        
        # Create subplots
        fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))
        fig.suptitle(f'Performance Comparison: {company_name}', fontsize=16, fontweight='bold')
        
        # 1. Response Time Bar Chart
        bars1 = ax1.bar(client_names, response_times, color=sns.color_palette("viridis", len(results)))
        ax1.set_title('Response Time Comparison', fontweight='bold')
        ax1.set_ylabel('Time (seconds)')
        ax1.tick_params(axis='x', rotation=45)
        
        # Add value labels on bars
        for bar, time in zip(bars1, response_times):
            height = bar.get_height()
            ax1.text(bar.get_x() + bar.get_width()/2., height + 0.1,
                    f'{time:.2f}s', ha='center', va='bottom', fontweight='bold')
        
        # 2. Total Tokens Bar Chart
        bars2 = ax2.bar(client_names, total_tokens, color=sns.color_palette("plasma", len(results)))
        ax2.set_title('Total Tokens Used', fontweight='bold')
        ax2.set_ylabel('Total Tokens')
        ax2.tick_params(axis='x', rotation=45)
        
        # Add value labels on bars
        for bar, tokens in zip(bars2, total_tokens):
            height = bar.get_height()
            ax2.text(bar.get_x() + bar.get_width()/2., height + max(total_tokens)*0.01,
                    f'{tokens:,}', ha='center', va='bottom', fontweight='bold')
        
        # 3. Content Length vs Response Time
        content_lengths = [r['content_length'] for r in results]
        colors = sns.color_palette("crest", len(results))
        bars3 = ax3.bar(client_names, content_lengths, color=colors)
        ax3.set_title('Content Length Comparison', fontweight='bold')
        ax3.set_ylabel('Characters')
        ax3.tick_params(axis='x', rotation=45)
        
        # Add value labels on bars
        for bar, length in zip(bars3, content_lengths):
            height = bar.get_height()
            ax3.text(bar.get_x() + bar.get_width()/2., height + max(content_lengths)*0.01,
                    f'{length:,}', ha='center', va='bottom', fontweight='bold')
        
        # 4. Response Time vs Content Length Scatter Plot
        colors = sns.color_palette("husl", len(results))
        scatter = ax4.scatter(response_times, content_lengths, c=colors, s=200, alpha=0.7)
        
        # Add labels for each point
        for i, (x, y, name) in enumerate(zip(response_times, content_lengths, client_names)):
            ax4.annotate(name, (x, y), xytext=(5, 5), textcoords='offset points', 
                        fontsize=9, fontweight='bold')
        
        ax4.set_title('Response Time vs Content Length', fontweight='bold')
        ax4.set_xlabel('Response Time (seconds)')
        ax4.set_ylabel('Content Length (characters)')
        ax4.grid(True, alpha=0.3)
        
        # Add trend line if we have enough data points
        if len(results) >= 3:
            z = np.polyfit(response_times, content_lengths, 1)
            p = np.poly1d(z)
            ax4.plot(response_times, p(response_times), "r--", alpha=0.8, linewidth=2)
        
        plt.tight_layout()
        plt.show()
        
        # Create a summary panel
        best_overall = min(results, key=lambda x: x['response_time'])
        
        summary_text = f"""
        🏆 [bold green]Best Overall Performance[/bold green]: {best_overall['client_name']}
           • Response Time: {best_overall['response_time']:.2f}s
           • Total Tokens: {best_overall['total_tokens']:,}
           • Content Length: {best_overall['content_length']:,} chars
        """
        
        console.print(Panel(summary_text, title="📊 Performance Insights", border_style="bright_blue"))

# Create a global research company instance
research_company = ResearchCompany()

## Set Up Different Clients for Comparison

Now let's configure different LLM clients that we can use with our research company. This allows us to compare OpenAI vs Groq performance and latency on the same research tasks.


In [3]:
# Configure different clients for comparison

# Groq via cloud API
research_company.add_client("groq_cloud", {
    "api_key": GROQ_API_KEY,
    "base_url": "https://api.groq.com/openai/v1",
    "model": "openai/gpt-oss-120b",  # Popular Groq model
    "provider": "groq"
})

# OpenAI GPT-5 (for comparison)
if OPENAI_API_KEY:
    research_company.add_client("openai_gpt5", {
        "api_key": OPENAI_API_KEY,
        "base_url": "https://api.openai.com/v1",
        "model": "gpt-5",
        "provider": "openai"
    })


print(f"\n🎯 Research company configured with {len(research_company.clients)} clients:")
for name, config in research_company.clients.items():
    print(f"   • {name}: {config['provider']} - {config['model']}")


✅ Added groq_cloud client with model openai/gpt-oss-120b
✅ Added openai_gpt5 client with model gpt-5

🎯 Research company configured with 2 clients:
   • groq_cloud: groq - openai/gpt-oss-120b
   • openai_gpt5: openai - gpt-5


## Demo 1: Research a Company with a Specific Client

Let's test our research company with a specific client to see how it performs.


In [4]:
# Demo 1: Test with Groq Local
print("🚀 DEMO 1: Research with Groq Client")
print("=" * 50)

company_to_research = "Parallel Web Systems"
result = research_company.research_with_client("groq_cloud", company_to_research, use_mcp=True)

# Display the result
research_company.display_result(result)


🚀 DEMO 1: Research with Groq Client
🔍 Researching Parallel Web Systems using groq_cloud...
   Provider: groq
   Model: openai/gpt-oss-120b
   Base URL: https://api.groq.com/openai/v1
RESEARCH RESULTS
Client: groq_cloud (groq)
Model: openai/gpt-oss-120b
Company: Parallel Web Systems
Response Time: 10.48s
MCP Calls: 1
## Parallel Web Systems – What the company does  

Parallel Web Systems (often shortened to **Parallel**) is a Palo‑Alto‑based AI‑infrastructure startup founded in 2023 by former Twitter CEO Parag Agrawal. Its mission is to **re‑engineer the open web for artificial‑intelligence agents** – the “second user” of the internet – by providing **high‑accuracy, enterprise‑grade APIs that let AI models retrieve, rank, reason over, and synthesize web data at scale**【5†L1-L4】【31†L1-L4】.  

Key capabilities  

| Capability | How it works | Why it matters |
|------------|--------------|----------------|
| **Deep Research API** – a structured web‑search and data‑extraction service built 

## Demo 2: Compare OpenAI vs Groq Latency

Now let's compare the latency and performance between different providers on the same research task. This will show you the speed differences between OpenAI and Groq.


In [5]:
# Demo 2: Compare all available clients
print("⚡ DEMO 2: OpenAI vs Groq Latency Comparison")
print("=" * 60)

company_to_research = "Anthropic"

# Compare all clients (or specify specific ones)
comparison_result = research_company.compare_clients(
    company_to_research, 
    client_names=None,  # None = all clients, or specify like ["groq_local", "openai_gpt4o"]
    use_mcp=True
)

# Show detailed results for each client
print(f"\n📊 DETAILED LATENCY BREAKDOWN:")
print("-" * 40)

successful_results = [r for r in comparison_result['results'] if r['success']]
if successful_results:
    # Sort by response time
    sorted_results = sorted(successful_results, key=lambda x: x['response_time'])
    
    for i, result in enumerate(sorted_results, 1):
        print(f"{i}. {result['client_name']} ({result['provider']})")
        print(f"   Model: {result['model']}")
        print(f"   Response Time: {result['response_time']:.2f}s")
        print(f"   Content Length: {result['content_length']} chars")
        print(f"   MCP Calls: {result['mcp_calls']}")
        
        # Calculate characters per second
        if result['content_length'] > 0:
            chars_per_sec = result['content_length'] / result['response_time']
            print(f"   Chars/sec: {chars_per_sec:.1f}")
        print()

# Save comparison results to JSON for later analysis
comparison_filename = f"latency_comparison_{company_to_research.lower().replace(' ', '_')}_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
with open(comparison_filename, 'w') as f:
    json.dump(comparison_result, f, indent=2, default=str)
    
print(f"💾 Detailed comparison saved to {comparison_filename}")

⚡ DEMO 2: OpenAI vs Groq Latency Comparison
⚖️  COMPARING CLIENTS: groq_cloud, openai_gpt5


🔍 Researching Anthropic using groq_cloud...
   Provider: groq
   Model: openai/gpt-oss-120b
   Base URL: https://api.groq.com/openai/v1


🔍 Researching Anthropic using openai_gpt5...
   Provider: openai
   Model: gpt-5
   Base URL: https://api.openai.com/v1



📊 DETAILED LATENCY BREAKDOWN:
----------------------------------------
1. groq_cloud (groq)
   Model: openai/gpt-oss-120b
   Response Time: 11.15s
   Content Length: 5263 chars
   MCP Calls: 1
   Chars/sec: 472.2

2. openai_gpt5 (openai)
   Model: gpt-5
   Response Time: 88.38s
   Content Length: 3711 chars
   MCP Calls: 1
   Chars/sec: 42.0

💾 Detailed comparison saved to latency_comparison_anthropic_20250918_164358.json


## Next Steps & Advanced Usage

### Production Integration

```python
# Example: Batch research multiple companies
companies_to_research = ["OpenAI", "Anthropic", "Google", "Microsoft"]
all_results = {}

for company in companies_to_research:
    print(f"Researching {company}...")
    all_results[company] = research_company(company)
    
# Now you have comprehensive intelligence on all companies!
```

### Customization Options

You can modify the research function for different use cases:

- **Financial Analysis**: Ask about quarterly results, stock performance, market position
- **Technology Research**: Focus on patents, R&D, technical capabilities  
- **Competitive Intelligence**: Compare multiple companies side-by-side
- **News Monitoring**: Track recent announcements, press releases, partnerships

### Additional Resources

- **Groq Documentation**: [console.groq.com/docs](https://console.groq.com/docs)
- **Parallel API**: [docs.parallel.ai](https://docs.parallel.ai)
- **MCP**: [modelcontextprotcol.io](https://modelcontextprotocol.io/docs/getting-started/intro)

### Pro Tips

1. **Use streaming** (`stream=True`) for real-time responses as they generate
2. **Batch requests** for multiple companies to maximize efficiency  
3. **Cache results** for repeated queries to save API costs
4. **Customize search objectives** for domain-specific research needs

---

## Summary

You've just experienced fast AI-powered research that combines:
- **Fast responses** (3-10 seconds)
- **Real-time web data** with current information
- **Source transparency** with full citation details

This approach enables you to build applications that need both speed and accuracy for real-time research tasks.
