# Search Engine Comparison Test

This notebook tests and compares various search engines available through LangChain.

**Tested Search Engines:**
1. **Tavily** - AI-optimized search API (requires API key)
2. **DuckDuckGo** - Privacy-focused, no API key required
3. **Brave Search** - Privacy-focused with API (requires API key)
4. **Google Search** - Traditional search (requires API key + CSE ID)
5. **Bing Search** - Microsoft search (requires API key)
6. **SearxNG** - Meta search engine (requires self-hosted instance)

**Setup Requirements:**
```bash
pip install langchain-community duckduckgo-search tavily-python
```

In [None]:
# Import required libraries
import os
import time
from typing import List, Dict, Any
from datetime import datetime
import json

# LangChain search utilities
from langchain_community.utilities import (
    DuckDuckGoSearchAPIWrapper,
    TavilySearchAPIWrapper,
    BraveSearchWrapper,
    GoogleSearchAPIWrapper,
    BingSearchAPIWrapper,
    SearxSearchWrapper
)

print("‚úì Libraries imported successfully")

## Configuration

Set up API keys for search engines that require them.

In [None]:
# API Keys Configuration
# Replace with your actual API keys or set as environment variables

TAVILY_API_KEY = os.getenv('TAVILY_API_KEY', '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')
BRAVE_API_KEY = os.getenv('BRAVE_API_KEY', '')  # Get from https://brave.com/search/api/
GOOGLE_API_KEY = os.getenv('GOOGLE_API_KEY', '')  # Get from Google Cloud Console
GOOGLE_CSE_ID = os.getenv('GOOGLE_CSE_ID', '')  # Custom Search Engine ID
BING_API_KEY = os.getenv('BING_SUBSCRIPTION_KEY', '')  # Get from Azure Portal
SEARX_HOST = os.getenv('SEARX_HOST', 'https://searx.be')  # Public SearxNG instance

# Test query
TEST_QUERY = "What are the latest developments in artificial intelligence 2025?"
MAX_RESULTS = 5

print(f"Test Query: {TEST_QUERY}")
print(f"Max Results: {MAX_RESULTS}")
print(f"\nAPI Keys Status:")
print(f"  Tavily: {'‚úì Set' if TAVILY_API_KEY else '‚úó Not set'}")
print(f"  Brave: {'‚úì Set' if BRAVE_API_KEY else '‚úó Not set (optional)'}")
print(f"  Google: {'‚úì Set' if GOOGLE_API_KEY and GOOGLE_CSE_ID else '‚úó Not set (optional)'}")
print(f"  Bing: {'‚úì Set' if BING_API_KEY else '‚úó Not set (optional)'}")
print(f"  SearxNG: Using {SEARX_HOST}")

## Helper Functions

In [None]:
def format_results(results: List[Dict], engine_name: str) -> None:
    """Pretty print search results"""
    print(f"\n{'='*80}")
    print(f"Results from {engine_name}")
    print(f"{'='*80}")
    
    if not results:
        print("No results found.")
        return
    
    for i, result in enumerate(results[:MAX_RESULTS], 1):
        print(f"\n{i}. {result.get('title', 'No title')}")
        print(f"   URL: {result.get('link', result.get('url', 'No URL'))}")
        snippet = result.get('snippet', result.get('content', result.get('description', 'No description')))
        print(f"   {snippet[:200]}..." if len(snippet) > 200 else f"   {snippet}")
    
    print(f"\n{'='*80}\n")

def time_search(func, *args, **kwargs):
    """Time a search function execution"""
    start = time.time()
    try:
        result = func(*args, **kwargs)
        elapsed = time.time() - start
        return result, elapsed, None
    except Exception as e:
        elapsed = time.time() - start
        return None, elapsed, str(e)

print("‚úì Helper functions defined")

## 1. DuckDuckGo Search (No API Key Required)

DuckDuckGo is privacy-focused and doesn't require an API key, making it great for testing.

In [None]:
print("Testing DuckDuckGo Search...\n")

ddg_search = DuckDuckGoSearchAPIWrapper(max_results=MAX_RESULTS)
results, elapsed, error = time_search(ddg_search.results, TEST_QUERY, MAX_RESULTS)

if error:
    print(f"‚ùå DuckDuckGo Error: {error}")
else:
    print(f"‚úì Completed in {elapsed:.2f}s")
    format_results(results, "DuckDuckGo")
    
    # Store for comparison
    ddg_results = results

## 2. Tavily Search (AI-Optimized)

Tavily is optimized for AI applications and LLM use cases.

In [None]:
if TAVILY_API_KEY:
    print("Testing Tavily Search...\n")
    
    tavily_search = TavilySearchAPIWrapper(tavily_api_key=TAVILY_API_KEY)
    results, elapsed, error = time_search(tavily_search.results, TEST_QUERY, MAX_RESULTS)
    
    if error:
        print(f"‚ùå Tavily Error: {error}")
    else:
        print(f"‚úì Completed in {elapsed:.2f}s")
        format_results(results, "Tavily")
        
        # Store for comparison
        tavily_results = results
else:
    print("‚ö† Tavily API key not set, skipping...")

## 3. Brave Search

Brave offers privacy-focused search with a generous free tier.

In [None]:
if BRAVE_API_KEY:
    print("Testing Brave Search...\n")
    
    brave_search = BraveSearchWrapper(api_key=BRAVE_API_KEY)
    results, elapsed, error = time_search(brave_search.run, TEST_QUERY)
    
    if error:
        print(f"‚ùå Brave Error: {error}")
    else:
        print(f"‚úì Completed in {elapsed:.2f}s")
        # Brave returns a string, not structured results
        print(f"\nBrave Search Results:\n{results[:500]}...")
else:
    print("‚ö† Brave API key not set, skipping...")

## 4. Google Search

Traditional Google search using Custom Search Engine API.

In [None]:
if GOOGLE_API_KEY and GOOGLE_CSE_ID:
    print("Testing Google Search...\n")
    
    google_search = GoogleSearchAPIWrapper(
        google_api_key=GOOGLE_API_KEY,
        google_cse_id=GOOGLE_CSE_ID
    )
    results, elapsed, error = time_search(google_search.results, TEST_QUERY, MAX_RESULTS)
    
    if error:
        print(f"‚ùå Google Error: {error}")
    else:
        print(f"‚úì Completed in {elapsed:.2f}s")
        format_results(results, "Google")
        
        # Store for comparison
        google_results = results
else:
    print("‚ö† Google API key or CSE ID not set, skipping...")

## 5. Bing Search

Microsoft Bing search using Azure Cognitive Services.

In [None]:
if BING_API_KEY:
    print("Testing Bing Search...\n")
    
    bing_search = BingSearchAPIWrapper(bing_subscription_key=BING_API_KEY)
    results, elapsed, error = time_search(bing_search.results, TEST_QUERY, MAX_RESULTS)
    
    if error:
        print(f"‚ùå Bing Error: {error}")
    else:
        print(f"‚úì Completed in {elapsed:.2f}s")
        format_results(results, "Bing")
        
        # Store for comparison
        bing_results = results
else:
    print("‚ö† Bing API key not set, skipping...")

## 6. SearxNG (Meta Search Engine)

SearxNG aggregates results from multiple search engines.

In [None]:
print("Testing SearxNG Search...\n")

try:
    searx_search = SearxSearchWrapper(searx_host=SEARX_HOST)
    results, elapsed, error = time_search(searx_search.results, TEST_QUERY, MAX_RESULTS)
    
    if error:
        print(f"‚ùå SearxNG Error: {error}")
    else:
        print(f"‚úì Completed in {elapsed:.2f}s")
        format_results(results, "SearxNG")
        
        # Store for comparison
        searx_results = results
except Exception as e:
    print(f"‚ö† SearxNG not available: {e}")

## Performance Comparison

Let's compare the performance and quality of different search engines.

In [None]:
# Collect metrics from all search engines
comparison_data = []

# Test each engine with timing
engines_to_test = [
    ("DuckDuckGo", lambda: DuckDuckGoSearchAPIWrapper(max_results=MAX_RESULTS).results(TEST_QUERY, MAX_RESULTS)),
]

if TAVILY_API_KEY:
    engines_to_test.append(
        ("Tavily", lambda: TavilySearchAPIWrapper(tavily_api_key=TAVILY_API_KEY).results(TEST_QUERY, MAX_RESULTS))
    )

print("\n" + "="*80)
print("PERFORMANCE COMPARISON")
print("="*80)
print(f"\n{'Engine':<20} {'Time (s)':<12} {'Results':<10} {'Status'}")
print("-" * 80)

for engine_name, search_func in engines_to_test:
    results, elapsed, error = time_search(search_func)
    
    if error:
        print(f"{engine_name:<20} {elapsed:<12.2f} {'0':<10} ‚ùå {error[:30]}")
        comparison_data.append({
            'engine': engine_name,
            'time': elapsed,
            'results': 0,
            'status': 'error',
            'error': error
        })
    else:
        num_results = len(results) if results else 0
        print(f"{engine_name:<20} {elapsed:<12.2f} {num_results:<10} ‚úì Success")
        comparison_data.append({
            'engine': engine_name,
            'time': elapsed,
            'results': num_results,
            'status': 'success'
        })

print("\n" + "="*80)

## Summary and Recommendations

In [None]:
print("\n" + "="*80)
print("SEARCH ENGINE COMPARISON SUMMARY")
print("="*80)

print("\nüìä Performance Metrics:")
successful = [d for d in comparison_data if d['status'] == 'success']
if successful:
    fastest = min(successful, key=lambda x: x['time'])
    print(f"  Fastest: {fastest['engine']} ({fastest['time']:.2f}s)")
    
    most_results = max(successful, key=lambda x: x['results'])
    print(f"  Most Results: {most_results['engine']} ({most_results['results']} results)")

print("\nüìù Recommendations:")
print("\n  1. DuckDuckGo:")
print("     ‚úì No API key required")
print("     ‚úì Privacy-focused")
print("     ‚úì Good for development/testing")
print("     ‚ö† Rate limits on free tier")

print("\n  2. Tavily:")
print("     ‚úì Optimized for AI/LLM applications")
print("     ‚úì Clean, structured results")
print("     ‚úì Includes answer generation")
print("     ‚ö† Requires API key")

print("\n  3. Brave:")
print("     ‚úì Privacy-focused")
print("     ‚úì Generous free tier")
print("     ‚ö† Requires API key")

print("\n  4. Google/Bing:")
print("     ‚úì High quality, comprehensive results")
print("     ‚ö† Requires API key + setup")
print("     ‚ö† Limited free tier")

print("\n  5. SearxNG:")
print("     ‚úì Meta-search (aggregates multiple engines)")
print("     ‚úì Self-hosted or use public instances")
print("     ‚ö† Requires running instance")

print("\nüéØ Best Choice for Your Use Case:")
print("  ‚Ä¢ Development/Testing: DuckDuckGo (no key needed)")
print("  ‚Ä¢ AI Applications: Tavily (optimized for LLMs)")
print("  ‚Ä¢ Privacy + Quality: Brave Search")
print("  ‚Ä¢ Enterprise/Production: Google or Bing (high quality)")
print("  ‚Ä¢ Maximum Coverage: SearxNG (meta-search)")

print("\n" + "="*80)

# Save comparison data
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_file = f"search_comparison_{timestamp}.json"
with open(output_file, 'w') as f:
    json.dump({
        'timestamp': timestamp,
        'test_query': TEST_QUERY,
        'max_results': MAX_RESULTS,
        'results': comparison_data
    }, f, indent=2)

print(f"\nüíæ Results saved to: {output_file}")

## Advanced: Custom Search with Multiple Engines

Here's an example of using multiple search engines together for better coverage.

In [None]:
def multi_engine_search(query: str, max_results: int = 5) -> Dict[str, Any]:
    """
    Search using multiple engines and aggregate results
    """
    all_results = {}
    
    # Always try DuckDuckGo (no API key needed)
    try:
        ddg = DuckDuckGoSearchAPIWrapper(max_results=max_results)
        all_results['duckduckgo'] = ddg.results(query, max_results)
    except Exception as e:
        all_results['duckduckgo'] = {'error': str(e)}
    
    # Try Tavily if API key available
    if TAVILY_API_KEY:
        try:
            tavily = TavilySearchAPIWrapper(tavily_api_key=TAVILY_API_KEY)
            all_results['tavily'] = tavily.results(query, max_results)
        except Exception as e:
            all_results['tavily'] = {'error': str(e)}
    
    return all_results

# Test multi-engine search
print("Testing multi-engine search...\n")
multi_results = multi_engine_search(TEST_QUERY, MAX_RESULTS)

print(f"Engines used: {list(multi_results.keys())}")
print(f"Total results: {sum(len(v) if isinstance(v, list) else 0 for v in multi_results.values())}")

for engine, results in multi_results.items():
    if isinstance(results, dict) and 'error' in results:
        print(f"\n{engine}: Error - {results['error']}")
    else:
        print(f"\n{engine}: {len(results)} results")

## Conclusion

This notebook demonstrated testing various search engines through LangChain.

**Key Takeaways:**
- DuckDuckGo is great for development (no API key)
- Tavily is optimized for AI applications
- Multiple engines can be combined for better coverage
- Each engine has trade-offs between cost, quality, and features

**Next Steps:**
1. Get API keys for the engines you want to use
2. Test with your specific queries
3. Measure performance and quality for your use case
4. Implement fallback logic for production systems