# 04 - Multi-LLM Integration

## Objectives
- Integration with Gemini, Claude, etc.
- LLM-as-judge implementation (LLM2 validates LLM1)
- Cross-model comparison and consensus
- Fallback mechanisms

## Expected Output
Multi-LLM orchestration patterns

In [None]:
import google.generativeai as genai

from src.core.llm_client import (
    get_gemini_api_key,
    setup_multi_llm_clients,
    implement_provider_fallback,
    create_judge_prompts,
    implement_cross_validation,
    implement_dual_llm_consensus,
    create_cost_optimizer,
    export_multi_llm_pipeline,
    run_system_validation
)

## Phase 1: Multi-Provider Setup

In [2]:
try:
    gemini_api_key = get_gemini_api_key(env_filename="../.env")
    genai.configure(api_key=gemini_api_key)
    print("[SUCCESS] Gemini API key loaded and configured successfully")
    print(f"[INFO] First 5 characters of API key: {gemini_api_key[:5]}*****")
except Exception as e:
    print(f"[ERROR] Failed to load or configure API key: {e}")
    gemini_api_key = None

clients = setup_multi_llm_clients()

[SUCCESS] Gemini API key loaded and configured successfully
[INFO] First 5 characters of API key: AIzaS*****
[INIT] Loading environment variables from: ../.env
[SUCCESS] OpenAI client initialized
[SETUP] Initializing Google Gemini client configuration
[SUCCESS] Gemini API key loaded and configured successfully
[INFO] First 5 characters of API key: AIzaS*****
[SETUP] Google Gemini client initialized successfully
[SUCCESS] Google Gemini client initialized
[COMPLETED] Multi-LLM setup finished with 2 providers


In [3]:
fallback_system = implement_provider_fallback(clients)
test_response = fallback_system["make_request"]("Generate a technical interview question about Python data structures.")
print(f"[TEST] Response from {test_response.provider.value}: {test_response.content[:100]}...")

[FALLBACK] Initializing fallback system with primary: openai
[CONFIG] Fallback order: ['openai', 'google']
[REQUEST] Starting request with fallback for prompt length: 69
[ATTEMPT] Provider openai (attempt 1)
[SUCCESS] openai responded in 4.47s
[TEST] Response from openai: **Question:**

You are given a list of integers that may contain duplicates. Write a Python function...


In [4]:
judge_prompts = create_judge_prompts()

test_question = "Explain the difference between a list and a tuple in Python, and provide use cases for each."
evaluation = implement_cross_validation(
    fallback_system,
    judge_prompts,
    test_question,
    "question_quality"
)
print(f"[EVALUATION] {evaluation['evaluation_type']}: {evaluation.get('evaluation_result', {}).get('overall_score', 'Failed')}/10")

[JUDGE] Creating evaluation prompts for LLM-as-judge system
[COMPLETED] Created 3 judge prompt templates
[CROSS-VALIDATION] Starting question_quality evaluation
[REQUEST] Starting request with fallback for prompt length: 733
[ATTEMPT] Provider openai (attempt 1)
[SUCCESS] openai responded in 4.90s
[SUCCESS] Cross-validation completed by openai
[SCORE] Overall score: 8
[EVALUATION] question_quality: 8/10


In [5]:
cost_optimizer = create_cost_optimizer()

test_prompt = "Create a challenging system design interview question about distributed databases."
consensus_result = implement_dual_llm_consensus(fallback_system, judge_prompts, test_prompt)
print(f"[RESULT] Winner: {consensus_result.get('selected_provider', 'N/A')} | Cost: ${consensus_result.get('total_cost', 0):.4f}")

[OPTIMIZER] Cost thresholds configured: {'openai_per_token': 0.0001, 'google_per_token': 0.0, 'max_cost_per_request': 0.1}
[CONSENSUS] Starting dual-LLM consensus generation
[GENERATE] Getting response from openai
[SUCCESS] openai generated response (518 tokens)
[GENERATE] Getting response from google


E0000 00:00:1760459692.385456 8738903 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.


[SUCCESS] google generated response (1190 tokens)
[CONSENSUS] Evaluating responses for best selection
[REQUEST] Starting request with fallback for prompt length: 584
[ATTEMPT] Provider openai (attempt 1)
[SUCCESS] openai responded in 0.65s
[JUDGE] Parsed result: {'winner': 'A', 'confidence': 8}
[CONSENSUS] Selected openai with confidence 8/10
[RESULT] Winner: openai | Cost: $0.0518


In [7]:
pipeline_export = export_multi_llm_pipeline()
validation_results = run_system_validation(fallback_system, judge_prompts)

print("\n" + "="*50)
print("MULTI-LLM INTEGRATION COMPLETED")
print("="*50)
print("OpenAI + Google Gemini 2.5 Flash integration")
print("LLM-as-judge validation system")
print("Automatic fallback and cost optimization")
print("Production-ready export package")
print("Ready for src/core/llm_client.py integration")

[EXPORT] Packaging multi-LLM system for production deployment
[SUCCESS] Pipeline exported with 6 components
[READY] System ready for src/ integration
[VALIDATION] Running final system validation
[TEST 1] Generate a Python coding interview quest...
[CONSENSUS] Starting dual-LLM consensus generation
[GENERATE] Getting response from openai
[SUCCESS] openai generated response (304 tokens)
[GENERATE] Getting response from google
[SUCCESS] google generated response (971 tokens)
[CONSENSUS] Evaluating responses for best selection
[REQUEST] Starting request with fallback for prompt length: 584
[ATTEMPT] Provider openai (attempt 1)
[SUCCESS] openai responded in 0.76s
[JUDGE] Parsed result: {'winner': 'A', 'confidence': 8}
[CONSENSUS] Selected openai with confidence 8/10
[RESULT] openai selected (confidence: 8/10)
[TEST 2] Create a system design question about da...
[CONSENSUS] Starting dual-LLM consensus generation
[GENERATE] Getting response from openai
[SUCCESS] openai generated response (514