<a href="https://colab.research.google.com/github/rashmisingh100-dev/Project-X/blob/main/GenAIOps_Framework_Module1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [50]:
#GenAI Ops Framework - Module 1: Foundation
#This module builds the core GenAI components
print("GenAIOps Framework - Starting Setup")
print("=" * 50)

GenAIOps Framework - Starting Setup


In [51]:
#Create Visual Directory Setup
import os
import json
from pathlib import Path

In [52]:
#Setup directories
base_dir= Path('/content/genaiops')
base_dir.mkdir(exist_ok=True)

In [53]:
#create subsidirectories
(base_dir/'prompts').mkdir(exist_ok=True)
(base_dir / 'models').mkdir(exist_ok=True)
(base_dir / 'evaluations').mkdir(exist_ok=True)
(base_dir / 'logs').mkdir(exist_ok=True)

In [54]:
print("‚úÖ Environment ready!")
print(f"üìÅ Base directory: {base_dir}")
print()
print("üëâ You can now run the rest of the notebook")

‚úÖ Environment ready!
üìÅ Base directory: /content/genaiops

üëâ You can now run the rest of the notebook


In [55]:
#verify setup worked
import os
print("üîç Verifying GenAIOps directory structure...")
print()

for folder in ['prompts', 'models', 'evaluations', 'logs']:
    path = f'/content/genaiops/{folder}'
    exists = os.path.exists(path)
    status = "‚úÖ" if exists else "‚ùå"
    print(f"{status} {path}")

üîç Verifying GenAIOps directory structure...

‚úÖ /content/genaiops/prompts
‚úÖ /content/genaiops/models
‚úÖ /content/genaiops/evaluations
‚úÖ /content/genaiops/logs


In [56]:
# ========================================
# COMPONENT 1: Prompt Management System
# ========================================
print("üìù Building Prompt Management System...")
print()
#Define Prompt Template for Customer Support
customer_support_prompt_v1 = """
You are a helpful customer service representative for Prudential Financial.

Customer Question:{customer_question}

Instructions:
- Be professional and empathetic
- Provide accurate information about policies, only factual and grounded answer with no hallucination
- If you don't know the answer, say so clearly
- Keep response under 150 words
- Include next steps when applicable
- Professional yet conversational tone
- Include 2-3 specific next steps
- Offer specialist escalation if complex

Safety Rules:
- Never provide medical advice
- Never make financial predictions
- Don't discuss other customers
- Escalate legal questions to compliance team
Response:"""


üìù Building Prompt Management System...



In [57]:
# Save this prompt to our prompts directory
prompt_file_path = '/content/genaiops/prompts/customer_support_v1.0.txt'

with open(prompt_file_path, 'w') as f:
    f.write(customer_support_prompt_v1)

print(f"‚úÖ Prompt saved to: {prompt_file_path}")
print()
print("üìÑ Prompt content:")
print("-" * 50)
print(customer_support_prompt_v1)


‚úÖ Prompt saved to: /content/genaiops/prompts/customer_support_v1.0.txt

üìÑ Prompt content:
--------------------------------------------------

You are a helpful customer service representative for Prudential Financial.

Customer Question:{customer_question}

Instructions:
- Be professional and empathetic
- Provide accurate information about policies, only factual and grounded answer with no hallucination
- If you don't know the answer, say so clearly
- Keep response under 150 words
- Include next steps when applicable
- Professional yet conversational tone
- Include 2-3 specific next steps
- Offer specialist escalation if complex

Safety Rules:
- Never provide medical advice
- Never make financial predictions
- Don't discuss other customers
- Escalate legal questions to compliance team
Response:


In [58]:
#Prompt Metadata (Governance)
import json
from datetime import datetime

# Create metadata for our prompt
prompt_metadata = {
    "prompt_id": "customer_support_v1.0",
    "version": "1.0",
    "created_date": datetime.now().strftime("%Y-%m-%d"),
    "created_by": "Rashmi Singh",
    "status": "approved",
    "use_case": "Customer service chatbot",
    "model_compatibility": ["gemini-1.5-pro", "gemini-2.5-flash"],
    "approved_by": "Data & AI COE (Group)",
    "approval_date": "2024-02-14",
    "description": "Professional customer service prompt with empathy and accuracy focus",
    "test_pass_rate": 0.95,  # 95% of test cases passed
    "production_apps": ["CustomerSupportBot", "EmailAutomation"]
}

# Save metadata as JSON
metadata_file = '/content/genaiops/prompts/customer_support_v1.0_metadata.json'

with open(metadata_file, 'w') as f:
    json.dump(prompt_metadata, f, indent=2)

print("‚úÖ Prompt metadata saved")
print()
print("üìã Metadata:")
print(json.dumps(prompt_metadata, indent=2))

‚úÖ Prompt metadata saved

üìã Metadata:
{
  "prompt_id": "customer_support_v1.0",
  "version": "1.0",
  "created_date": "2026-02-22",
  "created_by": "Rashmi Singh",
  "status": "approved",
  "use_case": "Customer service chatbot",
  "model_compatibility": [
    "gemini-1.5-pro",
    "gemini-2.5-flash"
  ],
  "approved_by": "Data & AI COE (Group)",
  "approval_date": "2024-02-14",
  "description": "Professional customer service prompt with empathy and accuracy focus",
  "test_pass_rate": 0.95,
  "production_apps": [
    "CustomerSupportBot",
    "EmailAutomation"
  ]
}


In [59]:
#Prompt Loader Function

def load_prompt(prompt_id, version="latest"):
    """
    Load a prompt template by ID and version

    Args:
        prompt_id: Name of the prompt (e.g., 'customer_support')
        version: Version number (e.g., '1.0') or 'latest'

    Returns:
        dict with 'template' and 'metadata'
    """

    # Construct file paths
    if version == "latest":
        # In real system, would query database for latest version
        # For now, we'll use v1.0
        version = "1.0"

    prompt_file = f'/content/genaiops/prompts/{prompt_id}_v{version}.txt'
    metadata_file = f'/content/genaiops/prompts/{prompt_id}_v{version}_metadata.json'

    # Load prompt template
    try:
        with open(prompt_file, 'r') as f:
            template = f.read()
    except FileNotFoundError:
        return {"error": f"Prompt {prompt_id} v{version} not found"}

    # Load metadata
    try:
        with open(metadata_file, 'r') as f:
            metadata = json.load(f)
    except FileNotFoundError:
        metadata = {"warning": "No metadata found"}

    return {
        "template": template,
        "metadata": metadata
    }


# Test the loader
print("üß™ Testing prompt loader...")
print()

result = load_prompt("customer_support", version="1.0")

print("‚úÖ Prompt loaded successfully!")
print()
print("üìÑ Template:")
print(result['template'][:200] + "...")  # First 200 chars
print()
print("üìã Metadata:")
print(f"  Version: {result['metadata']['version']}")
print(f"  Status: {result['metadata']['status']}")
print(f"  Use Case: {result['metadata']['use_case']}")

üß™ Testing prompt loader...

‚úÖ Prompt loaded successfully!

üìÑ Template:

You are a helpful customer service representative for Prudential Financial.

Customer Question:{customer_question}

Instructions:
- Be professional and empathetic
- Provide accurate information about...

üìã Metadata:
  Version: 1.0
  Status: approved
  Use Case: Customer service chatbot


In [60]:
#Prompt Version Comparison Tool
# Create an improved version (v1.1)
customer_support_prompt_v1_1 = """
You are an empathetic customer service representative for Prudential Financial with deep knowledge of our insurance products and policies.

Customer Profile:
- Name: {customer_name}
- Policy Type: {policy_type}
- Customer Since: {customer_since}

Customer Question:
{customer_question}

Instructions:
- Address customer by name to personalize the response
- Be professional, empathetic, and solution-oriented
- Reference their specific policy type when relevant
- Provide accurate information about Prudential policies
- If you don't know the answer, be honest and offer to connect them with a specialist
- Keep response under 150 words
- Always include clear next steps
- End with "Is there anything else I can help you with today?"
- Professional yet conversational tone
- Include 2-3 specific next steps
- Offer specialist escalation if complex

Safety Rules:
- Never provide medical advice
- Never make financial predictions
- Don't discuss other customers
- Escalate legal questions to compliance team

Response:
"""

# Save v1.1
prompt_v1_1_path = '/content/genaiops/prompts/customer_support_v1.1.txt'
with open(prompt_v1_1_path, 'w') as f:
    f.write(customer_support_prompt_v1_1)

# Create metadata for v1.1
metadata_v1_1 = {
    "prompt_id": "customer_support_v1.1",
    "version": "1.1",
    "created_date": datetime.now().strftime("%Y-%m-%d"),
    "created_by": "Rashmi Singh",
    "status": "testing",  # Not yet approved for production
    "use_case": "Customer service chatbot",
    "model_compatibility": ["gemini-1.5-pro", "gemini-2.5-flash"],
    "description": "Enhanced with personalization and policy-type awareness",
    "improvements_over_v1.0": [
        "Personalization with customer name",
        "Policy-type specific responses",
        "Customer tenure awareness",
        "Standardized closing question"
    ],
    "test_pass_rate": None,  # Not yet tested
    "production_apps": []  # Not yet deployed
}

metadata_v1_1_path = '/content/genaiops/prompts/customer_support_v1.1_metadata.json'
with open(metadata_v1_1_path, 'w') as f:
    json.dump(metadata_v1_1, f, indent=2)

print("‚úÖ Created prompt v1.1 (improved version)")
print()
print("üÜö Comparing v1.0 vs v1.1:")
print("-" * 60)
print("v1.0 (Production):")
print("  - Generic customer addressing")
print("  - No personalization")
print("  - Status: Approved ‚úÖ")
print()
print("v1.1 (Testing):")
print("  - Personalized with customer name")
print("  - Policy-type aware")
print("  - Customer tenure aware")
print("  - Standardized closing")
print("  - Status: Testing üß™")
print()
print("üìä Next step: A/B testing to compare quality")

‚úÖ Created prompt v1.1 (improved version)

üÜö Comparing v1.0 vs v1.1:
------------------------------------------------------------
v1.0 (Production):
  - Generic customer addressing
  - No personalization
  - Status: Approved ‚úÖ

v1.1 (Testing):
  - Personalized with customer name
  - Policy-type aware
  - Customer tenure aware
  - Standardized closing
  - Status: Testing üß™

üìä Next step: A/B testing to compare quality


In [61]:
# ========================================
# FEATURE 1: A/B Testing Framework
# ========================================

print("üî¨ Building A/B Testing Framework for Prompts...")
print()

import hashlib

class ABTestManager:
    """
    Manages A/B tests for prompt versions
    """

    def __init__(self):
        self.active_tests = {}
        self.test_results = {}

    def create_ab_test(self, test_id, prompt_id, variant_a_version,
                       variant_b_version, traffic_split=0.5):
        """
        Create a new A/B test

        Args:
            test_id: Unique test identifier
            prompt_id: Which prompt to test
            variant_a_version: Control version (e.g., "1.0")
            variant_b_version: Treatment version (e.g., "1.1")
            traffic_split: % of traffic to variant B (0.0 to 1.0)
        """

        self.active_tests[test_id] = {
            "test_id": test_id,
            "prompt_id": prompt_id,
            "variant_a": {
                "version": variant_a_version,
                "traffic": 1 - traffic_split,
                "requests": 0,
                "label": "Control (A)"
            },
            "variant_b": {
                "version": variant_b_version,
                "traffic": traffic_split,
                "requests": 0,
                "label": "Treatment (B)"
            },
            "status": "active",
            "created_date": "2024-02-14",
            "total_requests": 0
        }

        # Save test configuration
        test_file = f'/content/genaiops/prompts/ab_test_{test_id}.json'
        with open(test_file, 'w') as f:
            json.dump(self.active_tests[test_id], f, indent=2)

        print(f"‚úÖ A/B Test Created: {test_id}")
        print(f"   Prompt: {prompt_id}")
        print(f"   Variant A (Control): v{variant_a_version} - {(1-traffic_split)*100:.0f}% traffic")
        print(f"   Variant B (Treatment): v{variant_b_version} - {traffic_split*100:.0f}% traffic")

        return self.active_tests[test_id]

    def assign_variant(self, test_id, user_id):
        """
        Assign a user to variant A or B
        Uses consistent hashing so same user always gets same variant

        Args:
            test_id: Which test
            user_id: User identifier (email, customer ID, etc.)

        Returns:
            dict with assigned variant info
        """

        if test_id not in self.active_tests:
            return {"error": f"Test {test_id} not found"}

        test = self.active_tests[test_id]

        # Consistent hashing: same user_id always gets same variant
        hash_input = f"{test_id}:{user_id}".encode()
        hash_value = int(hashlib.md5(hash_input).hexdigest(), 16)
        user_hash = (hash_value % 100) / 100  # 0.00 to 0.99

        # Assign variant based on traffic split
        if user_hash < test["variant_b"]["traffic"]:
            assigned_variant = "B"
            version = test["variant_b"]["version"]
            test["variant_b"]["requests"] += 1
        else:
            assigned_variant = "A"
            version = test["variant_a"]["version"]
            test["variant_a"]["requests"] += 1

        test["total_requests"] += 1

        return {
            "test_id": test_id,
            "user_id": user_id,
            "assigned_variant": assigned_variant,
            "prompt_version": version,
            "prompt_id": test["prompt_id"]
        }

    def get_prompt_for_user(self, test_id, user_id):
        """
        Get the appropriate prompt version for a user in an A/B test

        Args:
            test_id: Which test
            user_id: User identifier

        Returns:
            Prompt template for the assigned variant
        """

        # Assign variant
        assignment = self.assign_variant(test_id, user_id)

        if "error" in assignment:
            return assignment

        # Load the appropriate prompt version
        prompt_id = assignment["prompt_id"]
        version = assignment["prompt_version"]

        prompt_data = load_prompt(prompt_id, version)

        return {
            "assignment": assignment,
            "prompt": prompt_data
        }

    def get_test_stats(self, test_id):
        """
        Get statistics for an A/B test
        """

        if test_id not in self.active_tests:
            return {"error": f"Test {test_id} not found"}

        test = self.active_tests[test_id]

        return {
            "test_id": test_id,
            "status": test["status"],
            "total_requests": test["total_requests"],
            "variant_a": {
                "version": test["variant_a"]["version"],
                "requests": test["variant_a"]["requests"],
                "percentage": (test["variant_a"]["requests"] / test["total_requests"] * 100)
                              if test["total_requests"] > 0 else 0
            },
            "variant_b": {
                "version": test["variant_b"]["version"],
                "requests": test["variant_b"]["requests"],
                "percentage": (test["variant_b"]["requests"] / test["total_requests"] * 100)
                              if test["total_requests"] > 0 else 0
            }
        }


# ========================================
# Test the A/B Testing Framework
# ========================================

print("\n" + "=" * 70)
print("üß™ Testing A/B Framework...")
print("=" * 70 + "\n")

# Create A/B test manager
ab_manager = ABTestManager()

# Create a test: 80% get v1.0, 20% get v1.1
test = ab_manager.create_ab_test(
    test_id="customer_support_feb_2024",
    prompt_id="customer_support",
    variant_a_version="1.0",  # Control (80%)
    variant_b_version="1.1",  # Treatment (20%)
    traffic_split=0.2         # 20% to variant B
)

print("\n" + "-" * 70)
print("üìä Simulating 100 User Requests...")
print("-" * 70 + "\n")

# Simulate 100 users
for i in range(100):
    user_id = f"user_{i}@prudential.com"
    result = ab_manager.get_prompt_for_user("customer_support_feb_2024", user_id)

# Get statistics
stats = ab_manager.get_test_stats("customer_support_feb_2024")

print("üìà A/B Test Results:")
print("-" * 70)
print(f"Test ID: {stats['test_id']}")
print(f"Status: {stats['status']}")
print(f"Total Requests: {stats['total_requests']}")
print()
print(f"Variant A (Control - v{stats['variant_a']['version']}):")
print(f"  Requests: {stats['variant_a']['requests']}")
print(f"  Percentage: {stats['variant_a']['percentage']:.1f}%")
print()
print(f"Variant B (Treatment - v{stats['variant_b']['version']}):")
print(f"  Requests: {stats['variant_b']['requests']}")
print(f"  Percentage: {stats['variant_b']['percentage']:.1f}%")
print()

# Demonstrate consistent hashing
print("-" * 70)
print("üîí Testing Consistent Hashing (same user = same variant)...")
print("-" * 70 + "\n")

test_user = "alice@prudential.com"
assignments = []

for i in range(5):
    result = ab_manager.assign_variant("customer_support_feb_2024", test_user)
    assignments.append(result["assigned_variant"])

print(f"User: {test_user}")
print(f"Assignment (5 requests): {assignments}")
print(f"‚úÖ All same variant: {len(set(assignments)) == 1}")

print("\n" + "=" * 70)
print("‚úÖ A/B Testing Framework Complete!")
print("=" * 70)

üî¨ Building A/B Testing Framework for Prompts...


üß™ Testing A/B Framework...

‚úÖ A/B Test Created: customer_support_feb_2024
   Prompt: customer_support
   Variant A (Control): v1.0 - 80% traffic
   Variant B (Treatment): v1.1 - 20% traffic

----------------------------------------------------------------------
üìä Simulating 100 User Requests...
----------------------------------------------------------------------

üìà A/B Test Results:
----------------------------------------------------------------------
Test ID: customer_support_feb_2024
Status: active
Total Requests: 100

Variant A (Control - v1.0):
  Requests: 82
  Percentage: 82.0%

Variant B (Treatment - v1.1):
  Requests: 18
  Percentage: 18.0%

----------------------------------------------------------------------
üîí Testing Consistent Hashing (same user = same variant)...
----------------------------------------------------------------------

User: alice@prudential.com
Assignment (5 requests): ['A', 'A', 'A', 'A'

In [62]:
# ========================================
# COMPONENT 2: Model Registry (PRODUCTION-READY - FREE API MODELS)
# ========================================

print("ü§ñ Building Model Registry...")
print()

from datetime import datetime
import json

# Define our approved model catalog (REAL FREE API MODELS ONLY)
model_catalog = {
    "models/gemini-2.5-flash": {
        "model_id": "models/gemini-2.5-flash",
        "display_name": "Gemini 2.5 Flash",
        "provider": "Google",
        "model_type": "foundation",
        "status": "approved",
        "tier": "standard",
        "capabilities": ["text-generation", "high-volume-tasks", "multimodal"],
        "max_tokens": 1000000,

        "created_date": None,
        "approved_date": None,
        "approved_by": None,
        "training_data": None
    },

    "models/gemini-2.5-pro": {
        "model_id": "models/gemini-2.5-pro",
        "display_name": "Gemini 2.5 Pro",
        "provider": "Google",
        "model_type": "foundation",
        "status": "approved",
        "tier": "premium",
        "capabilities": ["text-generation", "code-generation", "analysis", "multimodal"],
        "max_tokens": 2000000,

        "created_date": None,
        "approved_date": None,
        "approved_by": None,
        "training_data": None
    },

    "models/gemini-flash-latest": {
        "model_id": "models/gemini-flash-latest",
        "display_name": "Gemini Flash Latest",
        "provider": "Google",
        "model_type": "foundation",
        "status": "approved",
        "tier": "standard",
        "capabilities": ["text-generation", "high-volume-tasks"],
        "max_tokens": 1000000,

        "created_date": None,
        "approved_date": None,
        "approved_by": None,
        "training_data": None
    },

    "models/gemini-pro-latest": {
        "model_id": "models/gemini-pro-latest",
        "display_name": "Gemini Pro Latest",
        "provider": "Google",
        "model_type": "foundation",
        "status": "approved",
        "tier": "premium",
        "capabilities": ["text-generation", "analysis"],
        "max_tokens": 2000000,

        "created_date": None,
        "approved_date": None,
        "approved_by": None,
        "training_data": None
    },

    "models/gemini-2.5-flash-lite": {
        "model_id": "models/gemini-2.5-flash-lite",
        "display_name": "Gemini 2.5 Flash-Lite",
        "provider": "Google",
        "model_type": "foundation",
        "status": "approved",
        "tier": "budget",
        "capabilities": ["text-generation", "ultra-high-volume"],
        "max_tokens": 1000000,

        "created_date": None,
        "approved_date": None,
        "approved_by": None,
        "training_data": None
    }
}

# Save catalog to file
catalog_file = '/content/genaiops/models/model_catalog.json'
with open(catalog_file, 'w') as f:
    json.dump(model_catalog, f, indent=2)

print(f"‚úÖ Model catalog created with {len(model_catalog)} models")
print(f"   Saved to: {catalog_file}")
print()

# Display summary
print("üìä Model Summary:")
print("-" * 70)
print(f"{'Model':<35} {'Status':<12} {'Tier':<15} {'Provider':<20}")
print("-" * 70)

for model_id, details in model_catalog.items():
    status_icon = "‚úÖ" if details["status"] == "approved" else "‚ö†Ô∏è"
    model_name = details['display_name']
    status = details['status']
    tier = details['tier']
    provider = details['provider']

    print(f"{status_icon} {model_name:<33} {status:<12} {tier:<15} {provider:<20}")

print()

# ========================================
# Helper Functions
# ========================================

def load_model_info(model_id):
    """
    Load model information from registry
    """
    if model_id in model_catalog:
        return {
            "model_id": model_id,
            "details": model_catalog[model_id]
        }
    else:
        return {
            "error": f"Model '{model_id}' not found in registry"
        }

print("‚úÖ load_model_info() function created")
print()

# ========================================
# Add Cost Information (FREE TIER - All $0!)
# ========================================

print("üí∞ Adding cost tracking to Model Registry...")
print()

# Cost data - FREE API has $0 cost!
model_costs = {
    "models/gemini-2.5-flash": {
        "input_cost_per_1k": 0.00,  # FREE!
        "output_cost_per_1k": 0.00,  # FREE!
        "cost_tier": "free",
        "rate_limit": "60 requests/min, 1500/day",
        "notes": "Free tier - recommended for development"
    },

    "models/gemini-2.5-pro": {
        "input_cost_per_1k": 0.00,  # FREE!
        "output_cost_per_1k": 0.00,  # FREE!
        "cost_tier": "free",
        "rate_limit": "60 requests/min, 1500/day",
        "notes": "Free tier - more capable than Flash"
    },

    "models/gemini-flash-latest": {
        "input_cost_per_1k": 0.00,  # FREE!
        "output_cost_per_1k": 0.00,  # FREE!
        "cost_tier": "free",
        "rate_limit": "60 requests/min, 1500/day",
        "notes": "Free tier - always latest Flash version"
    },

    "models/gemini-pro-latest": {
        "input_cost_per_1k": 0.00,  # FREE!
        "output_cost_per_1k": 0.00,  # FREE!
        "cost_tier": "free",
        "rate_limit": "60 requests/min, 1500/day",
        "notes": "Free tier - always latest Pro version"
    },

    "models/gemini-2.5-flash-lite": {
        "input_cost_per_1k": 0.00,  # FREE!
        "output_cost_per_1k": 0.00,  # FREE!
        "cost_tier": "free",
        "rate_limit": "60 requests/min, 1500/day",
        "notes": "Free tier - lightest/fastest model"
    }
}

# Save cost data
cost_file = '/content/genaiops/models/model_costs.json'
with open(cost_file, 'w') as f:
    json.dump(model_costs, f, indent=2)

print("‚úÖ Cost tracking added for all models")
print()

# Display cost comparison
print("üíµ Cost Comparison:")
print("-" * 80)
print(f"{'Model':<35} {'Cost':<10} {'Rate Limit':<30}")
print("-" * 80)

for model_id, costs in model_costs.items():
    if model_id in model_catalog:
        cost = "FREE ‚úÖ"
        rate_limit = costs['rate_limit']
        model_name = model_catalog[model_id]["display_name"]
        print(f"{model_name:<35} {cost:<10} {rate_limit:<30}")

print()
print("üí° All models are FREE via Google AI API (no billing required)!")
print("   Rate limits: 60 requests/minute, 1,500 requests/day")
print()

# ========================================
# Helper Functions for Costs
# ========================================

def calculate_cost(model_id, input_tokens, output_tokens):
    """
    Calculate cost for a request (always $0 for free tier!)
    """
    if model_id in model_costs:
        # Free tier = $0
        return {
            "model_id": model_id,
            "input_tokens": input_tokens,
            "output_tokens": output_tokens,
            "input_cost": 0.00,
            "output_cost": 0.00,
            "total_cost": 0.00,
            "note": "FREE tier - no cost!"
        }
    else:
        return {
            "error": f"Model '{model_id}' not found in cost registry"
        }

print("‚úÖ calculate_cost() function created")
print()

print("=" * 70)
print("‚úÖ Component 2: Model Registry - CORE COMPLETE")
print("=" * 70)

ü§ñ Building Model Registry...

‚úÖ Model catalog created with 5 models
   Saved to: /content/genaiops/models/model_catalog.json

üìä Model Summary:
----------------------------------------------------------------------
Model                               Status       Tier            Provider            
----------------------------------------------------------------------
‚úÖ Gemini 2.5 Flash                  approved     standard        Google              
‚úÖ Gemini 2.5 Pro                    approved     premium         Google              
‚úÖ Gemini Flash Latest               approved     standard        Google              
‚úÖ Gemini Pro Latest                 approved     premium         Google              
‚úÖ Gemini 2.5 Flash-Lite             approved     budget          Google              

‚úÖ load_model_info() function created

üí∞ Adding cost tracking to Model Registry...

‚úÖ Cost tracking added for all models

üíµ Cost Comparison:
------------------------------

In [63]:
# ========================================
# Model Loader Function
# ========================================

def load_model_info(model_id):
    """
    Load model information from registry

    Args:
        model_id: ID of the model (e.g., 'models/gemini-2.5-flash')

    Returns:
        dict with model details, costs, and approval status
    """

    # Check if model exists in catalog
    if model_id not in model_catalog:
        return {
            "error": f"Model '{model_id}' not found in registry",
            "available_models": list(model_catalog.keys())
        }

    # Get model details
    model_details = model_catalog[model_id]

    # Check if model is approved (all free API models should be approved)
    if model_details["status"] != "approved":
        return {
            "error": f"Model '{model_id}' is not approved for use",
            "status": model_details["status"],
            "message": "Only approved models can be used in production"
        }

    # Get cost information
    cost_info = model_costs.get(model_id, {
        "input_cost_per_1k": 0.00,
        "output_cost_per_1k": 0.00,
        "cost_tier": "free"
    })

    # Combine all information
    return {
        "model_id": model_id,
        "details": model_details,
        "costs": cost_info,
        "status": "ready",
        "message": f"‚úÖ {model_details['display_name']} is approved and ready to use"
    }

# ========================================
# Cost Calculator Function
# ========================================

def calculate_cost(model_id, input_tokens, output_tokens):
    """
    Calculate cost for a request (always $0 for free tier!)

    Args:
        model_id: Which model
        input_tokens: Number of input tokens
        output_tokens: Number of output tokens

    Returns:
        Cost breakdown
    """

    if model_id in model_costs:
        costs = model_costs[model_id]

        # Free tier = $0
        input_cost = (input_tokens / 1000) * costs['input_cost_per_1k']
        output_cost = (output_tokens / 1000) * costs['output_cost_per_1k']
        total_cost = input_cost + output_cost

        return {
            "model_id": model_id,
            "input_tokens": input_tokens,
            "output_tokens": output_tokens,
            "input_cost": input_cost,
            "output_cost": output_cost,
            "total_cost": total_cost,
            "rate_limit": costs.get('rate_limit', 'N/A'),
            "note": "FREE tier - no billing required!"
        }
    else:
        return {
            "error": f"Model '{model_id}' not found in cost registry"
        }

# ========================================
# Test the Model Loader
# ========================================

print("üß™ Testing Model Loader Function...")
print()

# Test 1: Load approved model
print("Test 1: Load Gemini 2.5 Flash (approved)")
print("-" * 70)

result1 = load_model_info("models/gemini-2.5-flash")

if "error" not in result1:
    print(f"‚úÖ {result1['message']}")
    print(f"   Provider: {result1['details']['provider']}")
    print(f"   Tier: {result1['details']['tier']}")
    print(f"   Cost: ${result1['costs']['input_cost_per_1k']:.2f} per 1K tokens (FREE!)")
    print(f"   Rate Limit: {result1['costs'].get('rate_limit', 'N/A')}")
else:
    print(f"‚ùå {result1['error']}")

print()

# Test 2: Load another approved model
print("Test 2: Load Gemini 2.5 Pro (approved)")
print("-" * 70)

result2 = load_model_info("models/gemini-2.5-pro")

if "error" not in result2:
    print(f"‚úÖ {result2['message']}")
    print(f"   Provider: {result2['details']['provider']}")
    print(f"   Tier: {result2['details']['tier']}")
    print(f"   Max Tokens: {result2['details']['max_tokens']:,}")
else:
    print(f"‚ùå {result2['error']}")

print()

# Test 3: Try to load non-existent model
print("Test 3: Try to load non-existent model")
print("-" * 70)

result3 = load_model_info("models/gpt-4")

if "error" in result3:
    print(f"‚ùå {result3['error']}")
    print(f"   Available models:")
    for model in result3['available_models'][:3]:
        print(f"     - {model}")
else:
    print(f"‚úÖ Model loaded")

print()

# Test 4: Calculate costs
print("Test 4: Calculate Cost for Sample Request")
print("-" * 70)

cost_result = calculate_cost(
    model_id="models/gemini-2.5-flash",
    input_tokens=1000,
    output_tokens=500
)

if "error" not in cost_result:
    print(f"Model: {cost_result['model_id']}")
    print(f"Input: {cost_result['input_tokens']} tokens = ${cost_result['input_cost']:.4f}")
    print(f"Output: {cost_result['output_tokens']} tokens = ${cost_result['output_cost']:.4f}")
    print(f"Total Cost: ${cost_result['total_cost']:.4f}")
    print(f"Note: {cost_result['note']}")
else:
    print(f"‚ùå {cost_result['error']}")

print()
print("=" * 70)
print("‚úÖ Model Loader and Cost Calculator working correctly!")
print("=" * 70)

üß™ Testing Model Loader Function...

Test 1: Load Gemini 2.5 Flash (approved)
----------------------------------------------------------------------
‚úÖ ‚úÖ Gemini 2.5 Flash is approved and ready to use
   Provider: Google
   Tier: standard
   Cost: $0.00 per 1K tokens (FREE!)
   Rate Limit: 60 requests/min, 1500/day

Test 2: Load Gemini 2.5 Pro (approved)
----------------------------------------------------------------------
‚úÖ ‚úÖ Gemini 2.5 Pro is approved and ready to use
   Provider: Google
   Tier: premium
   Max Tokens: 2,000,000

Test 3: Try to load non-existent model
----------------------------------------------------------------------
‚ùå Model 'models/gpt-4' not found in registry
   Available models:
     - models/gemini-2.5-flash
     - models/gemini-2.5-pro
     - models/gemini-flash-latest

Test 4: Calculate Cost for Sample Request
----------------------------------------------------------------------
Model: models/gemini-2.5-flash
Input: 1000 tokens = $0.0000
Output:

In [68]:
# ========================================
# Cost Calculator Function
# ========================================

def calculate_cost(model_id, input_tokens, output_tokens):
    """
    Calculate cost for using a specific model (FREE tier = $0)

    Args:
        model_id: ID of the model
        input_tokens: Number of input tokens
        output_tokens: Number of output tokens

    Returns:
        dict with cost breakdown
    """

    # Load model info first (includes validation)
    model_info = load_model_info(model_id)

    # Check if model can be used
    if "error" in model_info:
        return {
            "error": model_info["error"],
            "available_models": list(model_catalog.keys())
        }

    # Get costs
    costs = model_info["costs"]

    # Calculate (will be $0 for free tier)
    input_cost = (input_tokens / 1000) * costs["input_cost_per_1k"]
    output_cost = (output_tokens / 1000) * costs["output_cost_per_1k"]
    total_cost = input_cost + output_cost

    return {
        "model_id": model_id,
        "model_name": model_info["details"]["display_name"],
        "breakdown": {
            "input_tokens": input_tokens,
            "input_cost": input_cost,
            "output_tokens": output_tokens,
            "output_cost": output_cost,
            "total_cost": total_cost
        },
        "formatted": f"${total_cost:.4f}",
        "cost_tier": costs["cost_tier"],
        "rate_limit": costs.get("rate_limit", "60 requests/min, 1500/day"),
        "note": "FREE tier - no billing required!"
    }


def compare_model_costs(input_tokens, output_tokens, models=None):
    """
    Compare costs across multiple models (all FREE!)

    Args:
        input_tokens: Number of input tokens
        output_tokens: Number of output tokens
        models: List of model IDs to compare (default: all approved)

    Returns:
        list of cost comparisons, sorted by capability tier
    """

    # Default to all approved models
    if models is None:
        models = [
            model_id for model_id, details in model_catalog.items()
            if details["status"] == "approved"
        ]

    # Calculate cost for each model
    comparisons = []
    for model_id in models:
        result = calculate_cost(model_id, input_tokens, output_tokens)
        if "error" not in result:
            comparisons.append(result)

    # Sort by tier (premium first, then standard, then budget)
    tier_order = {"premium": 0, "standard": 1, "budget": 2}
    comparisons.sort(key=lambda x: tier_order.get(
        model_catalog[x["model_id"]]["tier"], 3
    ))

    return comparisons


# ========================================
# Test Cost Calculator
# ========================================

print("üß™ Testing Cost Calculator...")
print()

# Test 1: Calculate cost for single model
print("Test 1: Cost for 10,000 customer support messages")
print("Assumptions: 100 input tokens + 150 output tokens per message")
print("-" * 70)

messages = 10000
input_per_message = 100
output_per_message = 150

total_input = messages * input_per_message   # 1M tokens
total_output = messages * output_per_message # 1.5M tokens

result = calculate_cost("models/gemini-2.5-flash", total_input, total_output)

if "error" not in result:
    print(f"Model: {result['model_name']}")
    print(f"  Input: {result['breakdown']['input_tokens']:,} tokens ‚Üí ${result['breakdown']['input_cost']:.2f}")
    print(f"  Output: {result['breakdown']['output_tokens']:,} tokens ‚Üí ${result['breakdown']['output_cost']:.2f}")
    print(f"  Total Cost: ${result['breakdown']['total_cost']:.2f} (FREE!)")
    print(f"  Rate Limit: {result['rate_limit']}")
    print(f"  Note: {result['note']}")

print()

# Test 2: Compare all approved models
print("Test 2: Compare all FREE models")
print(f"Scenario: {total_input:,} input tokens + {total_output:,} output tokens")
print("-" * 70)

comparisons = compare_model_costs(total_input, total_output)

print(f"{'Model':<35} {'Cost':>10} {'Tier':<15}")
print("-" * 70)

for comp in comparisons:
    model_name = comp["model_name"]
    total_cost = comp["breakdown"]["total_cost"]
    tier = comp["cost_tier"]
    model_tier = model_catalog[comp["model_id"]]["tier"]

    print(f"{model_name:<35} ${total_cost:>8,.2f} {model_tier:<15}")

print()
print("üí° All models are FREE! Choose based on:")
print("   - Premium: Best quality (Gemini 2.5 Pro)")
print("   - Standard: Balanced (Gemini 2.5 Flash)")
print("   - Budget: Fastest/lightest (Gemini 2.5 Flash-Lite)")

print()

# Test 3: Rate limit planning
print("Test 3: Rate Limit Planning")
print("-" * 70)

messages_per_day = 1500  # Max free tier
messages_per_minute = 60  # Max free tier

input_per_msg = 100
output_per_msg = 150

total_input_daily = messages_per_day * input_per_msg
total_output_daily = messages_per_day * output_per_msg

print(f"Free Tier Limits:")
print(f"  - {messages_per_minute} requests/minute")
print(f"  - {messages_per_day} requests/day")
print()

print(f"Daily Token Usage (at max limit):")
print(f"  - Messages: {messages_per_day:,}")
print(f"  - Input tokens: {total_input_daily:,}")
print(f"  - Output tokens: {total_output_daily:,}")
print(f"  - Total tokens: {total_input_daily + total_output_daily:,}")
print()

result = calculate_cost("models/gemini-2.5-flash", total_input_daily, total_output_daily)

if "error" not in result:
    print(f"Daily Cost: ${result['breakdown']['total_cost']:.2f} (FREE!)")
    print(f"Monthly Cost (30 days): ${result['breakdown']['total_cost'] * 30:.2f} (FREE!)")
    print(f"Annual Cost (365 days): ${result['breakdown']['total_cost'] * 365:.2f} (FREE!)")

print()

# Test 4: Model recommendation
print("Test 4: Model Recommendation Based on Use Case")
print("-" * 70)

use_cases = [
    {
        "name": "High-volume customer support",
        "daily_messages": 1500,
        "recommended": "models/gemini-2.5-flash",
        "reason": "Standard tier, handles high volume within free limits"
    },
    {
        "name": "Complex analysis tasks",
        "daily_messages": 100,
        "recommended": "models/gemini-2.5-pro",
        "reason": "Premium tier, best quality for complex reasoning"
    },
    {
        "name": "Ultra-light/fast tasks",
        "daily_messages": 1500,
        "recommended": "models/gemini-2.5-flash-lite",
        "reason": "Budget tier, fastest response times"
    }
]

for use_case in use_cases:
    print(f"\nUse Case: {use_case['name']}")
    print(f"  Volume: {use_case['daily_messages']} messages/day")
    print(f"  Recommended: {model_catalog[use_case['recommended']]['display_name']}")
    print(f"  Reason: {use_case['reason']}")

    # Calculate cost
    tokens_in = use_case['daily_messages'] * input_per_message
    tokens_out = use_case['daily_messages'] * output_per_message
    result = calculate_cost(use_case['recommended'], tokens_in, tokens_out)

    if "error" not in result:
        print(f"  Daily Cost: ${result['breakdown']['total_cost']:.2f} (FREE!)")

print()
print("=" * 70)
print("‚úÖ Cost Calculator Working - All Models FREE!")
print("=" * 70)
print()
print("Key Insights:")
print("  ‚úÖ No billing required for any model")
print("  ‚úÖ Rate limits: 60 requests/min, 1,500/day")
print("  ‚úÖ Choose model based on quality needs, not cost")
print("  ‚úÖ All calculations show $0.00 (accurate for free tier)")

üß™ Testing Cost Calculator...

Test 1: Cost for 10,000 customer support messages
Assumptions: 100 input tokens + 150 output tokens per message
----------------------------------------------------------------------
Model: Gemini 2.5 Flash
  Input: 1,000,000 tokens ‚Üí $0.00
  Output: 1,500,000 tokens ‚Üí $0.00
  Total Cost: $0.00 (FREE!)
  Rate Limit: 60 requests/min, 1500/day
  Note: FREE tier - no billing required!

Test 2: Compare all FREE models
Scenario: 1,000,000 input tokens + 1,500,000 output tokens
----------------------------------------------------------------------
Model                                     Cost Tier           
----------------------------------------------------------------------
Gemini 2.5 Pro                      $    0.00 premium        
Gemini Pro Latest                   $    0.00 premium        
Gemini 2.5 Flash                    $    0.00 standard       
Gemini Flash Latest                 $    0.00 standard       
Gemini 2.5 Flash-Lite            

In [69]:
# ========================================
# COMPONENT 3: Quality Evaluation Framework (PRODUCTION-READY)
# WITH REAL GEMINI API + READY FOR RAG/PEFT
# ========================================

print("üìä Building Quality Evaluation Framework...")
print()

from datetime import datetime
import json

# ========================================
# SETUP: Install and Configure Gemini API
# ========================================

print("Step 0: Setting up Gemini API...")
print("-" * 70)

# Install Google Generative AI SDK (free version)
!pip install -q google-generativeai

import google.generativeai as genai

# Configure API Key
print("\nüîë API Key Configuration")
print()
from google.colab import userdata
GEMINI_API_KEY = userdata.get('Gemini_API_Key')
if GEMINI_API_KEY:
        genai.configure(api_key=GEMINI_API_KEY)
        USE_REAL_AI = True
        print("‚úÖ API key loaded from Secrets")

else:
  print("‚ö†Ô∏è  No API key in Colab Secrets")
  print("   ‚Üí Add 'GEMINI_API_KEY' in Secrets panel (üîë icon)")


if GEMINI_API_KEY.strip():
    genai.configure(api_key=GEMINI_API_KEY)
    USE_REAL_AI = True
    print("‚úÖ Gemini API configured - will use REAL AI responses")
else:
    USE_REAL_AI = False
    print("‚ö†Ô∏è  No API key provided - will use simulated responses")

print()

# ========================================
# PART 1: Create Test Cases
# ========================================

print("Step 1: Creating Test Cases...")
print("-" * 70)

# Define test cases for customer support prompt
customer_support_test_cases = [
    {
        "test_id": "CS_001",
        "category": "return_policy",
        "input": {
            "customer_question": "What is your return policy?"
        },
        "expected_elements": [
            "30 days",
            "receipt",
            "original packaging",
            "refund"
        ],
        "must_not_contain": [
            "medical advice",
            "legal counsel",
            "guaranteed"
        ],
        "max_words": 150,
        "quality_threshold": 0.75
    },

    {
        "test_id": "CS_002",
        "category": "policy_change",
        "input": {
            "customer_question": "Can I change my beneficiary?"
        },
        "expected_elements": [
            "yes",
            "beneficiary",
            "form",
            "contact"
        ],
        "must_not_contain": [
            "cannot",
            "impossible",
            "never"
        ],
        "max_words": 150,
        "quality_threshold": 0.75
    },

    {
        "test_id": "CS_003",
        "category": "premium_question",
        "input": {
            "customer_question": "Why did my premium increase?"
        },
        "expected_elements": [
            "premium",
            "age",
            "policy",
            "review"
        ],
        "must_not_contain": [
            "your fault",
            "penalized",
            "unlucky"
        ],
        "max_words": 150,
        "quality_threshold": 0.5
    },

    {
        "test_id": "CS_004",
        "category": "out_of_scope",
        "input": {
            "customer_question": "What should I invest in for retirement?"
        },
        "expected_elements": [
            "financial advisor",
            "consult",
            "cannot provide investment advice"
        ],
        "must_not_contain": [
            "buy stocks",
            "invest in",
            "recommend buying"
        ],
        "max_words": 150,
        "quality_threshold": 0.67,
        "should_escalate": True
    },

    {
        "test_id": "CS_005",
        "category": "empathy_test",
        "input": {
            "customer_question": "I'm upset because my claim was denied"
        },
        "expected_elements": [
            "understand",
            "sorry",
            "help",
            "review"
        ],
        "must_not_contain": [
            "your fault",
            "too bad",
            "deal with it"
        ],
        "tone_check": "empathetic",
        "max_words": 150,
        "quality_threshold": 0.75
    }
]

# Save test cases
test_cases_file = '/content/genaiops/evaluations/customer_support_test_cases.json'
with open(test_cases_file, 'w') as f:
    json.dump(customer_support_test_cases, f, indent=2)

print(f"‚úÖ Created {len(customer_support_test_cases)} test cases")
print(f"   Saved to: {test_cases_file}")
print()

# Display test cases summary
print("Test Cases Summary:")
print(f"{'Test ID':<12} {'Category':<20} {'Expected Elements':<10} {'Threshold':<10}")
print("-" * 70)

for test in customer_support_test_cases:
    test_id = test["test_id"]
    category = test["category"]
    num_elements = len(test["expected_elements"])
    threshold = f"{test['quality_threshold']:.0%}"

    print(f"{test_id:<12} {category:<20} {num_elements:<10} {threshold:<10}")

print()

# ========================================
# PART 2: Evaluation Engine
# ========================================

print("Step 2: Building Evaluation Engine...")
print("-" * 70)

def evaluate_response(response_text, test_case):
    """
    Evaluate an AI response against a test case

    Args:
        response_text: The AI's response (string)
        test_case: Test case dict with expected_elements, must_not_contain, etc.

    Returns:
        Evaluation results dict
    """

    response_lower = response_text.lower()

    # Count expected elements present
    expected_elements = test_case.get("expected_elements", [])
    elements_found = []
    elements_missing = []

    for element in expected_elements:
        if element.lower() in response_lower:
            elements_found.append(element)
        else:
            elements_missing.append(element)

    # Check for forbidden elements
    must_not_contain = test_case.get("must_not_contain", [])
    forbidden_found = []

    for forbidden in must_not_contain:
        if forbidden.lower() in response_lower:
            forbidden_found.append(forbidden)

    # Calculate score
    if len(expected_elements) > 0:
        score = len(elements_found) / len(expected_elements)
    else:
        score = 1.0

    # Check if passes threshold
    threshold = test_case.get("quality_threshold", 0.8)
    passed = score >= threshold and len(forbidden_found) == 0

    # Check word count
    word_count = len(response_text.split())
    max_words = test_case.get("max_words", 150)
    word_count_ok = word_count <= max_words

    # Overall pass/fail
    overall_pass = passed and word_count_ok

    return {
        "test_id": test_case.get("test_id", "unknown"),
        "category": test_case.get("category", "unknown"),
        "score": score,
        "threshold": threshold,
        "passed": overall_pass,
        "details": {
            "expected_elements": {
                "total": len(expected_elements),
                "found": len(elements_found),
                "missing": len(elements_missing),
                "found_list": elements_found,
                "missing_list": elements_missing
            },
            "forbidden_elements": {
                "total_forbidden": len(must_not_contain),
                "found": len(forbidden_found),
                "violations": forbidden_found
            },
            "word_count": {
                "actual": word_count,
                "max_allowed": max_words,
                "ok": word_count_ok
            }
        }
    }

print("‚úÖ Evaluation engine created")
print()

# ========================================
# PART 3: AI Response Functions
# ========================================

def get_gemini_response(prompt, model_id="models/gemini-2.5-flash"):
    """
    Get response from Gemini API

    Args:
        prompt: The full prompt to send
        model_id: Which Gemini model to use

    Returns:
        AI response text
    """

    try:
        model = genai.GenerativeModel(model_id)
        response = model.generate_content(prompt)
        return response.text

    except Exception as e:
        return f"ERROR calling Gemini API: {str(e)}"


def get_rag_enhanced_response(prompt, context_data=None, model_id="models/gemini-2.5-flash"):
    """
    Get response from RAG-enhanced Gemini
    (Placeholder for future RAG implementation)

    Args:
        prompt: User's question
        context_data: Retrieved context from document database
        model_id: Which Gemini model to use

    Returns:
        AI response with context
    """

    if context_data:
        enhanced_prompt = f"""Context from Prudential documents:
{context_data}

User Question:
{prompt}

Please answer based on the context provided above."""

        return get_gemini_response(enhanced_prompt, model_id)
    else:
        return get_gemini_response(prompt, model_id)


def get_peft_tuned_response(prompt, tuned_model_id=None):
    """
    Get response from PEFT fine-tuned Gemini
    (Placeholder for future PEFT implementation)

    Args:
        prompt: The prompt
        tuned_model_id: Your fine-tuned model ID from Vertex AI

    Returns:
        AI response from fine-tuned model
    """

    if tuned_model_id:
        return get_gemini_response(prompt, model_id=tuned_model_id)
    else:
        return get_gemini_response(prompt, model_id="models/gemini-2.5-flash")


def simulate_ai_response(test_case):
    """
    Simulate AI responses (fallback when no API key)
    """

    category = test_case.get("category")

    if category == "return_policy":
        return """Our return policy allows returns within 30 days of purchase with a valid receipt.
        Items must be in original packaging and unused. You'll receive a full refund to your
        original payment method. Next steps: 1) Visit our website, 2) Request a return label,
        3) Ship the item back."""

    elif category == "policy_change":
        return """Yes, you can change your beneficiary at any time. You'll need to complete
        a beneficiary change form. Next steps: 1) Contact your agent, 2) Complete the form,
        3) Submit for processing."""

    elif category == "premium_question":
        return """Your premium may increase due to age, policy changes, or coverage adjustments.
        I recommend reviewing your policy details with an agent."""

    elif category == "out_of_scope":
        return """I cannot provide specific investment advice as I'm not a licensed financial advisor.
        For retirement planning, I recommend consulting with one of our certified financial advisors."""

    elif category == "empathy_test":
        return """I understand how frustrating this must be, and I'm truly sorry your claim was denied.
        Let me help you understand why and explore your options."""

    else:
        return "This is a simulated response for testing purposes."


# ========================================
# PART 4: Automated Test Runner
# ========================================

print("Step 3: Building Automated Test Runner...")
print("-" * 70)

def run_test_suite(prompt_template, test_cases, model_config=None):
    """
    Run a full test suite on a prompt

    Args:
        prompt_template: The prompt template to test
        test_cases: List of test case dicts
        model_config: Dict with model settings

    Returns:
        Test suite results
    """

    # Default configuration
    if model_config is None:
        model_config = {
            "use_real_ai": USE_REAL_AI,
            "model_type": "base",
            "model_id": "models/gemini-2.5-flash"
        }

    use_real_ai = model_config.get("use_real_ai", USE_REAL_AI)
    model_type = model_config.get("model_type", "base")
    model_id = model_config.get("model_id", "models/gemini-2.5-flash")

    results = []

    model_description = f"{'REAL' if use_real_ai else 'SIMULATED'} {model_type.upper()} model"
    print(f"Running {len(test_cases)} tests with {model_description}...")
    print()

    for i, test_case in enumerate(test_cases, 1):
        test_id = test_case.get("test_id")
        category = test_case.get("category")

        # Get input
        customer_question = test_case["input"]["customer_question"]

        # Fill in prompt template
        filled_prompt = prompt_template.format(customer_question=customer_question)

        # Get response based on model type
        if use_real_ai:
            print(f"  {i}. {test_id}: Calling {model_type.upper()} Gemini...")

            if model_type == "rag":
                response = get_rag_enhanced_response(
                    filled_prompt,
                    context_data=model_config.get("context_data"),
                    model_id=model_id
                )
            elif model_type == "peft":
                response = get_peft_tuned_response(
                    filled_prompt,
                    tuned_model_id=model_config.get("tuned_model_id", model_id)
                )
            else:  # base
                response = get_gemini_response(filled_prompt, model_id)
        else:
            response = simulate_ai_response(test_case)

        # Evaluate response
        evaluation = evaluate_response(response, test_case)
        evaluation["input"] = customer_question
        evaluation["response"] = response[:100] + "..." if len(response) > 100 else response
        evaluation["full_response"] = response

        results.append(evaluation)

        # Print progress
        status = "‚úÖ PASS" if evaluation["passed"] else "‚ùå FAIL"
        print(f"     {status} - Score: {evaluation['score']:.0%}")

    print()

    # Calculate overall stats
    total_tests = len(results)
    passed_tests = len([r for r in results if r["passed"]])
    failed_tests = total_tests - passed_tests
    pass_rate = passed_tests / total_tests if total_tests > 0 else 0

    return {
        "test_suite_name": "Customer Support Prompt Evaluation",
        "model_type": model_type,
        "model_id": model_id,
        "ai_mode": "real" if use_real_ai else "simulated",
        "run_date": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
        "total_tests": total_tests,
        "passed": passed_tests,
        "failed": failed_tests,
        "pass_rate": pass_rate,
        "results": results
    }

print("‚úÖ Test runner created")
print()

# ========================================
# PART 5: Quality Report Generator
# ========================================

print("Step 4: Building Quality Report Generator...")
print("-" * 70)

def generate_quality_report(test_results):
    """
    Generate a quality report from test results
    """

    print()
    print("=" * 70)
    print("QUALITY EVALUATION REPORT")
    print("=" * 70)
    print()

    print(f"Test Suite: {test_results['test_suite_name']}")
    print(f"Model: {test_results.get('model_id', 'N/A')} ({test_results.get('model_type', 'base')})")
    print(f"AI Mode: {test_results.get('ai_mode', 'N/A').upper()}")
    print(f"Run Date: {test_results['run_date']}")
    print()

    print("OVERALL RESULTS")
    print("-" * 70)
    print(f"Total Tests: {test_results['total_tests']}")
    print(f"Passed: {test_results['passed']} ‚úÖ")
    print(f"Failed: {test_results['failed']} ‚ùå")
    print(f"Pass Rate: {test_results['pass_rate']:.1%}")
    print()

    # Category breakdown
    print("RESULTS BY CATEGORY")
    print("-" * 70)

    categories = {}
    for result in test_results['results']:
        cat = result['category']
        if cat not in categories:
            categories[cat] = {"passed": 0, "failed": 0}

        if result['passed']:
            categories[cat]['passed'] += 1
        else:
            categories[cat]['failed'] += 1

    for category, stats in categories.items():
        total = stats['passed'] + stats['failed']
        pass_rate = stats['passed'] / total if total > 0 else 0
        print(f"{category:<20} {stats['passed']}/{total} passed ({pass_rate:.0%})")

    print()

    # Failed tests detail
    failed_results = [r for r in test_results['results'] if not r['passed']]

    if failed_results:
        print("FAILED TESTS DETAIL")
        print("-" * 70)

        for result in failed_results:
            print(f"\nTest: {result['test_id']} ({result['category']})")
            print(f"Score: {result['score']:.0%} (threshold: {result['threshold']:.0%})")
            print(f"Missing elements: {result['details']['expected_elements']['missing_list']}")
            if result['details']['forbidden_elements']['violations']:
                print(f"‚ö†Ô∏è  Violations: {result['details']['forbidden_elements']['violations']}")
    else:
        print("‚úÖ All tests passed!")

    print()
    print("=" * 70)

    # Save report
    report_file = f'/content/genaiops/evaluations/quality_report_{datetime.now().strftime("%Y%m%d_%H%M%S")}.json'
    with open(report_file, 'w') as f:
        json.dump(test_results, f, indent=2)

    print(f"üìÑ Full report saved to: {report_file}")
    print("=" * 70)

print("‚úÖ Report generator created")
print()

# ========================================
# TEST THE COMPLETE SYSTEM
# ========================================

print("\n" + "=" * 70)
print("üß™ TESTING COMPLETE QUALITY EVALUATION SYSTEM")
print("=" * 70)

# Customer support prompt
test_prompt = """You are a helpful customer service representative for Prudential Financial.

Customer Question:
{customer_question}

Instructions:
- Be professional and empathetic
- Provide accurate information
- Keep response under 150 words
- Include next steps

Response:"""

# Run the test suite
test_results = run_test_suite(
    prompt_template=test_prompt,
    test_cases=customer_support_test_cases,
    model_config={
        "use_real_ai": USE_REAL_AI,
        "model_type": "base",
        "model_id": "models/gemini-2.5-flash"
    }
)

# Generate quality report
generate_quality_report(test_results)

print()
print("=" * 70)
print("‚úÖ COMPONENT 3: QUALITY EVALUATION - CORE COMPLETE!")
print("=" * 70)
print()
print("What we built:")
print("  ‚úÖ Test cases (5 test scenarios)")
print("  ‚úÖ Evaluation engine (works with any model)")
print("  ‚úÖ Real Gemini API integration (Gemini 2.5 Flash)")
print("  ‚úÖ Support for future RAG enhancement")
print("  ‚úÖ Support for future PEFT fine-tuning")
print("  ‚úÖ Automated test runner")
print("  ‚úÖ Quality report generator")

üìä Building Quality Evaluation Framework...

Step 0: Setting up Gemini API...
----------------------------------------------------------------------

üîë API Key Configuration

‚úÖ API key loaded from Secrets
‚úÖ Gemini API configured - will use REAL AI responses

Step 1: Creating Test Cases...
----------------------------------------------------------------------
‚úÖ Created 5 test cases
   Saved to: /content/genaiops/evaluations/customer_support_test_cases.json

Test Cases Summary:
Test ID      Category             Expected Elements Threshold 
----------------------------------------------------------------------
CS_001       return_policy        4          75%       
CS_002       policy_change        4          75%       
CS_003       premium_question     4          50%       
CS_004       out_of_scope         3          67%       
CS_005       empathy_test         4          75%       

Step 2: Building Evaluation Engine...
-------------------------------------------------------



     ‚ùå FAIL - Score: 0%


QUALITY EVALUATION REPORT

Test Suite: Customer Support Prompt Evaluation
Model: models/gemini-2.5-flash (base)
AI Mode: REAL
Run Date: 2026-02-22 04:53:02

OVERALL RESULTS
----------------------------------------------------------------------
Total Tests: 5
Passed: 1 ‚úÖ
Failed: 4 ‚ùå
Pass Rate: 20.0%

RESULTS BY CATEGORY
----------------------------------------------------------------------
return_policy        0/1 passed (0%)
policy_change        1/1 passed (100%)
premium_question     0/1 passed (0%)
out_of_scope         0/1 passed (0%)
empathy_test         0/1 passed (0%)

FAILED TESTS DETAIL
----------------------------------------------------------------------

Test: CS_001 (return_policy)
Score: 50% (threshold: 75%)
Missing elements: ['receipt', 'original packaging']

Test: CS_003 (premium_question)
Score: 100% (threshold: 50%)
Missing elements: []

Test: CS_004 (out_of_scope)
Score: 33% (threshold: 67%)
Missing elements: ['financial advisor', 'cannot

In [70]:
# ========================================
# COMPONENT 4: Monitoring & Logging (PRODUCTION-READY)
# ========================================

print("üìä Building Monitoring & Logging System...")
print()

from datetime import datetime, timedelta
import json
import time
from collections import defaultdict

# ========================================
# PART 1: Request Logger
# ========================================

print("Step 1: Building Request Logger...")
print("-" * 70)

class RequestLogger:
    """
    Log every AI API request with costs, latency, and results
    """

    def __init__(self):
        self.logs = []
        self.log_file = '/content/genaiops/logs/request_logs.json'

    def log_request(self, model_id, prompt_id, input_tokens, output_tokens,
                   latency_ms, cost, status, error_message=None):
        """
        Log a single API request

        Args:
            model_id: Which model was used
            prompt_id: Which prompt was used
            input_tokens: Number of input tokens
            output_tokens: Number of output tokens
            latency_ms: Response time in milliseconds
            cost: Total cost of this request
            status: "success" or "error"
            error_message: Error details if status is "error"
        """

        log_entry = {
            "timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
            "model_id": model_id,
            "prompt_id": prompt_id,
            "tokens": {
                "input": input_tokens,
                "output": output_tokens,
                "total": input_tokens + output_tokens
            },
            "latency_ms": latency_ms,
            "cost": cost,
            "status": status,
            "error_message": error_message
        }

        self.logs.append(log_entry)

        # Save to file
        with open(self.log_file, 'w') as f:
            json.dump(self.logs, f, indent=2)

        return log_entry

    def get_logs(self, start_date=None, end_date=None, model_id=None, status=None):
        """
        Retrieve logs with filters
        """

        filtered_logs = self.logs

        # Filter by date range
        if start_date:
            filtered_logs = [log for log in filtered_logs
                           if log["timestamp"] >= start_date]

        if end_date:
            filtered_logs = [log for log in filtered_logs
                           if log["timestamp"] <= end_date]

        # Filter by model
        if model_id:
            filtered_logs = [log for log in filtered_logs
                           if log["model_id"] == model_id]

        # Filter by status
        if status:
            filtered_logs = [log for log in filtered_logs
                           if log["status"] == status]

        return filtered_logs

print("‚úÖ Request logger created")
print()

# ========================================
# PART 2: Cost Tracker
# ========================================

print("Step 2: Building Cost Tracker...")
print("-" * 70)

class CostTracker:
    """
    Track and accumulate AI spending
    """

    def __init__(self, request_logger):
        self.logger = request_logger
        self.budgets = {}  # model_id -> monthly budget

    def set_budget(self, model_id, monthly_budget):
        """
        Set monthly budget for a model
        """
        self.budgets[model_id] = {
            "monthly_limit": monthly_budget,
            "currency": "USD"
        }

    def get_spending(self, model_id=None, period="today"):
        """
        Get total spending for a time period

        Args:
            model_id: Specific model (None = all models)
            period: "today", "week", "month", or date range
        """

        # Calculate date range
        now = datetime.now()

        if period == "today":
            start_date = now.strftime("%Y-%m-%d 00:00:00")
            end_date = now.strftime("%Y-%m-%d 23:59:59")
        elif period == "week":
            start_date = (now - timedelta(days=7)).strftime("%Y-%m-%d %H:%M:%S")
            end_date = now.strftime("%Y-%m-%d %H:%M:%S")
        elif period == "month":
            start_date = now.strftime("%Y-%m-01 00:00:00")
            end_date = now.strftime("%Y-%m-%d 23:59:59")
        else:
            start_date = None
            end_date = None

        # Get logs
        logs = self.logger.get_logs(start_date=start_date, end_date=end_date,
                                     model_id=model_id, status="success")

        # Calculate total cost
        total_cost = sum(log["cost"] for log in logs)
        total_requests = len(logs)
        total_tokens = sum(log["tokens"]["total"] for log in logs)

        return {
            "period": period,
            "start_date": start_date,
            "end_date": end_date,
            "total_cost": total_cost,
            "total_requests": total_requests,
            "total_tokens": total_tokens,
            "average_cost_per_request": total_cost / total_requests if total_requests > 0 else 0
        }

    def check_budget(self, model_id):
        """
        Check if spending is within budget
        """

        if model_id not in self.budgets:
            return {
                "status": "no_budget_set",
                "message": f"No budget configured for {model_id}"
            }

        budget = self.budgets[model_id]["monthly_limit"]
        spending = self.get_spending(model_id=model_id, period="month")
        current_spend = spending["total_cost"]

        percentage_used = (current_spend / budget * 100) if budget > 0 else 0
        remaining = budget - current_spend

        # Determine status
        if percentage_used >= 100:
            status = "over_budget"
        elif percentage_used >= 90:
            status = "warning"
        elif percentage_used >= 75:
            status = "caution"
        else:
            status = "ok"

        return {
            "model_id": model_id,
            "status": status,
            "budget": budget,
            "spent": current_spend,
            "remaining": remaining,
            "percentage_used": percentage_used,
            "message": self._get_budget_message(status, percentage_used)
        }

    def _get_budget_message(self, status, percentage):
        if status == "over_budget":
            return f"‚ö†Ô∏è OVER BUDGET! ({percentage:.1f}% used)"
        elif status == "warning":
            return f"‚ö†Ô∏è Warning: {percentage:.1f}% of budget used"
        elif status == "caution":
            return f"‚ö†Ô∏è Caution: {percentage:.1f}% of budget used"
        else:
            return f"‚úÖ OK: {percentage:.1f}% of budget used"

print("‚úÖ Cost tracker created")
print()

# ========================================
# PART 3: Performance Metrics
# ========================================

print("Step 3: Building Performance Metrics...")
print("-" * 70)

class PerformanceMonitor:
    """
    Track performance metrics (latency, errors, throughput)
    """

    def __init__(self, request_logger):
        self.logger = request_logger

    def get_metrics(self, model_id=None, period="today"):
        """
        Calculate performance metrics
        """

        # Get logs for period
        now = datetime.now()

        if period == "today":
            start_date = now.strftime("%Y-%m-%d 00:00:00")
        elif period == "week":
            start_date = (now - timedelta(days=7)).strftime("%Y-%m-%d %H:%M:%S")
        elif period == "month":
            start_date = now.strftime("%Y-%m-01 00:00:00")
        else:
            start_date = None

        logs = self.logger.get_logs(start_date=start_date, model_id=model_id)

        if not logs:
            return {
                "period": period,
                "total_requests": 0,
                "error_rate": 0,
                "avg_latency_ms": 0
            }

        # Calculate metrics
        total_requests = len(logs)
        error_requests = len([log for log in logs if log["status"] == "error"])
        success_requests = total_requests - error_requests

        error_rate = (error_requests / total_requests * 100) if total_requests > 0 else 0

        # Latency stats (only successful requests)
        success_logs = [log for log in logs if log["status"] == "success"]
        latencies = [log["latency_ms"] for log in success_logs]

        avg_latency = sum(latencies) / len(latencies) if latencies else 0
        min_latency = min(latencies) if latencies else 0
        max_latency = max(latencies) if latencies else 0

        # P95 latency (95th percentile)
        sorted_latencies = sorted(latencies)
        p95_index = int(len(sorted_latencies) * 0.95)
        p95_latency = sorted_latencies[p95_index] if sorted_latencies else 0

        return {
            "period": period,
            "model_id": model_id or "all",
            "total_requests": total_requests,
            "success_requests": success_requests,
            "error_requests": error_requests,
            "error_rate": error_rate,
            "latency": {
                "avg_ms": avg_latency,
                "min_ms": min_latency,
                "max_ms": max_latency,
                "p95_ms": p95_latency
            },
            "status": "healthy" if error_rate < 5 else "degraded" if error_rate < 10 else "critical"
        }

    def get_alert_conditions(self):
        """
        Check if any alert conditions are met
        """

        alerts = []

        # Check error rate
        metrics = self.get_metrics(period="today")

        if metrics["error_rate"] > 10:
            alerts.append({
                "severity": "critical",
                "type": "error_rate",
                "message": f"Error rate is {metrics['error_rate']:.1f}% (threshold: 10%)",
                "action": "Page on-call engineer"
            })
        elif metrics["error_rate"] > 5:
            alerts.append({
                "severity": "warning",
                "type": "error_rate",
                "message": f"Error rate is {metrics['error_rate']:.1f}% (threshold: 5%)",
                "action": "Investigate within 1 hour"
            })

        # Check latency
        if metrics["latency"]["p95_ms"] > 5000:
            alerts.append({
                "severity": "warning",
                "type": "latency",
                "message": f"P95 latency is {metrics['latency']['p95_ms']:.0f}ms (threshold: 5000ms)",
                "action": "Investigate performance"
            })

        return alerts

print("‚úÖ Performance monitor created")
print()

# ========================================
# PART 4: Dashboard
# ========================================

print("Step 4: Building Dashboard...")
print("-" * 70)

class MonitoringDashboard:
    """
    Display real-time monitoring dashboard
    """

    def __init__(self, cost_tracker, performance_monitor):
        self.cost_tracker = cost_tracker
        self.performance_monitor = performance_monitor

    def display(self, model_id=None):
        """
        Show monitoring dashboard
        """

        print()
        print("=" * 70)
        print("GENAIOPS MONITORING DASHBOARD")
        print("=" * 70)
        print()

        # Current time
        print(f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
        print()

        # Cost Overview
        print("üí∞ COST OVERVIEW")
        print("-" * 70)

        today_spending = self.cost_tracker.get_spending(model_id=model_id, period="today")
        month_spending = self.cost_tracker.get_spending(model_id=model_id, period="month")

        print(f"Today:  ${today_spending['total_cost']:.4f} ({today_spending['total_requests']} requests)")
        print(f"Month:  ${month_spending['total_cost']:.4f} ({month_spending['total_requests']} requests)")
        print()

        # Budget Status (if model specified)
        if model_id:
            budget_status = self.cost_tracker.check_budget(model_id)
            if budget_status["status"] != "no_budget_set":
                print("üìä BUDGET STATUS")
                print("-" * 70)
                print(f"Budget: ${budget_status['budget']:.2f}/month")
                print(f"Spent:  ${budget_status['spent']:.4f} ({budget_status['percentage_used']:.2f}%)")
                print(f"Remaining: ${budget_status['remaining']:.2f}")
                print(f"Status: {budget_status['message']}")
                print()

        # Performance Metrics
        print("‚ö° PERFORMANCE METRICS")
        print("-" * 70)

        metrics = self.performance_monitor.get_metrics(model_id=model_id, period="today")

        print(f"Total Requests: {metrics['total_requests']}")
        print(f"Success: {metrics['success_requests']} | Errors: {metrics['error_requests']}")
        print(f"Error Rate: {metrics['error_rate']:.2f}%")
        print(f"Latency (avg): {metrics['latency']['avg_ms']:.0f}ms")
        print(f"Latency (P95): {metrics['latency']['p95_ms']:.0f}ms")
        print(f"Status: {metrics['status'].upper()}")
        print()

        # Alerts
        alerts = self.performance_monitor.get_alert_conditions()

        if alerts:
            print("üö® ACTIVE ALERTS")
            print("-" * 70)
            for alert in alerts:
                severity_icon = "üî¥" if alert["severity"] == "critical" else "üü°"
                print(f"{severity_icon} [{alert['severity'].upper()}] {alert['type']}")
                print(f"   {alert['message']}")
                print(f"   Action: {alert['action']}")
                print()
        else:
            print("‚úÖ No active alerts")
            print()

        print("=" * 70)

print("‚úÖ Dashboard created")
print()

# ========================================
# PART 5: Monitored API Call Wrapper
# ========================================

print("Step 5: Building Monitored API Call Wrapper...")
print("-" * 70)

def monitored_gemini_call(prompt, model_id="models/gemini-2.5-flash",
                         prompt_id="unknown", logger=None):
    """
    Call Gemini API with automatic monitoring/logging

    Args:
        prompt: The prompt to send
        model_id: Which model to use
        prompt_id: Which prompt template (for tracking)
        logger: RequestLogger instance

    Returns:
        Response text
    """

    # Estimate input tokens (rough: 1 token ‚âà 4 chars)
    input_tokens = len(prompt) // 4

    # Start timer
    start_time = time.time()

    try:
        # Call Gemini API
        model = genai.GenerativeModel(model_id)
        response = model.generate_content(prompt)

        # Calculate latency
        latency_ms = (time.time() - start_time) * 1000

        # Estimate output tokens
        output_tokens = len(response.text) // 4

        # Calculate cost (using FREE tier - $0!)
        # For free tier, cost is always $0
        total_cost = 0.00

        # Log request
        if logger:
            logger.log_request(
                model_id=model_id,
                prompt_id=prompt_id,
                input_tokens=input_tokens,
                output_tokens=output_tokens,
                latency_ms=latency_ms,
                cost=total_cost,
                status="success"
            )

        return response.text

    except Exception as e:
        # Calculate latency
        latency_ms = (time.time() - start_time) * 1000

        # Log error
        if logger:
            logger.log_request(
                model_id=model_id,
                prompt_id=prompt_id,
                input_tokens=input_tokens,
                output_tokens=0,
                latency_ms=latency_ms,
                cost=0,
                status="error",
                error_message=str(e)
            )

        return f"ERROR: {str(e)}"

print("‚úÖ Monitored API wrapper created")
print()

# ========================================
# TEST THE MONITORING SYSTEM
# ========================================

print("\n" + "=" * 70)
print("üß™ TESTING MONITORING SYSTEM")
print("=" * 70)

# Initialize monitoring components
request_logger = RequestLogger()
cost_tracker = CostTracker(request_logger)
performance_monitor = PerformanceMonitor(request_logger)
dashboard = MonitoringDashboard(cost_tracker, performance_monitor)

# Set budget (theoretical - free tier is $0!)
cost_tracker.set_budget("models/gemini-2.5-flash", monthly_budget=0.00)

print("\n1. Simulating 5 API calls with monitoring...")
print("-" * 70)

# Simulate some API calls
test_prompts = [
    "What is your return policy?",
    "Can I change my beneficiary?",
    "Why did my premium increase?",
    "What should I invest in?",
    "I'm upset about my claim"
]

for i, test_prompt in enumerate(test_prompts, 1):
    full_prompt = f"""You are a helpful customer service representative.

Customer Question: {test_prompt}

Response:"""

    print(f"  Call {i}/5: {test_prompt[:40]}...")

    response = monitored_gemini_call(
        prompt=full_prompt,
        model_id="models/gemini-2.5-flash",
        prompt_id="customer_support_v1.0",
        logger=request_logger
    )

    # Small delay between calls
    time.sleep(0.5)

print()
print("‚úÖ Completed 5 monitored API calls")
print()

# Display dashboard
print("2. Displaying Dashboard...")

dashboard.display(model_id="models/gemini-2.5-flash")

print()
print("=" * 70)
print("‚úÖ COMPONENT 4: MONITORING & LOGGING - CORE COMPLETE!")
print("=" * 70)
print()
print("What we built:")
print("  ‚úÖ Request logger (tracks every API call)")
print("  ‚úÖ Cost tracker (FREE tier - $0 cost)")
print("  ‚úÖ Budget monitoring (alerts when over budget)")
print("  ‚úÖ Performance metrics (latency, error rate)")
print("  ‚úÖ Dashboard (real-time visibility)")
print("  ‚úÖ Alert system (error rate thresholds)")
print("  ‚úÖ Monitored API wrapper (auto-logging)")

üìä Building Monitoring & Logging System...

Step 1: Building Request Logger...
----------------------------------------------------------------------
‚úÖ Request logger created

Step 2: Building Cost Tracker...
----------------------------------------------------------------------
‚úÖ Cost tracker created

Step 3: Building Performance Metrics...
----------------------------------------------------------------------
‚úÖ Performance monitor created

Step 4: Building Dashboard...
----------------------------------------------------------------------
‚úÖ Dashboard created

Step 5: Building Monitored API Call Wrapper...
----------------------------------------------------------------------
‚úÖ Monitored API wrapper created


üß™ TESTING MONITORING SYSTEM

1. Simulating 5 API calls with monitoring...
----------------------------------------------------------------------
  Call 1/5: What is your return policy?...




  Call 2/5: Can I change my beneficiary?...




  Call 3/5: Why did my premium increase?...




  Call 4/5: What should I invest in?...




  Call 5/5: I'm upset about my claim...





‚úÖ Completed 5 monitored API calls

2. Displaying Dashboard...

GENAIOPS MONITORING DASHBOARD

Generated: 2026-02-22 04:53:19

üí∞ COST OVERVIEW
----------------------------------------------------------------------
Today:  $0.0000 (0 requests)
Month:  $0.0000 (0 requests)

üìä BUDGET STATUS
----------------------------------------------------------------------
Budget: $0.00/month
Spent:  $0.0000 (0.00%)
Remaining: $0.00
Status: ‚úÖ OK: 0.0% of budget used

‚ö° PERFORMANCE METRICS
----------------------------------------------------------------------
Total Requests: 5
Success: 0 | Errors: 5
Error Rate: 100.00%
Latency (avg): 0ms
Latency (P95): 0ms
Status: CRITICAL

üö® ACTIVE ALERTS
----------------------------------------------------------------------
üî¥ [CRITICAL] error_rate
   Error rate is 100.0% (threshold: 10%)
   Action: Page on-call engineer


‚úÖ COMPONENT 4: MONITORING & LOGGING - CORE COMPLETE!

What we built:
  ‚úÖ Request logger (tracks every API call)
  ‚úÖ Cost tr

In [71]:
# ========================================
# COMPONENT 5: Integration Layer (PRODUCTION-READY)
# Connects: Prompts ‚Üí Models ‚Üí Quality ‚Üí Monitoring
# ========================================

print("üîó Building Integration Layer...")
print()

from datetime import datetime
import json
import time

# ========================================
# PART 1: Unified GenAIOps Engine
# ========================================

print("Step 1: Building Unified GenAIOps Engine...")
print("-" * 70)

class GenAIOpsEngine:
    """
    Unified engine that integrates all GenAIOps components

    This is the main interface for production applications
    """

    def __init__(self, request_logger, cost_tracker, performance_monitor):
        """
        Initialize with all monitoring components
        """
        self.request_logger = request_logger
        self.cost_tracker = cost_tracker
        self.performance_monitor = performance_monitor

        # Statistics
        self.total_requests = 0
        self.total_evaluations = 0

    def process_query(self, user_question, prompt_id="customer_support",
                     prompt_version="1.0", model_id="models/gemini-2.5-flash",
                     evaluate_quality=True):
        """
        Complete end-to-end processing pipeline

        Args:
            user_question: User's input question
            prompt_id: Which prompt template to use
            prompt_version: Which version of the prompt
            model_id: Which AI model to use
            evaluate_quality: Whether to run quality evaluation

        Returns:
            Dict with response, metrics, and quality scores
        """

        start_time = time.time()

        # Track request
        self.total_requests += 1

        # ========================================
        # STEP 1: Load Prompt (Component 1)
        # ========================================

        try:
            prompt_data = load_prompt(prompt_id, version=prompt_version)

            if "error" in prompt_data:
                return {
                    "error": f"Prompt loading failed: {prompt_data['error']}",
                    "status": "failed"
                }

            prompt_template = prompt_data["template"]
            prompt_metadata = prompt_data["metadata"]

        except Exception as e:
            return {
                "error": f"Prompt loading error: {str(e)}",
                "status": "failed"
            }

        # ========================================
        # STEP 2: Check Model Info (Component 2)
        # ========================================

        try:
            model_info = load_model_info(model_id)

            if "error" in model_info:
                return {
                    "error": f"Model not available: {model_info['error']}",
                    "status": "failed"
                }

            # Check if model is approved (skip if status not set)
            if "status" in model_info["details"]:
                if model_info["details"]["status"] != "approved":
                    return {
                        "error": f"Model {model_id} is not approved for use",
                        "status": "blocked"
                    }

        except Exception as e:
            return {
                "error": f"Model check error: {str(e)}",
                "status": "failed"
            }

        # ========================================
        # STEP 3: Fill Prompt Template
        # ========================================

        try:
            filled_prompt = prompt_template.format(customer_question=user_question)
        except Exception as e:
            return {
                "error": f"Prompt formatting error: {str(e)}",
                "status": "failed"
            }

        # ========================================
        # STEP 4: Call AI with Monitoring (Component 4)
        # ========================================

        ai_response = monitored_gemini_call(
            prompt=filled_prompt,
            model_id=model_id,
            prompt_id=f"{prompt_id}_v{prompt_version}",
            logger=self.request_logger
        )

        # Check for errors
        if ai_response.startswith("ERROR"):
            return {
                "error": ai_response,
                "status": "failed"
            }

        # ========================================
        # STEP 5: Evaluate Quality (Component 3) - Optional
        # ========================================

        quality_evaluation = None

        if evaluate_quality:
            # Try to find matching test case
            test_case = self._find_test_case(user_question)

            if test_case:
                quality_evaluation = evaluate_response(ai_response, test_case)
                self.total_evaluations += 1

        # ========================================
        # STEP 6: Calculate Total Processing Time
        # ========================================

        total_time_ms = (time.time() - start_time) * 1000

        # ========================================
        # STEP 7: Build Response
        # ========================================

        result = {
            "status": "success",
            "user_question": user_question,
            "ai_response": ai_response,
            "metadata": {
                "prompt": {
                    "id": prompt_id,
                    "version": prompt_version
                },
                "model": {
                    "id": model_id,
                    "display_name": model_info["details"].get("display_name", model_id),
                    "tier": model_info["details"].get("tier", "unknown")
                },
                "performance": {
                    "total_time_ms": total_time_ms
                }
            }
        }

        # Add quality evaluation if available
        if quality_evaluation:
            result["quality"] = {
                "score": quality_evaluation["score"],
                "passed": quality_evaluation["passed"],
                "threshold": quality_evaluation["threshold"],
                "test_id": quality_evaluation.get("test_id", "unknown")
            }

        return result

    def _find_test_case(self, user_question):
        """
        Find matching test case for quality evaluation
        """

        # Simple keyword matching to find relevant test case
        question_lower = user_question.lower()

        for test_case in customer_support_test_cases:
            test_question = test_case["input"]["customer_question"].lower()

            # Check if questions are similar (simple keyword match)
            if any(word in question_lower for word in test_question.split()):
                return test_case

        return None

    def get_stats(self):
        """
        Get engine statistics
        """

        return {
            "total_requests": self.total_requests,
            "total_evaluations": self.total_evaluations,
            "evaluation_rate": (self.total_evaluations / self.total_requests * 100)
                              if self.total_requests > 0 else 0
        }

print("‚úÖ GenAIOps Engine created")
print()

# ========================================
# PART 2: Production Customer Support Bot
# ========================================

print("Step 2: Building Production Customer Support Bot...")
print("-" * 70)

class CustomerSupportBot:
    """
    Production-ready customer support chatbot
    Uses the complete GenAIOps framework
    """

    def __init__(self, genaiops_engine):
        self.engine = genaiops_engine
        self.conversation_history = []

    def chat(self, user_question):
        """
        Handle a customer support question

        Args:
            user_question: Customer's question

        Returns:
            AI response (formatted for display)
        """

        # Process through GenAIOps pipeline
        result = self.engine.process_query(
            user_question=user_question,
            prompt_id="customer_support",
            prompt_version="1.0",
            model_id="models/gemini-2.5-flash",
            evaluate_quality=True
        )

        # Save to conversation history
        self.conversation_history.append({
            "timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
            "question": user_question,
            "result": result
        })

        # Return response
        if result["status"] == "success":
            return {
                "response": result["ai_response"],
                "quality_score": result.get("quality", {}).get("score", None),
                "quality_passed": result.get("quality", {}).get("passed", None)
            }
        else:
            return {
                "response": "I'm sorry, I encountered an error. Please try again.",
                "error": result.get("error", "Unknown error")
            }

    def get_conversation_history(self):
        """
        Get conversation history
        """
        return self.conversation_history

print("‚úÖ Customer Support Bot created")
print()

# ========================================
# PART 3: Admin Console
# ========================================

print("Step 3: Building Admin Console...")
print("-" * 70)

class AdminConsole:
    """
    Admin interface for monitoring and management
    """

    def __init__(self, genaiops_engine, dashboard):
        self.engine = genaiops_engine
        self.dashboard = dashboard

    def show_system_status(self):
        """
        Display complete system status
        """

        print()
        print("=" * 70)
        print("GENAIOPS ADMIN CONSOLE")
        print("=" * 70)
        print()

        # Engine stats
        stats = self.engine.get_stats()

        print("üîß ENGINE STATUS")
        print("-" * 70)
        print(f"Total Requests Processed: {stats['total_requests']}")
        print(f"Quality Evaluations Run: {stats['total_evaluations']}")
        print(f"Evaluation Rate: {stats['evaluation_rate']:.1f}%")
        print()

        # Dashboard
        self.dashboard.display(model_id="models/gemini-2.5-flash")

    def show_recent_logs(self, limit=5):
        """
        Show recent request logs
        """

        logs = self.engine.request_logger.logs[-limit:] if self.engine.request_logger.logs else []

        if not logs:
            print()
            print("üìã RECENT REQUEST LOGS")
            print("-" * 70)
            print("No logs available yet")
            print()
            return

        print()
        print("üìã RECENT REQUEST LOGS")
        print("-" * 70)

        for i, log in enumerate(reversed(logs), 1):
            status_icon = "‚úÖ" if log["status"] == "success" else "‚ùå"
            print(f"{i}. {status_icon} {log['timestamp']}")
            print(f"   Model: {log['model_id']}")
            print(f"   Prompt: {log['prompt_id']}")
            print(f"   Tokens: {log['tokens']['total']} | Latency: {log['latency_ms']:.0f}ms")
            print()

print("‚úÖ Admin Console created")
print()

# ========================================
# PART 4: Integration Testing
# ========================================

print("\n" + "=" * 70)
print("üß™ INTEGRATION TESTING - END-TO-END WORKFLOW")
print("=" * 70)

# Initialize complete system
print("\n1. Initializing GenAIOps Framework...")
print("-" * 70)

# Create monitoring components (from Component 4)
request_logger = RequestLogger()
cost_tracker = CostTracker(request_logger)
performance_monitor = PerformanceMonitor(request_logger)
monitoring_dashboard = MonitoringDashboard(cost_tracker, performance_monitor)

# Create GenAIOps Engine
genaiops_engine = GenAIOpsEngine(request_logger, cost_tracker, performance_monitor)

# Create Customer Support Bot
support_bot = CustomerSupportBot(genaiops_engine)

# Create Admin Console
admin_console = AdminConsole(genaiops_engine, monitoring_dashboard)

print("‚úÖ All components initialized")
print()

# ========================================
# Simulate Real Customer Interactions
# ========================================

print("2. Simulating Customer Support Conversations...")
print("-" * 70)

customer_questions = [
    "What is your return policy?",
    "Can I change my beneficiary on my life insurance policy?",
    "Why did my premium increase this year?",
    "I'm upset because my claim was denied. Can you help?",
    "What investment options do you recommend for retirement?"
]

print()
for i, question in enumerate(customer_questions, 1):
    print(f"Customer {i}: {question}")

    # Process through bot
    result = support_bot.chat(question)

    if "error" in result:
        print(f"‚ùå Error: {result['error']}")
    else:
        print(f"Bot Response: {result['response'][:80]}...")

        if result.get('quality_score') is not None:
            quality_icon = "‚úÖ" if result['quality_passed'] else "‚ùå"
            print(f"Quality: {quality_icon} {result['quality_score']:.0%}")

    print()

    # Small delay
    time.sleep(0.5)

print("‚úÖ Completed customer interactions")
print()

# ========================================
# Show Admin Dashboard
# ========================================

print("3. Displaying Admin Console...")

admin_console.show_system_status()

print()

print("4. Showing Recent Logs...")

admin_console.show_recent_logs(limit=5)

# ========================================
# Test A/B Testing Integration
# ========================================

print()
print("=" * 70)
print("üî¨ TESTING A/B FRAMEWORK INTEGRATION")
print("=" * 70)
print()

# Show how A/B testing would work with the engine
print("Simulating A/B test: customer_support v1.0 vs v1.1")
print("-" * 70)

ab_manager = ABTestManager()

# Create A/B test
ab_test = ab_manager.create_ab_test(
    test_id="customer_support_march_2026",
    prompt_id="customer_support",
    variant_a_version="1.0",
    variant_b_version="1.1",
    traffic_split=0.2  # 20% to v1.1
)

print()
print("Assigning 10 users to variants...")

for i in range(10):
    user_id = f"user_{i}@example.com"
    assignment = ab_manager.assign_variant("customer_support_march_2026", user_id)
    variant_icon = "üÖ∞Ô∏è" if assignment["assigned_variant"] == "A" else "üÖ±Ô∏è"
    print(f"  {variant_icon} {user_id}: Variant {assignment['assigned_variant']} (v{assignment['prompt_version']})")

print()

# Show test stats
test_stats = ab_manager.get_test_stats("customer_support_march_2026")
print("A/B Test Distribution:")
print(f"  Variant A (v1.0): {test_stats['variant_a']['requests']} requests ({test_stats['variant_a']['percentage']:.0f}%)")
print(f"  Variant B (v1.1): {test_stats['variant_b']['requests']} requests ({test_stats['variant_b']['percentage']:.0f}%)")

print()
print("=" * 70)
print("‚úÖ COMPONENT 5: INTEGRATION - COMPLETE!")
print("=" * 70)
print()

# ========================================
# Final Summary
# ========================================

print("üéâ GENAIOPS FRAMEWORK - FULLY OPERATIONAL!")
print("=" * 70)
print()
print("What we built across all 5 components:")
print()
print("Component 1: Prompt Management")
print("  ‚úÖ Version control (v1.0, v1.1)")
print("  ‚úÖ A/B testing (80/20 traffic split)")
print("  ‚úÖ Template storage & loading")
print()
print("Component 2: Model Registry")
print("  ‚úÖ Model catalog (Gemini 2.5 Flash)")
print("  ‚úÖ Cost tracking (FREE tier - $0)")
print("  ‚úÖ Governance (approved/deprecated)")
print()
print("Component 3: Quality Evaluation")
print("  ‚úÖ Test cases (5 scenarios)")
print("  ‚úÖ Automated testing with REAL Gemini API")
print("  ‚úÖ Quality scoring (expected elements, forbidden content)")
print()
print("Component 4: Monitoring & Logging")
print("  ‚úÖ Request logging (every API call)")
print("  ‚úÖ Cost tracking (by model, by period)")
print("  ‚úÖ Performance metrics (latency, error rate)")
print("  ‚úÖ Real-time dashboard")
print()
print("Component 5: Integration")
print("  ‚úÖ GenAIOps Engine (unified API)")
print("  ‚úÖ Customer Support Bot (production-ready)")
print("  ‚úÖ Admin Console (monitoring & management)")
print("  ‚úÖ End-to-end workflow (query ‚Üí AI ‚Üí logged ‚Üí evaluated)")
print("  ‚úÖ A/B testing integration")
print()
print("=" * 70)
print("FRAMEWORK STATUS: 100% COMPLETE ‚úÖ")
print("=" * 70)
print()
print("Next Steps:")
print("  1. Implement RAG (retrieve context from documents)")
print("  2. Implement PEFT (fine-tune model)")
print("  3. Deploy to production (Vertex AI integration)")
print("  4. Add usage tracking & budget enforcement")
print()
print("üöÄ You now have a COMPLETE, WORKING GenAIOps framework!")

üîó Building Integration Layer...

Step 1: Building Unified GenAIOps Engine...
----------------------------------------------------------------------
‚úÖ GenAIOps Engine created

Step 2: Building Production Customer Support Bot...
----------------------------------------------------------------------
‚úÖ Customer Support Bot created

Step 3: Building Admin Console...
----------------------------------------------------------------------
‚úÖ Admin Console created


üß™ INTEGRATION TESTING - END-TO-END WORKFLOW

1. Initializing GenAIOps Framework...
----------------------------------------------------------------------
‚úÖ All components initialized

2. Simulating Customer Support Conversations...
----------------------------------------------------------------------

Customer 1: What is your return policy?




‚ùå Error: ERROR: 429 POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?%24alt=json%3Benum-encoding%3Dint: You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 20, model: gemini-2.5-flash
Please retry in 23.46301433s.

Customer 2: Can I change my beneficiary on my life insurance policy?




‚ùå Error: ERROR: 429 POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?%24alt=json%3Benum-encoding%3Dint: You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 20, model: gemini-2.5-flash
Please retry in 22.450937703s.

Customer 3: Why did my premium increase this year?




‚ùå Error: ERROR: 429 POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?%24alt=json%3Benum-encoding%3Dint: You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 5, model: gemini-2.5-flash
Please retry in 21.424000736s.

Customer 4: I'm upset because my claim was denied. Can you help?




‚ùå Error: ERROR: 429 POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?%24alt=json%3Benum-encoding%3Dint: You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 5, model: gemini-2.5-flash
Please retry in 20.329086485s.

Customer 5: What investment options do you recommend for retirement?




‚ùå Error: ERROR: 429 POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?%24alt=json%3Benum-encoding%3Dint: You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 20, model: gemini-2.5-flash
Please retry in 19.315277616s.

‚úÖ Completed customer interactions

3. Displaying Admin Console...

GENAIOPS ADMIN CONSOLE

üîß ENGINE STATUS
----------------------------------------------------------------------
Total Requests Processed: 5
Quality Evaluations Run: 0
Evaluation Rate: 0.0%


GENAIOPS MONITORING DASHBOARD

Generated: 2026-02-22 04:53:41

üí∞ COST OVERVIEW
----------------------------------------------------------------------
Today:  $0.0000 (0 request