# 🦉 Trustwise SDK Demo

This notebook provides a comprehensive demonstration of the Trustwise SDK's capabilities for evaluating AI-generated content. We'll cover everything from basic setup to advanced features and best practices.

## 1. Installation and Environment Setup

First, let's set up our environment and install the necessary packages.

### Installation

In [1]:
# Install the latest version of Trustwise SDK
# !pip install trustwise

### Environment Setup

In [2]:
# Import required packages
import os
from dotenv import load_dotenv
from trustwise.sdk import TrustwiseSDK
from trustwise.sdk.config import TrustwiseConfig

# Load environment variables from .env file
load_dotenv()

# Verify API key is set
api_key = os.environ.get("TW_API_KEY")
assert api_key is not None, "TW_API_KEY is not set in environment variables"

## 2. SDK Configuration and Initialization

The Trustwise SDK offers flexible configuration options. Let's explore different ways to initialize it.

In [3]:
# Method 1: Using environment variable (recommended)
config = TrustwiseConfig()  # Automatically uses TW_API_KEY from environment
trustwise = TrustwiseSDK(config)

# Method 2: Direct initialization with API key
config_direct = TrustwiseConfig(api_key=os.environ["TW_API_KEY"])
trustwise_direct = TrustwiseSDK(config_direct)

# Method 3: Custom configuration with specific base URL
config_custom = TrustwiseConfig(
    api_key=os.environ["TW_API_KEY"],
    base_url="https://api.trustwise.ai"
)
trustwise_custom = TrustwiseSDK(config_custom)

## 3. Understanding API Versioning

The SDK uses a path-based versioning system that makes it easy to work with different API versions. Let's explore this feature.

In [None]:
# Example context for our evaluations
context = [{
    "node_text": "Paris is the capital of France. It is known for the Eiffel Tower and the Louvre Museum.",
    "node_score": 0.95,
    "node_id": "doc:idx:1"
}]

# Using explicit version path (v3) - Recommended
result_v3 = trustwise.safety.v3.faithfulness.evaluate(
    query="What is the capital of France?",
    response="The capital of France is Paris.",
    context=context
)
print("V3 Result:", result_v3)

# Using default version (backward compatibility)
result_default = trustwise.safety.faithfulness.evaluate(
    query="What is the capital of France?",
    response="The capital of France is Paris.",
    context=context
)
print("Default Version Result:", result_default)

## 4. Safety Metrics

Let's explore the comprehensive safety metrics available in the SDK.

In [None]:
# 4.1 Faithfulness Evaluation
faithfulness = trustwise.safety.v3.faithfulness.evaluate(
    query="What is the capital of France?",
    response="The capital of France is Paris.",
    context=context
)
print("Faithfulness Score:", faithfulness.score)
print("Facts:", faithfulness.facts)

# 4.2 PII Detection
pii_text = "My email is john@example.com and my phone is 123-456-7890"
pii_result = trustwise.safety.v3.pii.evaluate(
    text=pii_text,
    allowlist=["john@example.com"],  # Allowed PII patterns
    blocklist=["123-456-7890"]      # Blocked PII patterns
)
print("\nPII Detection Results:")
print("Identified PII:", pii_result.identified_pii)

# 4.3 Answer Relevancy
relevancy = trustwise.safety.v3.answer_relevancy.evaluate(
    query="What is the capital of France?",
    response="The capital of France is Paris.",
    context=context
)
print("\nAnswer Relevancy Score:", relevancy.score)
print("Generated Question:", relevancy.generated_question)

## 5. Alignment Metrics

Now let's look at the alignment metrics that help evaluate the quality and appropriateness of responses.

In [None]:
# 5.1 Clarity Evaluation
clarity = trustwise.alignment.v1.clarity.evaluate(
    query="What is the capital of France?",
    response="The capital of France is Paris."
)
print("Clarity Score:", clarity.score)

# 5.2 Tone Analysis
tone = trustwise.alignment.v1.tone.evaluate(
    response="The capital of France is Paris."
)
print("\nTone Analysis:")
for label, score in zip(tone.labels, tone.scores):
    print(f"{label}: {score:.2f}%")

# 5.3 Formality Check
formality = trustwise.alignment.v1.formality.evaluate(
    response="The capital of France is Paris."
)
print("\nFormality Score:", formality.score)
print("Sentence Scores:", dict(zip(formality.sentences, formality.scores)))

## 6. Performance Metrics

Let's explore the performance metrics for monitoring costs and environmental impact.

In [None]:
# 6.1 Cost Evaluation
cost_result = trustwise.performance.v1.cost.evaluate(
    model_name="gpt-3.5-turbo",
    model_type="LLM",
    model_provider="OpenAI",
    number_of_queries=5,
    total_prompt_tokens=950,
    total_completion_tokens=50
)
print("Cost Analysis:")
print(f"Cost per run: ${cost_result.cost_estimate_per_run:.4f}")
print(f"Total project cost: ${cost_result.total_project_cost_estimate:.4f}")

# 6.2 Carbon Emissions
carbon_result = trustwise.performance.v1.carbon.evaluate(
    processor_name="AMD A10-9700",
    provider_name="aws",
    provider_region="us-east-1",
    instance_type="p4d.24xlarge",
    average_latency=100
)
print("\nCarbon Emissions:")
print(f"Carbon emissions: {carbon_result.carbon_emitted:.4f} kg CO2e")

### 6.1 Types, JSON and auto-complete support

Trustwise SDK supports both Response types and JSON for developer's ease of use.

In [None]:
print("Carbon Result:", type(carbon_result), carbon_result)
print("Carbon Emitted:", carbon_result.carbon_emitted)
print("Carbon Result JSON:", carbon_result.to_json())

## 7. Guardrails (Experimental) and Validation

Let's implement guardrails to automatically validate responses against multiple metrics.

In [None]:
# Create a multi-metric guardrail
guardrail = trustwise.guardrails(
    thresholds={
        "faithfulness": 80,
        "answer_relevancy": 70,
        "clarity": 70
    },
    block_on_failure=True
)

# Test the guardrail with a good response
good_response = "The capital of France is Paris."
good_evaluation = guardrail.evaluate(
    query="What is the capital of France?",
    response=good_response,
    context=context
)
print("Good Response Evaluation:")
print(good_evaluation.to_json())

# Test the guardrail with a poor response
poor_response = "I don't know the answer to that question."
poor_evaluation = guardrail.evaluate(
    query="What is the capital of France?",
    response=poor_response,
    context=context
)
print("\nPoor Response Evaluation:")
print(poor_evaluation.to_json())

## 8. Error Handling and Best Practices

Let's explore different types of errors you might encounter when using the SDK and how to handle them properly.

### 8.1 SDK Validation Errors

The SDK uses Pydantic for input validation. Let's see how it handles invalid inputs:

In [None]:
# Try to evaluate with invalid input (missing required field)
result = trustwise.safety.v3.faithfulness.evaluate(
    query="What is the capital of France?",
    # Missing 'response' parameter
    context=context
)

In [None]:
# Try to evaluate with invalid input type
result = trustwise.safety.v3.faithfulness.evaluate(
    query=123,  # Invalid type: should be string
    response="The capital of France is Paris.",
    context=context
)

### 8.2 Backend API Errors

When the backend API returns a non-200 response, the SDK will raise a `TrustwiseError` with the backend's error message. Let's see some examples:

In [None]:

# Import required packages
import os
from dotenv import load_dotenv
from trustwise.sdk import TrustwiseSDK
from trustwise.sdk.config import TrustwiseConfig

# Load environment variables from .env file
load_dotenv()

config = TrustwiseConfig()  # Automatically uses TW_API_KEY from environment
trustwise = TrustwiseSDK(config)

# Try to evaluate with invalid API key
invalid_config = TrustwiseConfig(api_key="invalid_key")
invalid_sdk = TrustwiseSDK(invalid_config)
result = invalid_sdk.safety.v3.faithfulness.evaluate(
    query="What is the capital of France?",
    response="The capital of France is Paris.",
    context=[]
)

In [None]:
# Try to evaluate with invalid context format
invalid_context = [{
    "invalid_field": "This is not a valid context format"
}]
result = trustwise.safety.v3.faithfulness.evaluate(
    query="What is the capital of France?",
    response="The capital of France is Paris.",
    context=invalid_context
)

### 8.3 Comprehensive Error Handling

Here's a practical example of how to handle both SDK validation errors and backend API errors in a production environment:

In [None]:
from typing import Optional, Dict, Any
from trustwise.sdk.exceptions import TrustwiseValidationError

def safe_evaluate_with_error_handling(
    query: str,
    response: str,
    context: list,
    metric: str = "faithfulness"
) -> Optional[Dict[str, Any]]:
    """
    Safely evaluate a response with comprehensive error handling.
    
    Args:
        query: The user's query
        response: The AI's response
        context: The context used for evaluation
        metric: The metric to evaluate (default: faithfulness)
        
    Returns:
        Optional[Dict[str, Any]]: Evaluation results or error details
    """
    try:
        # Get the appropriate evaluator based on the metric
        evaluator = getattr(trustwise.safety.v3, metric)
        
        # Perform the evaluation
        result = evaluator.evaluate(
            query=query,
            response=response,
            context=context
        )
        
        return {
            "success": True,
            "score": result.score,
            "details": result.to_json()
        }
        
    except TrustwiseValidationError as e:
        print(f"Trustwise Validation Error: {str(e)}")
        return {
            "success": False,
            "error_type": "trustwise_validation_error",
            "error_message": str(e)
        }
    except Exception as e:
        print(f"Unexpected Error: {str(e)}")
        return {
            "success": False,
            "error_type": "unexpected_error",
            "error_message": str(e)
        }

# Test the error handling with various scenarios
test_cases = [
    # Valid case
    {
        "query": "What is the capital of France?",
        "response": "The capital of France is Paris.",
        "context": context,
        "description": "Valid input"
    },
    # Invalid query type
    {
        "query": 123,  # Invalid type
        "response": "The capital of France is Paris.",
        "context": context,
        "description": "Invalid query type"
    },
    # Invalid context format
    {
        "query": "What is the capital of France?",
        "response": "The capital of France is Paris.",
        "context": [{"invalid": "format"}],
        "description": "Invalid context format"
    }
]

for case in test_cases:
    print(f"\nTesting: {case['description']}")
    result = safe_evaluate_with_error_handling(
        query=case["query"],
        response=case["response"],
        context=case["context"]
    )
    print("Result:", result)

## 10. Features

This notebook has demonstrated the key features and capabilities of the Trustwise SDK:

1. Flexible configuration and initialization
2. Path-based API versioning
3. Full test coverage for SDK + Installation
4. Automated documentation support
5. Guardrails (Experimental) and validation
6. Structured Error handling
7. Extensibility of .explain() / .batch_evaluate() features for future