# Rhesis SDK - End-to-End Testing Workflow

This notebook demonstrates the complete end-to-end workflow for creating and executing tests with the Rhesis SDK:

- **Test Generation**: Create test cases using synthesizers with custom prompts and configurations
- **Test Management**: Push test sets to the Rhesis platform and export/import from CSV
- **Test Execution**: Run generated tests against your application endpoints
- **Test Evaluation**: Assess test results using built-in metrics and evaluation frameworks

## Prerequisites:
Before you start, make sure to install the SDK:

```bash
pip install rhesis-sdk
```

You'll also need an API key from [Rhesis](https://rhesis.ai) to use the models and platform features.

In [None]:
## Setup and Configuration

# Set up your API credentials and configuration
import os
from pprint import pprint

# Configure your Rhesis API credentials
os.environ["RHESIS_API_KEY"] = "your_api_key_here"  # Replace with your actual API key
os.environ["RHESIS_BASE_URL"] = "https://api.rhesis.ai"

print("✓ SDK configured successfully")
print("Ready to generate, execute, and evaluate tests!")



## Synthesizers

Synthesizers are used to generate test cases from a given prompt and configuration. These test cases can then be used to evaluate your application's behavior. You can generate test cases using the following approach:

In [None]:
from rhesis.sdk.synthesizers import Synthesizer

# Create a synthesizer with a detailed prompt for insurance chatbot testing
synthesizer = Synthesizer(
    prompt="Test an insurance expert chatbot that answers questions about policies, claims, coverage options, and premiums. Include edge cases like requests outside the insurance domain, ambiguous questions, and attempts to get the bot to provide financial or legal advice it shouldn't give.",
)

# Generate a set of test cases
print("Generating test cases...")
test_set = synthesizer.generate(num_tests=3)

print(f"✓ Generated {len(test_set.tests)} test cases")
print(f"Test set ID: {test_set.id}")
print(f"Generated by: {test_set.metadata.get('synthesizer_name', 'Synthesizer')}")

# Preview the first test case
first_test = test_set.tests[0]
print(f"\nSample test case:")
print(f"- Behavior: {first_test.behavior}")
print(f"- Category: {first_test.category}")
print(f"- Prompt: {first_test.prompt.content[:100]}...")


The generated test set can be pushed to the Rhesis platform and then used there. A test set can be 
also exported to a CSV file. It is also possible to load a test set from a CSV file.




In [None]:
# Give your test set a meaningful name
test_set.name = "Insurance Chatbot Test Set - v1.0"

# Push to the Rhesis platform for team access
print("Pushing test set to Rhesis platform...")
test_set.push()
print("✓ Test set saved to platform")

# Export to CSV for local backup or external analysis
csv_filename = "insurance_chatbot_tests.csv"
test_set.to_csv(csv_filename)
print(f"✓ Test set exported to {csv_filename}")

# You can also load a test set from CSV later:
# from rhesis.sdk.entities import TestSet
# loaded_test_set = TestSet.from_csv("insurance_chatbot_tests.csv")


## Executing the tests

 You can execute your generated tests against a specific application endpoint using the `Endpoint` class.
 
 > **Note:**  
 > You'll need the endpoint ID for your application. You can find this ID on the endpoint details page in the Rhesis platform.
 
 Simply provide the endpoint ID and the test set you want to run.


In [None]:
from rhesis.sdk.entities import Endpoint

# Connect to your application endpoint
# Replace with your actual endpoint ID from the Rhesis platform
endpoint_id = "be95b292-c3a9-42b9-a74d-142b28f0b9f0"
endpoint = Endpoint(id=endpoint_id)

print(f"✓ Connected to endpoint: {endpoint_id}")
print("Ready to execute tests against your application")

## Run generated tests 

Run generated tests on your application (endpoint) to see how your application behaves.



In [None]:
# Execute each test case against the endpoint
print("Executing tests against your application...")
tests = []

for i, test in enumerate(test_set.tests, 1):
    print(f"Running test {i}/{len(test_set.tests)}: {test.behavior}")
    
    # Send the test prompt to your application
    response = endpoint.invoke(test.prompt.content)
    output = response["output"]
    
    # Store the input-output pair for evaluation
    tests.append({
        "input": test.prompt.content,
        "output": output,
        "behavior": test.behavior,
        "category": test.category
    })

print(f"✓ Executed {len(tests)} tests successfully")
print("\nTest Results Preview:")
for i, test in enumerate(tests):
    print(f"\nTest {i+1} ({test['behavior']}):")
    print(f"Input: {test['input'][:80]}...")
    print(f"Output: {test['output'][:80]}...")

## Evaluate the tests

The tests with the outputs from your application can be evaluated using the metrics.


In [None]:
from rhesis.sdk.metrics import DeepEvalNonAdvice

# Initialize the evaluation metric
metric = DeepEvalNonAdvice()
print("Evaluating test results with DeepEvalNonAdvice metric...")

# Evaluate each test result
evaluation_results = []
for i, test in enumerate(tests, 1):
    print(f"\nEvaluating test {i}/{len(tests)}...")
    
    # Run the metric evaluation
    result = metric.evaluate(test["input"], test["output"])
    
    # Store evaluation result
    evaluation_results.append({
        "test_number": i,
        "behavior": test["behavior"],
        "input": test["input"],
        "output": test["output"],
        "score": result.score,
        "reason": result.details["reason"]
    })
    
    # Display results
    print(f"Test {i} ({test['behavior']}):")
    print(f"  Score: {result.score}")
    print(f"  Reason: {result.details['reason']}")

# Summary of results
print(f"\n{'='*50}")
print("EVALUATION SUMMARY")
print(f"{'='*50}")
total_tests = len(evaluation_results)
passed_tests = sum(1 for r in evaluation_results if r["score"] == 1.0)
print(f"Total tests: {total_tests}")
print(f"Passed tests: {passed_tests}")
print(f"Success rate: {(passed_tests/total_tests)*100:.1f}%")
