## ConfigSynthesizer Example

This notebook demonstrates how to use the `ConfigSynthesizer` for structured test case generation:
- Uses `GenerationConfig` for organized test generation parameters
- Supports flexible configuration with behaviors, categories, and topics
- Can work with or without external sources
- Provides rich metadata and structured output

The `ConfigSynthesizer` is designed for:
- **Structured Generation**: Organize test parameters using `GenerationConfig`
- **Flexible Prompting**: Combine generation prompts with structured parameters
- **Source Integration**: Optionally incorporate external content sources
- **Consistent Output**: Generate tests with predictable structure and metadata

Key Configuration Parameters:
- `config`: GenerationConfig object containing:
  - `generation_prompt`: Main prompt for test generation (optional)
  - `behaviors`: List of behaviors to test (optional)
  - `categories`: List of test categories (optional)
  - `topics`: List of topics to cover (optional)
  - `additional_context`: Extra context information (optional)
- `batch_size`: Maximum tests per LLM call (default: 20)
- `model`: The model to use for generation (optional)

### Example 1: Basic ConfigSynthesizer Usage

In [None]:
# Set up your API credentials and configuration
import os
from rhesis.sdk.synthesizers import ConfigSynthesizer, GenerationConfig

# Configure your Rhesis API credentials
os.environ["RHESIS_API_KEY"] = "replace-with-your-api-key"  # Replace with your actual API key

print("âœ“ SDK configured successfully")
print("Ready to generate test cases with ConfigSynthesizer!")


## Example 1: Basic ConfigSynthesizer Usage


In [None]:
# Create a basic generation config
config = GenerationConfig(
    generation_prompt="Generate test cases for a customer service chatbot",
    behaviors=["helpful_responses", "error_handling", "escalation"],
    categories=["customer_support", "technical_issues"],
    topics=["billing", "account_management", "troubleshooting"],
    additional_context="Focus on common customer pain points and edge cases"
)

# Create synthesizer with just the config (no sources)
synthesizer = ConfigSynthesizer(config=config, batch_size=5)

# Generate test cases
result = synthesizer.generate(num_tests=8)

print(f"Generated {len(result.tests)} tests")
print(f"Test set metadata keys: {list(result.metadata.keys())}")
print(f"Synthesizer used: {result.metadata.get('synthesizer_name')}")


In [None]:
# Inspect the structure of generated tests
first_test = result.tests[0]

print("Test Structure:")
print(f"- Prompt: {first_test.prompt.content[:100]}...")
print(f"- Behavior: {first_test.behavior}")
print(f"- Category: {first_test.category}")
print(f"- Topic: {first_test.topic}")
print(f"- Test Type: {first_test.test_type}")

print(f"\nMetadata keys: {list(first_test.metadata.keys())}")
print(f"Generated by: {first_test.metadata['generated_by']}")

# Show how the config parameters are reflected in the tests
behaviors = [test.behavior for test in result.tests]
categories = [test.category for test in result.tests]
topics = [test.topic for test in result.tests]

print(f"\nGenerated behaviors: {set(behaviors)}")
print(f"Generated categories: {set(categories)}")
print(f"Generated topics: {set(topics)}")

### Example 2: ConfigSynthesizer with Different Config Patterns

In [None]:
# Example 1: Minimal config with just a prompt
minimal_config = GenerationConfig(
    generation_prompt="Generate security test cases for web applications"
)

minimal_synthesizer = ConfigSynthesizer(config=minimal_config)
minimal_result = minimal_synthesizer.generate(num_tests=3)

print("=== Minimal Config Results ===")
print(f"Generated {len(minimal_result.tests)} tests")
for i, test in enumerate(minimal_result.tests):
    print(f"Test {i+1}: {test.behavior} | {test.category} | {test.topic}")

print("\n" + "="*50 + "\n")

# Example 2: Config with only structured parameters (no prompt)
structured_config = GenerationConfig(
    behaviors=["input_validation", "authentication", "authorization"],
    categories=["security", "performance", "usability"],
    topics=["login_system", "data_access", "user_interface"],
    additional_context="Focus on edge cases and potential vulnerabilities"
)

structured_synthesizer = ConfigSynthesizer(config=structured_config)
structured_result = structured_synthesizer.generate(num_tests=4)

print("=== Structured Config Results (No Prompt) ===")
print(f"Generated {len(structured_result.tests)} tests")
for i, test in enumerate(structured_result.tests):
    print(f"Test {i+1}: {test.behavior} | {test.category} | {test.topic}")
    print(f"  Prompt preview: {test.prompt.content[:80]}...")

print("\n" + "="*50 + "\n")

# Example 3: Combined config with both prompt and structured parameters
combined_config = GenerationConfig(
    generation_prompt="Generate comprehensive API testing scenarios",
    behaviors=["error_handling", "rate_limiting", "data_validation"],
    categories=["api_testing", "integration"],
    topics=["rest_endpoints", "authentication", "response_handling"]
)

combined_synthesizer = ConfigSynthesizer(config=combined_config, batch_size=3)
combined_result = combined_synthesizer.generate(num_tests=3)

print("=== Combined Config Results ===")
print(f"Generated {len(combined_result.tests)} tests")
for i, test in enumerate(combined_result.tests):
    print(f"Test {i+1}: {test.behavior} | {test.category} | {test.topic}")
    print(f"  Prompt preview: {test.prompt.content[:80]}...")

### Example 3: ConfigSynthesizer with Sources

In [None]:
from rhesis.sdk.services.extractor import SourceSpecification, SourceType
from rhesis.sdk.services.chunker import SemanticChunker

# Create a config for document-based test generation
config = GenerationConfig(
    generation_prompt="Generate test cases for insurance claims processing",
    categories=["claims_processing", "policy_validation"],
    behaviors=["fraud_detection", "documentation_review", "compliance_check"]
)

# Create sources with different types of content
sources = [
    # Text source with policy information
    SourceSpecification(
        type=SourceType.TEXT,
        name="policy_terms.md",
        description="Insurance policy terms and coverage",
        metadata={
            "content": """
# Insurance Policy Terms

## Coverage
- Medical emergencies up to $50,000
- Theft and loss up to $10,000
- Natural disasters (excluding floods)

## Exclusions
- Intentional damage or fraud
- Pre-existing medical conditions
- War and terrorism

## Claims Process
1. Report incident within 48 hours
2. Provide all required documentation
3. Await assessment by claims adjuster
4. Receive decision within 14 business days
            """
        }
    ),
    # Another text source with processing guidelines
    SourceSpecification(
        type=SourceType.TEXT,
        name="claims_guidelines.md",
        description="Internal guidelines for processing claims",
        metadata={
            "content": """
# Claims Processing Guidelines

## Standard Processing Time
- Simple claims: 5-7 business days
- Complex claims: 10-14 business days
- Disputed claims: 21-30 business days

## Fraud Indicators
- Inconsistent incident dates
- Missing or altered documentation
- Unusually high claim amounts
- Multiple claims in short timeframe

## Required Documentation
- Incident report
- Police report (if applicable)
- Medical records (for health claims)
- Receipts and proof of ownership
            """
        }
    )
]

# Create synthesizer with sources and custom chunking
synthesizer = ConfigSynthesizer(
    config=config,
    sources=sources,
    chunking_strategy=SemanticChunker(max_tokens_per_chunk=800),
    batch_size=5
)

# Generate tests based on the source content
result = synthesizer.generate(num_tests=6)

print(f"Generated {len(result.tests)} tests from {len(sources)} sources")
print(f"Documents used: {result.metadata.get('documents_used', [])}")
print(f"Coverage: {result.metadata.get('coverage_percent', 0):.1%}")
print(f"Contexts used: {result.metadata.get('contexts_used', 0)}/{result.metadata.get('contexts_total', 0)}")

# Show how source content influences test generation
print("\n=== Sample Tests with Source Context ===")
for i, test in enumerate(result.tests[:2]):
    print(f"\nTest {i+1}:")
    print(f"  Behavior: {test.behavior}")
    print(f"  Category: {test.category}")
    print(f"  Prompt: {test.prompt.content[:120]}...")
    
    # Show source information
    if 'sources' in test.metadata:
        source_info = test.metadata['sources'][0]
        print(f"  Source: {source_info['name']}")
        print(f"  Context preview: {source_info['content'][:100]}...")


### Example with file-based sources (update paths as needed)

In [None]:
# Document source example
file_sources = [
    SourceSpecification(
        type=SourceType.DOCUMENT,
        name="policy_manual.pdf",
        description="Company policy manual",
        metadata={"path": "/path/to/your/policy_manual.pdf"}
    ),
    SourceSpecification(
        type=SourceType.WEBSITE,
        name="company_website",
        description="Company privacy policy page",
        metadata={"url": "https://yourcompany.com/privacy"}
    )
]

file_config = GenerationConfig(
    generation_prompt="Generate compliance test cases based on company policies",
    categories=["compliance", "policy_adherence"],
    behaviors=["policy_validation", "procedure_checking"]
)

file_synthesizer = ConfigSynthesizer(
    config=file_config,
    sources=file_sources,
    chunking_strategy=SemanticChunker(max_tokens_per_chunk=1200)
)

file_result = file_synthesizer.generate(num_tests=5)
print(f"Generated {len(file_result.tests)} tests from file sources")


print("File-based source example is commented out.")
print("Uncomment and update file paths/URLs to test with actual documents.")
