# Collinear Red Team - Custom Target Model

This notebook shows how to test your own custom model with red-team evaluation.

Red-teaming tests whether LLMs can be manipulated into violating safety policies through adversarial prompting.

**Note:** The attack plan is loaded automatically on the server. You only need to specify which model you want to test!

## Install SDK

In [None]:
!pip install collinear

## Setup

Import the client and set your API keys:

In [None]:
import os
from collinear.client import Client

# Set your API keys
os.environ["OPENAI_API_KEY"] = "your-openai-key-here"
os.environ["COLLINEAR_API_KEY"] = "your-collinear-key-here"
os.environ["COLLINEAR_BACKEND_URL"] = "https://stage.collinear.ai"

## Initialize Client

The client needs your default API credentials. These will be used for the attacker and evaluator models (running on Collinear's side):

In [None]:
client = Client(
    assistant_model_url="https://api.openai.com/v1",
    assistant_model_api_key=os.environ["OPENAI_API_KEY"],
    assistant_model_name="gpt-4o-mini",
    collinear_api_key=os.environ["COLLINEAR_API_KEY"],
)

print("✓ Client initialized")

## Test Your Custom Model

Specify the model you want to test using the `target_model` parameter.

The target model will use the same API endpoint and credentials from the client initialization above:

In [None]:
# Start evaluation with your custom target model
evaluation = client.redteam(
    target_model="gpt-4o",  # Replace with your model name
    max_turns=10,
)

print(f"✓ Started evaluation: {evaluation.id}")
print(f"  Testing model: gpt-4o")
print(f"  Attack plan loaded automatically on the server")

## Alternative: Use a Different API Endpoint for Your Target Model

If your target model is hosted at a different endpoint (e.g., Azure OpenAI, a custom deployment, or a different provider), use `ModelConfig`:

In [None]:
from collinear.redteam import ModelConfig

# Define your custom target model configuration
my_target = ModelConfig(
    provider="openai_compat",
    model="gpt-4o",  # Replace with your model name
    base_url="https://api.openai.com/v1",  # Replace with your API endpoint
    api_key="your-api-key-here",  # Replace with your API key
    temperature=0.0,
    max_retries=10,
)

# Start evaluation with custom target configuration
evaluation = client.redteam(
    target_config=my_target,
    max_turns=10,
)

print(f"✓ Started evaluation: {evaluation.id}")
print(f"  Testing custom model configuration")

## Poll for Results

Wait for the evaluation to complete (may take several minutes):

In [None]:
# Wait for completion (up to 10 minutes)
result = evaluation.poll(timeout=600.0, interval=5.0)

# View summary
summary = evaluation.summary()
print(f"\nStatus: {summary['status']}")
print(f"Total behaviors tested: {summary['total_behaviors']}")
print(f"Successful: {summary['successful']}")
print(f"Failed: {summary['failed']}")

if summary['errors_by_type']:
    print(f"\nErrors: {summary['errors_by_type']}")

## View Full Results

The result contains all attack transcripts, judge scores, and evaluation details:

In [None]:
import json
print(json.dumps(result, indent=2))