# Collinear Red Team - Custom Target Model

This notebook shows how to test your own custom model with red-team evaluation.

Red-teaming tests whether LLMs can be manipulated into violating safety policies through adversarial prompting.

**Note:** The attack plan is loaded automatically on the server. You only need to specify which model you want to test!

## Install SDK

In [None]:
!pip install collinear

## Setup

Import the client and set your API keys:

In [None]:
import os
from collinear.client import Client

# Set your API keys
os.environ["OPENAI_API_KEY"] = "your-openai-key-here"
os.environ["COLLINEAR_API_KEY"] = "your-collinear-key-here"
os.environ["COLLINEAR_BACKEND_URL"] = "https://stage.collinear.ai"

## Initialize Client

The client needs your default API credentials. These will be used for the attacker and evaluator models (running on Collinear's side):

In [None]:
client = Client(
    assistant_model_url="https://api.openai.com/v1",
    assistant_model_api_key=os.environ["OPENAI_API_KEY"],
    assistant_model_name="gpt-4o-mini",
    collinear_api_key=os.environ["COLLINEAR_API_KEY"],
)

print("✓ Client initialized")

## Test Your Custom Model

Specify the model you want to test using the `target_model` parameter.

The target model will use the same API endpoint and credentials from the client initialization above:

In [None]:
max_turns = 3
max_workers = 2
target_model = "gpt-4o"

evaluation = client.redteam(
    target_model=target_model,
    max_turns=max_turns,
    max_workers=max_workers,
)

print(f"✓ Started: {evaluation.id}")
print(f"\n   Target Model:")
print(f"      Model:    {target_model}")
print(f"      Endpoint: https://api.openai.com/v1")
print(f"\n   Attacker Model:")
print(f"      Model:    gpt-4o-mini")
print(f"      Endpoint: https://api.openai.com/v1")
print(f"\n   Max turns: {max_turns} | Workers: {max_workers}")

## Alternative: Use a Different API Endpoint for Your Target Model

If your target model is hosted at a different endpoint (e.g., Azure OpenAI, a custom deployment, or a different provider), use `ModelConfig`:

In [None]:
from collinear.redteam import ModelConfig

max_turns = 3
max_workers = 2

my_target = ModelConfig(
    provider="openai_compat",
    model="gpt-4o",
    base_url="https://api.openai.com/v1",
    api_key="your-api-key-here",
    temperature=0.0,
    max_retries=10,
)

evaluation = client.redteam(
    target_config=my_target,
    max_turns=max_turns,
    max_workers=max_workers,
)

print(f"✓ Started: {evaluation.id}")
print(f"\n   Target Model:")
print(f"      Model:    {my_target.model}")
print(f"      Endpoint: {my_target.base_url}")
print(f"\n   Attacker Model:")
print(f"      Model:    gpt-4o-mini")
print(f"      Endpoint: https://api.openai.com/v1")
print(f"\n   Max turns: {max_turns} | Workers: {max_workers}")

## Poll for Results with Real-time Updates

Wait for the evaluation to complete with status updates:

In [None]:
import time

start = time.time()
last_status = None
last_print = 0

while True:
    result = evaluation.status(refresh=True)
    current_status = result.get("status", "PENDING")
    elapsed = time.time() - start

    if current_status != last_status or elapsed - last_print >= 30:
        elapsed_mins = int(elapsed / 60)
        print(f"[{elapsed_mins:3d}m] {current_status}")
        last_status = current_status
        last_print = elapsed

    if current_status in {"COMPLETED", "FAILED"}:
        break

    time.sleep(15)

summary = evaluation.summary()
print(f"\nSummary:")
print(f"  Total behaviors: {summary['total_behaviors']}")
print(f"  Successful: {summary['successful']}")
print(f"  Failed: {summary['failed']}")

if summary['errors_by_type']:
    print(f"  Errors: {summary['errors_by_type']}")

## Save and View Results

Save results to a file and display them:

In [None]:
import json

output_file = f"redteam_results_{evaluation.id}.json"
with open(output_file, "w") as f:
    json.dump(result, f, indent=2)

print(f"✓ Results saved to: {output_file}")
print(f"\nFull results:")
print(json.dumps(result, indent=2))