# EvalHub Client SDK Usage Examples

This notebook demonstrates how to use the EvalHub client SDK for interacting with the EvalHub evaluation service.

The SDK provides separate client classes for async and sync operations:
- `AsyncEvalHubClient` - For asynchronous operations (recommended for I/O-bound workloads)
- `SyncEvalHubClient` - For synchronous operations

Both use a **nested resource structure**:
- `client.providers` - Provider operations
- `client.benchmarks` - Benchmark operations
- `client.collections` - Collection operations
- `client.jobs` - Evaluation job operations

## Prerequisites: Set Environment Variables

Before running this notebook, set the required environment variables in your shell:

```bash
# Get your OpenShift authentication token
export EVALHUB_TOKEN=$(oc whoami -t)

# Get the EvalHub route URL from your namespace (e.g., 'test')
export EVALHUB_URL=https://$(oc get route evalhub -n test -o jsonpath='{.spec.host}')

# Verify the variables are set
echo "Token: ${EVALHUB_TOKEN:0:20}..."
echo "URL: $EVALHUB_URL"
```

Then start Jupyter with these variables available:
```bash
jupyter notebook client_usage.ipynb
```

## OpenShift Authentication

The SDK has built-in support for OpenShift authentication with three methods:

1. **ServiceAccount Token** (`auth_token_path`): For pods running inside OpenShift, use the automounted token at `/var/run/secrets/kubernetes.io/serviceaccount/token`
2. **User Token** (`auth_token`): For local development, use token from `oc whoami -t` (shown above)
3. **Environment Variable** (`auth_token`): For production, use token from environment variable or secret (used in examples below)

## Setup: Import Required Modules

In [1]:
from evalhub import (
    AsyncEvalHubClient,
    SyncEvalHubClient,
    JobSubmissionRequest,
    BenchmarkConfig,
    ModelConfig,
)
import os

# Verify environment variables are set
if not os.getenv("EVALHUB_TOKEN"):
    raise ValueError("EVALHUB_TOKEN environment variable is not set. See Prerequisites section above.")
if not os.getenv("EVALHUB_URL"):
    raise ValueError("EVALHUB_URL environment variable is not set. See Prerequisites section above.")

## Example 1: Basic Client Usage

Connect to EvalHub and check health status.

In [2]:
# Create synchronous client with environment variable authentication
with SyncEvalHubClient(
    base_url=os.getenv("EVALHUB_URL"),
    auth_token=os.getenv("EVALHUB_TOKEN"),
    insecure=True,  # For self-signed certs (use ca_bundle_path in production)
) as client:
    try:
        # Check health
        health = client.health()
        print(f"✓ EvalHub is healthy: {health['status']}")
        print(f"  Version: {health.get('version', 'unknown')}")
        print(f"  Uptime: {health.get('uptime_seconds', 0):.1f}s")
    except Exception as e:
        print(f"✗ Failed to connect: {e}")

TLS verification disabled - skipping CA bundle detection
TLS verification disabled (insecure mode)


✓ EvalHub is healthy: healthy
  Version: unknown
  Uptime: 0.0s


## Example: List Providers and Benchmarks

In [3]:
# List available providers using nested resource structure
with SyncEvalHubClient(
    base_url=os.getenv("EVALHUB_URL"),
    auth_token=os.getenv("EVALHUB_TOKEN"),
    insecure=True,
) as client:
    try:
        # List all providers
        providers = client.providers.list()
        print(f"✓ Found {len(providers)} providers:")
        for provider in providers:
            print(f"  - {provider.id}: {provider.name}")
        
        # List benchmarks from a specific provider
        if providers:
            provider_id = providers[0].id
            benchmarks = client.benchmarks.list(provider_id=provider_id)
            print(f"\n✓ Found {len(benchmarks)} benchmarks for {provider_id}:")
            for benchmark in benchmarks[:5]:  # Show first 5
                print(f"  - {benchmark.id}: {benchmark.name}")
    
    except Exception as e:
        print(f"✗ Error: {e}")

TLS verification disabled - skipping CA bundle detection
TLS verification disabled (insecure mode)


✓ Found 5 providers:
  - ragas: RAGAS
  - garak: Garak
  - guidellm: GuideLLM
  - lighteval: Lighteval
  - lm_evaluation_harness: LM Evaluation Harness

✓ Found 4 benchmarks for ragas:
  - faithfulness: Faithfulness
  - answer_relevancy: Answer Relevancy
  - context_precision: Context Precision
  - context_recall: Context Recall


## Example: Submit an Evaluation Job

**Important:** The `model.url` should be the **base URL only** (host:port), not including the `/v1/completions` endpoint path. The endpoint is added automatically by the evaluation framework.

To find available services in your namespace:
```bash
# List services in your namespace
oc get svc -n test

# Example: If you have a vLLM service named 'vllm-server'
# Correct URL format: http://vllm-server.test.svc.cluster.local:8000
# WRONG format: http://vllm-server.test.svc.cluster.local:8000/v1/completions
```

In [4]:
# Submit an evaluation job
with SyncEvalHubClient(
    base_url=os.getenv("EVALHUB_URL"),
    auth_token=os.getenv("EVALHUB_TOKEN"),
    insecure=True,
) as client:
    try:
        # Create model configuration
        # IMPORTANT: URL should be base URL only (host:port), NOT including /v1/completions
        # The /v1/completions endpoint is added automatically by lm-eval
        model = ModelConfig(
            url="http://vllm-server.test.svc.cluster.local:8000",
            name="tinyllama"
        )
        
        # Create benchmark configuration
        benchmark = BenchmarkConfig(
            id="arc_easy",
            provider_id="lm_evaluation_harness",
            parameters={
                "limit": 5,  # Number of examples to evaluate (use small number for testing)
                "tokenizer": "google/flan-t5-small"  # HuggingFace tokenizer for the model
            }
        )
        
        # Create job submission request
        job_request = JobSubmissionRequest(
            model=model,
            benchmarks=[benchmark],
            timeout_minutes=30,
            retry_attempts=1
        )
        
        # Submit job using nested resource
        job = client.jobs.submit(job_request)
        
        # Store job ID for later use
        submitted_job_id = job.id
        
        print(f"✓ Job submitted successfully")
        print(f"  Job ID: {submitted_job_id}")
        print(f"  State: {job.state}")
        print(f"  Created: {job.resource.created_at}")
        
        # Check status
        updated_job = client.jobs.get(submitted_job_id)
        print(f"\n✓ Current job state: {updated_job.state}")
        if updated_job.status and updated_job.status.message:
            print(f"  Message: {updated_job.status.message.message}")

    except Exception as e:
        print(f"✗ Error: {e}")

TLS verification disabled - skipping CA bundle detection
TLS verification disabled (insecure mode)


✓ Job submitted successfully
  Job ID: 30184068-b829-4827-8446-ee3a4359f62d
  State: unknown
  Created: 2026-02-08 17:52:08.086035+00:00

✓ Current job state: pending
  Message: Evaluation job created


## Example: Async Client Usage

Using the asynchronous client for better performance with I/O-bound operations.

**Note:** Same method names as sync - just await them!

## Example: Wait for Job Completion and Retrieve Results

Monitor a job until completion and retrieve the results.

In [5]:
# Wait for job completion and retrieve results
# This uses the submitted_job_id from the previous cell
with SyncEvalHubClient(
    base_url=os.getenv("EVALHUB_URL"),
    auth_token=os.getenv("EVALHUB_TOKEN"),
    insecure=True,
) as client:
    try:
        # Use the job ID from the previous submission
        # Make sure you ran the job submission cell first!
        print(f"Waiting for job {submitted_job_id} to complete...")
        
        completed_job = client.jobs.wait_for_completion(
            job_id=submitted_job_id,
            timeout=600,  # Wait up to 10 minutes
            poll_interval=5.0  # Check every 5 seconds
        )
        
        print(f"✓ Job completed with state: {completed_job.state}")
        
        # Display results if available
        if completed_job.results:
            print(f"\n✓ Evaluation Results:")
            print(f"  Total evaluations: {completed_job.results.total_evaluations}")
            print(f"  Completed: {completed_job.results.completed_evaluations}")
            print(f"  Failed: {completed_job.results.failed_evaluations}")
            
            # Show benchmark results
            if completed_job.results.benchmarks:
                print(f"\n✓ Benchmark Results:")
                for bench_result in completed_job.results.benchmarks:
                    print(f"\n  Benchmark: {bench_result.id} (Provider: {bench_result.provider_id})")
                    
                    # Display metrics
                    if bench_result.metrics:
                        print(f"    Metrics:")
                        for metric_name, metric_value in bench_result.metrics.items():
                            print(f"      - {metric_name}: {metric_value}")
                    
                    # Display artifacts info if available
                    if bench_result.artifacts:
                        print(f"    Artifacts: {len(bench_result.artifacts)} items")
                    
                    # Display MLFlow info if available
                    if bench_result.mlflow_run_id:
                        print(f"    MLFlow Run ID: {bench_result.mlflow_run_id}")
                    if bench_result.logs_path:
                        print(f"    Logs: {bench_result.logs_path}")
            
            # Display MLFlow experiment URL if available
            if completed_job.results.mlflow_experiment_url:
                print(f"\n  MLFlow Experiment: {completed_job.results.mlflow_experiment_url}")
        else:
            print(f"\n✗ No results available yet")
        
        # Show benchmark execution status
        if completed_job.status and completed_job.status.benchmarks:
            print(f"\n✓ Benchmark Execution Status:")
            for bench in completed_job.status.benchmarks:
                print(f"  - {bench.id}: {bench.state}")
                if bench.message:
                    print(f"    Message: {bench.message.message}")
    
    except NameError:
        print(f"✗ Error: submitted_job_id not found. Please run the job submission cell first!")
    except TimeoutError as e:
        print(f"✗ Job did not complete within timeout: {e}")
    except Exception as e:
        print(f"✗ Error: {e}")

TLS verification disabled - skipping CA bundle detection
TLS verification disabled (insecure mode)


Waiting for job 30184068-b829-4827-8446-ee3a4359f62d to complete...
✓ Job completed with state: completed

✓ Evaluation Results:
  Total evaluations: 0
  Completed: 0
  Failed: 0

✓ Benchmark Results:

  Benchmark: arc_easy (Provider: )
    Metrics:
      - acc: 0.4
      - acc_norm: 0.6
      - acc_norm_stderr: 0.24494897427831783
      - acc_stderr: 0.24494897427831783
    Artifacts: 3 items

✓ Benchmark Execution Status:
  - arc_easy: completed


In [6]:
import asyncio

async def async_example():
    """Demonstrate async client usage."""
    async with AsyncEvalHubClient(
        base_url=os.getenv("EVALHUB_URL"),
        auth_token=os.getenv("EVALHUB_TOKEN"),
        insecure=True,
    ) as client:
        try:
            # Async health check - same method name!
            health = await client.health()
            print(f"✓ Async health check: {health['status']}")
            
            # Async provider list - same method name!
            providers = await client.providers.list()
            print(f"✓ Found {len(providers)} providers (async)")
            
            # Async benchmark list - same method name!
            benchmarks = await client.benchmarks.list(provider_id="lighteval")
            print(f"✓ Found {len(benchmarks)} lighteval benchmarks (async)")
        
        except Exception as e:
            print(f"✗ Async operation failed: {e}")

# Run the async example
await async_example()

TLS verification disabled - skipping CA bundle detection
TLS verification disabled (insecure mode)


✓ Async health check: healthy
✓ Found 5 providers (async)
✓ Found 23 lighteval benchmarks (async)


## Summary: Authentication Methods for OpenShift

The SDK supports three authentication methods:

### 1. ServiceAccount Token (For Pods)
Use `auth_token_path="/var/run/secrets/kubernetes.io/serviceaccount/token"` when running inside OpenShift pods.

### 2. User Token (For Development)
Get token from `oc whoami -t` and pass as `auth_token=token` for local development.

### 3. Environment Variable (Production - Used in Examples Above)
Store token in environment variable and pass as `auth_token=os.getenv("EVALHUB_TOKEN")`.

**Example shown above:**
```python
with SyncEvalHubClient(
    base_url=os.getenv("EVALHUB_URL"),
    auth_token=os.getenv("EVALHUB_TOKEN"),
    insecure=True,  # or ca_bundle_path="/path/to/ca.crt"
) as client:
    providers = client.providers.list()
```

All three methods work identically - choose based on your deployment environment.

## Retry Logic and Error Handling

The client includes automatic retry logic with exponential backoff for handling transient failures:

- **Retried errors**: Connection timeouts, server errors (5xx), rate limits (429)
- **Retry strategy**: Exponential backoff with jitter
- **Default retries**: Up to 3 attempts with increasing delays
- **Timeout**: Configurable per-request timeout (default: 30 seconds)

The retry logic is built into the HTTP client and handles:
- Network connection failures
- Temporary server unavailability
- Rate limiting responses
- Gateway timeouts

You don't need to configure anything - it works automatically!

## Advanced: Custom Timeout and SSL Configuration

Configure custom timeouts or SSL certificate validation.

In [7]:
# Example: Custom timeout and SSL configuration
with SyncEvalHubClient(
    base_url=os.getenv("EVALHUB_URL"),
    auth_token=os.getenv("EVALHUB_TOKEN"),
    
    # Option 1: Disable SSL verification (development only)
    insecure=True,
    
    # Option 2: Provide CA bundle path (production)
    # ca_bundle_path="/path/to/ca-bundle.crt",
    
    # Custom timeout (in seconds)
    timeout=60.0,
) as client:
    try:
        providers = client.providers.list()
        print(f"✓ Connected with custom configuration")
        print(f"  Found {len(providers)} providers")
    except Exception as e:
        print(f"✗ Error: {e}")

TLS verification disabled - skipping CA bundle detection
TLS verification disabled (insecure mode)


✓ Connected with custom configuration
  Found 5 providers
