# EvalHub Client SDK Usage Examples

This notebook demonstrates how to use the EvalHub client SDK for interacting with the EvalHub evaluation service.

The SDK provides separate client classes for async and sync operations:
- `AsyncEvalHubClient` - For asynchronous operations (recommended for I/O-bound workloads)
- `SyncEvalHubClient` - For synchronous operations

Both use a **nested resource structure** similar to Llama Stack Client:
- `client.providers` - Provider operations
- `client.benchmarks` - Benchmark operations
- `client.collections` - Collection operations
- `client.jobs` - Evaluation job operations

## Setup: Import Required Modules

In [None]:
from evalhub import (
    AsyncEvalHubClient,
    EvaluationRequest,
    ModelConfig,
    SyncEvalHubClient,
)
import asyncio

## Example 1: Synchronous Client Usage

Basic usage with the synchronous client. This is the simplest way to get started.

In [None]:
# Create synchronous client (defaults to http://localhost:8080)
with SyncEvalHubClient() as client:
    # Check health
    try:
        health = client.health()
        print(f"✓ EvalHub is healthy: {health}")
    except Exception as e:
        print(f"✗ Failed to connect to local EvalHub: {e}")

    # List available providers using nested resource
    try:
        providers = client.providers.list()
        print(f"\n✓ Found {len(providers)} providers")
        for provider in providers[:3]:  # Show first 3
            print(f"  - {provider.id}")
    except Exception as e:
        print(f"✗ Failed to list providers: {e}")

    # List available benchmarks using nested resource
    try:
        benchmarks = client.benchmarks.list(category="math")
        print(f"\n✓ Found {len(benchmarks)} math benchmarks")
        for benchmark in benchmarks[:3]:  # Show first 3
            print(f"  - {benchmark.benchmark_id}: {benchmark.name}")
    except Exception as e:
        print(f"✗ Failed to list benchmarks: {e}")

## Example 2: Remote EvalHub Connection

Connect to a remote EvalHub instance with authentication.

In [None]:
# Create client for remote instance with authentication
with SyncEvalHubClient(
    base_url="https://evalhub.example.com",
    auth_token="your-api-token-here",
    timeout=60.0,
) as client:
    print("✓ Remote client created (connection would be tested on first API call)")

## Example 3: Submit an Evaluation Job

Submit an evaluation job and check its status.

In [None]:
with SyncEvalHubClient() as client:
    # Create evaluation request
    # Using a vLLM endpoint deployed on OpenShift
    request = EvaluationRequest(
        benchmark_id="gsm8k",
        model=ModelConfig(
            url="http://vllm-service.my-namespace.svc.cluster.local:8000/v1",
            name="meta-llama/Llama-2-7b-chat-hf",
        ),
        num_few_shot=5,
        experiment_name="GSM8K Evaluation",
        tags={"environment": "dev", "version": "v1"},
    )

    try:
        # Submit job using nested resource
        job = client.jobs.submit(request)
        print(f"✓ Job submitted: {job.id}")
        print(f"  Status: {job.status}")

        # Check status using nested resource
        updated_job = client.jobs.get(job.id)
        print(f"✓ Job status updated: {updated_job.status}")

        # Wait for completion (polling)
        # Uncomment to wait for job completion:
        # final_job = client.jobs.wait_for_completion(job.id, timeout=300)
        # if final_job.status == "completed":
        #     # Results are embedded in the job resource
        #     print(f"✓ Job completed with results")

    except NotImplementedError:
        print("✗ Job submission not yet implemented (skeleton only)")
    except Exception as e:
        print(f"✗ Failed to submit job: {e}")

## Example 4: Async Client Usage

Using the asynchronous client for better performance with I/O-bound operations.

**Note:** Same method names as sync - just await them!

In [None]:
async def async_example():
    """Demonstrate async client usage."""
    async with AsyncEvalHubClient() as client:
        try:
            # Async health check - same method name!
            health = await client.health()
            print(f"✓ Async health check: {health}")

            # Async provider list - same method name!
            providers = await client.providers.list()
            print(f"✓ Found {len(providers)} providers (async)")

            # Async job submission - same method name!
            # Using a vLLM endpoint deployed on OpenShift
            request = EvaluationRequest(
                benchmark_id="mmlu",
                model=ModelConfig(
                    url="http://vllm-service.my-namespace.svc.cluster.local:8000/v1",
                    name="meta-llama/Llama-2-7b-chat-hf",
                ),
            )
            # Uncomment to submit job:
            # job = await client.jobs.submit(request)
            # print(f"✓ Async job submitted: {job.id}")
            
            # You can also wait for completion asynchronously
            # final_job = await client.jobs.wait_for_completion(job.id, timeout=300)
            # if final_job.status == "completed":
            #     # Results are embedded in the job resource
            #     print(f"✓ Job completed with results")

        except NotImplementedError:
            print("✗ Some async operations not yet implemented (skeleton only)")
        except Exception as e:
            print(f"✗ Async operation failed: {e}")

# Run the async example
try:
    await async_example()
except Exception as e:
    print(f"✗ Failed to run async example: {e}")

## Example 5: Client Class Comparison

### Sync vs Async - Same method names!

**Synchronous (SyncEvalHubClient):**
```python
with SyncEvalHubClient() as client:
    providers = client.list()           # No await needed
    job = client.submit(request)        # No await needed
    status = client.get_job(job_id)     # No await needed
```

**Asynchronous (AsyncEvalHubClient):**
```python
async with AsyncEvalHubClient() as client:
    providers = await client.list()     # Await needed
    job = await client.submit(request)  # Await needed
    status = await client.get_job(job_id)  # Await needed
```

## Additional Features

### List Collections

Collections are pre-defined sets of benchmarks for specific use cases.

In [None]:
with SyncEvalHubClient() as client:
    try:
        collections = client.collections.list()
        print(f"Available collections: {len(collections)}")
        for collection in collections:
            print(f"  - {collection.name}")
    except Exception as e:
        print(f"Failed to list collections: {e}")

### Filter Benchmarks by Provider

### Configure Retry Logic with Exponential Backoff

The client includes automatic retry logic with exponential backoff for handling transient failures (timeouts, server errors, connection issues).

In [None]:
# Default retry configuration (retries up to 3 times with exponential backoff)
with SyncEvalHubClient() as client:
    print("✓ Default retry: up to 3 retries with 1s → 2s → 4s delays")

# Custom retry configuration for unreliable networks
with SyncEvalHubClient(
    max_retries=5,                      # Retry up to 5 times
    retry_initial_delay=2.0,            # Start with 2 second delay
    retry_backoff_factor=2.0,           # Double delay each time (2s → 4s → 8s)
    retry_max_delay=60.0,               # Cap delays at 60 seconds
    retry_randomization=True,           # Add random delay
) as client:
    print("✓ Custom retry: up to 5 retries with exponential backoff")
    
# Fast-fail for interactive applications
with SyncEvalHubClient(
    max_retries=1,                      # Only retry once
    retry_initial_delay=0.5,            # Quick retry
    timeout=10.0,                       # Short timeout
) as client:
    print("✓ Fast-fail: minimal retries for quick feedback")

# Disable retries (not recommended for production)
with SyncEvalHubClient(max_retries=0) as client:
    print("✓ No retries: fail immediately on any error")

In [None]:
with SyncEvalHubClient() as client:
    try:
        # Get benchmarks from a specific provider
        benchmarks = client.benchmarks.list(
            provider_id="lm_evaluation_harness",
            limit=5
        )
        print(f"Found {len(benchmarks)} benchmarks")
        for benchmark in benchmarks:
            print(f"  - {benchmark.benchmark_id}: {benchmark.name}")
    except Exception as e:
        print(f"Failed to filter benchmarks: {e}")