# TrustyAI CLI Usage Demo

This notebook shows how to use the TrustyAI CLI for various AI safety operations including model evaluation, explanations, and fairness metrics.

## Prerequisites

Make sure you have installed TrustyAI with CLI support:

```bash
pip install .
```

For full evaluation support:

```bash
pip install .[eval]
```

Or for all features:

```bash
pip install .[all]
```

The CLI tool `trustyai` should be available after installation.


In [1]:
# List available evaluation providers (should show "lm-eval-harness" instead of separate providers)
!trustyai eval list-providers


[3m                         Available Evaluation Providers                         [0m
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃[1m [0m[1mProvider Name  [0m[1m [0m┃[1m [0m[1mDescription                [0m[1m [0m┃[1m [0m[1mLocal Mode[0m[1m [0m┃[1m [0m[1mKubernetes Mode[0m[1m [0m┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│[36m [0m[36mlm-eval-harness[0m[36m [0m│[32m [0m[32mLM Evaluation Harness for  [0m[32m [0m│[33m [0m[33m✓         [0m[33m [0m│[33m [0m[33m✓              [0m[33m [0m│
│[36m                 [0m│[32m [0m[32mlanguage model evaluation. [0m[32m [0m│[33m            [0m│[33m                 [0m│
│[36m                 [0m│[32m [0m[32mAutomatically delegates to [0m[32m [0m│[33m            [0m│[33m                 [0m│
│[36m                 [0m│[32m [0m[32mlocal or Kubernetes        [0m[32m [0m│[33m            [0m│[33m 

In [2]:
# Example: Local evaluation with the unified provider name
!trustyai eval execute \
  --provider lm-eval-harness \
  --execution-mode local \
  --model "google/flan-t5-base" \
  --tasks "arc_easy" \
  --limit 3 \
  --dry-run


Provider: lm-eval-harness
Execution mode: local
Model: google/flan-t5-base
Tasks: arc_easy
Limit: 3

--- Dry Run Mode - Validation Only ---
✅ Configuration validated successfully
Use without --dry-run to execute the evaluation


## 1. Basic CLI Information

Let's start by exploring the basic CLI functionality and getting help information.


In [3]:
# Display the main CLI help
!trustyai --help


Usage: trustyai [OPTIONS] COMMAND [ARGS]...

  TrustyAI CLI tool for Trustworthy AI operations.

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  eval     Model evaluation commands.
  info     Display information about TrustyAI.
  metrics  Fairness and performance metrics commands.
  model    Model management commands.


In [4]:
# Check the version
!trustyai --version


trustyai, version 2.0.0a1


In [5]:
# Get general information about TrustyAI
!trustyai info


TrustyAI SDK version 2.0.0a1
A Python SDK for trustworthy AI


In [6]:
# Get verbose information including available providers
!trustyai info --verbose


TrustyAI SDK version 2.0.0a1
A Python SDK for trustworthy AI

Additional Information:
  - Python package for explainable and fair AI
  - Supports model explanations and fairness metrics
[3m   Available Providers    [0m
┏━━━━━━━━━━━━━━━━━┳━━━━━━┓
┃[1m [0m[1mProvider Name  [0m[1m [0m┃[1m [0m[1mType[0m[1m [0m┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━┩
│[36m [0m[36mlm-eval-harness[0m[36m [0m│[32m [0m[32meval[0m[32m [0m│
└─────────────────┴──────┘


## 2. Model Evaluation Commands

The CLI provides comprehensive model evaluation functionality with support for multiple providers and deployment modes.


In [7]:
# Get help for evaluation commands
!trustyai eval --help


Usage: trustyai eval [OPTIONS] COMMAND [ARGS]...

  Model evaluation commands.

Options:
  --help  Show this message and exit.

Commands:
  execute         Execute model evaluation with specified provider and...
  list-datasets   List available evaluation datasets for a provider.
  list-metrics    List available evaluation metrics for a provider.
  list-providers  List available evaluation providers.


In [8]:
# List available evaluation providers
!trustyai eval list-providers


[3m                         Available Evaluation Providers                         [0m
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃[1m [0m[1mProvider Name  [0m[1m [0m┃[1m [0m[1mDescription                [0m[1m [0m┃[1m [0m[1mLocal Mode[0m[1m [0m┃[1m [0m[1mKubernetes Mode[0m[1m [0m┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│[36m [0m[36mlm-eval-harness[0m[36m [0m│[32m [0m[32mLM Evaluation Harness for  [0m[32m [0m│[33m [0m[33m✓         [0m[33m [0m│[33m [0m[33m✓              [0m[33m [0m│
│[36m                 [0m│[32m [0m[32mlanguage model evaluation. [0m[32m [0m│[33m            [0m│[33m                 [0m│
│[36m                 [0m│[32m [0m[32mAutomatically delegates to [0m[32m [0m│[33m            [0m│[33m                 [0m│
│[36m                 [0m│[32m [0m[32mlocal or Kubernetes        [0m[32m [0m│[33m            [0m│[33m 

In [9]:
# List available datasets for a specific provider
!trustyai eval list-datasets --provider lm-evaluation-harness


No evaluation provider available.
Try installing optional dependencies: pip install trustyai[eval]


In [13]:
# List available metrics for a specific provider
!trustyai eval list-metrics --provider lm-eval-harness


Error: LMEvalProvider.__init__() takes 1 positional argument but 2 were given


### Unified Evaluation Command

The `trustyai eval execute` command is the main evaluation command that supports both local and Kubernetes execution modes.


In [11]:
# Get help for the execute command
!trustyai eval execute --help


Usage: trustyai eval execute [OPTIONS]

  Execute model evaluation with specified provider and execution mode.

  This unified command supports both local and Kubernetes execution modes.

  Examples:   # Local execution   trustyai eval execute --provider lm-eval-
  harness --execution-mode local \     --model "hf/microsoft/DialoGPT-medium"
  --tasks "hellaswag,arc_easy" --limit 10      # Kubernetes execution
  trustyai eval execute --provider lm-eval-harness --execution-mode kubernetes
  \     --model "hf/microsoft/DialoGPT-medium" --tasks "hellaswag,arc_easy" \
  --namespace trustyai-eval --cpu 4 --memory 8Gi        # RAGAS evaluation
  with external dataset   trustyai eval execute --provider ragas --execution-
  mode local \     --model "openai/gpt-4" --tasks
  "faithfulness,answer_relevancy" \     --dataset "data/rag_evaluation.json"

Options:
  -p, --provider TEXT             Name of the evaluation provider  [required]
  --execution-mode [local|kubernetes]
                         

### Example 1: Local Evaluation with LM-Evaluation-Harness

This example shows how to run a local evaluation using the LM-Evaluation-Harness provider.


In [14]:
# Example: Local evaluation with a small model and limited examples
!trustyai eval execute \
  --provider lm-eval-harness \
  --execution-mode local \
  --model "google/flan-t5-base" \
  --tasks "arc_easy,hellaswag" \
  --limit 5 \
  --batch-size 1 \
  --output local_eval_results.json \
  --format json


Provider: lm-eval-harness
Execution mode: local
Model: google/flan-t5-base
Tasks: arc_easy, hellaswag
Limit: 5
Batch size: 1

🚀 Starting evaluation...
[DEBUG - _parse_args_to_config] Args=1: has namespace? False
Using device: cuda for model evaluation
2025-06-28:00:09:55 INFO     [models.huggingface:137] Using device 'cuda'
2025-06-28:00:09:56 INFO     [models.huggingface:382] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda'}
2025-06-28:00:09:59 INFO     [evaluator:189] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2025-06-28:00:09:59 INFO     [evaluator:243] Using pre-initialized model
2025-06-28:00:10:08 INFO     [api.task:434] Building contexts for hellaswag on rank 0...
100%|███████████████████████████████████████████| 5/5 [00:00<00:00, 2215.69it/s]
2025-06-28:00:10:08 INFO     [api.task:434] Building contexts for arc_easy on rank 0...
100%|███████████████

### Example 2: Dry Run Mode

Use the `--dry-run` flag to validate your configuration without actually running the evaluation.


In [20]:
# Dry run to validate configuration
!trustyai eval execute \
  --provider lm-evaluation-harness \
  --execution-mode local \
  --model "microsoft/DialoGPT-medium" \
  --tasks "hellaswag,arc_easy" \
  --limit 10 \
  --dry-run


Error: Evaluation provider 'lm-evaluation-harness' not found.
Try installing optional dependencies: pip install trustyai[eval]
Use 'trustyai eval list-providers' to see available providers.


### Example 3: Kubernetes Deployment

This example shows how to deploy an evaluation job to Kubernetes.

In [15]:
# Example: Kubernetes evaluation deployment
!trustyai eval execute \
  --provider lm-eval-harness \
  --execution-mode kubernetes \
  --model "microsoft/DialoGPT-medium" \
  --tasks "hellaswag,arc_easy" \
  --namespace trustyai-eval \
  --cpu 4 \
  --memory 8Gi \
  --limit 50 \
  --dry-run


Provider: lm-eval-harness
Execution mode: kubernetes
Model: microsoft/DialoGPT-medium
Tasks: hellaswag, arc_easy
Limit: 50
Namespace: trustyai-eval
CPU: 4
Memory: 8Gi

--- Dry Run Mode - Validation Only ---
✅ Configuration validated successfully
Use without --dry-run to execute the evaluation


### Example 4: Using Additional Parameters

You can pass additional provider-specific parameters using the `--parameters` flag with JSON.

In [17]:
# Example: Using additional parameters
!trustyai eval execute \
  --provider lm-eval-harness \
  --execution-mode local \
  --model "google/flan-t5-base" \
  --tasks "arc_easy" \
  --limit 3 \
  --parameters '{"num_fewshot": 0, "device": "cpu", "use_cache": false}' \
  --dry-run


Provider: lm-eval-harness
Execution mode: local
Model: google/flan-t5-base
Tasks: arc_easy
Limit: 3

--- Dry Run Mode - Validation Only ---
✅ Configuration validated successfully
Use without --dry-run to execute the evaluation


### Example 5: CSV Output Format

You can save results in CSV format for easier analysis.


In [18]:
# Example: Save results in CSV format
!trustyai eval execute \
  --provider lm-eval-harness \
  --execution-mode local \
  --model "google/flan-t5-base" \
  --tasks "arc_easy" \
  --limit 3 \
  --output eval_results.csv \
  --format csv \
  --dry-run


Provider: lm-eval-harness
Execution mode: local
Model: google/flan-t5-base
Tasks: arc_easy
Limit: 3

--- Dry Run Mode - Validation Only ---
✅ Configuration validated successfully
Use without --dry-run to execute the evaluation


In [24]:
# First, validate with dry run
trustyai eval execute --provider lm-evaluation-harness --model "your-model" --tasks "task1,task2" --dry-run

# Then run the actual evaluation
trustyai eval execute --provider lm-evaluation-harness --model "your-model" --tasks "task1,task2"


SyntaxError: invalid syntax (274547920.py, line 2)

In [20]:
# Test with small limit first
!trustyai eval execute --provider lm-eval-harness --model "new-model" --tasks "hellaswag" --limit 5

# Scale up after confirming it works
!trustyai eval execute --provider lm-eval-harness --model "new-model" --tasks "hellaswag" --limit 100


Provider: lm-eval-harness
Execution mode: local
Model: new-model
Tasks: hellaswag
Limit: 5

🚀 Starting evaluation...
[DEBUG - _parse_args_to_config] Args=1: has namespace? False
Using device: cuda for model evaluation
2025-06-28:00:13:15 INFO     [models.huggingface:137] Using device 'cuda'
Error: Evaluation failed: new-model is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`
Provider: lm-eval-harness
Execution mode: local
Model: new-model
Tasks: hellaswag
Limit: 100

🚀 Starting evaluation...
[DEBUG - _parse_args_to_config] Args=1: has namespace? False
Using device: cuda for model evaluation
2025-06-28:00:13:18 INFO     [models.huggingface:137] Using device 'cuda'
Error: Evaluation failed: new-model is not a local folder and is not a valid model identifier listed on

### 3. Use Kubernetes for Long-Running Evaluations

For production evaluations or large-scale tasks, use Kubernetes mode:


In [None]:
# For large evaluations, use Kubernetes
trustyai eval execute \
  --provider lm-evaluation-harness \
  --execution-mode kubernetes \
  --model "large-model" \
  --tasks "comprehensive_task_suite" \
  --namespace trustyai-production \
  --cpu 8 \
  --memory 32Gi \
  --watch  # Monitor progress
