# Generating global explanations of LLM-as-a-Judge using GloVE algorithm

This notebook shows how you might run the full pipeline and generate a global summary given a dataset and an LLM-as-a-Judge.

In [1]:
import os
from risk_policy_distillation.datasets.prompt_response_dataset import (
    PromptResponseDataset,
)
from risk_policy_distillation.datasets.abs_dataset import AbstractDataset
from risk_policy_distillation.models.explainers.local_explainers.lime import LIME
from risk_policy_distillation.models.explainers.local_explainers.shap_vals import SHAP
from risk_policy_distillation.models.guardians.guardian import Guardian
from risk_policy_distillation.pipeline.clusterer import Clusterer
from risk_policy_distillation.pipeline.concept_extractor import Extractor
from risk_policy_distillation.pipeline.pipeline import Pipeline

# use AI atlas nexus for the inference tasks
from ai_atlas_nexus.blocks.inference import (
    InferenceEngine,
    RITSInferenceEngine,
    WMLInferenceEngine,
    OllamaInferenceEngine,
    VLLMInferenceEngine,
)
from ai_atlas_nexus.blocks.inference.params import (
    InferenceEngineCredentials,
    RITSInferenceEngineParams,
    WMLInferenceEngineParams,
    OllamaInferenceEngineParams,
    VLLMInferenceEngineParams,
)

from ai_atlas_nexus.library import AIAtlasNexus

from datasets import load_dataset
from pathlib import Path
from typing import Literal

  from .autonotebook import tqdm as notebook_tqdm


INFO 12-02 11:16:29 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.


[2025-12-02 11:16:30] INFO loader.py:156: Loading faiss.
[2025-12-02 11:16:30] INFO loader.py:158: Successfully loaded faiss.


## Task
Explain output from an LLM-as-a-Judge. 

## Create a dataset
To explain the LLM-as-a-Judge, a dataset must be provided. [AbstractDataset](../../src/risk_policy_distillation/datasets/abs_dataset.py) class provides a wrapper for a dataframe you want to explain. You can use [PromptDataset](../../src/risk_policy_distillation/datasets/prompt_dataset.py) or [PromptResponseDataset](../../src/risk_policy_distillation/datasets/prompt_response_dataset.py) depending on whether your dataframe consists of only prompts or prompt-response pairs. You can also create a custom dataset by inheriting the Dataset class.

### Setup dataset configuration
In the cell below, an example of configuration with information on column name mapping is shown. 

A small sample from the dataset [PKU-Alignment/BeaverTails](https://github.com/PKU-Alignment/beavertails) is chosen to illustrate the example.  BeaverTails was developed to support research on safety alignment in large language models (LLMs), and consists of 300k+ human-labeled question-answering (QA) pairs, each associated with specific harm categories.  This sample and configuration are used to create a PromptResponseDataset.

Additional parameters: 

_flip_labels_ indicates whether labels of the dataframe should be flipped in preprocessing step (e.g. for BeaverTails where labels indicate that the content is safe rather than harmful); 

_split_ indicates whether a train-val-test split needs to be performed during preprocessing.


In [2]:
# get the dataset
ds = load_dataset("PKU-Alignment/BeaverTails")
dataframe = ds.data["330k_train"].table.to_pandas()

# sample for testing purposes
dataframe = dataframe.sample(3)

# provide some information in the data config
data_config = {
    "dataset_name": "BeaverTails",
    "index_col": "",  # if there is no index column it will be automatically populated
    "prompt_col": "prompt",
    "response_col": "response",
    "label_col": "is_safe",
}

dataset = PromptResponseDataset(dataframe=dataframe, config=data_config)

### Guardian model configuration
Policy Distillation uses [Granite Guardian](https://www.ibm.com/granite/docs/models/guardian) as a LLM-as-a-Judge, and other Large Language Models (LLMs) to query at various stages in the pipeline. Therefore, it requires access to inference or call the models.

In the cell below, an example of configuration for the guardian model has been provided. 


In [3]:
# Guardian model configuration
guardian_config = {
    "task": f"Harm detection",
    "criterion": f"Harm",
    "criterion_definition": f"The message has harmful by common-sense.",
    "labels": [0, 1],
    "label_names": ["harmless", "harmful"],
    "output_labels": ["no", "yes"],
}

**Available Inference Engines**: WML, Ollama, vLLM, RITS. Please follow the [Inference APIs](https://github.com/IBM/risk-atlas-nexus?tab=readme-ov-file#install-for-inference-apis) guide before going ahead.

_Note:_ RITS is intended solely for internal IBM use and requires TUNNELALL VPN for access.

Uncomment the section that is relevant for your use.

In [4]:
#############

# WML

# guardian_judge = WMLInferenceEngine(
#     model_name_or_path="ibm/granite-guardian-3-8b",
#     credentials={
#         "api_key": os.environ["WML_API_KEY"],
#         "api_url": os.environ["WML_API_URL"],
#         "project_id": os.environ["WML_PROJECT_ID"],
#     },
#     parameters=WMLInferenceEngineParams(logprobs=True, top_logprobs=10, temperature=0),
# )

# llm_component = WMLInferenceEngine(
#     model_name_or_path="meta-llama/llama-3-3-70b-instruct", 
#     credentials={
#         "api_key": os.environ["WML_API_KEY"],
#         "api_url": os.environ["WML_API_URL"],
#         "project_id": os.environ["WML_PROJECT_ID"],
#     },
# )

#############

# VLLM

# To run vLLM on an OpenAI-Compatible vLLM Server, execute the command:
# vllm serve ibm-granite/granite-guardian-3.3-8b --max_model_len 2048 --host localhost --port 8000 --api-key <YOUR KEY>
# vllm serve meta-llama/llama-3-3-70b-instruct --max_model_len 2048 --host localhost --port 8000 --api-key <YOUR KEY>

# guardian_judge = VLLMInferenceEngine(
#     model_name_or_path="ibm-granite/granite-guardian-3.3-8b",
#     credentials=InferenceEngineCredentials(
#         api_url=os.environ["VLLM_API_URL"], api_key=os.environ["VLLM_API_KEY"]
#     ),
#     parameters=VLLMInferenceEngineParams(logprobs=True, temperature=0),
# )
# llm_component = VLLMInferenceEngine(
#     model_name_or_path="meta-llama/Llama-3.3-70B-Instruct", # gated model
#     credentials=InferenceEngineCredentials(
#     api_url=os.environ["VLLM_API_URL_LLM"], api_key=os.environ["VLLM_API_KEY_LLM"]
# ),
# )

#############

#RITS (IBM Internal Only, VPN required)

guardian_judge = RITSInferenceEngine(
    model_name_or_path="ibm-granite/granite-guardian-3.3-8b",
    credentials={
        "api_key": os.environ["RITS_API_KEY"],
        "api_url": os.environ["RITS_API_URL"],
    },
    parameters=RITSInferenceEngineParams(
        logprobs=True, top_logprobs=10, temperature=0.0
    ),
)

llm_component = RITSInferenceEngine(
    model_name_or_path="meta-llama/llama-3-3-70b-instruct",
    credentials={
        "api_key": os.environ["RITS_API_KEY"],
        "api_url": os.environ["RITS_API_URL"],
    },
)


[2025-12-02 11:16:35:450] - INFO - AIAtlasNexus - Created RITS inference engine.
[2025-12-02 11:16:35:869] - INFO - AIAtlasNexus - Created RITS inference engine.


### Create and run the explanation generation pipeline

The pipeline streamlines local and global explanation generation process. The Extractor executes the CLoVE algorithm and generates a set of local explanations, and Clusterer executes GloVE algorithm and merges the local explanations into a global one.

Pass `lime=False` to pipeline creation step if no local word-based verification is done. Similarly, use `fr=False` if FactReasoner is not used to verify global explanations.

The resulting local and global explanations are saved in the path folder passed to the pipeline.run() call.

The execution logs can be found in the logs folder.


In [5]:
def generate_policy_rules(guardian_config, dataset: AbstractDataset,guardian_judge: InferenceEngine,llm_component: InferenceEngine,local_expl: Literal["LIME", "SHAP"] = "LIME",results_path: Path = Path("results")):
    """Generate the policy rules.

    Args:
        guardian_config (Dict): guardian config,
        dataset (AbstractDataset): Dataset to be used for running the pipeline,
        guardian_judge (InferenceEngine): An LLM inference engine instance of the Granite Guardian,
        llm_component (InferenceEngine): An LLM inference engine instance for all steps of the policy distillation pipeline
        local_expl (Literal[&quot;LIME&quot;, &quot;SHAP&quot;], optional): local explanation model -- only LIME and SHAP are supported. Defaults to "LIME".
        results_path (Path, optional): Output directory path. Defaults to Path("results").

    Returns:
        List: A list of policy rules
    """

    # Create an instance of the guardian model
    guardian = Guardian(
        inference_engine=guardian_judge,
        config=guardian_config,
    )

    # local explanation model -- only LIME and SHAP are supported
    if local_expl == "LIME":
        local_explainer = LIME(
            dataset.dataset_name, guardian_config["label_names"], n_samples=100
        )
    elif local_expl == "SHAP":
        local_explainer = SHAP(
            dataset.dataset_name, guardian_config["label_names"], n_samples=100
        )
    else:
        raise ValueError("Only LIME and SHAP are supported")

    # Create pipeline
    pipeline = Pipeline(
        extractor=Extractor(
            guardian,
            llm_component,
            guardian_config["criterion"],
            guardian_config["criterion_definition"],
            local_explainer,
        ),
        clusterer=Clusterer(
            llm_component,
            guardian_config["criterion_definition"],
            guardian_config["label_names"],
            n_iter=10,
        ),
        lime=True,
        fr=True,
        verbose=False
    )

    # Run pipeline
    expl = pipeline.run(dataset, results_path=results_path)
    return expl


In [6]:
# generate the rules
expl = generate_policy_rules(
    guardian_config, dataset, guardian_judge, llm_component, local_expl="LIME"
)

#print the output
expl.print()

[2025-12-02 11:16:35] INFO pipeline.py:50: Built pipeline.
[2025-12-02 11:16:35] INFO pipeline.py:51: Using LIME = True
[2025-12-02 11:16:35] INFO pipeline.py:52: Using FactReasoner = True
[2025-12-02 11:16:35] INFO pipeline.py:69: Results directory for BeaverTails: [results/BeaverTails/local/local_expl.csv, results/BeaverTails/global/global_expl.pkl]
[2025-12-02 11:16:35] INFO concept_extractor.py:71: Generating local explanations...
Inferring with RITS: 100%|██████████| 1/1 [00:00<00:00,  2.46it/s]
Inferring with RITS: 100%|██████████| 100/100 [00:05<00:00, 19.10it/s]
Inferring with RITS: 100%|██████████| 1/1 [00:01<00:00,  1.35s/it]
Inferring with RITS: 100%|██████████| 1/1 [00:00<00:00,  1.70it/s]
Inferring with RITS: 100%|██████████| 1/1 [00:00<00:00,  1.96it/s]
Inferring with RITS: 100%|██████████| 1/1 [00:00<00:00,  1.99it/s]
Inferring with RITS: 100%|██████████| 1/1 [00:00<00:00,  1.99it/s]
Inferring with RITS: 100%|██████████| 1/1 [00:01<00:00,  1.12s/it]
Inferring with RITS: 

RITS = True Model = llama-3.1-8b-instruct
[NLIExtractor] Using LLM on RITS: llama-3.1-8b-instruct
[NLIExtractor] Prompt version: v1
[NLIExtractor] Prompts created: 2


[92m11:18:21 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:21] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
[92m11:18:21 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:21] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
NLI: 100%|██████████| 2/2 [00:00<00:00, 40920.04prompts/s]
Inferring with RITS: 100%|██████████| 1/1 [00:00<00:00,  2.37it/s]
[92m11:18:21 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:21] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[92m11:18:21 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:21] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; prov

RITS = True Model = llama-3.1-8b-instruct
[NLIExtractor] Using LLM on RITS: llama-3.1-8b-instruct
[NLIExtractor] Prompt version: v1
[NLIExtractor] Prompts created: 2


[92m11:18:21 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:21] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
[92m11:18:21 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:21] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
NLI: 100%|██████████| 2/2 [00:00<00:00, 49932.19prompts/s]
Inferring with RITS: 100%|██████████| 1/1 [00:00<00:00,  2.05it/s]
[92m11:18:22 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:22] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[92m11:18:22 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:22] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; prov

RITS = True Model = llama-3.1-8b-instruct
[NLIExtractor] Using LLM on RITS: llama-3.1-8b-instruct
[NLIExtractor] Prompt version: v1
[NLIExtractor] Prompts created: 2


[92m11:18:22 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:22] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
[92m11:18:22 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:22] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
NLI: 100%|██████████| 2/2 [00:00<00:00, 77672.30prompts/s]
Inferring with RITS: 100%|██████████| 1/1 [00:00<00:00,  2.11it/s]
[92m11:18:22 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:22] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[92m11:18:22 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:22] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; prov

RITS = True Model = llama-3.1-8b-instruct
[NLIExtractor] Using LLM on RITS: llama-3.1-8b-instruct
[NLIExtractor] Prompt version: v1
[NLIExtractor] Prompts created: 2


[92m11:18:23 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:23] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
NLI: 100%|██████████| 2/2 [00:00<00:00, 42581.77prompts/s]
Inferring with RITS: 100%|██████████| 1/1 [00:00<00:00,  2.11it/s]
[92m11:18:23 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:23] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[92m11:18:23 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:23] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai


RITS = True Model = llama-3.1-8b-instruct
[NLIExtractor] Using LLM on RITS: llama-3.1-8b-instruct
[NLIExtractor] Prompt version: v1
[NLIExtractor] Prompts created: 2


[92m11:18:23 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:23] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
[92m11:18:23 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:23] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
NLI: 100%|██████████| 2/2 [00:00<00:00, 66576.25prompts/s]
Inferring with RITS: 100%|██████████| 1/1 [00:00<00:00,  1.80it/s]
[92m11:18:24 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:24] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[92m11:18:24 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:24] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; prov

RITS = True Model = llama-3.1-8b-instruct
[NLIExtractor] Using LLM on RITS: llama-3.1-8b-instruct
[NLIExtractor] Prompt version: v1
[NLIExtractor] Prompts created: 2


[92m11:18:24 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:24] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
[92m11:18:24 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:24] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
NLI: 100%|██████████| 2/2 [00:00<00:00, 74235.47prompts/s]
Inferring with RITS: 100%|██████████| 1/1 [00:00<00:00,  2.02it/s]
[92m11:18:25 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:25] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[92m11:18:25 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:25] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; prov

RITS = True Model = llama-3.1-8b-instruct
[NLIExtractor] Using LLM on RITS: llama-3.1-8b-instruct
[NLIExtractor] Prompt version: v1
[NLIExtractor] Prompts created: 2


NLI: 100%|██████████| 2/2 [00:00<00:00, 64527.75prompts/s]
Inferring with RITS: 100%|██████████| 1/1 [00:00<00:00,  1.85it/s]
[92m11:18:25 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:25] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[92m11:18:25 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:25] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[92m11:18:26 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:26] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
[92m11:18:26 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:26] INFO utils.py:1308: Wrapper: Completed Call, calling su

RITS = True Model = llama-3.1-8b-instruct
[NLIExtractor] Using LLM on RITS: llama-3.1-8b-instruct
[NLIExtractor] Prompt version: v1
[NLIExtractor] Prompts created: 2


NLI: 100%|██████████| 2/2 [00:00<00:00, 55924.05prompts/s]
Inferring with RITS: 100%|██████████| 1/1 [00:00<00:00,  1.43it/s]
[92m11:18:26 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:26] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[92m11:18:26 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:26] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai


RITS = True Model = llama-3.1-8b-instruct
[NLIExtractor] Using LLM on RITS: llama-3.1-8b-instruct
[NLIExtractor] Prompt version: v1
[NLIExtractor] Prompts created: 2


[92m11:18:27 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:27] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
[92m11:18:27 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:27] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
NLI: 100%|██████████| 2/2 [00:00<00:00, 48770.98prompts/s]
Inferring with RITS: 100%|██████████| 1/1 [00:00<00:00,  1.97it/s]
[92m11:18:27 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:27] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[92m11:18:27 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:27] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; prov

RITS = True Model = llama-3.1-8b-instruct
[NLIExtractor] Using LLM on RITS: llama-3.1-8b-instruct
[NLIExtractor] Prompt version: v1
[NLIExtractor] Prompts created: 2


[92m11:18:27 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:27] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
[92m11:18:27 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:27] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
NLI: 100%|██████████| 2/2 [00:00<00:00, 53773.13prompts/s]
[2025-12-02 11:18:27] INFO clusterer.py:42: Clustering 4 instances
[2025-12-02 11:18:27] INFO SentenceTransformer.py:219: Use pytorch device_name: mps
[2025-12-02 11:18:27] INFO SentenceTransformer.py:227: Load pretrained SentenceTransformer: all-MiniLM-L6-v2
Batches: 100%|██████████| 1/1 [00:00<00:00, 10.07it/s]
[2025-12-02 11:18:30] INFO clusterer.py:69: Cleaned up 0 clusters with 0 total concepts
[2025-12-02 11:18:30] INFO clusterer.py:42: Clustering 3 instances
[2025-12-02 11:18:30] INFO SentenceTransformer.py:219: Use pytorch device_name: mps
[2025-1

RITS = True Model = llama-3.1-8b-instruct
[NLIExtractor] Using LLM on RITS: llama-3.1-8b-instruct
[NLIExtractor] Prompt version: v1
[NLIExtractor] Prompts created: 2


[92m11:18:44 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:44] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
[92m11:18:44 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:44] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
NLI: 100%|██████████| 2/2 [00:00<00:00, 57065.36prompts/s]
Inferring with RITS: 100%|██████████| 1/1 [00:00<00:00,  2.60it/s]
[92m11:18:45 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:45] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[92m11:18:45 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:45] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; prov

RITS = True Model = llama-3.1-8b-instruct
[NLIExtractor] Using LLM on RITS: llama-3.1-8b-instruct
[NLIExtractor] Prompt version: v1
[NLIExtractor] Prompts created: 2


[92m11:18:45 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:45] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
[92m11:18:45 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:45] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
NLI: 100%|██████████| 2/2 [00:00<00:00, 66576.25prompts/s]
Inferring with RITS: 100%|██████████| 1/1 [00:00<00:00,  2.25it/s]
[92m11:18:45 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:45] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[92m11:18:45 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:45] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; prov

RITS = True Model = llama-3.1-8b-instruct
[NLIExtractor] Using LLM on RITS: llama-3.1-8b-instruct
[NLIExtractor] Prompt version: v1
[NLIExtractor] Prompts created: 2


NLI: 100%|██████████| 2/2 [00:00<00:00, 51150.05prompts/s]
Inferring with RITS: 100%|██████████| 1/1 [00:00<00:00,  1.99it/s]
[92m11:18:46 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:46] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[92m11:18:46 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:46] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[92m11:18:46 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:46] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
[92m11:18:46 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:46] INFO utils.py:1308: Wrapper: Completed Call, calling su

RITS = True Model = llama-3.1-8b-instruct
[NLIExtractor] Using LLM on RITS: llama-3.1-8b-instruct
[NLIExtractor] Prompt version: v1
[NLIExtractor] Prompts created: 2


NLI: 100%|██████████| 2/2 [00:00<00:00, 53773.13prompts/s]
Inferring with RITS: 100%|██████████| 1/1 [00:00<00:00,  2.33it/s]
[92m11:18:47 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:47] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[92m11:18:47 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:47] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[92m11:18:47 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:47] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
[92m11:18:47 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:47] INFO utils.py:1308: Wrapper: Completed Call, calling su

RITS = True Model = llama-3.1-8b-instruct
[NLIExtractor] Using LLM on RITS: llama-3.1-8b-instruct
[NLIExtractor] Prompt version: v1
[NLIExtractor] Prompts created: 2


NLI: 100%|██████████| 2/2 [00:00<00:00, 54120.05prompts/s]
Inferring with RITS: 100%|██████████| 1/1 [00:00<00:00,  2.27it/s]
[92m11:18:47 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:47] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[92m11:18:47 - LiteLLM:INFO[0m: utils.py:3419 - 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[2025-12-02 11:18:47] INFO utils.py:3419: 
LiteLLM completion() model= meta-llama/Llama-3.1-8B-Instruct; provider = openai
[92m11:18:47 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:47] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler


RITS = True Model = llama-3.1-8b-instruct
[NLIExtractor] Using LLM on RITS: llama-3.1-8b-instruct
[NLIExtractor] Prompt version: v1
[NLIExtractor] Prompts created: 2


[92m11:18:47 - LiteLLM:INFO[0m: utils.py:1308 - Wrapper: Completed Call, calling success_handler
[2025-12-02 11:18:47] INFO utils.py:1308: Wrapper: Completed Call, calling success_handler
NLI: 100%|██████████| 2/2 [00:00<00:00, 43464.29prompts/s]
[2025-12-02 11:18:47] INFO bipartite_graph.py:119: 
			Merging 2 nodes on 1 side.
[2025-12-02 11:18:47] INFO bipartite_graph.py:134: 		Added a node: id = label = 12, probability = uses disrespectful language, num of subnodes = 1.0
[2025-12-02 11:18:47] INFO global_expl.py:34: Loaded 2 rules
[2025-12-02 11:18:47] INFO global_expl.py:134: Stored global explanation to results/BeaverTails/global/global_expl.pkl


{'rules': [{'prediction': 1,
   'if_clause': 'uses disrespectful language',
   'despite_clauses': 'none'},
  {'prediction': 1,
   'if_clause': 'derogatory terms used',
   'despite_clauses': 'none'}]}