# Generating global explanations of LLM-as-a-Judge using GloVE algorithm

In [1]:
import datetime
import json
import logging
import os
import pickle

import pandas as pd
from dotenv import load_dotenv

import sys
import os

In [2]:
load_dotenv()

True

In [3]:
os.getenv("RITS_API_KEY")

'891df336c60320a10a9dffd5ca2869ad'

### Create an LLM-as-a-Judge

To create a wrapper for your LLM-as-a-Judge create either a GraniteGuardian object or inherit the [Judge](../src/models/guardians/judge.py) class to create.
You also need to define the specific of the task, such as the criterion the LLM-as-a-Judge is using and label names.

### Create a dataset

To explain the LLM-as-a-Judge you need to provide a dataset. [AbstractDataset](../src/models/datasets/abs_dataset.py) class provides a wrapper for a dataframe you want to explain. You can use [PromptDataset](../src/models/datasets/prompt_dataset.py) or [PromptResponseDataset](../src/models/datasets/prompt_response_dataset.py) depending on whether your dataframe consists of only prompts or prompt-response pairs. You can also create a custom dataset by inheriting the Dataset class. 

You have to provide a config with information on column name mapping. Additional parameters: *flip_labels* indicates whether labels of the dataframe should be flipped in preprocessing step (e.g. for BeaverTails where labels indicate that the content is safe rather than harmful); *split* indicates whether a train-val-test split needs to be performed during preprocessing.

In [4]:
# Creating a test dataset
bt_config = {
    "general": {
      "location": "PKU-Alignment/BeaverTails",
      "dataset_name": "BeaverTails"
    },
    "data": {
      "type": "prompt_response",
      "index_col": "",
      "prompt_col": "prompt",
      "response_col": "response",
      "label_col": "is_safe",
      "flip_labels": True,
      "category_label": "category_simple"
    },
   "split": {
      "split": False,
      "sample_ratio": 0.0001,
      "subset": "330k_train"
   }
  }



### Define components

Next we need to define how to access the LLM-based components. You can use a [RITSComponent](../src/models/components/llms/rits_component.py) or [OllamaComponent](../src/models/components/llms/ollama_component.py) wrappers for querying an LLM. You just need to pass the name of the LLM. Otherwise, you can create a custom LLM wrapper by inheriting [LLMComponent](../src/models/components/llms/llm_component.py) class. 

You can also define a local word-based explainer component which is used by the CloVE algorithm. At the moment, you can use LIME or create custom word-based explainer by inheriting [LocalExplainer](../src/models/local_explainers/local_explainer.py) class.

### Create and run the explanation generation pipeline

Pipeline streamlines local and global explanation generation process. Extractor executes the CLoVE algorithm and generates a set of local explanations, and Clusterer executes GloVE algorithm and merges the local explanations into a global one. 

Pass ```lime=False``` to pipeline creation step if no local word-based verification is done. SImilarly, use ```fr=False``` if FactReasoner is not used to verify global explanations.

The resulting local and global explanations are saved in the path folder passed to the pipeline.run() call. 
The execution logs can be found in the logs folder.

In [5]:
import sys
extra_path = "/Users/jasmina/Documents/Work/ran" # whatever individual directory it is
if extra_path not in sys.path:
    sys.path.append(extra_path)

In [6]:
from risk_atlas_nexus.library import RiskAtlasNexus

  from tqdm.autonotebook import tqdm


INFO 10-22 13:46:45 [__init__.py:216] Automatically detected platform cpu.


In [7]:
risk_atlas_nexus = RiskAtlasNexus()

[2025-10-22 13:46:47:674] - INFO - RiskAtlasNexus - Created RiskAtlasNexus instance. Base_dir: None


In [8]:
expl = risk_atlas_nexus.generate_policy_rules(task="harm detection", label_names=[ "harmless", "harmful"], dataset_config=bt_config)

INFO:logger:Loaded dataset with 300567 instances.)
INFO:logger:Built pipeline.
INFO:logger:Using LIME = True
INFO:logger:Using FactReasoner = True
INFO:logger:Loaded concepts from ../results/BeaverTails/local/expl.csv
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cds["Zipped"] = cds.apply(
INFO:logger:Loaded the following graph:
	Labels = dict_keys([0, 1]) Sizes = [7, 2] Number of edges = 2
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cds["Zipped"] = cds.apply(
INFO:logger:Loaded the following graph:
	Labels = dict_keys([0, 1]) Sizes = [7, 4] Number of ed

Path:  ../results/
../results/BeaverTails/local/expl.csv ../results/BeaverTails/global/global_expl.pkl
../results/BeaverTails/local/expl.csv


Could not cache non-existence of file. Will ignore error and continue. Error: [Errno 13] Permission denied: '/Users/jasmina/.cache/huggingface/hub/models--sentence-transformers--all-MiniLM-L6-v2/.no_exist/c9745ed1d9f207416be6d2e6f8de32d1f16199bf/adapter_config.json'
ERROR:huggingface_hub.file_download:Could not cache non-existence of file. Will ignore error and continue. Error: [Errno 13] Permission denied: '/Users/jasmina/.cache/huggingface/hub/models--sentence-transformers--all-MiniLM-L6-v2/.no_exist/c9745ed1d9f207416be6d2e6f8de32d1f16199bf/adapter_config.json'
INFO:logger:Cleaned up 1 clusters with 2 total concepts
INFO:logger:['neutral response', 'neutral response']
INFO:logger:
			Merging 2 nodes on 0 side.
INFO:logger:		Added a node: id = label = 7, probability = neutral response, num of subnodes = 1.0
INFO:logger:Clustering 4 instances
Could not cache non-existence of file. Will ignore error and continue. Error: [Errno 13] Permission denied: '/Users/jasmina/.cache/huggingface/hu

[AtomReviser] Using LLM on RITS: llama-3.1-8b-instruct
[AtomReviser] Using prompt version: v1
RITS = True Model = llama-3.1-8b-instruct
[NLIExtractor] Using LLM on RITS: llama-3.1-8b-instruct
[NLIExtractor] Prompt version: v1
[FactReasoner] Using merlin at: None
[FactReasoner] Using atom/context priors: False
[FactReasoner] Building the pipeline instance ...
[FactReasoner] Using text only contexts: True
[Building atoms ...]
Atom a0: The text was classified as harmless because it contains the following concept: Provides constructive feedback
[Atoms built: 1]
[Building contexts...]
[Contexts built: 4]
[FactReasoner] Found 2 unique contexts.
[Building atom-context relations...]
Using all contexts retrieved per atom.


AssertionError: NLI extractor must be NLIExtractor.

In [None]:
expl