# Generating global explanations of LLM-as-a-Judge using GloVE algorithm

### Installation instructions 

1. All necessary packages are listed in the requirements.txt file. To install them:

```{bash}
conda create -n glove python=3.9
conda activate glove
pip install -r requirements.txt
```

2. Follow the instructions to install FactReasoner [here](../fm_factual/README.md)

3. Make sure to create a .env file at root level with the following contents:
```{bash}
RITS_API_KEY= ...
LOCAL_ROOT= {This is absolute path to this repository}
CACHE_DIR = {Anywhere you want cache files for Merlin}
MERLIN_PATH = {Anywhere you installed merlin earlier}
```

4. Merlin requires sudo privileges to run. If using jupyter lab make sure to start it in sudo mode:

```{bash}
sudo jupyter lab
```

In [1]:
import datetime
import json
import logging
import os
import pickle

import pandas as pd
from dotenv import load_dotenv

import sys
import os

In [2]:
# # Setup logging
# logger = logging.getLogger('logger')
# logger.setLevel(logging.INFO)
# fh = logging.FileHandler(f'../logs/{datetime.datetime.now().strftime("%m_%d__%H_%M")}.log')
# fh.setLevel(logging.INFO)
# logger.addHandler(fh)

In [3]:
# # This is for my jupyter to be able to see this repository, might not be needed for everyone
# load_dotenv()
# local_root = os.getenv('LOCAL_ROOT')
# sys.path.append(local_root)

In [4]:
from risk_policy_distillation.pipeline.clusterer import Clusterer
from risk_policy_distillation.pipeline.concept_extractor import Extractor
from risk_policy_distillation.evaluation.evaluate import Evaluator
from risk_policy_distillation.pipeline.pipeline import Pipeline
from risk_policy_distillation.models.explainers.local_explainers.lime import LIME
from risk_policy_distillation.utils.data_util import load_ds
from risk_policy_distillation.models.guardians.granite_guardian import GraniteGuardian
from risk_policy_distillation.models.guardians.granite_guardian_batch import GGRits
from risk_policy_distillation.llms.rits_component import RITSComponent
from risk_policy_distillation.datasets.prompt_dataset import PromptDataset



  from .autonotebook import tqdm as notebook_tqdm


INFO 09-26 12:22:30 [__init__.py:216] Automatically detected platform cpu.


### Create an LLM-as-a-Judge

To create a wrapper for your LLM-as-a-Judge create either a GraniteGuardian object or inherit the [Judge](../src/models/guardians/judge.py) class to create.
You also need to define the specific of the task, such as the criterion the LLM-as-a-Judge is using and label names.

In [7]:
# Define a guardian model and the task
guardian_config = {
    'task': 'harm detection',
    'criterion': 'harm', 
    'criterion_definition': 'Content is harmful by common-sense understanding',
    'labels': [0, 1],
    'label_names': ['harmless', 'harmful'],
    'output_labels': [0, 1]
}

guardian_model = 'ibm-granite/granite-guardian-3.2-5b'
guardian = GGRits(guardian_model, guardian_config)

TypeError: RITSBatchModel.__init__() missing 1 required positional argument: 'output_labels'

### Create a dataset

To explain the LLM-as-a-Judge you need to provide a dataset. [AbstractDataset](../src/models/datasets/abs_dataset.py) class provides a wrapper for a dataframe you want to explain. You can use [PromptDataset](../src/models/datasets/prompt_dataset.py) or [PromptResponseDataset](../src/models/datasets/prompt_response_dataset.py) depending on whether your dataframe consists of only prompts or prompt-response pairs. You can also create a custom dataset by inheriting the Dataset class. 

You have to provide a config with information on column name mapping. Additional parameters: *flip_labels* indicates whether labels of the dataframe should be flipped in preprocessing step (e.g. for BeaverTails where labels indicate that the content is safe rather than harmful); *split* indicates whether a train-val-test split needs to be performed during preprocessing.

In [None]:
# Creating a test dataset
dataframe = pd.DataFrame([[0, 'How can I kill some time?', 0], 
                          [1, 'How can I shoot my shot?', 0], 
                          [2, 'Can can I bang my head to a song?', 1]], 
                        columns = ['Index', 'prompt', 'label'])

data_config = {'general':{
                   'dataset_name': 'test'
                },
               'data': {
                   'index_col': 'Index', # index column name
                   'prompt_col': 'prompt', # prompt column name
                   'label_col': 'label', # true label name
                   'flip_labels': False, # whether to flip labels (e.g. BeaverTails has is_safe instead of is_harmful)
               }, 
               'split': {
                   'split': False # whether to split dataset into train, val, test
               }}

# Wrap the dataframe 
dataset = PromptDataset(data_config, dataframe)

### Define components

Next we need to define how to access the LLM-based components. You can use a [RITSComponent](../src/models/components/llms/rits_component.py) or [OllamaComponent](../src/models/components/llms/ollama_component.py) wrappers for querying an LLM. You just need to pass the name of the LLM. Otherwise, you can create a custom LLM wrapper by inheriting [LLMComponent](../src/models/components/llms/llm_component.py) class. 

You can also define a local word-based explainer component which is used by the CloVE algorithm. At the moment, you can use LIME or create custom word-based explainer by inheriting [LocalExplainer](../src/models/local_explainers/local_explainer.py) class.

In [None]:
# This is an LLM that is used for generating concepts and labels 
llm_component = RITSComponent('llama-3-3-70b-instruct', 'meta-llama/llama-3-3-70b-instruct')
local_explainer = LIME(data_config['general']['dataset_name'], guardian_config['label_names'], n_samples=100)

### Create and run the explanation generation pipeline

Pipeline streamlines local and global explanation generation process. Extractor executes the CLoVE algorithm and generates a set of local explanations, and Clusterer executes GloVE algorithm and merges the local explanations into a global one. 

Pass ```lime=False``` to pipeline creation step if no local word-based verification is done. SImilarly, use ```fr=False``` if FactReasoner is not used to verify global explanations.

The resulting local and global explanations are saved in the path folder passed to the pipeline.run() call. 
The execution logs can be found in the logs folder.

In [None]:
pipeline = Pipeline(extractor = Extractor(guardian, llm_component, guardian_config['criterion'], guardian_config['criterion_definition'], local_explainer),
                    clusterer = Clusterer(llm_component, guardian_config['criterion_definition'], guardian_config['label_names'], n_iter=10),
                    lime=True, 
                    fr=True)

In [None]:
expl = pipeline.run(dataset, 
                    path='../results/')

### Printing the global explanation

In [None]:
# Printing the rules
for i, argument in enumerate(expl.rules):
    decision = guardian.label_names[expl.predictions[i]]
    rule = '{} IF {}'.format(decision, argument)

    if expl.despites[i] != 'none':
        rule += ' DESPITE '

        indent = ' ' * (len(rule))
        for d in expl.despites[i]:
            rule += '{}\n'.format(d) + indent
    else:
        rule += '\n'

    print(rule)