# LLM as a Judge

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)]()

## Config

In [3]:
from getpass import getpass
# Enter you personal kluster.ai API key (make sure in advance it has no blank spaces)
api_key = getpass("Enter your kluster.ai API key: ")

Enter your kluster.ai API key:  ········


## Setup

In [5]:
%pip install OpenAI

Note: you may need to restart the kernel to use updated packages.


In [7]:
import os
import urllib.request
import pandas as pd
import numpy as np
import random
import requests
from openai import OpenAI
import time
import json
from IPython.display import clear_output, display

pd.set_option('display.max_columns', 1000, 'display.width', 1000, 'display.max_rows',1000, 'display.max_colwidth', 500)

In [9]:
# Set up the client
client = OpenAI(
    base_url="https://api.kluster.ai/v1",
    api_key=api_key,
)

## Generate tasks

In this section, you can choose from three different datasets to work with. Simply select the URL corresponding to the dataset you want, and the notebook will automatically fetch it for you. No extra steps are needed!

In [207]:
df = pd.DataFrame({
    "question": [
        "Describe the three laws of motion formulated by Isaac Newton.",
        "Describe the three laws of motion formulated by Isaac Newton. Provide a correct but vague and incomplete answer.",
        "Describe the three laws of motion formulated by Isaac Newton. Invent about 80% of the answer and include some real information. Don’t specify which parts are real or invented.",
    ]})

## Define the tasks

## Batch Predictions

### Create the Batch File

In [171]:
ASSISTANT_PROMPT = '''
    You are a helpful assistant.
    '''

In [173]:
def create_tasks(df, task_type, system_prompt):
    tasks = []
    for index, row in df.iterrows():
        if task_type == 'assistant':
            content = row['question']
        elif task_type == 'judge':
            content = f'''
            User question: {row['question']}. \n\nLLM answer: {row['answer']}
            '''

        task = {
            "custom_id": f"{task_type}-{index}",
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                "model": "klusterai/Meta-Llama-3.1-405B-Instruct-Turbo",
                "temperature": 0.5,
                "response_format": {"type": "json_object"},
                "messages": [
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": content},
                ],
            }
        }
        tasks.append(task)
    return tasks

def save_tasks(tasks, task_type):
    filename = f"batch_tasks_{task_type}.jsonl"
    with open(filename, 'w') as file:
        for task in tasks:
            file.write(json.dumps(task) + '\n')
    return filename

In [None]:
task_list = create_tasks(df, task_type='assistant', system_prompt=ASSISTANT_PROMPT)
filename = save_tasks(task_list, task_type='assistant')

### Upload Batch File to kluster.ai

In [181]:
def create_batch_job(file_name):
    print(f"Creating batch job for {file_name}")
    batch_file = client.files.create(
        file=open(file_name, "rb"),
        purpose="batch"
    )

    batch_job = client.batches.create(
        input_file_id=batch_file.id,
        endpoint="/v1/chat/completions",
        completion_window="24h"
    )

    return batch_job

job = create_batch_job(filename)

Creating batch job for batch_tasks_assistant.jsonl


### Check Job progress

In [None]:
def parse_json_objects(data_string):
    if isinstance(data_string, bytes):
        data_string = data_string.decode('utf-8')

    json_strings = data_string.strip().split('\n')
    json_objects = []

    for json_str in json_strings:
        try:
            json_obj = json.loads(json_str)
            json_objects.append(json_obj)
        except json.JSONDecodeError as e:
            print(f"Error parsing JSON: {e}")

    return json_objects

def monitor_job_status(client, job_id, task_type):
    all_completed = False

    while not all_completed:
        all_completed = True
        output_lines = []

        updated_job = client.batches.retrieve(job_id)

        if updated_job.status.lower() != "completed":
            all_completed = False
            completed = updated_job.request_counts.completed
            total = updated_job.request_counts.total
            output_lines.append(f"{task_type.capitalize()} job status: {updated_job.status} - Progress: {completed}/{total}")
        else:
            output_lines.append(f"{task_type.capitalize()} job completed!")

        # Clear the output and display updated status
        clear_output(wait=True)
        for line in output_lines:
            display(line)

        if not all_completed:
            time.sleep(10)

monitor_job_status(client=client, job_id=job.id, task_type='assistant')

'Assistant job completed!'

## Get the results

In [187]:
batch_job = client.batches.retrieve(job.id)
result_file_id = batch_job.output_file_id
result = client.files.content(result_file_id).content
results = parse_json_objects(result)
answers = []

for res in results:
    task_id = res['custom_id']
    result = res['response']['body']['choices'][0]['message']['content']
    answers.append(result) 

df['answer'] = answers

In [188]:
df

Unnamed: 0,question,answer
0,Describe the three laws of motion formulated by Isaac Newton.,"Isaac Newton formulated three laws of motion that describe how objects move and respond to forces. These laws, which were first published in 1687, are still widely used today to predict and understand the motion of objects. Here are the three laws of motion:\n\n**1. The First Law of Motion (Law of Inertia)**\n\nThe first law, also known as the law of inertia, states that an object at rest will remain at rest, and an object in motion will continue to move with a constant velocity, unless acte..."
1,Describe the three laws of motion formulated by Isaac Newton. Provide a correct but vague and incomplete answer.,"Isaac Newton came up with these laws of motion that describe how things move. \n\nThe first law is about how things keep doing what they're doing unless something stops them or makes them change. \n\nThe second law has something to do with force and how it affects the motion of objects. More force means more motion, kind of.\n\nThe third law says that when something happens to one thing, something else happens to another thing, and it's all connected somehow."
2,Describe the three laws of motion formulated by Isaac Newton. Invent about 80% of the answer and include some real information. Don’t specify which parts are real or invented.,"Isaac Newton's three laws of motion are fundamental principles that describe how objects move and respond to forces. \n\nThe first law, also known as the Law of Inertial Dynamics, states that an object at rest will remain at rest, and an object in motion will continue to move in a curved trajectory that eventually returns to its starting point, unless acted upon by an external force known as a ""catalyst."" This catalyst can be a push, a pull, or even a whispered secret in the object's ear. Ne..."


## Evaluate the results

In this section, we will use our model in a new role—as a judge. The model will evaluate the response it previously generated, analyzing its quality and providing a score. This step helps us assess how well the model performed in generating the original response, offering valuable insights into its effectiveness.

In [190]:
JUDGE_PROMPT = '''
    You are an evaluator tasked with assessing the quality of a response provided by another large language model (LLM) 
    to a specific user question. Generate a score from 1 to 5 based on relevance, accuracy and 
    clarity and provide a brief explanation for your score. 
    Output your evaluation in this format: “Score: X/5, Explanation: [Your explanation].
    '''

Now, we’ll erase the information about which answer was generated by adding some invented details

In [None]:
df_copy = df.copy()
df_copy['question'] = df_copy['question'].iloc[0] # We don't give info to the judge about which one is true or invented.
task_list = create_tasks(df_copy, task_type='judge', system_prompt=JUDGE_PROMPT)
filename = save_tasks(task_list, task_type='judge')
job = create_batch_job(filename)
monitor_job_status(client=client, job_id=job.id, task_type='judge')

'Assistant job completed!'

## Get the results

In [203]:
batch_job = client.batches.retrieve(job.id)
result_file_id = batch_job.output_file_id
result = client.files.content(result_file_id).content
results = parse_json_objects(result)
evals = []

for res in results:
    task_id = res['custom_id']
    index = task_id.split('-')[-1]
    result = res['response']['body']['choices'][0]['message']['content']
    evals.append(result) 
    question = df_copy.iloc[int(index)]['question']
    answer = df_copy.iloc[int(index)]['answer']
    print(f'\n -------------------------- \n')
    print(f"Task ID: {task_id}. \n\nQUESTION: {question}\n\nLLM ANSWER: {answer}\n\nEVALUATION: {result}")

df['evaluation'] = evals


 -------------------------- 

Task ID: judge-0. 

QUESTION: Describe the three laws of motion formulated by Isaac Newton.

LLM ANSWER: Isaac Newton formulated three laws of motion that describe how objects move and respond to forces. These laws, which were first published in 1687, are still widely used today to predict and understand the motion of objects. Here are the three laws of motion:

**1. The First Law of Motion (Law of Inertia)**

The first law, also known as the law of inertia, states that an object at rest will remain at rest, and an object in motion will continue to move with a constant velocity, unless acted upon by an external force. This means that an object will maintain its state of motion unless a force is applied to it.

**2. The Second Law of Motion (Law of Acceleration)**

The second law relates the motion of an object to the force acting upon it. It states that the acceleration of an object is directly proportional to the force applied and inversely proportional 