# Scoring Review: Retrieval Augmented Generation (RAG)

In this notebook, we will demonstrate how reviewers can use RAG to perform their reviews by referencing appropriate external context per item.

## Setting up the notebook

High-level configs

In [1]:
%reload_ext autoreload
%autoreload 2

from dotenv import load_dotenv

# Load environment variables from .env file. Adjust the path to the .env file as needed.
load_dotenv(dotenv_path='../.env')

# Enable asyncio in Jupyter
import asyncio
import nest_asyncio

nest_asyncio.apply()

#  Add the package to the path (required if you are running this notebook from the examples folder)
import sys
sys.path.append('../../')


Import required packages

In [8]:
import json
import numpy as np
from openai import AsyncOpenAI
import pandas as pd
from pydantic import BaseModel
from tqdm.auto import tqdm

from lattereview.providers import OpenAIProvider
from lattereview.providers import LiteLLMProvider
from lattereview.agents import ScoringReviewer
from lattereview.workflows import ReviewWorkflow

## Data

Building five example stories and dummy question-answering pairs from each story:

In [3]:
class BuildStoryOutput(BaseModel):
    story: str
    questions: list[str]
    answers: list[bool]

async def build_story():
    prompt = """
    Write a one-paragraph story with whatever realistic or imaginary theme you like,  
    then create three TRUE/FALSE questions based on your story. 
    Ensure that only readers of your story can determine whether the statements are true or false. 
    Do not reveal the answers to your questions.
    Return your story, a Python list of three questions, and another Python list of three boolean responses to the questions as your output.
    """
    provider = OpenAIProvider(model="gpt-4o", response_format_class=BuildStoryOutput)
    return await provider.get_json_response(prompt, temperature=0.9)

def run_build_story():
    response =  asyncio.run(build_story())[0]
    return response

data = {
    "question": [],
    "answer": [],
    "story": []
}
for i in tqdm(range(5)):
    out = json.loads(run_build_story())
    for j in range(3):
        data["question"].append(out["questions"][j])
        data["answer"].append(out["answers"][j])
        data["story"].append(out["story"])


data = pd.DataFrame(data)
data.to_csv("data.csv", index=False)
data

  0%|          | 0/5 [00:00<?, ?it/s]

Unnamed: 0,question,answer,story
0,Zylka sang a melody that made the flowers bloom.,True,"In a hidden valley between misty mountains, th..."
1,Mira was a young boy who found Zylka.,False,"In a hidden valley between misty mountains, th..."
2,Zylka had the wings of a butterfly.,True,"In a hidden valley between misty mountains, th..."
3,Did Elsbeth create an enchanted device to capt...,False,In a mystical village hidden deep within the W...
4,Did the villagers declare a new holiday becaus...,True,In a mystical village hidden deep within the W...
5,Was the enchanted device made by Elsbeth a mag...,True,In a mystical village hidden deep within the W...
6,The story takes place in the village of Nymros.,True,"In the enchanted village of Nymros, there was ..."
7,Master Turtelini was a wise old tortoise.,False,"In the enchanted village of Nymros, there was ..."
8,The mysterious traveler wanted a Dream Loaf th...,False,"In the enchanted village of Nymros, there was ..."
9,The Luminafox is known for its enchanting abil...,False,"In the heart of the Misty Forest, a peculiar c..."


Embedding the stories to build a vector base:

In [4]:
data = pd.read_csv("data.csv")

async def get_embedding(text):
    client = AsyncOpenAI()
    if isinstance(text, str):
        text = [text]
    text = [x.replace("\n", " ") for x in text]         
    out = await client.embeddings.create(
        model="text-embedding-ada-002",
        input=text,
        encoding_format="float"
    )
    out = [np.array(x.embedding) for x in out.data]
    return out if len(out) > 1 else out[0]

stories = {story: None for story in set(data["story"].tolist())}

# Create async tasks for all embeddings
async def process_embeddings():
    tasks = [get_embedding(story) for story in stories.keys()]
    embeddings = await asyncio.gather(*tasks)
    return list(zip(embeddings, stories.keys()))

# Run the async code and get results
vector_story_pairs = await process_embeddings()
vector_base = np.array([x[0] for x in vector_story_pairs])
vector_base

array([[ 0.01816475, -0.0308483 ,  0.00664629, ...,  0.02177917,
        -0.00440217, -0.01711882],
       [ 0.02523177, -0.00680952, -0.01306099, ..., -0.00648731,
        -0.00514866, -0.02701221],
       [ 0.00867207, -0.00122565, -0.00118308, ..., -0.00458821,
         0.00842317, -0.02482412],
       [ 0.02891313,  0.00376639, -0.01399321, ..., -0.00439191,
         0.00073102, -0.03680335],
       [ 0.00784938, -0.01177733, -0.00066349, ...,  0.02286303,
        -0.01522897, -0.03685228]])

## Retrieval
Here, we will define a simple retrieval function that takes an input text (in our case, a statement) as input, embeds it, and finds the reference text (in our case, a story) with the highest cosine similarity to the embedding of that statement. The output of this function can then be used by the reviewer agent as additional context for each item it reviews.

In [5]:
async def find_relevant_story(statement):
    s_embeddings = await get_embedding(statement)
    dot_product = np.dot(vector_base, s_embeddings)
    base_norms = np.linalg.norm(vector_base, axis=1)
    query_norm = np.linalg.norm(s_embeddings)
    cosine_similarities = dot_product / (base_norms * query_norm)
    retrieved_index = np.argmax(cosine_similarities)
    retrieved_story = vector_story_pairs[retrieved_index][1]
    return retrieved_story

input_index = 11
statement = data.iloc[input_index]["question"]
retrieved_story = await find_relevant_story(statement)

print(f"=== The question was chosen from row {input_index} ===\n{statement}")
print(f"=== The related story to the question ===\n{data.iloc[input_index]['story']}")
print(f"=== The retrieved Story ===\n{retrieved_story}")

=== The question was chosen from row 11 ===
Many villagers have seen the den of the Luminafox.
=== The related story to the question ===
In the heart of the Misty Forest, a peculiar creature known as a Luminafox roamed the shadows. With fur like woven moonlight, it shimmered gently as it moved. The Luminafox was known to the villagers for its enchanting ability to guide lost travelers to safety, using the soft glow emanating from its tail. Many had sought its help during foggy nights, and tales of its kindness were whispered in the tavern by the hearth. However, few had ever seen its den, a place of legend said to be nestled beneath the roots of the oldest tree in the forest. It was there, beneath the sprawling branches, that the Luminafox found solace and where no other creature dared to venture.
=== The retrieved Story ===
In the heart of the Misty Forest, a peculiar creature known as a Luminafox roamed the shadows. With fur like woven moonlight, it shimmered gently as it moved. The 

## Scoring with Retrieval Augmented Generation

Here, we will create a simple reviewer agent that processes each input statement. The key argument we define for this reviewer is the `additional_context` argument. Note how we pass the `find_relevant_story` function to this `additional_context` argument, enabling the reviewer to apply this function to each input item. This allows the agent to retrieve the most relevant context for each statement before conducting its review. In our case, the reviewer will decide whether the statement is true or false based on the retrieved context, which, in this case, is the corresponding story.

In [6]:
reviewer = ScoringReviewer(
    provider=LiteLLMProvider(model="gpt-4o-mini"),
    name="reviewer",
    max_concurrent_requests=20, 
    backstory="A frequent book reader",
    input_description="TRUE/FALSE questions about stories",
    model_args={"max_tokens": 200, "temperature": 0.1},
    reasoning = "brief",
    scoring_task="Decide if the input statement is True or False given the provided story in the provided context",
    scoring_set=[1, 2],
    scoring_rules='Score 1 if the statement is TRUE and 2 if the statement is FALSE.',
    additional_context = find_relevant_story
)

review = ReviewWorkflow(
    workflow_schema=[
        {
            "round": 'A',
            "reviewers": [reviewer],
            "text_inputs": ["question"]
        }
    ]
)

updated_data = asyncio.run(review(data))
updated_data



Processing 15 eligible rows


['round: A', 'reviewer_name: reviewer'] -                     2024-12-31 22:10:38: 100%|██████████| 15/15 [00:02<00:00,  6.76it/s]

The following columns are present in the dataframe at the end of reviewer's reivew in round A: ['question', 'answer', 'story', 'round-A_reviewer_output', 'round-A_reviewer_reasoning', 'round-A_reviewer_score', 'round-A_reviewer_certainty']





Unnamed: 0,question,answer,story,round-A_reviewer_output,round-A_reviewer_reasoning,round-A_reviewer_score,round-A_reviewer_certainty
0,Zylka sang a melody that made the flowers bloom.,True,"In a hidden valley between misty mountains, th...",{'reasoning': 'The statement is TRUE because t...,The statement is TRUE because the story explic...,1,95
1,Mira was a young boy who found Zylka.,False,"In a hidden valley between misty mountains, th...",{'reasoning': 'The statement is false because ...,The statement is false because Mira is describ...,2,95
2,Zylka had the wings of a butterfly.,True,"In a hidden valley between misty mountains, th...",{'reasoning': 'The statement is true because t...,The statement is true because the provided sto...,1,100
3,Did Elsbeth create an enchanted device to capt...,False,In a mystical village hidden deep within the W...,{'reasoning': 'The statement is FALSE because ...,The statement is FALSE because Elsbeth created...,2,90
4,Did the villagers declare a new holiday becaus...,True,In a mystical village hidden deep within the W...,{'reasoning': 'The villagers did declare a new...,The villagers did declare a new holiday to cel...,1,95
5,Was the enchanted device made by Elsbeth a mag...,True,In a mystical village hidden deep within the W...,{'reasoning': 'The statement is TRUE because t...,The statement is TRUE because the story explic...,1,95
6,The story takes place in the village of Nymros.,True,"In the enchanted village of Nymros, there was ...",{'reasoning': 'The statement is TRUE because t...,The statement is TRUE because the story explic...,1,90
7,Master Turtelini was a wise old tortoise.,False,"In the enchanted village of Nymros, there was ...",{'reasoning': 'The statement 'Master Turtelini...,The statement 'Master Turtelini was a wise old...,1,90
8,The mysterious traveler wanted a Dream Loaf th...,False,"In the enchanted village of Nymros, there was ...",{'reasoning': 'The statement is FALSE because ...,The statement is FALSE because the traveler as...,2,90
9,The Luminafox is known for its enchanting abil...,False,"In the heart of the Misty Forest, a peculiar c...",{'reasoning': 'The statement claims that the L...,The statement claims that the Luminafox is kno...,2,90


Last but not least, you can check the memory of the agent for every item it has reviewed.

In [7]:
# An example review item (pass the index for a row in the input dataframe)

reviewer.memory[11]

{'system_prompt': "Your name is: <<reviewer>> Your backstory is: <<A frequent book reader>>. Your task is to review input itmes with the following description: <<TRUE/FALSE questions about stories>>. Your final output should have the following keys: reasoning (<class 'str'>), score (<class 'int'>), certainty (<class 'int'>).",
 'model_args': {'max_tokens': 200, 'temperature': 0.1},
 'input_prompt': '**Review the input item below and complete the scoring task as instructed:** --- **Input item:** <<Review Task ID: A-11 === question === Many villagers have seen the den of the Luminafox.>> **Scoring task:** <<Decide if the input statement is True or False given the provided story in the provided context>> --- **Instructions:** 1. **Score** the input item using only the values in this set: [1, 2]. 2. Follow these rules when determining your score: <<Score 1 if the statement is TRUE and 2 if the statement is FALSE.>>. 3. After assigning a score, report your certainty level as a value between