* https://huggingface.co/learn/cookbook/en/rag_evaluation
* https://sap-my.sharepoint.com/:x:/r/personal/sabine_loss_sap_com/_layouts/15/Doc.aspx?sourcedoc=%7B2F78859D-06EF-413D-9E7A-250936C7B556%7D&file=GoldenDataSet_RAG.xlsx&action=default&mobileredirect=true

* use ollama models and huggingface embeddings (need 2 LLM models)
* use help docs from the first 10 rows of the golden dataset - see if can bsoup it else just use text, copy paste to txt file or sth
* setup generator critic llm according to tut
* generate q-a
* evaluate q-a and filter for good scores
* human evaluation

# setup llms, embedding_model, and process pdfs

In [4]:
# setup ollama model

from langchain_ollama import ChatOllama

llm_model_name = "llama3.1"

generator_llm = ChatOllama(
    model=llm_model_name,
    temperature=0 # increase temp for more creative answers
) 

critic_llm = ChatOllama(
    model=llm_model_name,
    temperature=0 # increase temp for more creative answers
) 

# test
response = generator_llm.invoke("what is pythagoras theorem")
print(response)

response = critic_llm.invoke("what is pythagoras theorem")
print(response)



content="Pythagoras' Theorem, also known as the Pythagorean Theorem, is a fundamental concept in geometry that describes the relationship between the lengths of the sides of a right-angled triangle. It states:\n\n**a² + b² = c²**\n\nwhere:\n\n* **a** and **b** are the lengths of the two sides (called legs) that form the right angle.\n* **c** is the length of the hypotenuse (the side opposite the right angle).\n\nIn other words, if you square the lengths of the two shorter sides of a right-angled triangle and add them together, the result is equal to the square of the length of the longest side (the hypotenuse).\n\nHere's an example:\n\nSuppose we have a right-angled triangle with one leg that's 3 inches long and another leg that's 4 inches long. Using Pythagoras' Theorem, we can calculate the length of the hypotenuse as follows:\n\n**a² + b² = c²**\n**(3)² + (4)² = c²**\n**9 + 16 = c²**\n**25 = c²**\n\nTo find **c**, we take the square root of both sides:\n\n**c = √25**\n**c = 5 inches

In [14]:
from langchain_community.embeddings import HuggingFaceEmbeddings

embedding_model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': False}

hf_embedding_model = HuggingFaceEmbeddings(
    model_name=embedding_model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

  hf_embedding_model = HuggingFaceEmbeddings(
  from tqdm.autonotebook import tqdm, trange


In [25]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from tqdm import tqdm

def load_pdfs(file_paths):
    """
    file_paths must end with .pdf
    PyPDFLoader auto splits the pdf into pages, each page is 1 Document object split by page number
    note that the splitting by page number is not perfect, the actual page number might be +/- 1-2pages.

    returns a dict of key: file_path and value: list of document objects
    """
    documents_dict = {}   
    for f in tqdm(file_paths):
        loader = PyPDFLoader(file_path = f)
        documents = loader.load()
        documents_dict[f] = documents
    return documents_dict


def chunk_list_of_documents(documents):
    """
    input a list of documents as Document objects

    output a list of chunks as Document objects
    """
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size = 500,
        chunk_overlap = 100, # using 20% is a good start
        length_function=len,
        is_separator_regex=False,
        add_start_index=True
    )

    chunks = text_splitter.split_documents(documents)    
    return chunks


In [29]:
file = "product-allocation.pdf"

documents_dict = load_pdfs([file])

100%|██████████| 1/1 [00:00<00:00,  4.16it/s]


In [35]:
documents_dict.keys() 
# values are the Document objects containing content of each page of the document, ~ 1 page per document object

dict_keys(['product-allocation.pdf'])

In [38]:
chunks = chunk_list_of_documents(documents=documents_dict['product-allocation.pdf'])

In [40]:
len(chunks)

31

In [42]:
chunks[0]



# setup prompt and llm for generator-llm

Now let’s generate our QA couples. For this example, we generate only 10 QA couples and will load the rest from the Hub.

But for your specific knowledge base, given that you want to get at least ~100 test samples, and accounting for the fact that we will filter out around half of these with our critique agents later on, you should generate much more, in the >200 samples.

In [78]:
# sample call for langchain_ollama

# Define the prompt template
sample_prompt = """
write a short story about this character. {name} has trait {trait} and lives in {place}.
keep to 40 words only.
"""

# Define the trait and place for the character
name = "cheeky_fella"
trait = "bravery"
place = "a small village in the mountains"

# Format the prompt with the trait and place
formatted_prompt = sample_prompt.format(name=name,trait=trait, place=place)
print(formatted_prompt)
print()

# Call the LLM with the formatted prompt
resp = generator_llm.invoke(
    input=formatted_prompt  # Pass the formatted prompt to the LLM
)

print(resp)


write a short story about this character. cheeky_fella has trait bravery and lives in a small village in the mountains.
keep to 40 words only.


content='In the mountain village of Brindlemark, Cheeky Fella stood tall, his bright smile a beacon of courage. When a fierce storm threatened to destroy the village, he rallied the townsfolk, leading them to safety with bravery and wit, earning their eternal gratitude.' response_metadata={'model': 'llama3.1', 'created_at': '2024-09-23T07:54:31.951553Z', 'message': {'role': 'assistant', 'content': ''}, 'done_reason': 'stop', 'done': True, 'total_duration': 2295223000, 'load_duration': 29363667, 'prompt_eval_count': 42, 'prompt_eval_duration': 185780000, 'eval_count': 57, 'eval_duration': 2078914000} id='run-b5f4155b-6237-4863-aa58-77d84e5f0960-0' usage_metadata={'input_tokens': 42, 'output_tokens': 57, 'total_tokens': 99}


In [82]:
resp

AIMessage(content='In the mountain village of Brindlemark, Cheeky Fella stood tall, his bright smile a beacon of courage. When a fierce storm threatened to destroy the village, he rallied the townsfolk, leading them to safety with bravery and wit, earning their eternal gratitude.', response_metadata={'model': 'llama3.1', 'created_at': '2024-09-23T07:54:31.951553Z', 'message': {'role': 'assistant', 'content': ''}, 'done_reason': 'stop', 'done': True, 'total_duration': 2295223000, 'load_duration': 29363667, 'prompt_eval_count': 42, 'prompt_eval_duration': 185780000, 'eval_count': 57, 'eval_duration': 2078914000}, id='run-b5f4155b-6237-4863-aa58-77d84e5f0960-0', usage_metadata={'input_tokens': 42, 'output_tokens': 57, 'total_tokens': 99})

In [84]:
resp.content

'In the mountain village of Brindlemark, Cheeky Fella stood tall, his bright smile a beacon of courage. When a fierce storm threatened to destroy the village, he rallied the townsfolk, leading them to safety with bravery and wit, earning their eternal gratitude.'

In [88]:
QA_generation_prompt = """
Your task is to write a factoid question and an answer given a context.
Your factoid question should be answerable with a specific, concise piece of factual information from the context.
Your factoid question should be formulated in the same style as questions users could ask in a search engine.
This means that your factoid question MUST NOT mention something like "according to the passage" or "context".
Keep your answer under 300 words.
Provide your answer as follows:

Output:::
Factoid question: (your factoid question)
Answer: (your answer to the factoid question)

Now here is the context.

Context: {context}\n
Output:::"""

In [96]:
# create function to call llm

def get_generated_qa(llm,prompt,context):
    """
    prompt must contain the input {context}
    """
    # add the context and format the prompt
    formatted_prompt = prompt.format(context=context)
    
    # Call the LLM with the formatted prompt
    resp = llm.invoke(
        input=formatted_prompt  # Pass the formatted prompt to the LLM
    )
    
    return resp

In [98]:
chunks[0]



In [113]:
import random

N_GENERATIONS = 10  

print(f"Generating {N_GENERATIONS} QA couples...")

outputs = []
for sampled_context in tqdm(random.sample(chunks, N_GENERATIONS)):

    # get QA couple
    qa_couple = get_generated_qa(generator_llm,QA_generation_prompt,sampled_context)

    # condition check if answer is too long
    try:
        question = qa_couple.content.split("Factoid question: ")[-1].split("Answer: ")[0]
        answer = qa_couple.content.split("Answer: ")[-1]
        assert len(answer) < 300, "Answer is too long"
        outputs.append(
            {
                "context": sampled_context.page_content,
                "question": question,
                "answer": answer,
                "source_doc": sampled_context.metadata["source"],
            }
        )
    except:
        continued

Generating 10 QA couples...


100%|██████████| 10/10 [00:16<00:00,  1.67s/it]


In [117]:
import pandas as pd

qna_df = pd.DataFrame(outputs)
qna_df

Unnamed: 0,context,question,answer,source_doc
0,allocation sequence and a given product alloca...,What is the technical name of a product alloca...,A_ProdAllocationSequence,product-allocation.pdf
1,9/23/2024\n2 This is custom documentation. For...,What is the technical name of the Product Allo...,API_PRODUCT_ALLOC_SEQUENCE_SRV,product-allocation.pdf
2,9/23/2024\n1 This is custom documentation. For...,What is the date when this documentation was g...,"September 23, 2024.",product-allocation.pdf
3,Accelerator Hub.\nService Structure\nEntities\...,"What is the necessity of the ""Product Allocati...",Mandatory.,product-allocation.pdf
4,a large volume of data to be maintained in the...,What is enabled by this service?\n,The service enables reading of header data for...,product-allocation.pdf
5,CRUD\nCreate Read Update Delete\nX\nProperties...,What operations are supported for a Product Al...,Read the description of a specific product all...,product-allocation.pdf
6,Read the description of a speci�c product allo...,What is the technical name of this entity?\n,A_ProdAllocSqncAssgmt,product-allocation.pdf
7,POST <host>/sap/opu/odata/SAP/API_PRODUCT_ALLO...,What is the prefix used to enclose a UUID valu...,guid,product-allocation.pdf
8,ValidityStartUTCDateTime Validity start time\n...,What are the supported operations for a produc...,Read product-location assignments of a specifi...,product-allocation.pdf


we can see that the questions are all What questions which are simple questions

to do
* validate: check if the qna generation is good
* see if providing examples from the human dataset can improve the LLM generation
    * providing in conversation history not necessary probably, just throw all examples in 1 long prompt

# setup prompt and llm for critic-llm