#RAG and TruLens Evaluation - advanced techniques with auto-merging retrieval

RAG - retrieval augmented generation - is an antidote for contemporary LLMs hallucinations. However, naive RAG is prone to making up facts and not leveraging the knowledge base it has access to. In this notebook we will present techniques for efficient managing available knowledge, advanced retrieval techniqes and reranking capabilities. We base our design on:
 - Llama_index's retrieval module utilizing auto-mergin retrieval technique for more efficient context retrieval,
 - Llama_index's default VectorStoreIndex, which uses an in-memory SimpleVectorStore that's initialized as part of the default storage context,
 - Llama2-13B as the LLM answer generator
 -MTEB's leading sentence embeddings (BAAI/bge-base-en-v1.5)
 - SentenceTransformers based reranker "BAAI/bge-reranker-base"

**Auto-merging retrieval** technique setups the chunks of tokens in parent-children type of hierarchy (with number of levels controlled by a parameter). During retrieval, if the majority of chunks in a given parent set is relevant to the query, the parent chunk is returned instead, allowing for more extended context to be fed to the LLM.

Furthermore, evaluation of different RAG setups will be performed with the use TruLens RAG benchmark, which employs OpenAI as reasoning engine to evaluate generated outputs with respect to three main components (so called RAG Triad):
- Answer Relevance
- Context Relevance
- Groundedness

For this exercise I will use 290 page pdf book "Building Knowledge Graphs - A Practitioner's Guide" by J. Barrasa and J. Webber (compliments of neo4j).


In [None]:
!pip install -qU \
  transformers \
  sentence-transformers \
  accelerate \
  langchain \
  bitsandbytes \
  llama_index \
  trulens_eval \
  pypdf

#1. Get Knowledge Graph Practicioner guide


In [None]:
from llama_index import SimpleDirectoryReader

documents = SimpleDirectoryReader(
    input_files=["/content/drive/MyDrive/ML/data/RAG/Building-Knowledge-Graphs-Practitioner's-Guide-OReilly-book.pdf"]
).load_data()

from llama_index import Document

document = Document(text="\n\n".join([doc.text for doc in documents]))

In [None]:
document.text[10000:20000]

'                   130\nMetadata Graph Example                                                                                             130\nQuerying the Metadata Graph Model                                                                         131\nUsing Relationships to Connect Data and Metadata                                                133\nSummary                                                                                                                         134\n9.Identity Knowledge Graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  135\nKnowing Y our Customer                                                                                              135\nWhen Does the Problem Appear?                                                                               136\nGraph-Based Entity Resolution Step by Step                                                             137\nData Preparation                     

Setup the auto-merging retrieval and test

In [None]:
from llama_index.node_parser import HierarchicalNodeParser
from llama_index.node_parser import get_leaf_nodes

# create the hierarchical node parser w/ default settings
node_parser = HierarchicalNodeParser.from_defaults(
    chunk_sizes=[2048, 512, 128]
)
nodes = node_parser.get_nodes_from_documents([document])

leaf_nodes = get_leaf_nodes(nodes)
print(leaf_nodes[300].text)

Some, but not all, of those relationships also have since  properties on them repre‐
senting the date when the person began living in the place.


In [None]:
nodes_by_id = {node.node_id: node for node in nodes}

parent_node = nodes_by_id[leaf_nodes[300].parent_node.node_id]
print(parent_node.text)

Example 4-18. places_header.csv  contains the signature for Place  nodes
:ID(Place), city, country
Example 4-19. places1.csv  contains data for Place  nodes
143,Berlin,Germany
Example 4-20. places2.csv  contains data for Place  nodes
244,London,UK
To connect people and places, you need to have something like Example 4-21 , which
contains relationships starting with Person  nodes and ending with Place  nodes.
Some, but not all, of those relationships also have since  properties on them repre‐
senting the date when the person began living in the place.


#3. Get LLama2-13B and Embeddings

##3.1. Llama2-13B and Embeddings, setup the service context

We load Llama2-13B, fitting it on a single 15GB RAM with the help of bitsandbytes quantization.

In [None]:
%%time
from torch import cuda, bfloat16
import transformers
import torch


model_name = "meta-llama/Llama-2-13b-chat-hf"
# set quantization configuration using bitsandbytes lib
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)


hf_auth = os.environ.get('HF_API_KEY')
model_config = transformers.AutoConfig.from_pretrained(
    model_name,
    use_auth_token=hf_auth
)

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    config=model_config,
    quantization_config=bnb_config,
    device_map='auto',
    use_auth_token=hf_auth
)
model.eval()


Let's select some decent embeddings model based on [HF's leaderboard](https://huggingface.co/spaces/mteb/leaderboard).

In [None]:
%%time
import torch
from llama_index.llms import HuggingFaceLLM
from llama_index.prompts import PromptTemplate

# This will wrap the default prompts that are internal to llama-index
# taken from https://huggingface.co/Writer/camel-5b-hf
query_wrapper_prompt = PromptTemplate(
    "Below is an instruction that describes a task. "
    "Write a response that appropriately completes the request.\n\n"
    "### Instruction:\n{query_str}\n\n### Response:"
)

model_name = "meta-llama/Llama-2-13b-chat-hf"
tokenizer = transformers.AutoTokenizer.from_pretrained(
    model_name,
    use_auth_token=hf_auth
)

llm = HuggingFaceLLM(
    model=model,
    tokenizer=tokenizer,
    context_window=2048,
    max_new_tokens=512,
    generate_kwargs={
        "temperature": 0.01,
        "repetition_penalty": 1.1 },
    query_wrapper_prompt=query_wrapper_prompt
)

from llama_index.embeddings import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name='BAAI/bge-base-en-v1.5')



config.json:   0%|          | 0.00/777 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

CPU times: user 1.1 s, sys: 1.1 s, total: 2.2 s
Wall time: 3.86 s


### 3.2 Build the index

In [None]:
from llama_index import VectorStoreIndex
from llama_index import ServiceContext, StorageContext

auto_merging_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model=embed_model,
    node_parser=node_parser
)

storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)

automerging_index = VectorStoreIndex(
    leaf_nodes, storage_context=storage_context, service_context=auto_merging_context
)

automerging_index.storage_context.persist(persist_dir="/content/drive/MyDrive/ML/data/RAG//merging_index")

In [None]:
# This block of code is optional to check
# if an index file exist, then it will load it
# if not, it will rebuild it

import os
from llama_index import VectorStoreIndex, StorageContext, load_index_from_storage
from llama_index import load_index_from_storage

if not os.path.exists("/content/drive/MyDrive/ML/data/RAG//merging_index"):
     automerging_index = VectorStoreIndex(
            leaf_nodes,
            storage_context=storage_context,
            service_context=auto_merging_context
        )

    automerging_index.storage_context.persist(persist_dir="/content/drive/MyDrive/ML/data/RAG//merging_index")
else:
    automerging_index = load_index_from_storage(
        StorageContext.from_defaults(persist_dir="/content/drive/MyDrive/ML/data/RAG//merging_index"),
        service_context=auto_merging_context
    )

###3.3. Define the retriever and run query engine


In [None]:
from llama_index.indices.postprocessor import SentenceTransformerRerank
from llama_index.retrievers import AutoMergingRetriever
from llama_index.query_engine import RetrieverQueryEngine

automerging_retriever = automerging_index.as_retriever(
    similarity_top_k=12
)

retriever = AutoMergingRetriever(
    automerging_retriever,
    automerging_index.storage_context,
    verbose=True
)

rerank = SentenceTransformerRerank(top_n=6, model="BAAI/bge-reranker-base")

auto_merging_engine = RetrieverQueryEngine.from_args(
    automerging_retriever, node_postprocessors=[rerank]
)

config.json:   0%|          | 0.00/799 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/279 [00:00<?, ?B/s]

In [None]:
auto_merging_response = auto_merging_engine.query(
    "How did Meredith Corporation consolidate their client profiles and by what number did they reduce duplication among them?"
)
from llama_index.response.notebook_utils import display_response

display_response(auto_merging_response)

**`Final Response:`** Meredith Corporation consolidated their client profiles by using the WCC algorithm to identify unique sub-graphs within the larger graph. This allowed them to connect and analyze data from different sources. By implementing this algorithm, they were able to reduce duplication among client profiles. The exact number by which they reduced duplication is not mentioned in the given context.

#4. Testing the Auto-merging retriever setup


In [None]:
window_response = auto_merging_engine.query(
    "How can knowledge graphs be used in entity resolution problems where there are no strong identifiers?"
)
display_response(window_response)

**`Final Response:`** Knowledge graphs can be used in entity resolution problems where there are no strong identifiers by leveraging the power of graph algorithms and techniques. By using knowledge graphs, a set of weak identifiers can be aggregated into a strong identifier. This means that even if there are no strong identifiers available, the relationships and connections within the knowledge graph can be used to determine if two records represent the same real-world thing. The graph algorithms can analyze the relationships, similarities, and patterns within the knowledge graph to make informed decisions about entity resolution. This approach allows for the integration of data from different systems and helps in reasoning whether a record in one system has a counterpart in another system.

In [None]:
window_response = auto_merging_engine.query(
    "Explain how identity graph improved consumer insight for Meredith Corporation?"
)
display_response(window_response)

**`Final Response:`** The identity graph improved consumer insight for Meredith Corporation by allowing them to better understand their customers. By analyzing data over time and connecting it, rather than just looking at individual cookies, Meredith was able to increase their understanding of a customer by 20 to 30%. This deeper understanding translated into significant revenue gains and better-served consumers. The average length of touch points also increased from 14 days with a cookie to 241 days with user profiles, and average visits increased from 4 per cookie to 23.8 per profile. Overall, the identity graph provided Meredith with a 360-degree view of their users, even for anonymous users, which helped them personalize content and improve the user experience.

In [None]:
window_response = auto_merging_engine.query(
    "What is a fraud-ring pattern and what are ways to detect it?"
)
display_response(window_response)


**`Final Response:`** A fraud-ring pattern refers to a network of individuals who collaborate to commit fraudulent activities, such as identity theft or financial fraud. These individuals often create synthetic identities by using shared phone numbers and addresses to make their fraudulent activities harder to detect.

One way to detect a fraud-ring pattern is by using a knowledge graph. By analyzing the relationships between individuals, phone numbers, and addresses, patterns can be identified. For example, if multiple individuals share the same phone number and address, it could indicate a potential fraud ring. Additionally, analyzing the path length between individuals and identifying repeated patterns can help in detecting fraud rings.

It is important to note that not all linked identities may be fraudsters, as legitimate individuals can also share addresses and phone numbers. Therefore, it requires a combination of human expertise and machine learning to accurately identify and detect fraud-ring patterns.

Let's create a function to encapsulate all the above steps:

In [None]:
import os
from torch import cuda, bfloat16
import transformers
import torch
from llama_index.node_parser import HierarchicalNodeParser
from llama_index.node_parser import get_leaf_nodes
from llama_index import ServiceContext, VectorStoreIndex, StorageContext
from llama_index.indices.postprocessor import MetadataReplacementPostProcessor
from llama_index.indices.postprocessor import SentenceTransformerRerank
from llama_index import load_index_from_storage
from llama_index.prompts import PromptTemplate
from llama_index.embeddings import HuggingFaceEmbedding
from llama_index.llms import HuggingFaceLLM

def init_rag_llms(model_name = "meta-llama/Llama-2-13b-chat-hf", embed_model_name='BAAI/bge-base-en-v1.5'):

    # set quantization configuration using bitsandbytes lib
    bnb_config = transformers.BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type='nf4',
        bnb_4bit_use_double_quant=True,
        bnb_4bit_compute_dtype=bfloat16
    )


    hf_auth = os.environ.get('HF_API_KEY')
    model_config = transformers.AutoConfig.from_pretrained(
        model_name,
        use_auth_token=hf_auth
    )

    model = transformers.AutoModelForCausalLM.from_pretrained(
        model_name,
        trust_remote_code=True,
        config=model_config,
        quantization_config=bnb_config,
        device_map='auto',
        use_auth_token=hf_auth
    )
    model.eval()

    tokenizer = transformers.AutoTokenizer.from_pretrained(
        model_name,
        use_auth_token=hf_auth
    )

    # This will wrap the default prompts that are internal to llama-index
    # taken from https://huggingface.co/Writer/camel-5b-hf
    query_wrapper_prompt = PromptTemplate(
        "Below is an instruction that describes a task. "
        "Write a response that appropriately completes the request.\n\n"
        "### Instruction:\n{query_str}\n\n### Response:"
    )


    llm = HuggingFaceLLM(
        model=model,
        tokenizer=tokenizer,
        context_window=2048,
        max_new_tokens=512,
        generate_kwargs={
            "temperature": 0.01,
            "repetition_penalty": 1.1 },
        query_wrapper_prompt=query_wrapper_prompt
    )

    # loads 'intfloat/e5-base-v2'
    embed_model = HuggingFaceEmbedding(model_name=embed_model_name)
    return llm, embed_model


import os

from llama_index import (
    ServiceContext,
    StorageContext,
    VectorStoreIndex,
    load_index_from_storage,
)
from llama_index.node_parser import HierarchicalNodeParser
from llama_index.node_parser import get_leaf_nodes
from llama_index import StorageContext, load_index_from_storage
from llama_index.retrievers import AutoMergingRetriever
from llama_index.indices.postprocessor import SentenceTransformerRerank
from llama_index.query_engine import RetrieverQueryEngine


def build_automerging_index(
    documents,
    llm,
    embed_model,
    save_dir="merging_index",
    chunk_sizes=None,
):
    chunk_sizes = chunk_sizes or [2048, 512, 128]
    node_parser = HierarchicalNodeParser.from_defaults(chunk_sizes=chunk_sizes)
    nodes = node_parser.get_nodes_from_documents(documents)
    leaf_nodes = get_leaf_nodes(nodes)
    merging_context = ServiceContext.from_defaults(
        llm=llm,
        embed_model=embed_model,
    )
    storage_context = StorageContext.from_defaults()
    storage_context.docstore.add_documents(nodes)

    if not os.path.exists(save_dir):
        automerging_index = VectorStoreIndex(
            leaf_nodes, storage_context=storage_context, service_context=merging_context
        )
        automerging_index.storage_context.persist(persist_dir=save_dir)
    else:
        automerging_index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=save_dir),
            service_context=merging_context,
        )
    return automerging_index


def get_automerging_query_engine(
    automerging_index,
    similarity_top_k=12,
    rerank_top_n=4,
):
    base_retriever = automerging_index.as_retriever(similarity_top_k=similarity_top_k)
    retriever = AutoMergingRetriever(
        base_retriever, automerging_index.storage_context, verbose=True
    )
    rerank = SentenceTransformerRerank(
        top_n=rerank_top_n, model="BAAI/bge-reranker-base"
    )
    auto_merging_engine = RetrieverQueryEngine.from_args(
        retriever, node_postprocessors=[rerank]
    )
    return auto_merging_engine

##5. TruLens Evaluation

We'll run TruLens evaluation based on the RAG triad evaluation criteria:
- Context Relevance: how relevant is retrieved context to the query?
- Answer Relevance: how relevant is generated answer to the query?
- Groundedness: to what extent is the output grounded in retrieved context?

We'll be testing two setups for auto-merging and comparing the TruLens recorder's scores.



In [None]:
eval_questions = ["Explain how identity graph improved consumer insight for Meredith Corporation",
"What is the claim meredith corporation is making wrt the identy graph methods they employed?",
"Explain how can knowledge graphs be employed to solve the entity matching problem",
"In entity resolution problems what are strong and weak identifier? Explain with examples from the text.",
"How can knowledge graphs be used in entity resolution problems where there are no strong identifiers",
"List steps to designing a record deduplication solution using identity graphs",
"What are the challenges of working with unstructured data in entity resolution problem and how can they be addressed with graph-based solution",
"What are the common use cases where deduplication of data appear?",
"How can you employ graph-based solution for fraud detection?",
"What is a fraud-ring pattern and what are ways to detect it?",
"What are some pitfalls when detecting fraud ring and how to avoid them?",
"Explain how can knowledge graphs help to match skillsets of employers in an organization with particular project's needs"]


In [None]:
llm, embed_model = init_rag_llms(embed_model_name='BAAI/bge-base-en-v1.5')

We'll use tenacity to delay some of the calls to OpenAI apis in order to avoid rate exceed exceptions from the API.

In [None]:
!pip install tenacity

In [None]:
from trulens_eval.feedback import Groundedness
from llama_index.response.notebook_utils import display_response
from tenacity import retry, stop_after_attempt, wait_exponential

from trulens_eval import (
    Feedback,
    TruLlama,
    OpenAI
)
from trulens_eval import Tru

@retry(stop=stop_after_attempt(10), wait=wait_exponential(multiplier=1, min=4, max=10))
def call_tru_query_engine(query_engine, prompt):
    return query_engine.query(prompt)


def run_evals(eval_questions, tru_recorder, query_engine):
    for question in eval_questions:
        with tru_recorder as recording:
            response = call_tru_query_engine(query_engine, question)  #query_engine.query(question)


def get_prebuilt_trulens_recorder(query_engine, app_id):
    openai = OpenAI()

    qa_relevance = (
        Feedback(openai.relevance_with_cot_reasons, name="Answer Relevance")
        .on_input_output()
    )

    qs_relevance = (
        Feedback(openai.relevance_with_cot_reasons, name = "Context Relevance")
        .on_input()
        .on(TruLlama.select_source_nodes().node.text)
        .aggregate(np.mean)
    )

    grounded = Groundedness(groundedness_provider=openai)

    groundedness = (
        Feedback(grounded.groundedness_measure_with_cot_reasons, name="Groundedness")
            .on(TruLlama.select_source_nodes().node.text)
            .on_output()
            .aggregate(grounded.grounded_statements_aggregator)
    )

    feedbacks = [qa_relevance, qs_relevance, groundedness]
    tru_recorder = TruLlama(
        query_engine,
        app_id=app_id,
        feedbacks=feedbacks
    )
    return tru_recorder

In [None]:
from trulens_eval import Tru
import numpy as np

Tru().reset_database()

🦑 Tru initialized with db url sqlite:///default.sqlite .
🛑 Secret keys may be written to the database. See the `database_redact_keys` option of `Tru` to prevent this.


##**5.1 Using two layer tree**

In [None]:
from llama_index.llms import OpenAI

auto_merging_index_2 = build_automerging_index(
    [document],
    llm=llm,
    embed_model=embed_model,
    chunk_sizes=[2048,512],
    save_dir="/content/drive/MyDrive/ML/data/RAG/merging_index_2",
)
auto_merging_engine_2 = get_automerging_query_engine(
    auto_merging_index_2,
    similarity_top_k=12,
    rerank_top_n=6,
)

In [None]:
tru_recorder_2 = get_prebuilt_trulens_recorder(
    auto_merging_engine_2,
    app_id ='app_2'
)

✅ In Answer Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Answer Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input response will be set to __record__.app.query.rets.source_nodes[:].node.text .
✅ In Groundedness, input source will be set to __record__.app.query.rets.source_nodes[:].node.text .
✅ In Groundedness, input statement will be set to __record__.main_output or `Select.RecordOutput` .


In [None]:
run_evals(eval_questions, tru_recorder_2, auto_merging_engine_2)

In [None]:
# Tru().run_dashboard()

In [None]:
Tru().get_records_and_feedback(app_ids=[])[0]

Unnamed: 0,app_id,app_json,type,record_id,input,output,tags,record_json,cost_json,perf_json,ts,Answer Relevance,Context Relevance,Groundedness,Answer Relevance_calls,Context Relevance_calls,Groundedness_calls,latency,total_tokens,total_cost
0,app_2,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_2aafff8f7366652393bd59c1c5cb8dd5,"""Explain how identity graph improved consumer ...","""The identity graph improved consumer insight ...",-,"{""record_id"": ""record_hash_2aafff8f7366652393b...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2023-12-22T06:56:46.471574"", ""...",2023-12-22T06:56:57.061067,1.0,0.0,0.82,[{'args': {'prompt': 'Explain how identity gra...,[{'args': {'prompt': 'Explain how identity gra...,[{'args': {'source': 'Example 9-22. Matching u...,10,5093,0.007757
1,app_2,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_654badb11a3f97942c69a7b1f44b30b7,"""What is the claim meredith corporation is mak...","""Meredith Corporation asserts that they employ...",-,"{""record_id"": ""record_hash_654badb11a3f97942c6...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2023-12-22T06:56:57.768849"", ""...",2023-12-22T06:57:05.761829,1.0,0.0,1.0,[{'args': {'prompt': 'What is the claim meredi...,[{'args': {'prompt': 'What is the claim meredi...,[{'args': {'source': 'Example 9-22. Matching u...,7,5021,0.007609
2,app_2,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_a77677ebcb5283fdb8386455cb0930e0,"""Explain how can knowledge graphs be employed ...","""Knowledge graphs can be employed to solve the...",-,"{""record_id"": ""record_hash_a77677ebcb5283fdb83...","{""n_requests"": 3, ""n_successful_requests"": 3, ...","{""start_time"": ""2023-12-22T06:57:06.464791"", ""...",2023-12-22T06:57:25.874331,1.0,0.033333,1.0,[{'args': {'prompt': 'Explain how can knowledg...,[{'args': {'prompt': 'Explain how can knowledg...,[{'args': {'source': '. . . . . . . . . . . . ...,19,5746,0.008876
3,app_2,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_7e55b18e4a5463d1f52c60fb56f34034,"""In entity resolution problems what are strong...","""In entity resolution problems, strong identif...",-,"{""record_id"": ""record_hash_7e55b18e4a5463d1f52...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2023-12-22T06:57:26.576373"", ""...",2023-12-22T06:57:35.261430,1.0,0.6,1.0,[{'args': {'prompt': 'In entity resolution pro...,[{'args': {'prompt': 'In entity resolution pro...,[{'args': {'source': 'This is done through the...,8,2503,0.003852
4,app_2,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_488faf8e7db0ab5d3101f0730389bdc7,"""How can knowledge graphs be used in entity re...","""Knowledge graphs can be used in entity resolu...",-,"{""record_id"": ""record_hash_488faf8e7db0ab5d310...","{""n_requests"": 3, ""n_successful_requests"": 3, ...","{""start_time"": ""2023-12-22T06:57:36.108111"", ""...",2023-12-22T06:57:51.641368,1.0,0.233333,1.0,[{'args': {'prompt': 'How can knowledge graphs...,[{'args': {'prompt': 'How can knowledge graphs...,[{'args': {'source': 'In cases where the dat...,15,6845,0.010423
5,app_2,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_6bd45613abebc1feed3b7fdf4580e347,"""List steps to designing a record deduplicatio...","""The steps to designing a record deduplication...",-,"{""record_id"": ""record_hash_6bd45613abebc1feed3...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2023-12-22T06:57:52.384026"", ""...",2023-12-22T06:58:01.819761,1.0,0.166667,0.833333,[{'args': {'prompt': 'List steps to designing ...,[{'args': {'prompt': 'List steps to designing ...,[{'args': {'source': '| Chapter 9: Identity Kn...,9,4039,0.006167
6,app_2,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_d5ccd1d617d8b4771731bc852be1a475,"""What are the challenges of working with unstr...","""The challenges of working with unstructured d...",-,"{""record_id"": ""record_hash_d5ccd1d617d8b477173...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2023-12-22T06:58:02.506656"", ""...",2023-12-22T06:58:08.819511,1.0,0.25,0.8,[{'args': {'prompt': 'What are the challenges ...,[{'args': {'prompt': 'What are the challenges ...,[{'args': {'source': 'In cases where the dat...,6,3871,0.005866
7,app_2,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_850a4fdf8af0a3cae6c7e879ddca0ef3,"""What are the common use cases where deduplica...","""Data deduplication commonly occurs in scenari...",-,"{""record_id"": ""record_hash_850a4fdf8af0a3cae6c...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2023-12-22T06:58:09.585343"", ""...",2023-12-22T06:58:15.248846,1.0,0.15,2.3,[{'args': {'prompt': 'What are the common use ...,[{'args': {'prompt': 'What are the common use ...,[{'args': {'source': 'In cases where the dat...,5,3433,0.005201
8,app_2,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_48063ccfb39d957cb214c074ceb128ad,"""How can you employ graph-based solution for f...","""A graph-based solution for fraud detection ca...",-,"{""record_id"": ""record_hash_48063ccfb39d957cb21...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2023-12-22T06:58:15.982497"", ""...",2023-12-22T06:58:26.088644,1.0,0.75,1.0,[{'args': {'prompt': 'How can you employ graph...,[{'args': {'prompt': 'How can you employ graph...,[{'args': {'source': '162 | Chapter 10: Patter...,10,4797,0.007304
9,app_2,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_9478fa7f5c6abaa2a26a292101d2c179,"""What is a fraud-ring pattern and what are way...","""A fraud-ring pattern refers to a network of i...",-,"{""record_id"": ""record_hash_9478fa7f5c6abaa2a26...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2023-12-22T06:58:26.777003"", ""...",2023-12-22T06:58:43.755364,1.0,0.283333,,[{'args': {'prompt': 'What is a fraud-ring pat...,[{'args': {'prompt': 'What is a fraud-ring pat...,,16,4939,0.007621


In [None]:
Tru().get_records_and_feedback(app_ids=[])[0].to_csv("/content/drive/MyDrive/ML/data/RAG/automargin_2.csv")

In [None]:
import pandas as pd
automargin_2 = pd.read_csv("/content/drive/MyDrive/ML/data/RAG/automargin_2.csv")


In [None]:
# get average Answer Relevance, Context Relevance and Groundedness
avg_ar = automargin_2['Answer Relevance'].mean()
avg_cr = automargin_2['Context Relevance'].mean()
avg_grd = automargin_2['Groundedness'].mean()
(avg_ar, avg_cr, avg_grd)

(1.0, 0.24444444444444438, 1.0535185185185185)

Average Answer Relevance and Groundedness is high.


##**5.2. Using three layer tree**

In [None]:
auto_merging_index_3 = build_automerging_index(
    documents,
    llm=llm,
    embed_model=embed_model,
    save_dir="merging_index_3",
    chunk_sizes=[2048,512,128],
)

auto_merging_engine_3 = get_automerging_query_engine(
    auto_merging_index_3,
    similarity_top_k=12,
    rerank_top_n=6,
)


config.json:   0%|          | 0.00/799 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/279 [00:00<?, ?B/s]

In [None]:
tru_recorder = get_prebuilt_trulens_recorder(
    auto_merging_engine_3,
    app_id ='app_3'
)

✅ In Answer Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Answer Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input response will be set to __record__.app.query.rets.source_nodes[:].node.text .
✅ In Groundedness, input source will be set to __record__.app.query.rets.source_nodes[:].node.text .
✅ In Groundedness, input statement will be set to __record__.main_output or `Select.RecordOutput` .


Test on one question:


In [None]:
window_response = auto_merging_engine_3.query(
    # "What is a fraud-ring pattern and what are ways to detect it?"
    "Explain how identity graph improved consumer insight for Meredith Corporation"
)
display_response(window_response)

> Merging 5 nodes into parent node.
> Parent node id: dc3b6791-03bb-4f20-90f7-13c838507a3e.
> Parent node text: From here on, the algorithm will continue to build the graph where the connected
components yield...

> Merging 1 nodes into parent node.
> Parent node id: 49db924e-5453-4b28-9012-d98998e0bcc8.
> Parent node text: It created a virtuous cycle.
Meredith commented, “We basically have increased our understanding o...

> Merging 2 nodes into parent node.
> Parent node id: 63370965-cc4e-402e-9c87-6911d4c53751.
> Parent node text: From here on, the algorithm will continue to build the graph where the connected
components yield...



**`Final Response:`** The identity graph implemented by Meredith Corporation improved consumer insight by consolidating and analyzing user data from various sources. By using the WCC algorithm to identify unique sub-graphs within the larger graph, Meredith was able to create more accurate and comprehensive user profiles. This allowed them to understand their customers better and personalize content based on their interests and preferences. As a result, the average length of touch points increased significantly, from 14 days with a cookie to 241 days with user profiles. The number of average visits also increased, indicating that users were returning more frequently. By gaining a high-definition view of user interests and preferences, Meredith was able to develop stronger models and deliver more relevant content, leading to increased user engagement and revenue gains.

In [None]:
window_response = auto_merging_engine_3.query(
    # "What is a fraud-ring pattern and what are ways to detect it?"
    "How did Meredith Corporation consolidate their client profiles and by what number did they reduce duplication among them?"
)
display_response(window_response)

> Merging 5 nodes into parent node.
> Parent node id: dc3b6791-03bb-4f20-90f7-13c838507a3e.
> Parent node text: From here on, the algorithm will continue to build the graph where the connected
components yield...

> Merging 1 nodes into parent node.
> Parent node id: 49db924e-5453-4b28-9012-d98998e0bcc8.
> Parent node text: It created a virtuous cycle.
Meredith commented, “We basically have increased our understanding o...

> Merging 2 nodes into parent node.
> Parent node id: 63370965-cc4e-402e-9c87-6911d4c53751.
> Parent node text: From here on, the algorithm will continue to build the graph where the connected
components yield...



**`Final Response:`** Meredith Corporation consolidated their client profiles by identifying unique sub-graphs within the larger graph using the WCC algorithm. They incorporated more than 20 months of user data from both first- and third-party sources, resulting in nearly 350 million profiles being consolidated into 163 million richer and more accurate profiles. This consolidation reduced duplication among the client profiles. The exact number by which duplication was reduced is not mentioned in the given context.

In [None]:
run_evals(eval_questions, tru_recorder, auto_merging_engine_3)

> Merging 5 nodes into parent node.
> Parent node id: dc3b6791-03bb-4f20-90f7-13c838507a3e.
> Parent node text: From here on, the algorithm will continue to build the graph where the connected
components yield...

> Merging 1 nodes into parent node.
> Parent node id: 49db924e-5453-4b28-9012-d98998e0bcc8.
> Parent node text: It created a virtuous cycle.
Meredith commented, “We basically have increased our understanding o...

> Merging 2 nodes into parent node.
> Parent node id: 63370965-cc4e-402e-9c87-6911d4c53751.
> Parent node text: From here on, the algorithm will continue to build the graph where the connected
components yield...

> Merging 5 nodes into parent node.
> Parent node id: dc3b6791-03bb-4f20-90f7-13c838507a3e.
> Parent node text: From here on, the algorithm will continue to build the graph where the connected
components yield...

> Merging 1 nodes into parent node.
> Parent node id: 49db924e-5453-4b28-9012-d98998e0bcc8.
> Parent node text: It created a virtuous cycle.
Mer

In [None]:
Tru().get_records_and_feedback(app_ids=[])[0]

Unnamed: 0,app_id,app_json,type,record_id,input,output,tags,record_json,cost_json,perf_json,ts,Answer Relevance,Context Relevance,Groundedness,Answer Relevance_calls,Context Relevance_calls,Groundedness_calls,latency,total_tokens,total_cost
0,app_3,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_2e4e3c007cda3b8695094a5691666e7a,"""Explain how identity graph improved consumer ...","""The identity graph implemented by Meredith Co...",-,"{""record_id"": ""record_hash_2e4e3c007cda3b86950...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2023-12-22T09:31:24.370345"", ""...",2023-12-22T09:31:31.295906,0.9,0.166667,0.966667,[{'args': {'prompt': 'Explain how identity gra...,[{'args': {'prompt': 'Explain how identity gra...,"[{'args': {'source': 'From here on, the algori...",6,1375,0.002136
1,app_3,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_b913781213daa2c263bb9de256a8171f,"""What is the claim meredith corporation is mak...","""Meredith Corporation claims that by using gra...",-,"{""record_id"": ""record_hash_b913781213daa2c263b...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2023-12-22T09:31:32.093662"", ""...",2023-12-22T09:31:36.981355,1.0,0.3,1.0,[{'args': {'prompt': 'What is the claim meredi...,[{'args': {'prompt': 'What is the claim meredi...,"[{'args': {'source': 'From here on, the algori...",4,1292,0.001976
2,app_3,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_5aeedc10e93f2098c510832831fd5ffa,"""Explain how can knowledge graphs be employed ...","""Knowledge graphs can be employed to solve the...",-,"{""record_id"": ""record_hash_5aeedc10e93f2098c51...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2023-12-22T09:31:37.661979"", ""...",2023-12-22T09:31:43.305511,1.0,0.35,1.0,[{'args': {'prompt': 'Explain how can knowledg...,[{'args': {'prompt': 'Explain how can knowledg...,[{'args': {'source': 'CHAPTER 13 Talking to Yo...,5,890,0.001383
3,app_3,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_748a1743161737aaf53c1082ecac5e19,"""In entity resolution problems what are strong...","""In entity resolution problems, strong identif...",-,"{""record_id"": ""record_hash_748a1743161737aaf53...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2023-12-22T09:31:44.066420"", ""...",2023-12-22T09:31:51.838702,1.0,0.316667,0.811111,[{'args': {'prompt': 'In entity resolution pro...,[{'args': {'prompt': 'In entity resolution pro...,"[{'args': {'source': 'The higher the score, th...",7,963,0.001533
4,app_3,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_fd1f5f6b50b4f08771576cff4ca00d46,"""How can knowledge graphs be used in entity re...","""Knowledge graphs can be used in entity resolu...",-,"{""record_id"": ""record_hash_fd1f5f6b50b4f087715...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2023-12-22T09:31:52.655862"", ""...",2023-12-22T09:31:58.249769,1.0,0.7,1.0,[{'args': {'prompt': 'How can knowledge graphs...,[{'args': {'prompt': 'How can knowledge graphs...,[{'args': {'source': 'This problem commonly ar...,5,895,0.001392
5,app_3,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_3ed68d43d35c89e8879af89113424372,"""List steps to designing a record deduplicatio...","""The steps to designing a record deduplication...",-,"{""record_id"": ""record_hash_3ed68d43d35c89e8879...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2023-12-22T09:31:59.021711"", ""...",2023-12-22T09:32:08.306549,1.0,0.3,1.0,[{'args': {'prompt': 'List steps to designing ...,[{'args': {'prompt': 'List steps to designing ...,[{'args': {'source': 'Graph-Based Entity Resol...,9,1061,0.00171
6,app_3,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_6f75e2348ff4c21141ea9dffa919054f,"""What are the challenges of working with unstr...","""The challenges of working with unstructured d...",-,"{""record_id"": ""record_hash_6f75e2348ff4c21141e...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2023-12-22T09:32:08.987014"", ""...",2023-12-22T09:32:18.891575,1.0,0.416667,1.0,[{'args': {'prompt': 'What are the challenges ...,[{'args': {'prompt': 'What are the challenges ...,[{'args': {'source': 'This problem commonly ar...,9,927,0.00146
7,app_3,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_9e426a4679be5c2e0fbd11dfa6049b53,"""What are the common use cases where deduplica...","""Some of the common use cases where deduplicat...",-,"{""record_id"": ""record_hash_9e426a4679be5c2e0fb...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2023-12-22T09:32:19.605232"", ""...",2023-12-22T09:32:22.992541,1.0,0.433333,1.0,[{'args': {'prompt': 'What are the common use ...,[{'args': {'prompt': 'What are the common use ...,[{'args': {'source': 'The process is rarely ...,3,916,0.00139
8,app_3,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_41e2911e5e11ee5075abdcd3bc56700d,"""How can you employ graph-based solution for f...","""Graph-based solutions for fraud detection can...",-,"{""record_id"": ""record_hash_41e2911e5e11ee5075a...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2023-12-22T09:32:23.819023"", ""...",2023-12-22T09:32:29.641650,0.9,0.583333,1.0,[{'args': {'prompt': 'How can you employ graph...,[{'args': {'prompt': 'How can you employ graph...,[{'args': {'source': 'Using both subgraph-loca...,5,857,0.001339
9,app_3,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_c091eec9d98a6d5ef902ac7bd4d0d997,"""What is a fraud-ring pattern and what are way...","""A fraud-ring pattern refers to a network of i...",-,"{""record_id"": ""record_hash_c091eec9d98a6d5ef90...","{""n_requests"": 1, ""n_successful_requests"": 1, ...","{""start_time"": ""2023-12-22T09:32:30.413829"", ""...",2023-12-22T09:32:38.065697,0.9,0.3,1.0,[{'args': {'prompt': 'What is a fraud-ring pat...,[{'args': {'prompt': 'What is a fraud-ring pat...,[{'args': {'source': 'compelling fake identiti...,7,1270,0.001994


In [None]:
Tru().get_records_and_feedback(app_ids=[])[0].to_csv("/content/drive/MyDrive/ML/data/RAG/automerging_3.csv")

In [None]:
# get average Answer Relevance, Context Relevance and Groundedness
import pandas as pd
automerging_3_pd = pd.read_csv("/content/drive/MyDrive/ML/data/RAG/automerging_3.csv")

avg_ar = automerging_3_pd['Answer Relevance'].mean()
avg_cr = automerging_3_pd['Context Relevance'].mean()
avg_grd = automerging_3_pd['Groundedness'].mean()
(avg_ar, avg_cr, avg_grd)

(0.9833333333333334, 0.39444444444444443, 0.9074570105820104)

For the setup with 3-leyers of node hierarchy RAG performs better, which is reflected in TruLens' triad metric: mean Answer Relevance, Context Relevance and Groundedness.

**Summary**
- setup with 3 levels of nodes gives better granularity when selecting context.
- error analysis: after analyzing the weakest link (Context Relevance score) it was visible that often the low score was not indicative of RAG not performing well. It was rather the matter of several pieces of context extracted and only one of them being relevant to the matter in question (rerank_top_n=6 meaning 6 pieces of context were assessed). So the score for the mismatched chunks were 0, creating low average score for context relevance, but the LLM anyway used only the right piece of context, outputting correct answer that was also grounded well in the piece of context provided.