## È´òÁ∫ßRAGÊñπÊ≥ï

Small-to-Big Retrieval
1. Parent Document Retrieval
2. Auto-Merging Retrieval
3. Sentence-Window Retrieval

In [39]:
!pip install python-dotenv llama-index trulens-eval torch sentence-transformers

Collecting sentence-transformers
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m86.0/86.0 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting sentencepiece (from sentence-transformers)
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m1.3/1.3 MB[0m [31m23.7 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: sentence-transformers
  Building wheel for sentence-transformers (setup.py) ... [?25l[?25hdone
  Created wheel for sentence-transformers: filename=sentence_transformers-2.2.2-py3-none-any.whl size=125923 sha256=e5495c868f4b5a0fa0758628140c22713ea26a9872

In [3]:
import os, logging
from google.colab import userdata
from tqdm.autonotebook import tqdm

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

openai_api_key = os.environ["OPENAI_API_KEY"]

## Baseline Âü∫Á∫øRAG

In [5]:
from llama_index import SimpleDirectoryReader, VectorStoreIndex
from llama_index.node_parser import SimpleNodeParser
from llama_index import ServiceContext, StorageContext
from llama_index.embeddings import OpenAIEmbedding, HuggingFaceEmbedding
from llama_index.llms import OpenAI
from llama_index.query_engine import RetrieverQueryEngine
from llama_index import load_index_from_storage

from typing import List
from pathlib import Path

In [6]:
## Create LLM and Embedding Model
embed_model = OpenAIEmbedding() # default embedding model ada
llm = OpenAI(api_key=openai_api_key, model="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(
    embed_model=embed_model, llm=llm
)

# check if data indexes already exists
if not os.path.exists("./storage"):
    # load data
    documents = SimpleDirectoryReader(input_dir="dataFiles").load_data(show_progress=True)

    # create nodes parser
    node_parser = SimpleNodeParser.from_defaults(chunk_size=1024)

    # split into nodes
    base_nodes = node_parser.get_nodes_from_documents(documents=documents)

    # creating index
    index = VectorStoreIndex(nodes=base_nodes, service_context=service_context)

    # store index
    index.storage_context.persist()
else:
    # load existing index
    storage_context = StorageContext.from_defaults(persist_dir="./storage")
    index = load_index_from_storage(storage_context=storage_context)


# create retriever
retriever = index.as_retriever(similarity_top_k=2)

# query retriever engine
query_engine = RetrieverQueryEngine.from_args(
    retriever=retriever,
    service_context=service_context
)

Loading files: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00, 323.86file/s]


In [7]:
# test response
response = query_engine.query("What did the president say about covid-19")

print(response)

The president stated that COVID-19 need no longer control our lives and that we will never just accept living with COVID-19. He emphasized the importance of continuing to combat the virus and staying on guard, as it is a virus that mutates and spreads. He also mentioned the effectiveness of vaccines and treatments in providing protection against COVID-19 and expressed the commitment to vaccinating more Americans. Additionally, he acknowledged the eagerness of parents with children under 5 to see a vaccine authorized for their children.


## RAG Pipeline Evaluation

- Answer Relevance: How relevant is the answer to the query or the user question?

- Context Relevance: How relevant was the retrieved context in regards to answering the user question?

- Groundedness: How much is the response supported by the retrieved context?

In [44]:
from trulens_eval import Feedback, Tru, TruLlama, Select
from trulens_eval.feedback import Groundedness
from trulens_eval.feedback.provider.openai import OpenAI as OpenAITruLens

import numpy as np

benchmark_result_db = 'benchmark.sqlite'

tru = Tru(database_file=benchmark_result_db)

fopenai = OpenAITruLens() # default using GPT3.5-turbo for eval

grounded = Groundedness(groundedness_provider=OpenAITruLens())
# Define a groundedness feedback function
f_groundedness = Feedback(grounded.groundedness_measure_with_cot_reasons).on(
    TruLlama.select_source_nodes().node.text
    ).on_output(
    ).aggregate(grounded.grounded_statements_aggregator)

# Question/answer relevance between overall question and answer.
f_qa_relevance = Feedback(fopenai.relevance).on_input_output()

# Question/statement relevance between question and each context chunk.
f_context_relevance = Feedback(fopenai.qs_relevance).on_input().on(
    TruLlama.select_source_nodes().node.text
    ).aggregate(np.mean)



‚úÖ In groundedness_measure_with_cot_reasons, input source will be set to __record__.app.query.rets.source_nodes[:].node.text .
‚úÖ In groundedness_measure_with_cot_reasons, input statement will be set to __record__.main_output or `Select.RecordOutput` .
‚úÖ In relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
‚úÖ In relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
‚úÖ In qs_relevance, input question will be set to __record__.main_input or `Select.RecordInput` .
‚úÖ In qs_relevance, input statement will be set to __record__.app.query.rets.source_nodes[:].node.text .


In [8]:
tru_query_engine_recorder = TruLlama(query_engine,
    app_id='RAG_Baseline_V0',
    feedbacks=[f_groundedness, f_qa_relevance, f_context_relevance])

eval_questions = []

eval_questions_file = 'eval_questions.txt'

  tru = Tru(database_file=benchmark_result_db)


ü¶ë Tru initialized with db url sqlite:///benchmark.sqlite .
üõë Secret keys may be written to the database. See the `database_redact_keys` option of `Tru` to prevent this.
‚úÖ In groundedness_measure_with_cot_reasons, input source will be set to __record__.app.query.rets.source_nodes[:].node.text .
‚úÖ In groundedness_measure_with_cot_reasons, input statement will be set to __record__.main_output or `Select.RecordOutput` .
‚úÖ In relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
‚úÖ In relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
‚úÖ In qs_relevance, input question will be set to __record__.main_input or `Select.RecordInput` .
‚úÖ In qs_relevance, input statement will be set to __record__.app.query.rets.source_nodes[:].node.text .


In [9]:
with open(eval_questions_file, "r") as eval_qn:
    for qn in eval_qn:
        qn_stripped = qn.strip()
        eval_questions.append(qn_stripped)


def run_eval(eval_questions: List[str]):
    for qn in eval_questions:
        # eval using context window
        with tru_query_engine_recorder as recording:
            query_engine.query(qn)


run_eval(eval_questions=eval_questions)

# run dashboard
tru.run_dashboard()

Starting dashboard ...
npx: installed 22 in 4.5s

Go to this url and submit the ip given here. your url is: https://fresh-badgers-tap.loca.lt

  Submit this IP Address: 34.73.66.145



<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>

### Baseline Comment

The RAG application seems to be performing poorly in retrieving the most relevant document.

## Auto-Merging Retrieval

Output a hierarchy of nodes, from top-level nodes with bigger chunk sizes to child nodes with smaller chunk sizes, where each child node has a parent node with a bigger chunk size.

By default, the hierarchy is:

- 1st level: chunk size 2048
- 2nd level: chunk size 512
- 3rd level: chunk size 128

In [23]:
from typing import List
from llama_index import (
    Document,
    VectorStoreIndex,
    SimpleDirectoryReader,
    ServiceContext,
)
from llama_index.retrievers import RecursiveRetriever
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.node_parser import SimpleNodeParser
from llama_index.embeddings import OpenAIEmbedding
from llama_index.schema import IndexNode
from llama_index.llms import OpenAI

import os

import numpy as np

## Create LLM and Embedding Model
embed_model = OpenAIEmbedding() # default embedding model ada
llm = OpenAI(api_key=openai_api_key, model="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(
    embed_model=embed_model, llm=llm
)

# load data
documents = SimpleDirectoryReader(input_dir="dataFiles").load_data(show_progress=True)

doc_text = "\n\n".join([d.get_content() for d in documents])
docs = [Document(text=doc_text)]

Loading files: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00, 560.29file/s]


In [24]:
from llama_index.node_parser import HierarchicalNodeParser, SentenceSplitter

node_parser = HierarchicalNodeParser.from_defaults()

nodes = node_parser.get_nodes_from_documents(documents = docs)

Get "leaf" nodes in node list. These nodes don't have children of their own.

In [25]:
from llama_index.node_parser import get_leaf_nodes, get_root_nodes

leaf_nodes = get_leaf_nodes(nodes)
len(leaf_nodes)

87

In [26]:
root_nodes = get_root_nodes(nodes)
len(root_nodes)

5

### Store

Docstore: load all nodes into

VectorStoreIndex: containing just the leaf-level nodes

In [28]:
# storage context
from llama_index.storage.docstore import SimpleDocumentStore
from llama_index.storage import StorageContext
from llama_index import ServiceContext
from llama_index import VectorStoreIndex

In [29]:
docstore = SimpleDocumentStore()

docstore.add_documents(nodes) # all nodes

# define storage context (will include vector store by default too)
storage_context = StorageContext.from_defaults(docstore=docstore)

service_context = ServiceContext.from_defaults(
    llm=OpenAI(model="gpt-3.5-turbo")
)

base_index = VectorStoreIndex(
    leaf_nodes,
    storage_context=storage_context,
    service_context=service_context,
)

In [30]:
from llama_index.retrievers.auto_merging_retriever import AutoMergingRetriever

base_retriever = base_index.as_retriever(similarity_top_k=6)
retriever = AutoMergingRetriever(base_retriever, storage_context, verbose=True)

query_str = (
    "What did the president say about covid-19"
)

nodes = retriever.retrieve(query_str)
base_nodes = base_retriever.retrieve(query_str)

In [32]:
len(nodes)

6

In [33]:
len(base_nodes)

6

In [34]:
from llama_index.response.notebook_utils import display_source_node

for node in nodes:
    display_source_node(node, source_length=10000)

**Node ID:** 853a9565-73a0-4125-880d-c8c3a0c109b1<br>**Similarity:** 0.8207400125552234<br>**Text:** Under these new guidelines, most Americans in most of the country can now be mask free.   

And based on the projections, more of the country will reach that point across the next couple of weeks. 

Thanks to the progress we have made this past year, COVID-19 need no longer control our lives.  

I know some are talking about ‚Äúliving with COVID-19‚Äù. Tonight ‚Äì I say that we will never just accept living with COVID-19. 

We will continue to combat the virus as we do other diseases. And because this is a virus that mutates and spreads, we will stay on guard.<br>

**Node ID:** 0d6a9f53-7918-4d09-8349-78b08f46f019<br>**Similarity:** 0.8159280428328318<br>**Text:** Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  

Last year COVID-19 kept us apart. This year we are finally together again. 

Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. 

With a duty to one another to the American people to the Constitution. 

And with an unwavering resolve that freedom will always triumph over tyranny.<br>

**Node ID:** aa2b70e4-0a8b-4f7b-a2ec-926a20ae4939<br>**Similarity:** 0.8149870297493353<br>**Text:** Time with one another. And worst of all, so much loss of life. 

Let‚Äôs use this moment to reset. Let‚Äôs stop looking at COVID-19 as a partisan dividing line and see it for what it is: A God-awful disease.  

Let‚Äôs stop seeing each other as enemies, and start seeing each other for who we really are: Fellow Americans.  

We can‚Äôt change how divided we‚Äôve been. But we can change how we move forward‚Äîon COVID-19 and other issues we must face together.<br>

**Node ID:** b50609f7-1118-4239-851f-32e1a3810970<br>**Similarity:** 0.8119718233515406<br>**Text:** The pandemic has been punishing. 

And so many families are living paycheck to paycheck, struggling to keep up with the rising cost of food, gas, housing, and so much more. 

I understand. 

I remember when my Dad had to leave our home in Scranton, Pennsylvania to find work. I grew up in a family where if the price of food went up, you felt it. 

That‚Äôs why one of the first things I did as President was fight to pass the American Rescue Plan.  

Because people were hurting. We needed to act, and we did.<br>

**Node ID:** b9e94483-7e70-47e2-89ac-56ee010fe39f<br>**Similarity:** 0.8054011110664735<br>**Text:** And, if Congress provides the funds we need, we‚Äôll have new stockpiles of tests, masks, and pills ready if needed. 

I cannot promise a new variant won‚Äôt come. But I can promise you we‚Äôll do everything within our power to be ready if it does.  

Third ‚Äì we can end the shutdown of schools and businesses. We have the tools we need. 

It‚Äôs time for Americans to get back to work and fill our great downtowns again.  People working from home can feel safe to begin to return to the office.   

We‚Äôre doing that here in the federal government.<br>

**Node ID:** 99361265-915d-4d73-9f84-9ef57b4ce6fe<br>**Similarity:** 0.8049653121329142<br>**Text:** And I know you‚Äôre tired, frustrated, and exhausted. 

But I also know this. 

Because of the progress we‚Äôve made, because of your resilience and the tools we have, tonight I can say  
we are moving forward safely, back to more normal routines.  

We‚Äôve reached a new moment in the fight against COVID-19, with severe cases down to a level not seen since last July.  

Just a few days ago, the Centers for Disease Control and Prevention‚Äîthe CDC‚Äîissued new mask guidelines. 

Under these new guidelines, most Americans in most of the country can now be mask free.<br>

In [None]:
from trulens_eval import Feedback, Tru, TruLlama
from trulens_eval.feedback import Groundedness
from trulens_eval.feedback.provider.openai import OpenAI as OpenAITruLens

## Sentence Window Retrieval

In [43]:
import os
from llama_index import (
    SimpleDirectoryReader,
    Document,
    StorageContext,
    load_index_from_storage
)

from llama_index.node_parser import SentenceWindowNodeParser
from llama_index.llms import OpenAI
from llama_index.embeddings import OpenAIEmbedding
from llama_index import ServiceContext
from llama_index import VectorStoreIndex
from llama_index.indices.postprocessor import MetadataReplacementPostProcessor
from llama_index.indices.postprocessor import SentenceTransformerRerank


# load data
documents = SimpleDirectoryReader(input_dir="dataFiles").load_data(show_progress=True)


# merge pages into one
document = Document(text="\n\n".join([doc.text for doc in documents]))

embed_model = OpenAIEmbedding() # default embedding model ada
llm = OpenAI(api_key=openai_api_key, model="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(
    embed_model=embed_model, llm=llm
)


def create_indexes(
    documents: Document,
    index_save_dir: str,
    window_size: int = 4,
    llm_model: str = "gpt-3.5-turbo",
    temperature: float = 0.1
):
    node_parser = SentenceWindowNodeParser.from_defaults(
        window_size=window_size,
        window_metadata_key="window",
        original_text_metadata_key="original_text",
    )


    # creating the service context
    sentence_context = ServiceContext.from_defaults(
        llm=llm,
        embed_model=embed_model,
        node_parser=node_parser,
    )

    if not os.path.exists(index_save_dir):
        # creating the vector store index
        index = VectorStoreIndex.from_documents(
            [document], service_context=sentence_context
        )

        # make vector store persistant
        index.storage_context.persist(persist_dir=index_save_dir)
    else:
        # load vector store indexed if they exist
        index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=index_save_dir),
            service_context=sentence_context
        )

    return index


def create_query_engine(
    sentence_index: VectorStoreIndex,
    similarity_top_k: int = 6,
    rerank_top_n: int = 5,
    rerank_model: str = "BAAI/bge-reranker-base",
):
    # add meta data replacement post processor
    postproc = MetadataReplacementPostProcessor(
        target_metadata_key="window"
    )

    # link: https://huggingface.co/BAAI/bge-reranker-base
    rerank = SentenceTransformerRerank(
        top_n=rerank_top_n,
        model=rerank_model
    )

    sentence_window_engine = sentence_index.as_query_engine(
        similarity_top_k=similarity_top_k,
        node_postprocessors=[postproc, rerank]
    )

    return sentence_window_engine


# create index with window size 3
index_window_3 = create_indexes(
    documents=documents,
    index_save_dir="storage",
    window_size=3,
    llm_model="gpt-3.5-turbo",
    temperature=0.1
)

# create index with window size 6
index_window_6 = create_indexes(
    documents=documents,
    index_save_dir="sentence_window_size_6_index",
    window_size=3,
    llm_model="gpt-3.5-turbo",
    temperature=0.1
)

# create query engine
sentence_window_engine_window_zize3 = create_query_engine(
    sentence_index=index_window_3,
    similarity_top_k=5,
    rerank_top_n=2,
)

sentence_window_engine_window_zize6 = create_query_engine(
    sentence_index=index_window_6,
    similarity_top_k=5,
    rerank_top_n=2,
)

response = sentence_window_engine_window_zize3.query(
    "What did the president say about covid-19?"
)

print(response)

Loading files: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00, 260.37file/s]
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


The president acknowledged that COVID-19 has impacted every decision in our lives and the life of the nation for more than two years. The president also recognized that people are tired, frustrated, and exhausted due to the ongoing pandemic.


In [41]:
### Eval window size 3
tru_query_engine_recorder = TruLlama(sentence_window_engine_window_zize3,
    app_id='RAG_sentence_window_size_3',
    feedbacks=[f_groundedness, f_qa_relevance, f_context_relevance])

eval_questions = []

eval_questions_file = 'eval_questions.txt'

with open(eval_questions_file, "r") as eval_qn:
    for qn in eval_qn:
        qn_stripped = qn.strip()
        eval_questions.append(qn_stripped)


def run_eval(eval_questions: List[str]):
    for qn in eval_questions:
        # eval using context window
        with tru_query_engine_recorder as recording:
            sentence_window_engine_window_zize3.query(qn)


run_eval(eval_questions=eval_questions)

# run dashboard
tru.run_dashboard()



‚úÖ In groundedness_measure_with_cot_reasons, input source will be set to __record__.app.query.rets.source_nodes[:].node.text .
‚úÖ In groundedness_measure_with_cot_reasons, input statement will be set to __record__.main_output or `Select.RecordOutput` .
‚úÖ In relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
‚úÖ In relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
‚úÖ In qs_relevance, input question will be set to __record__.main_input or `Select.RecordInput` .
‚úÖ In qs_relevance, input statement will be set to __record__.app.query.rets.source_nodes[:].node.text .


In [45]:
### Eval window size 6
tru_query_engine_recorder = TruLlama(sentence_window_engine_window_zize6,
    app_id='RAG_sentence_window_size_6',
    feedbacks=[f_groundedness, f_qa_relevance, f_context_relevance])

eval_questions = []

eval_questions_file = 'eval_questions.txt'

with open(eval_questions_file, "r") as eval_qn:
    for qn in eval_qn:
        qn_stripped = qn.strip()
        eval_questions.append(qn_stripped)


def run_eval(eval_questions: List[str]):
    for qn in eval_questions:
        # eval using context window
        with tru_query_engine_recorder as recording:
            sentence_window_engine_window_zize6.query(qn)


run_eval(eval_questions=eval_questions)

# run dashboard
tru.run_dashboard()

Starting dashboard ...
Config file already exists. Skipping writing process.
Credentials file already exists. Skipping writing process.
Dashboard already running at path:   Submit this IP Address: 34.73.66.145



<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>