# Lesson 4: Auto-merging Retrieval

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader(
    input_files=["./eBook-How-to-Build-a-Career-in-AI.pdf"]
).load_data()

In [3]:
print(type(documents), "\n")
print(len(documents), "\n")
print(type(documents[0]))
print(documents[0])

<class 'list'> 

41 

<class 'llama_index.core.schema.Document'>
Doc ID: ec8cae55-9b02-472a-a061-49a5205c259d
Text: PAGE 1 Founder, DeepLearning.AI Collected Insights from Andrew
Ng How to  Build Your Career in AI A Simple Guide


## Auto-merging retrieval setup

In [4]:
from llama_index.core import Document

document = Document(text="\n\n".join([doc.text for doc in documents]))

In [5]:
from llama_index.core.node_parser import HierarchicalNodeParser

# create the hierarchical node parser w/ default settings
node_parser = HierarchicalNodeParser.from_defaults(
    chunk_sizes=[2048, 512, 128]
)

In [6]:
nodes = node_parser.get_nodes_from_documents([document])

In [7]:
from llama_index.core.node_parser import get_leaf_nodes, get_root_nodes

leaf_nodes = get_leaf_nodes(nodes)
print(leaf_nodes[30].text)

Of course, I also encourage learning driven by curiosity. If something interests you, go ahead 
and learn it regardless of how useful it might turn out to be!  Maybe this will lead to a creative 
spark or technical breakthrough.
How much math do you need to know to be a machine learning engineer?


In [8]:
nodes_by_id = {node.node_id: node for node in nodes}

parent_node = nodes_by_id[leaf_nodes[30].parent_node.node_id]
print(parent_node.text)

On some days, maybe you’ll end up studying for an 
hour or longer.

PAGE 12
Should You 
Learn Math to 
Get a Job in AI? 
CHAPTER 3
LEARNING

PAGE 13
Should you Learn Math to Get a Job in AI? CHAPTER 3
Is math a foundational skill for AI? It’s always nice to know more math! But there’s so much to 
learn that, realistically, it’s necessary to prioritize. Here’s how you might go about strengthening 
your math background.
To figure out what’s important to know, I find it useful to ask what you need to know to make 
the decisions required for the work you want to do. At DeepLearning.AI, we frequently ask, 
“What does someone need to know to accomplish their goals?” The goal might be building a 
machine learning model, architecting a system, or passing a job interview.
Understanding the math behind algorithms you use is often helpful, since it enables you to 
debug them. But the depth of knowledge that’s useful changes over time. As machine learning 
techniques mature and become more reliabl

### Building the index

In [9]:
import getpass
import os

INFERENCE_SERVER_URL = "http://localhost:8989"
MODEL_NAME = "ibm-granite/granite-3.3-2b-instruct"
API_KEY= "alanliuxiang"

from llama_index.llms.openai_like import OpenAILike

llm = OpenAILike(
  model=MODEL_NAME,
  api_key=API_KEY,
  api_base= f"{INFERENCE_SERVER_URL}/v1",
  context_window=1234,
  is_chat_model=True,  # supports chat completions
  is_function_calling_model=True # supports tools/functions in the api
)


In [10]:
from trulens.providers.openai import OpenAI
from trulens_eval.feedback.provider.endpoint.openai import OpenAIClient
from trulens_eval.utils.pyschema import Class
import openai as oai

# Define the client class and client kwargs
client_cls = Class.of_class(oai.OpenAI)
client_kwargs = {
    "api_key": "alanliuxiang",
    "base_url": "http://localhost:8989/v1"
}

# Initialize the OpenAIClient with the custom base URL
client = OpenAIClient(client_cls=client_cls, client_kwargs=client_kwargs)

provider = OpenAI(model_engine=MODEL_NAME,
                  client=client,
)

OpenAI parameter api_key is not serialized for DEFERRED feedback mode. If you are not using DEFERRED, you do not need to do anything. If you are using DEFERRED, try to specify this parameter through env variable or another mechanism.


In [11]:
data = {
    "query": ["what is AI?"],
    "query_id": ["1"],
    "expected_response": ["Artificial Intelligence"],
    "expected_chunks": [
        [
            {
                "text": "AI is the simulation of human intelligence processes by machines, especially computer systems.",
                "title": "AI is not a bubble :(",
                "expected_score": 0.9,
            },
            {
                "text": "AI is the evil overlod that's going to rule over all human beings.",
                "title": "AI should be feared",
                "expected_score": 0.4,
            },
            {
                "text": "AI is the future of humanity.",
                "title": "AI is the future",
                "expected_score": 0.5,
            },
        ],
    ],
}

In [12]:
from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceSplitter

# embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
embed_model = HuggingFaceEmbedding()

Settings.llm = llm
Settings.embed_model = embed_model
Settings.node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=20)
Settings.num_output = 512
Settings.context_window = 4096

In [13]:
from llama_index.core import VectorStoreIndex, StorageContext

storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)

automerging_index = VectorStoreIndex(
    leaf_nodes, storage_context=storage_context
)

automerging_index.storage_context.persist(persist_dir="./merging_index")

In [14]:
# This block of code is optional to check
# if an index file exist, then it will load it
# if not, it will rebuild it

import os
from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storage

if not os.path.exists("./merging_index"):
    storage_context = StorageContext.from_defaults()
    storage_context.docstore.add_documents(nodes)

    automerging_index = VectorStoreIndex(
            leaf_nodes,
            storage_context=storage_context,
        )

    automerging_index.storage_context.persist(persist_dir="./merging_index")
else:
    automerging_index = load_index_from_storage(
        StorageContext.from_defaults(persist_dir="./merging_index"),
    )


Loading llama_index.core.storage.kvstore.simple_kvstore from ./merging_index/docstore.json.
Loading llama_index.core.storage.kvstore.simple_kvstore from ./merging_index/index_store.json.


### Defining the retriever and running the query engine

In [15]:
from llama_index.core.postprocessor import SentenceTransformerRerank
from llama_index.core.retrievers import AutoMergingRetriever
from llama_index.core.postprocessor import MetadataReplacementPostProcessor
from llama_index.core.query_engine import RetrieverQueryEngine

automerging_retriever = automerging_index.as_retriever(
    similarity_top_k=12
)

retriever = AutoMergingRetriever(
    automerging_retriever, 
    automerging_index.storage_context, 
    verbose=True
)

rerank = SentenceTransformerRerank(top_n=6, model="BAAI/bge-reranker-base")

auto_merging_engine = RetrieverQueryEngine.from_args(
    automerging_retriever, node_postprocessors=[rerank]
)

In [16]:
auto_merging_response = auto_merging_engine.query(
    "What is the importance of networking in AI?"
)

In [17]:
from llama_index.core.response.notebook_utils import display_response

display_response(auto_merging_response)

**`Final Response:`** Networking plays a significant role in the AI field, as it can provide valuable support and opportunities for collaboration. While some individuals may find networking intimidating, such as those who prefer solitary activities, the benefits of a strong professional network in AI are substantial. This network can offer help and advice during challenging times, and the influence of colleagues can positively impact one's work ethic and approach to AI development. Furthermore, being part of a supportive community can foster a sense of belonging and encourage continuous learning and improvement, which are crucial in the rapidly evolving AI landscape.

## Putting it all Together

In [18]:
def build_automerging_index(
    documents,
    llm,
    embed_model=HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5"),
    save_dir="merging_index",
    chunk_sizes=None,
):
    chunk_sizes = chunk_sizes or [2048, 512, 128]
    node_parser = HierarchicalNodeParser.from_defaults(chunk_sizes=chunk_sizes)
    nodes = node_parser.get_nodes_from_documents(documents)
    leaf_nodes = get_leaf_nodes(nodes)
    
    storage_context = StorageContext.from_defaults()
    storage_context.docstore.add_documents(nodes)

    if not os.path.exists(save_dir):
        automerging_index = VectorStoreIndex(
            leaf_nodes, 
            storage_context=storage_context
        )
        automerging_index.storage_context.persist(persist_dir=save_dir)
    else:
        automerging_index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=save_dir),
        )
    return automerging_index


def get_automerging_query_engine(
    automerging_index,
    similarity_top_k=12,
    rerank_top_n=6,
):
    base_retriever = automerging_index.as_retriever(similarity_top_k=similarity_top_k)
    retriever = AutoMergingRetriever(
        base_retriever, automerging_index.storage_context, verbose=True
    )
    rerank = SentenceTransformerRerank(
        top_n=rerank_top_n, model="BAAI/bge-reranker-base"
    )
    auto_merging_engine = RetrieverQueryEngine.from_args(
        retriever, node_postprocessors=[rerank]
    )
    return auto_merging_engine

In [19]:
index = build_automerging_index(
    [document],
    llm=llm,
    save_dir="./merging_index",
)


Loading llama_index.core.storage.kvstore.simple_kvstore from ./merging_index/docstore.json.
Loading llama_index.core.storage.kvstore.simple_kvstore from ./merging_index/index_store.json.


In [20]:
query_engine = get_automerging_query_engine(index, similarity_top_k=6)

## TruLens Evaluation

In [21]:
from trulens_eval import Tru

Tru().reset_database()

🦑 Initialized with db url sqlite:///default.sqlite .
🛑 Secret keys may be written to the database. See the `database_redact_keys` option of `TruSession` to prevent this.


Updating app_name and app_version in apps table: 0it [00:00, ?it/s]
Updating app_id in records table: 0it [00:00, ?it/s]
Updating app_json in apps table: 0it [00:00, ?it/s]


### Two layers

In [22]:
auto_merging_index_0 = build_automerging_index(
    documents,
    llm=llm,
    embed_model=embed_model,
    save_dir="merging_index_0",
    chunk_sizes=[2048,512],
)

In [23]:
auto_merging_engine_0 = get_automerging_query_engine(
    auto_merging_index_0,
    similarity_top_k=12,
    rerank_top_n=6,
)

In [24]:
from utils import get_prebuilt_trulens_recorder

tru_recorder = get_prebuilt_trulens_recorder(
    auto_merging_engine_0,
    app_id ='app_0',
    provider=provider,
    data=data
)

✅ In Answer Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Answer Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input args will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input kwargs will be set to __record__.calls[-1].rets.source_nodes[:].node.text .


Updating app_name and app_version in apps table: 0it [00:00, ?it/s]
Updating app_id in records table: 0it [00:00, ?it/s]
Updating app_json in apps table: 0it [00:00, ?it/s]


✅ In Groundedness, input prompt will be set to __record__.calls[-1].rets.source_nodes[:].node.text .
✅ In Groundedness, input response will be set to __record__.main_output or `Select.RecordOutput` .
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.embeddings.multi_modal_base.MultiModalEmbedding'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.base.embeddings.base.BaseEmbedding'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.schema.TransformComponent'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.schema.BaseComponent'>
instrumenting <class 

In [25]:
eval_questions = []
with open('generated_questions.text', 'r') as file:
    for line in file:
        # Remove newline character and convert to integer
        item = line.strip()
        eval_questions.append(item)

In [26]:
def run_evals(eval_questions, tru_recorder, query_engine):
    for question in eval_questions:
        with tru_recorder as recording:
            response = query_engine.query(question)

In [27]:
run_evals(eval_questions, tru_recorder, auto_merging_engine_0)

> Merging 2 nodes into parent node.
> Parent node id: f9d61ff4-54fc-405c-abfa-847633fbdbf3.
> Parent node text: PAGE 20
Working on projects requires making tough choices about what to build and how to go 
abou...

> Merging 1 nodes into parent node.
> Parent node id: c49f5a9b-a51a-4227-a93c-6c84132ed7a1.
> Parent node text: PAGE 15
One of the most important skills of an AI architect is the ability to identify ideas that...

> Merging 1 nodes into parent node.
> Parent node id: 71ae69ad-4185-408c-9941-ff3c93e5b934.
> Parent node text: PAGE 16
Determine milestones. Once you’ve deemed a project sufficiently 
valuable, the next step ...

> Merging 1 nodes into parent node.
> Parent node id: f5093645-b393-4903-9229-7bf6da71e4f1.
> Parent node text: PAGE 23
Each project is only one step on a longer journey, hopefully one that has a positive impa...

> Merging 1 nodes into parent node.
> Parent node id: 509d2860-7675-490c-82eb-ccf2076b1bb1.
> Parent node text: PAGE 22
Over the course of a car

In [28]:
from trulens_eval import Tru

Tru().get_leaderboard(app_ids=[])

Unnamed: 0_level_0,Unnamed: 1_level_0,latency,total_cost
app_name,app_version,Unnamed: 2_level_1,Unnamed: 3_level_1
App_1,base,4.739812,0.0


In [29]:
Tru().run_dashboard()

Starting dashboard ...


Accordion(children=(VBox(children=(VBox(children=(Label(value='STDOUT'), Output())), VBox(children=(Label(valu…

Dashboard started at http://localhost:53599 .


<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>

### Three layers

In [30]:
auto_merging_index_1 = build_automerging_index(
    documents,
    llm=llm,
    embed_model=embed_model,
    save_dir="merging_index_1",
    chunk_sizes=[2048,512,128],
)

In [31]:
auto_merging_engine_1 = get_automerging_query_engine(
    auto_merging_index_1,
    similarity_top_k=12,
    rerank_top_n=6,
)


In [32]:
tru_recorder = get_prebuilt_trulens_recorder(
    auto_merging_engine_1,
    app_id ='app_1',
    provider=provider,
    data=data
)

✅ In Answer Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Answer Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input args will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input kwargs will be set to __record__.calls[-1].rets.source_nodes[:].node.text .


Updating app_name and app_version in apps table: 0it [00:00, ?it/s]
Updating app_id in records table: 0it [00:00, ?it/s]
Updating app_json in apps table: 0it [00:00, ?it/s]


✅ In Groundedness, input prompt will be set to __record__.calls[-1].rets.source_nodes[:].node.text .
✅ In Groundedness, input response will be set to __record__.main_output or `Select.RecordOutput` .
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.embeddings.multi_modal_base.MultiModalEmbedding'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.base.embeddings.base.BaseEmbedding'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.schema.TransformComponent'>
instrumenting <class 'llama_index.embeddings.huggingface.base.HuggingFaceEmbedding'> for base <class 'llama_index.core.schema.BaseComponent'>
instrumenting <class 

In [33]:
run_evals(eval_questions, tru_recorder, auto_merging_engine_1)

> Merging 4 nodes into parent node.
> Parent node id: b5960e4f-aa98-401e-945a-ddfe2420cafd.
> Parent node text: PAGE 20
Working on projects requires making tough choices about what to build and how to go 
abou...

> Merging 2 nodes into parent node.
> Parent node id: 9dbc9bbf-d6b8-4ca4-a2a6-9ea44affca23.
> Parent node text: But when committing to a direction means making a costly investment or entering a one-
way door (...

> Merging 2 nodes into parent node.
> Parent node id: 123d34c5-9cc3-4568-8cc8-13013aa9c2ae.
> Parent node text: PAGE 20
Working on projects requires making tough choices about what to build and how to go 
abou...



In [34]:
from trulens_eval import Tru

Tru().get_leaderboard(app_ids=[])

Unnamed: 0_level_0,Unnamed: 1_level_0,latency,total_cost
app_name,app_version,Unnamed: 2_level_1,Unnamed: 3_level_1
App_1,base,4.031855,0.0


In [35]:
Tru().run_dashboard()

Starting dashboard ...
Dashboard already running at path:   Local URL: http://localhost:53599



<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>