<a href="https://colab.research.google.com/github/sanjayakanungo/RAG/blob/main/docs/examples/retrievers/auto_merging_retriever.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/examples/retrievers/auto_merging_retriever.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Auto Merging Retriever

In this notebook, we showcase our `AutoMergingRetriever`, which looks at a set of leaf nodes and recursively "merges" subsets of leaf nodes that reference a parent node beyond a given threshold. This allows us to consolidate potentially disparate, smaller contexts into a larger context that might help synthesis.

You can define this hierarchy yourself over a set of documents, or you can make use of our brand-new text parser: a HierarchicalNodeParser that takes in a candidate set of documents and outputs an entire hierarchy of nodes, from "coarse-to-fine".

In [1]:
%pip install llama-index-llms-openai
%pip install llama-index-readers-file

Collecting llama-index-llms-openai
  Downloading llama_index_llms_openai-0.1.7-py3-none-any.whl (9.3 kB)
Collecting llama-index-core<0.11.0,>=0.10.1 (from llama-index-llms-openai)
  Downloading llama_index_core-0.10.15-py3-none-any.whl (15.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.3/15.3 MB[0m [31m53.3 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json (from llama-index-core<0.11.0,>=0.10.1->llama-index-llms-openai)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting deprecated>=1.2.9.3 (from llama-index-core<0.11.0,>=0.10.1->llama-index-llms-openai)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)
Collecting dirtyjson<2.0.0,>=1.0.8 (from llama-index-core<0.11.0,>=0.10.1->llama-index-llms-openai)
  Downloading dirtyjson-1.0.8-py3-none-any.whl (25 kB)
Collecting httpx (from llama-index-core<0.11.0,>=0.10.1->llama-index-llms-openai)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
!pip install llama-index

Collecting llama-index
  Downloading llama_index-0.10.15-py3-none-any.whl (5.6 kB)
Collecting llama-index-agent-openai<0.2.0,>=0.1.4 (from llama-index)
  Downloading llama_index_agent_openai-0.1.5-py3-none-any.whl (12 kB)
Collecting llama-index-cli<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_cli-0.1.7-py3-none-any.whl (25 kB)
Collecting llama-index-embeddings-openai<0.2.0,>=0.1.5 (from llama-index)
  Downloading llama_index_embeddings_openai-0.1.6-py3-none-any.whl (6.0 kB)
Collecting llama-index-indices-managed-llama-cloud<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.1.3-py3-none-any.whl (6.6 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama-index)
  Downloading llama_index_legacy-0.9.48-py3-none-any.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m13.2 MB/s[0m eta [36m0:00:00[0m
Collecting llama-index-multi-modal-llms-openai<0.2.0,>=0.1.3 (from llama-index)
  Downl

In [5]:
from google.colab import drive
drive.mount("/content/drive")

Mounted at /content/drive


By default, the PDF reader creates a separate doc for each page.
For the sake of this notebook, we stitch docs together into one doc.
This will help us better highlight auto-merging capabilities that "stitch" chunks together later on.

In [8]:
!pip install pypdf
from llama_index.core import SimpleDirectoryReader
docs0 = SimpleDirectoryReader("/content/drive/MyDrive/GENAI-Pinnacle/VCFdataset").load_data()



In [9]:
len(docs0)

771

In [10]:
from llama_index.core import Document

doc_text = "\n\n".join([d.get_content() for d in docs0])
docs = [Document(text=doc_text)]

## Parse Chunk Hierarchy from Text, Load into Storage

In this section we make use of the `HierarchicalNodeParser`. This will output a hierarchy of nodes, from top-level nodes with bigger chunk sizes to child nodes with smaller chunk sizes, where each child node has a parent node with a bigger chunk size.

By default, the hierarchy is:
- 1st level: chunk size 2048
- 2nd level: chunk size 512
- 3rd level: chunk size 128


We then load these nodes into storage. The leaf nodes are indexed and retrieved via a vector store - these are the nodes that will first be directly retrieved via similarity search. The other nodes will be retrieved from a docstore.

In [11]:
from llama_index.core.node_parser import (
    HierarchicalNodeParser,
    SentenceSplitter,
)

In [12]:
node_parser = HierarchicalNodeParser.from_defaults()

In [13]:
nodes = node_parser.get_nodes_from_documents(docs)

In [14]:
len(nodes)

3832

Here we import a simple helper function for fetching "leaf" nodes within a node list.
These are nodes that don't have children of their own.

In [15]:
from llama_index.core.node_parser import get_leaf_nodes, get_root_nodes

In [16]:
leaf_nodes = get_leaf_nodes(nodes)

In [17]:
len(leaf_nodes)

2997

In [18]:
root_nodes = get_root_nodes(nodes)

In [19]:
len(root_nodes)

142

### Load into Storage

We define a docstore, which we load all nodes into.

We then define a `VectorStoreIndex` containing just the leaf-level nodes.

In [20]:
import os
import openai
os.environ["OPENAI_API_KEY"] = ""

In [21]:
# define storage context
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.core import StorageContext
from llama_index.llms.openai import OpenAI

docstore = SimpleDocumentStore()

# insert nodes into docstore
docstore.add_documents(nodes)

# define storage context (will include vector store by default too)
storage_context = StorageContext.from_defaults(docstore=docstore)

llm = OpenAI(model="gpt-3.5-turbo")

In [22]:
## Load index into vector index
from llama_index.core import VectorStoreIndex

base_index = VectorStoreIndex(
    leaf_nodes,
    storage_context=storage_context,
)

## Define Retriever

In [23]:
from llama_index.core.retrievers import AutoMergingRetriever

In [25]:
base_retriever = base_index.as_retriever(similarity_top_k=6)
retriever = AutoMergingRetriever(base_retriever, storage_context, verbose=True)

In [48]:
query_str = (
    "what are deployment options for VMware Cloud Foundation?"
    "How does it impact licensing?"
)
nodes = retriever.retrieve(query_str)
base_nodes = base_retriever.retrieve(query_str)

In [49]:
len(nodes)

6

In [50]:
len(base_nodes)

6

In [51]:
from llama_index.core.response.notebook_utils import display_source_node

for node in nodes:
    display_source_node(node, source_length=10000)

**Node ID:** e95113f2-1003-4728-bc7f-84f9557c8a4d<br>**Similarity:** 0.85902710475225<br>**Text:** nLicensing  (only for perpetual licensing) allows you to 
manage VMware product licenses. You can also add 
licenses for the component products in your VMware 
Cloud Foundation  deployment.
nSubscription  (only for subscription licensing) provides 
details about completing the onboarding process 
for VMware Cloud Foundation+. Once onboarding is 
complete, it provides a link to review subscription 
usage in the VMware Cloud console.
nSingle Sign On  allows you to manage VMware 
Cloud Foundation  users and groups, including adding 
users and groups and assigning roles. You can 
also configure identity providers for VMware Cloud 
Foundation .<br>

**Node ID:** a999280b-7f90-4766-b8a5-fc318ed42c9d<br>**Similarity:** 0.8586523012599151<br>**Text:** Keyless licensing mode for a cloud-connected subscription
To deploy in keyless licensing mode, select Yes for Use Keyless Licensing . You do not have to 
enter any license keys.
Important    If you deploy VMware Cloud Foundation  in keyless licensing mode, you cannot switch 
to keyed licensing mode without doing a full bring-up rebuild.VMware Cloud Foundation Deployment Guide
VMware, Inc. 29

See Using VMware Cloud Foundation with VMware Cloud  for more information about cloud-
connected subscription.<br>

**Node ID:** b56bd21d-c82a-453f-9488-f30f4d87571b<br>**Similarity:** 0.8580576160999389<br>**Text:** Note    This option is only available for new VMware Cloud Foundation  installations and the 
setting you apply during bring-up will be used for future upgrades. You cannot change the 
FIPS security mode setting after bring-up.
Deployment Parameters Worksheet: License Keys
Use the License Keys section of the deployment parameters worksheet to choose keyed or 
keyless licensing mode.VMware Cloud Foundation Deployment Guide
VMware, Inc. 28

Keyed licensing mode
To deploy in keyed licensing mode, select No for Use Keyless Licensing .<br>

**Node ID:** fcd93336-e1ce-45dc-9d6c-d4edc603da90<br>**Similarity:** 0.8553990348120664<br>**Text:** Table 1-3. Best Practices for License Operations in VMware Cloud Foundation
Operation Licensing Model When or How Often Description
Add licenses. Key-based Insufficient license capacity 
for expanding an 
environment.To add license keys 
manually, use the SDDC 
Manager UI. See Managing 
License Keys  in the 
VMware Cloud Foundation 
Administration Guide .
You can automate adding 
license keys by using the 
VMware Cloud Foundation 
API. See License Keys 
in the VMware Cloud 
Foundation  API reference 
documentation.<br>

**Node ID:** ff193b3e-7ddd-46f2-bd06-2f8df998fe11<br>**Similarity:** 0.8535818249000939<br>**Text:** 2Prepare ESXi Hosts for VMware Cloud Foundation
3Deploy the Management Domain Using VMware Cloud Builder
The VMware Cloud Foundation  deployment process is referred to as bring-up. You specify 
deployment information specific to your environment such as networks, hosts, license keys, 
and other information in the deployment parameter workbook and upload the file to the 
VMware Cloud Builder appliance  to initiate bring-up of the management domain.
VMware, Inc.<br>

**Node ID:** 596bf4ce-9b7c-4e99-aa62-9d01077cc331<br>**Similarity:** 0.853185786839447<br>**Text:** Copyright and trademark information.VMware Cloud Foundation Deployment Guide
VMware, Inc. 2

Contents
About the VMware Cloud Foundation  Deployment Guide 4
1Preparing your Environment for VMware Cloud Foundation 5
2Deploying VMware Cloud Foundation 6
Deploy VMware Cloud Builder Appliance 7
Prepare ESXi Hosts for VMware Cloud Foundation 9
Create a Custom ISO Image for ESXi 10
Create a Custom ESXi ISO Image Using VMware PowerCLI 10
Create a Custom ESXi ISO Image Using vSphere Lifecycle Manager 12
Install ESXi Interactively and Configure Hosts for VMware Cloud Foundation<br>

In [52]:
for node in base_nodes:
    display_source_node(node, source_length=10000)

**Node ID:** e95113f2-1003-4728-bc7f-84f9557c8a4d<br>**Similarity:** 0.85902710475225<br>**Text:** nLicensing  (only for perpetual licensing) allows you to 
manage VMware product licenses. You can also add 
licenses for the component products in your VMware 
Cloud Foundation  deployment.
nSubscription  (only for subscription licensing) provides 
details about completing the onboarding process 
for VMware Cloud Foundation+. Once onboarding is 
complete, it provides a link to review subscription 
usage in the VMware Cloud console.
nSingle Sign On  allows you to manage VMware 
Cloud Foundation  users and groups, including adding 
users and groups and assigning roles. You can 
also configure identity providers for VMware Cloud 
Foundation .<br>

**Node ID:** a999280b-7f90-4766-b8a5-fc318ed42c9d<br>**Similarity:** 0.8586523012599151<br>**Text:** Keyless licensing mode for a cloud-connected subscription
To deploy in keyless licensing mode, select Yes for Use Keyless Licensing . You do not have to 
enter any license keys.
Important    If you deploy VMware Cloud Foundation  in keyless licensing mode, you cannot switch 
to keyed licensing mode without doing a full bring-up rebuild.VMware Cloud Foundation Deployment Guide
VMware, Inc. 29

See Using VMware Cloud Foundation with VMware Cloud  for more information about cloud-
connected subscription.<br>

**Node ID:** b56bd21d-c82a-453f-9488-f30f4d87571b<br>**Similarity:** 0.8580576160999389<br>**Text:** Note    This option is only available for new VMware Cloud Foundation  installations and the 
setting you apply during bring-up will be used for future upgrades. You cannot change the 
FIPS security mode setting after bring-up.
Deployment Parameters Worksheet: License Keys
Use the License Keys section of the deployment parameters worksheet to choose keyed or 
keyless licensing mode.VMware Cloud Foundation Deployment Guide
VMware, Inc. 28

Keyed licensing mode
To deploy in keyed licensing mode, select No for Use Keyless Licensing .<br>

**Node ID:** fcd93336-e1ce-45dc-9d6c-d4edc603da90<br>**Similarity:** 0.8553990348120664<br>**Text:** Table 1-3. Best Practices for License Operations in VMware Cloud Foundation
Operation Licensing Model When or How Often Description
Add licenses. Key-based Insufficient license capacity 
for expanding an 
environment.To add license keys 
manually, use the SDDC 
Manager UI. See Managing 
License Keys  in the 
VMware Cloud Foundation 
Administration Guide .
You can automate adding 
license keys by using the 
VMware Cloud Foundation 
API. See License Keys 
in the VMware Cloud 
Foundation  API reference 
documentation.<br>

**Node ID:** ff193b3e-7ddd-46f2-bd06-2f8df998fe11<br>**Similarity:** 0.8535818249000939<br>**Text:** 2Prepare ESXi Hosts for VMware Cloud Foundation
3Deploy the Management Domain Using VMware Cloud Builder
The VMware Cloud Foundation  deployment process is referred to as bring-up. You specify 
deployment information specific to your environment such as networks, hosts, license keys, 
and other information in the deployment parameter workbook and upload the file to the 
VMware Cloud Builder appliance  to initiate bring-up of the management domain.
VMware, Inc.<br>

**Node ID:** 596bf4ce-9b7c-4e99-aa62-9d01077cc331<br>**Similarity:** 0.853185786839447<br>**Text:** Copyright and trademark information.VMware Cloud Foundation Deployment Guide
VMware, Inc. 2

Contents
About the VMware Cloud Foundation  Deployment Guide 4
1Preparing your Environment for VMware Cloud Foundation 5
2Deploying VMware Cloud Foundation 6
Deploy VMware Cloud Builder Appliance 7
Prepare ESXi Hosts for VMware Cloud Foundation 9
Create a Custom ISO Image for ESXi 10
Create a Custom ESXi ISO Image Using VMware PowerCLI 10
Create a Custom ESXi ISO Image Using vSphere Lifecycle Manager 12
Install ESXi Interactively and Configure Hosts for VMware Cloud Foundation<br>

## Plug it into Query Engine

In [53]:
from llama_index.core.query_engine import RetrieverQueryEngine

In [54]:
query_engine = RetrieverQueryEngine.from_args(retriever)
base_query_engine = RetrieverQueryEngine.from_args(base_retriever)

In [55]:
response = query_engine.query(query_str)

In [57]:
print(str(response))

The deployment options for VMware Cloud Foundation are keyed licensing mode and keyless licensing mode. Keyed licensing mode requires entering license keys manually, while keyless licensing mode does not require entering any license keys. The choice between these two modes impacts how licenses are managed and applied during the deployment process.


In [58]:
base_response = base_query_engine.query(query_str)

In [59]:
print(str(base_response))

The deployment options for VMware Cloud Foundation are keyless licensing mode and keyed licensing mode. In keyless licensing mode, users do not need to enter any license keys during deployment, while in keyed licensing mode, users need to manually add license keys. The choice between these deployment options impacts how licenses are managed and applied within the VMware Cloud Foundation environment.
