# Complex Query Resolution through LlamaIndex Utilizing Recursive Retrieval, Document Agents, and Sub Question Query Decomposition

In this notebook, we experiment with the aim of resolving complex queries by leveraging LlamaIndex, integrating Recursive Retrieval, Document Agents, and Sub Question Query Engine. We navigate through a structured pathway using 3 query engines as tools for Document Agents - Vector Index, Summary Index, and Knowledge Graph Index ensuring that the system adeptly manages multifaceted inquiries, providing coherent and contextually rich responses by seamlessly integrating and synthesizing information from diverse documents and data sources.

![llamaindex-rr-da-sqe.png](../assets/img/llamaindex-rr-da-sqe.png)

Let's start by installing the dependencies and importing the necessary libraries.

In [None]:
%pip install llama-index pinecone-client transformers neo4j python-dotenv

In [2]:
import nest_asyncio

nest_asyncio.apply()

In [3]:
import os
import torch
import pinecone
from transformers import pipeline
from llama_index import (
    VectorStoreIndex,
    SummaryIndex,
    KnowledgeGraphIndex,
    SimpleKeywordTableIndex,
    SimpleDirectoryReader,
    ServiceContext,
    StorageContext
)
from dotenv import load_dotenv
from llama_index.schema import IndexNode
from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_index.llms import OpenAI
from llama_index.query_engine import SubQuestionQueryEngine
from llama_index.retrievers import RecursiveRetriever
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.response_synthesizers import get_response_synthesizer
from llama_index.vector_stores import PineconeVectorStore
from llama_index.graph_stores import Neo4jGraphStore

  from tqdm.autonotebook import tqdm


## Data Preparation

We will work with 3 Wikipedia pages. Extract the data, store it, and finally load it for further processing.

In [4]:
wiki_titles = ["Seattle", "Boston", "Chicago"]

In [5]:
from pathlib import Path

import requests

for title in wiki_titles:
    response = requests.get(
        "https://en.wikipedia.org/w/api.php",
        params={
            "action": "query",
            "format": "json",
            "titles": title,
            "prop": "extracts",
            # 'exintro': True,
            "explaintext": True,
        },
    ).json()
    page = next(iter(response["query"]["pages"].values()))
    wiki_text = page["extract"]

    data_path = Path("data")
    if not data_path.exists():
        Path.mkdir(data_path)

    with open(data_path / f"{title}.txt", "w") as fp:
        fp.write(wiki_text)

In [6]:
# Load all wiki documents
city_docs = {}
for wiki_title in wiki_titles:
    city_docs[wiki_title] = SimpleDirectoryReader(
        input_files=[f"data/{wiki_title}.txt"]
    ).load_data()

In [7]:
load_dotenv()
os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')

In [8]:
llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(llm=llm)

[nltk_data] Downloading package punkt to /tmp/llama_index...
[nltk_data]   Unzipping tokenizers/punkt.zip.


## Build Document Agent for each Document

Now we define document agents for each document.
Before that we define - a vector index (for semantic search), summary index (for summarization), and a graph index (for structural semantic search) for each document. These 3 query engines are then converted into tools that are passed to an OpenAI function calling agent.
This document agent can dynamically choose to perform semantic search over vector index or graph index or summarization within a given document.
We create a separate document agent for each city.

### Vector Storage Context

We create a Pinecone vector index with specified parameters like dimension and metric. And a vector storage context is established, utilizing Pinecone's vector store, to manage and facilitate the efficient storage and retrieval of the vector index data within the LlamaIndex framework.

In [9]:
# init pinecone
os.environ["PINECONE_API_KEY"] = os.getenv('PINECONE_API_KEY')
os.environ["PINECONE_ENVIRONMENT"] = os.getenv('PINECONE_ENVIRONMENT')
pinecone.init(api_key=os.environ["PINECONE_API_KEY"], environment=os.environ["PINECONE_ENVIRONMENT"])
pinecone.create_index("vector-index", dimension=1536, metric="euclidean", pod_type="p1")

# construct vector store and customize storage context
vector_storage_context = StorageContext.from_defaults(
    vector_store=PineconeVectorStore(pinecone.Index("vector-index"))
)

### Graph Storage Context

In this section, we build a knowledge graph from scratch using Relation Extraction By End-to-end Language generation (REBEL), LlamaIndex, and Neo4j. REBEL is a relation extraction model which uses a BART model to convert raw sentences into relation triplets. We essentially construct a knowledge graph from unstructured data for efficient granular knowledge retrieval. Lastly, we utilize Neo4j's graph store, to manage and facilitate the efficient storage and retrieval of the graph data within the LlamaIndex framework.

![p4-kg.png](../assets/img/p4-kg.png)

In [10]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

triplet_extractor = pipeline(
    'text2text-generation',
    model='Babelscape/rebel-large',
    tokenizer='Babelscape/rebel-large',
    device=device)

cpu


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.42k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/1.23k [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/123 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/344 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [11]:
import re

def clean_triplets(input_text, triplets):
    """Sometimes the model hallucinates, so we filter out entities
       not present in the text"""
    text = input_text.lower()
    clean_triplets = []
    for triplet in triplets:

        if (triplet["head"] == triplet["tail"]):
            continue

        head_match = re.search(
            r'\b' + re.escape(triplet["head"].lower()) + r'\b', text)
        if head_match:
            head_index = head_match.start()
        else:
            head_index = text.find(triplet["head"].lower())

        tail_match = re.search(
            r'\b' + re.escape(triplet["tail"].lower()) + r'\b', text)
        if tail_match:
            tail_index = tail_match.start()
        else:
            tail_index = text.find(triplet["tail"].lower())

        if ((head_index == -1) or (tail_index == -1)):
            continue

        clean_triplets.append((triplet["head"], triplet["type"], triplet["tail"]))

    return clean_triplets

def extract_triplets(input_text):
    text = triplet_extractor.tokenizer.batch_decode([triplet_extractor(input_text, return_tensors=True, return_text=False)[0]["generated_token_ids"]])[0]

    triplets = []
    relation, subject, relation, object_ = '', '', '', ''
    text = text.strip()
    current = 'x'
    for token in text.replace("<s>", "").replace("<pad>", "").replace("</s>", "").split():
        if token == "<triplet>":
            current = 't'
            if relation != '':
                triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
                relation = ''
            subject = ''
        elif token == "<subj>":
            current = 's'
            if relation != '':
                triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
            object_ = ''
        elif token == "<obj>":
            current = 'o'
            relation = ''
        else:
            if current == 't':
                subject += ' ' + token
            elif current == 's':
                object_ += ' ' + token
            elif current == 'o':
                relation += ' ' + token

    if subject != '' and relation != '' and object_ != '':
        triplets.append({'head': subject.strip(), 'type': relation.strip(), 'tail':object_.strip()})
    clean = clean_triplets(input_text, triplets)
    return clean

In [12]:
os.environ["NEO4J_URI"] = os.getenv('NEO4J_URI')
os.environ["NEO4J_USERNAME"] = os.getenv('NEO4J_USERNAME')
os.environ["NEO4J_PASSWORD"] = os.getenv('NEO4J_PASSWORD')
os.environ["NEO4J_DB"] = os.getenv('NEO4J_DB')

graph_store = Neo4jGraphStore(
    username=os.environ["NEO4J_USERNAME"],
    password=os.environ["NEO4J_PASSWORD"],
    url=os.environ["NEO4J_URI"],
    database=os.environ["NEO4J_DB"],
)

graph_storage_context = StorageContext.from_defaults(graph_store=graph_store)

In [13]:
from llama_index.agent import OpenAIAgent

# Build agents dictionary
agents = {}

for wiki_title in wiki_titles:
    # build vector index
    vector_index = VectorStoreIndex.from_documents(
        city_docs[wiki_title], service_context=service_context, storage_context=vector_storage_context
    )
    # build summary index
    summary_index = SummaryIndex.from_documents(
        city_docs[wiki_title], service_context=service_context
    )
    graph_index = KnowledgeGraphIndex.from_documents(
        city_docs[wiki_title],
        storage_context=graph_storage_context,
        kg_triplet_extract_fn=extract_triplets,
        service_context=ServiceContext.from_defaults(llm=llm, chunk_size=256)
    )
    # define query engines
    vector_query_engine = vector_index.as_query_engine()
    list_query_engine = summary_index.as_query_engine()
    graph_query_engine = graph_index.as_query_engine()

    # define tools
    query_engine_tools = [
        QueryEngineTool(
            query_engine=vector_query_engine,
            metadata=ToolMetadata(
                name="vector_tool",
                description=f"Useful for retrieving specific context from {wiki_title}",
            ),
        ),
        QueryEngineTool(
            query_engine=list_query_engine,
            metadata=ToolMetadata(
                name="summary_tool",
                description=f"Useful for summarization questions related to {wiki_title}",
            ),
        ),
        QueryEngineTool(
            query_engine=graph_query_engine,
            metadata=ToolMetadata(
                name="graph_tool",
                description=f"Useful for retrieving structural, interconnected and relational knowledge related to {wiki_title}",
            ),
        ),
    ]

    # build agent
    function_llm = OpenAI(model="gpt-3.5-turbo-0613")
    agent = OpenAIAgent.from_tools(
        query_engine_tools,
        llm=function_llm,
        verbose=True,
    )

    agents[wiki_title] = agent

Upserted vectors:   0%|          | 0/17 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/17 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/23 [00:00<?, ?it/s]

## Build Recursive Retriever over these Agents

A set of summary nodes is established, each correlating to a respective Wikipedia city article. Subsequently, a RecursiveRetriever is configured in front these nodes, orchestrating the routing of queries to an appropriate node. This node, in turn, directs the query to the pertinent document agent, ensuring a structured pathway for query navigation and retrieval within the system.

In [14]:
# define top-level nodes
nodes = []
for wiki_title in wiki_titles:
    wiki_summary = (
        f"This content contains Wikipedia articles about {wiki_title}. "
        f"Use this index if you need to lookup specific facts about {wiki_title}.\n"
        "Do not use this index if you want to analyze multiple cities."
    )
    node = IndexNode(text=wiki_summary, index_id=wiki_title)
    nodes.append(node)

In [15]:
# define top-level retriever
top_vector_index = VectorStoreIndex(nodes)
vector_retriever = top_vector_index.as_retriever(similarity_top_k=1)

In [16]:
# define recursive retriever
recursive_retriever = RecursiveRetriever(
    "vector",
    retriever_dict={"vector": vector_retriever},
    query_engine_dict=agents,
    verbose=True,
)

response_synthesizer = get_response_synthesizer(
    response_mode="compact",
)
retriever_query_engine = RetrieverQueryEngine.from_args(
    recursive_retriever,
    response_synthesizer=response_synthesizer,
    service_context=service_context,
)

## Setup sub question query engine

In this pivotal step, the recursive retriever, encapsulating the document agent and various query engines, is transformed into a tool. This tool, characterized by its ability to access documents through recursive retrieval, is subsequently integrated into the SubQuestionQueryEngine. This ensures that the engine not only inherits the capabilities of the recursive retriever but also leverages its functionalities to effectively decompose and navigate through complex, multi-faceted queries, providing a structured and efficient pathway for extracting and synthesizing information from multiple documents.

In [17]:
# convert the recursive retriever into a tool
query_engine_tools = [
    QueryEngineTool(
        query_engine=retriever_query_engine,
        metadata=ToolMetadata(
            name="recursive_retriever",
            description="Recursive retriever for accessing documents"
        ),
    ),
]

In [18]:
# setup sub question query engine
query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    service_context=service_context,
    use_async=True,
)

In [20]:
response = query_engine.query(
    "Tell me about the sports teams in Boston and the positive aspects of Seattle"
)
print(response)

Generated 2 sub questions.
[1;3;38;2;237;90;200m[recursive_retriever] Q: What are the sports teams in Boston?
[0m[1;3;34mRetrieving with query id None: What are the sports teams in Boston?
[0m[1;3;38;5;200mRetrieved node with id, entering: Boston
[0m[1;3;34mRetrieving with query id Boston: What are the sports teams in Boston?
[0m=== Calling Function ===
Calling function: vector_tool with args: {
  "input": "sports teams in Boston"
}
Got output: Boston has teams in the four major North American men's professional sports leagues, which are Major League Baseball, the National Football League, the National Basketball Association, and the National Hockey League. Additionally, Boston has a team in Major League Soccer.
[1;3;32mGot response: The sports teams in Boston include:

1. Boston Red Sox: The Boston Red Sox are a professional baseball team and a member of Major League Baseball (MLB). They play their home games at Fenway Park, which is the oldest ballpark in MLB.

2. New Englan