# Composable Objects

In this notebook, we show how you can combine multiple objects into a single top-level index.

This approach works by setting up `IndexNode` objects, with an `obj` field that points to a:
- query engine
- retriever
- query pipeline
- another node!

```python
object = IndexNode(index_id="my_object", obj=query_engine, text="some text about this object")
```

## Data Setup

In [None]:
!wget --user-agent "Mozilla" "https://arxiv.org/pdf/2307.09288.pdf" -O "./llama2.pdf"
!wget --user-agent "Mozilla" "https://arxiv.org/pdf/1706.03762.pdf" -O "./attention.pdf"

In [None]:
from llama_index import download_loader

PyMuPDFReader = download_loader("PyMuPDFReader")

llama2_docs = PyMuPDFReader().load_data(
    file_path="./llama2.pdf", metadata=True
)
attention_docs = PyMuPDFReader().load_data(
    file_path="./attention.pdf", metadata=True
)

## Retriever Setup

In [None]:
import os

os.environ["OPENAI_API_KEY"] = "sk-..."

In [None]:
from llama_index.node_parser import TokenTextSplitter

nodes = TokenTextSplitter(
    chunk_size=1024, chunk_overlap=128
).get_nodes_from_documents(llama2_docs + attention_docs)

In [None]:
from llama_index import VectorStoreIndex
from llama_index.retrievers import BM25Retriever

index = VectorStoreIndex(nodes=nodes)
vector_retriever = index.as_retriever(similarity_top_k=2)
bm25_retriever = BM25Retriever.from_defaults(nodes=nodes, similarity_top_k=2)

## Composing Objects

Here, we construct the `IndexNodes`. Note that the text is what is used to index the node by the top-level index.

For a vector index, the text is embedded, for a keyword index, the text is used for keywords.

In this example, the `SummaryIndex` is used, which does not technically need the text for retrieval, since it always retrieves all nodes.

In [None]:
from llama_index.schema import IndexNode

vector_obj = IndexNode(
    index_id="vector", obj=vector_retriever, text="Vector Retriever"
)
bm25_obj = IndexNode(
    index_id="bm25", obj=bm25_retriever, text="BM25 Retriever"
)

In [None]:
from llama_index import SummaryIndex

summary_index = SummaryIndex(objects=[vector_obj, bm25_obj])

## Querying

When we query, all objects will be retrieved and used to generate the nodes to get a final answer.

Using `tree_summarize` with `aquery()` ensures concurrent execution and faster responses.

In [None]:
query_engine = summary_index.as_query_engine(response_mode="tree_summarize")

In [None]:
response = await query_engine.aquery(
    "How does attention work in transformers?"
)

In [None]:
print(str(response))

Attention in transformers works by mapping a query and a set of key-value pairs to an output. The output is computed as a weighted sum of the values, where the weights are determined by the similarity between the query and the keys. In the transformer model, attention is used in three different ways: 

1. Encoder-decoder attention: The queries come from the previous decoder layer, and the memory keys and values come from the output of the encoder. This allows every position in the decoder to attend over all positions in the input sequence.

2. Self-attention in the encoder: Each position in the encoder can attend to all positions in the previous layer of the encoder.

3. Self-attention in the decoder: Each position in the decoder can attend to all positions in the decoder up to and including that position. To preserve the auto-regressive property, leftward information flow in the decoder is prevented by masking out illegal connections.

By using multi-head attention, the transformer mo

In [None]:
response = await query_engine.aquery(
    "What is the architecture of Llama2 based on?"
)

In [None]:
print(str(response))

The architecture of Llama 2 is based on the transformer model.


In [None]:
response = await query_engine.aquery(
    "What was used before attention in transformers?"
)

In [None]:
print(str(response))

Recurrent neural networks, such as long short-term memory (LSTM) and gated recurrent neural networks (GRU), were commonly used before attention in transformers. These models were widely used for sequence modeling and transduction tasks like language modeling and machine translation.
