# Dense-X-Retrieval Pack

This notebook walks through using the `DenseXRetrievalPack`, which parses documents into nodes, and then generates propositions from each node to assist with retreival.

This follows the idea from the paper [Dense X Retrieval: What Retreival Granularity Should We Use?](https://arxiv.org/abs/2312.06648).

From the paper, a proposition is described as:

```
Propositions are defined as atomic expressions within text, each encapsulating a distinct factoid and presented in a concise, self-contained natural language format.
```

We use the provided OpenAI prompt from their paper to generate propositions, which are then embedded and used to retrieve their parent node chunks.

In [3]:
!pip install python-dotenv llama-index llama-hub

Collecting python-dotenv
  Using cached python_dotenv-1.0.0-py3-none-any.whl (19 kB)
Collecting llama-index
  Downloading llama_index-0.9.16.post1-py3-none-any.whl (990 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m990.2/990.2 kB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting llama-hub
  Downloading llama_hub-0.0.60-py3-none-any.whl (10.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.9/10.9 MB[0m [31m86.4 MB/s[0m eta [36m0:00:00[0m
Collecting beautifulsoup4<5.0.0,>=4.12.2 (from llama-index)
  Downloading beautifulsoup4-4.12.2-py3-none-any.whl (142 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.0/143.0 kB[0m [31m18.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting dataclasses-json (from llama-index)
  Downloading dataclasses_json-0.6.3-py3-none-any.whl (28 kB)
Collecting deprecated>=1.2.9.3 (from llama-index)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)
Collecting httpx (from lla

In [9]:
import nest_asyncio
nest_asyncio.apply()

## Setup

In [4]:
from dotenv import load_dotenv

# Load the enviroment variables
load_dotenv()

True

For this demo we use a simple PDFReader to read and extract the documents. You can use the following section to use a more advanced document loader and extract complete documents from the PDF file.

In [5]:
from pathlib import Path
from llama_index import download_loader

PDFReader = download_loader("PDFReader")

loader = PDFReader()
documents = loader.load_data(file=Path('Attention is all you need.pdf'))

In [None]:
from llama_hub.file.unstructured import UnstructuredReader

documents = UnstructuredReader().load_data("data/Attention is all you need.pdf")

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\edumu\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\edumu\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
  from .autonotebook import tqdm as notebook_tqdm


In [6]:
documents

[Document(id_='b1caeae3-4e2e-4f69-bbfc-068e9e6dbe9b', embedding=None, metadata={'page_label': '1', 'file_name': 'Attention is all you need.pdf'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='215a4e4cfd187b7f1951b4796bf528de8e3c7a090794c107967fc03077f7dc5c', text='Attention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.comNoam Shazeer∗\nGoogle Brain\nnoam@google.comNiki Parmar∗\nGoogle Research\nnikip@google.comJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.comAidan N. Gomez∗†\nUniversity of Toronto\naidan@cs.toronto.eduŁukasz Kaiser∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propos

## Run the DenseXRetrievalPack

The `DenseXRetrievalPack` creates both a retriver and query engine.

First we download the package

In [7]:
from llama_index.llama_pack import download_llama_pack

DenseXRetrievalPack = download_llama_pack("DenseXRetrievalPack", "./dense_pack")

[nltk_data] Downloading package punkt to /tmp/llama_index...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Now, we create the retriever and the query engine from the `DenseXRetrieval` package using GPT 3.5-turbo as the LLM for the propositions extraction and for the query resolution.

In [10]:
from llama_index.llms import OpenAI
from llama_index.text_splitter import SentenceSplitter

dense_pack = DenseXRetrievalPack(
  documents,
  proposition_llm=OpenAI(model="gpt-3.5-turbo", max_tokens=750),
  query_llm=OpenAI(model="gpt-3.5-turbo", max_tokens=256),
  text_splitter=SentenceSplitter(chunk_size=1024)
)
dense_query_engine = dense_pack.query_engine

100%|██████████| 15/15 [00:41<00:00,  2.75s/it]


Generating embeddings:   0%|          | 0/304 [00:00<?, ?it/s]

Let's create a base query engine to compare the results

In [16]:
from llama_index import VectorStoreIndex

base_index = VectorStoreIndex.from_documents(documents)
base_query_engine = base_index.as_query_engine()

## Solve a Query

### How are transformers related to convolutional neural networks?

In [13]:
response = dense_query_engine.query("How are transformers related to convolutional neural networks?")

In [15]:
print(response.response)

'Transformers are related to convolutional neural networks (CNNs) in that they both are used in sequence transduction models. However, transformers differ from CNNs in their architecture. While CNNs use convolutional layers to compute hidden representations in parallel for all input and output positions, transformers rely entirely on attention mechanisms to draw global dependencies between input and output. This allows transformers to be more parallelizable and requires less time to train compared to CNN-based models.'

In [17]:
response = base_query_engine.query("How are transformers related to convolutional neural networks?")
print(response.response)

Transformers are related to convolutional neural networks (CNNs) in that they both can be used as building blocks in sequence transduction models. However, transformers differ from CNNs in their architecture. While CNNs use convolutional layers to compute hidden representations in parallel for all input and output positions, transformers rely entirely on an attention mechanism to draw global dependencies between input and output. This allows transformers to achieve more parallelization and reduce the number of operations required to relate signals from different positions.
