# Fuzzy Citation Query Engine

This notebook walks through using the `DenseXRetrievalPack`, which parses documents into nodes, and then generates propositions from each node to assist with retreival.

This follows the idea from the paper [Dense X Retrieval: What Retreival Granularity Should We Use?](https://arxiv.org/abs/2312.06648).

From the paper, a proposition is described as:

```
Propositions are defined as atomic expressions within text, each encapsulating a distinct factoid and presented in a concise, self-contained natural language format.
```

We use the provided OpenAI prompt from their paper to generate propositions, which are then embedded and used to retrieve their parent node chunks.

## Setup

In [1]:
import os

os.environ["OPENAI_API_KEY"] = "sk-..."

import nest_asyncio
nest_asyncio.apply()

In [1]:
!mkdir -p 'data/'
!curl 'https://arxiv.org/pdf/2307.09288.pdf' -o 'data/llama2.pdf'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13.0M  100 13.0M    0     0  1160k      0  0:00:11  0:00:11 --:--:-- 1574k


In [2]:
from llama_hub.file.unstructured import UnstructuredReader

documents = UnstructuredReader().load_data("data/llama2.pdf")

[nltk_data] Downloading package punkt to /home/loganm/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/loganm/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
  from .autonotebook import tqdm as notebook_tqdm


## Run the FuzzyCitationEnginePack

The `FuzzyCitationEnginePack` can wrap any existing query engine.

In [None]:
# from llama_index.llama_pack import download_llama_pack

# DenseXRetrievalPack = download_llama_pack("DenseXRetrievalPack", "./dense_pack")

In [3]:
from llama_hub.llama_packs.dense_x_retrieval.base import DenseXRetrievalPack
from llama_index.text_splitter import SentenceSplitter

In [4]:
dense_pack = DenseXRetrievalPack(documents, text_splitter=SentenceSplitter(chunk_size=1024))

100%|██████████| 90/90 [03:00<00:00,  2.01s/it]
Generating embeddings: 100%|██████████| 2190/2190 [00:14<00:00, 148.07it/s]


In [5]:
response = dense_pack.run("How was Llama2 pretrained?")

In [6]:
# display the response as markdown
from IPython.display import Markdown, display

display(Markdown(str(response)))

Llama 2 was pretrained using an optimized auto-regressive transformer. The training process involved several improvements to enhance performance, such as more robust data cleaning, updated data mixes, training on 40% more total tokens, doubling the context length, and utilizing grouped-query attention (GQA) to improve inference scalability for larger models.