# Auto Merging Retriever

In this notebook, we showcase our `AutoMergingRetriever`, which looks at a set of leaf nodes and recursively "merges" subsets of leaf nodes that reference a parent node beyond a given threshold. This allows us to consolidate potentially disparate, smaller contexts into a larger context that might help synthesis.

You can define this hierarchy yourself over a set of documents, or you can make use of our brand-new text parser: a RecursiveNodeParser that takes in a candidate set of documents and outputs an entire hierarchy of nodes, from "coarse-to-fine".

## Load Data

In [4]:
!wget --user-agent "Mozilla" "https://arxiv.org/pdf/2307.09288.pdf" -O "data/llama2.pdf"

--2023-08-26 20:54:41--  https://arxiv.org/pdf/2307.09288.pdf
Resolving arxiv.org (arxiv.org)... 128.84.21.199
Connecting to arxiv.org (arxiv.org)|128.84.21.199|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13661300 (13M) [application/pdf]
Saving to: ‘data/llama2.pdf’


2023-08-26 20:59:27 (47.0 KB/s) - ‘data/llama2.pdf’ saved [13661300/13661300]



In [1]:
from pathlib import Path
from llama_hub.file.pdf.base import PDFReader
from llama_hub.file.unstructured.base import UnstructuredReader
from llama_hub.file.deepdoctection.base import DeepDoctectionReader
from llama_hub.file.pdf_miner.base import PDFMinerReader
from llama_index import SimpleDirectoryReader

In [52]:
!pip install deepdoctection timm pdfminer

Collecting pdfminer
  Downloading pdfminer-20191125.tar.gz (4.2 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.2/4.2 MB[0m [31m30.2 MB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m[36m0:00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
Collecting pycryptodome
  Downloading pycryptodome-3.18.0-cp35-abi3-macosx_10_9_universal2.whl (2.4 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m43.1 MB/s[0m eta [36m0:00:00[0m31m51.6 MB/s[0m eta [36m0:00:01[0m
Building wheels for collected packages: pdfminer
  Building wheel for pdfminer (setup.py) ... [?25ldone
[?25h  Created wheel for pdfminer: filename=pdfminer-20191125-py3-none-any.whl size=6140099 sha256=2dabaf17d728054ec22d8192870dc4992b88377dc5692513d2f75acede0e1d65
  Stored in directory: /Users/jerryliu/Library/Caches/pip/wheels/b6/02/c9/adcd788c3ed26716f0be0d92c19daeec173ab822641de69fc0
Successfully built pdfminer
Installing col

In [57]:
# PDFReader = download_loader("PDFReader")
loader = PDFReader()
docs0 = loader.load_data(file=Path('./data/llama2.pdf'))

# loader = UnstructuredReader()
# docs = loader.load_data(file=Path('./data/llama2.pdf'))

# loader = DeepDoctectionReader()
# docs = loader.load_data(file=Path('./data/llama2.pdf'))

# loader = PDFMinerReader()
# docs = loader.load_data(file=Path('./data/llama2.pdf'))

In [58]:
# stitch docs together
from llama_index import Document

doc_text = "\n\n".join([d.get_content() for d in docs0])
docs = [Document(text=doc_text)]

In [60]:
# reader = SimpleDirectoryReader(input_files=["../data/paul_graham/paul_graham_essay.txt"])
# docs = reader.load_data()

## Parse Chunk Hierarchy from Text, Load into Storage

In [61]:
from llama_index.node_parser import RecursiveNodeParser, SimpleNodeParser

In [62]:
# how to parse nodes

node_parser = RecursiveNodeParser.from_defaults()

In [63]:
nodes = node_parser.get_nodes_from_documents(docs)

In [64]:
len(nodes)

999

In [65]:
from typing import List
from llama_index.schema import NodeRelationship
def get_leaf_nodes(nodes: List) -> List:
    """Get leaf nodes."""
    leaf_nodes = []
    for node in nodes:
        if NodeRelationship.CHILD not in node.relationships:
            leaf_nodes.append(node)
    return leaf_nodes


def get_non_leaf_nodes(nodes: List) -> List:
    """Get non_leaf nodes."""
    leaf_nodes = get_leaf_nodes(nodes)
    leaf_node_ids = {n.node_id for n in leaf_nodes}

    non_leaf_nodes = []
    for node in nodes:
        if node.node_id not in leaf_node_ids:
            non_leaf_nodes.append(node)
    return non_leaf_nodes

In [66]:
leaf_nodes = get_leaf_nodes(nodes)

In [67]:
len(leaf_nodes)

783

In [68]:
# for i in range(2):
#     print(leaf_nodes[i].get_content())

### Load into Storage

In [69]:
# define storage context
from llama_index.storage.docstore import SimpleDocumentStore
from llama_index.storage import StorageContext

docstore = SimpleDocumentStore()

# insert nodes into docstore
docstore.add_documents(nodes)

# define storage context (will include vector store by default too)
storage_context = StorageContext.from_defaults(
    docstore=docstore
)

In [70]:
## Load index into vector index
from llama_index import VectorStoreIndex

base_index = VectorStoreIndex(leaf_nodes, storage_context=storage_context)

## Define Retriever

In [71]:
from llama_index.retrievers.auto_merging_retriever import AutoMergingRetriever

In [89]:
base_retriever = base_index.as_retriever(similarity_top_k=6)
retriever = AutoMergingRetriever(
    base_retriever,
    storage_context,
    verbose=True
)

In [117]:
# query_str = "What were some lessons learned from red-teaming?"
query_str = "Can you tell me about the key concepts for safety finetuning"
# query_str = "Give me details on how safety violations were measured for Llama 2"

nodes = retriever.retrieve(query_str)
base_nodes = base_retriever.retrieve(query_str)

> Filling in node. Node id: 3b256786-958e-4626-9a02-ca9d00aace53> Node text: andusethedataforsupervisedfine-tuninginthesame
manner as described in Section 3.1.An example can ...

> Merging 3 nodes into parent node.
> Parent node id: 7d852fbe-3928-4ff1-9be9-8f80075d0102.
> Parent node text: We also ask the annotators to avoid negative user experience
categories (see Appendix A.5.2).The ...



In [118]:
# node = docstore.get_document('fb24e04e-9544-4659-bf82-1c90d5540b7e')
# print(node.get_content())

In [119]:
len(nodes)

5

In [120]:
from llama_index.response.notebook_utils import display_source_node

for node in nodes:
    display_source_node(node, source_length=10000)

**Node ID:** dec3035d-a5b6-4edd-acd1-5745499380c1<br>**Similarity:** 0.8706509346624604<br>**Text:** we describe our approach to safety fine-tuning, including safety categories, annotation
guidelines,andthetechniquesweusetomitigatesafetyrisks.Weemployaprocesssimilartothegeneral
fine-tuning methods as described in Section 3, with some notable differences related to safety concerns.Specifically,<br>

**Node ID:** 05d93601-1b11-4892-be36-f8507d8eb600<br>**Similarity:** 0.862506098281326<br>**Text:** we use the following techniques in safety fine-tuning:
1.Supervised Safety Fine-Tuning : We initialize by gathering adversarial prompts and safe demonstra-
tions that are then included in the general supervised fine-tuning process (Section 3.1).This teaches
themodeltoalignwithoursafetyguidelinesevenbeforeRLHF,andthuslaysthefoundationfor
high-quality human preference data annotation.2.Safety RLHF : Subsequently,<br>

**Node ID:** cbc408f8-1fa2-4653-81d3-7e66d1b688f2<br>**Similarity:** 0.8459963570749429<br>**Text:** 7
3 Fine-tuning 8
3.1 Supervised Fine-Tuning (SFT) .9
3.2 Reinforcement Learning with Human Feedback (RLHF) .9
3.3 System Message for Multi-Turn Consistency .16
3.4 RLHF Results .17
4 Safety 20
4.1 Safety in Pretraining .20
4.2 Safety Fine-Tuning .23
4.3 Red Teaming .28
4.<br>

**Node ID:** 2fd5840f-0c57-46c5-9ba5-ade16d3c7322<br>**Similarity:** 0.831324881683882<br>**Text:** thispapercontributesathoroughdescriptionofourfine-tuningmethodologyandapproachtoimproving
LLM safety.We hope that this openness will enable the community to reproduce fine-tuned LLMs and
continue to improve the safety of those models, paving the way for more responsible development of LLMs.<br>

**Node ID:** 7d852fbe-3928-4ff1-9be9-8f80075d0102<br>**Similarity:** 0.8248377708758811<br>**Text:** We also ask the annotators to avoid negative user experience
categories (see Appendix A.5.2).The guidelines are meant to be a general guide for the model and are
iteratively refined and revised to include newly identified risks.4.2.2 Safety Supervised Fine-Tuning
InaccordancewiththeestablishedguidelinesfromSection4.2.1,wegatherpromptsanddemonstrations
ofsafemodelresponsesfromtrainedannotators,andusethedataforsupervisedfine-tuninginthesame
manner as described in Section 3.1.An example can be found in Table 5.The annotators are instructed to initially come up with prompts that they think could potentially induce
themodel toexhibit unsafebehavior, i.e.perform redteaming, asdefined bythe guidelines.Subsequently,
annotators are tasked with crafting a safe and helpful response that the model should produce.4.2.3 Safety RLHF
Weobserveearlyinthedevelopmentof Llama 2-Chat thatitisabletogeneralizefromthesafedemonstrations
insupervisedfine-tuning.Themodelquicklylearnstowritedetailedsaferesponses,addresssafetyconcerns,
explainwhythetopicmightbesensitive,andprovideadditionalhelpfulinformation.Inparticular,when
the model outputs safe responses, they are often more detailed than what the average annotator writes.Therefore, after gathering only a few thousand supervised demonstrations, we switched entirely to RLHF to
teachthemodelhowtowritemorenuancedresponses.ComprehensivetuningwithRLHFhastheadded
benefit that it may make the model more robust to jailbreak attempts (Bai et al. 2022a).WeconductRLHFbyfirstcollectinghumanpreferencedataforsafetysimilartoSection3.2.2: annotators
writeapromptthattheybelievecanelicitunsafebehavior,andthencomparemultiplemodelresponsesto
theprompts,selectingtheresponsethatissafestaccordingtoasetofguidelines.Wethenusethehuman
preference data to train a safety reward model (see Section 3.2.2),<br>

In [121]:
for node in base_nodes:
    display_source_node(node, source_length=10000)

**Node ID:** dec3035d-a5b6-4edd-acd1-5745499380c1<br>**Similarity:** 0.8706509346624604<br>**Text:** we describe our approach to safety fine-tuning, including safety categories, annotation
guidelines,andthetechniquesweusetomitigatesafetyrisks.Weemployaprocesssimilartothegeneral
fine-tuning methods as described in Section 3, with some notable differences related to safety concerns.Specifically,<br>

**Node ID:** 05d93601-1b11-4892-be36-f8507d8eb600<br>**Similarity:** 0.862506098281326<br>**Text:** we use the following techniques in safety fine-tuning:
1.Supervised Safety Fine-Tuning : We initialize by gathering adversarial prompts and safe demonstra-
tions that are then included in the general supervised fine-tuning process (Section 3.1).This teaches
themodeltoalignwithoursafetyguidelinesevenbeforeRLHF,andthuslaysthefoundationfor
high-quality human preference data annotation.2.Safety RLHF : Subsequently,<br>

**Node ID:** cbc408f8-1fa2-4653-81d3-7e66d1b688f2<br>**Similarity:** 0.8459963570749429<br>**Text:** 7
3 Fine-tuning 8
3.1 Supervised Fine-Tuning (SFT) .9
3.2 Reinforcement Learning with Human Feedback (RLHF) .9
3.3 System Message for Multi-Turn Consistency .16
3.4 RLHF Results .17
4 Safety 20
4.1 Safety in Pretraining .20
4.2 Safety Fine-Tuning .23
4.3 Red Teaming .28
4.<br>

**Node ID:** 2fd5840f-0c57-46c5-9ba5-ade16d3c7322<br>**Similarity:** 0.831324881683882<br>**Text:** thispapercontributesathoroughdescriptionofourfine-tuningmethodologyandapproachtoimproving
LLM safety.We hope that this openness will enable the community to reproduce fine-tuned LLMs and
continue to improve the safety of those models, paving the way for more responsible development of LLMs.<br>

**Node ID:** 1f0dac7d-1b36-4a84-aa00-f172415b8460<br>**Similarity:** 0.8265816530219178<br>**Text:** We also ask the annotators to avoid negative user experience
categories (see Appendix A.5.2).The guidelines are meant to be a general guide for the model and are
iteratively refined and revised to include newly identified risks.4.2.2 Safety Supervised Fine-Tuning
InaccordancewiththeestablishedguidelinesfromSection4.2.1,wegatherpromptsanddemonstrations
ofsafemodelresponsesfromtrainedannotators,<br>

**Node ID:** eaa3d730-968c-4bd6-9521-1452ddcac01f<br>**Similarity:** 0.8230938887298445<br>**Text:** 3 Safety RLHF
Weobserveearlyinthedevelopmentof Llama 2-Chat thatitisabletogeneralizefromthesafedemonstrations
insupervisedfine-tuning.Themodelquicklylearnstowritedetailedsaferesponses,addresssafetyconcerns,
explainwhythetopicmightbesensitive,andprovideadditionalhelpfulinformation.Inparticular,when
the model outputs safe responses,<br>

## Plug it into Query Engine

In [122]:
from llama_index.query_engine import RetrieverQueryEngine

In [123]:
query_engine = RetrieverQueryEngine.from_args(retriever)
base_query_engine = RetrieverQueryEngine.from_args(base_retriever)

In [124]:
response = query_engine.query(query_str)

> Filling in node. Node id: 3b256786-958e-4626-9a02-ca9d00aace53> Node text: andusethedataforsupervisedfine-tuninginthesame
manner as described in Section 3.1.An example can ...

> Merging 3 nodes into parent node.
> Parent node id: 7d852fbe-3928-4ff1-9be9-8f80075d0102.
> Parent node text: We also ask the annotators to avoid negative user experience
categories (see Appendix A.5.2).The ...



In [125]:
print(str(response))

The key concepts for safety fine-tuning include supervised safety fine-tuning and safety RLHF (Reinforcement Learning with Human Feedback). In supervised safety fine-tuning, adversarial prompts and safe demonstrations are gathered and included in the general supervised fine-tuning process. This helps the model align with safety guidelines even before RLHF. Safety RLHF involves collecting human preference data by having annotators write prompts that they believe can elicit unsafe behavior. Multiple model responses are compared, and the safest response is selected according to a set of guidelines. This data is then used to train a safety reward model. These concepts are aimed at mitigating safety risks and improving the safety of language models.


In [126]:
base_response = base_query_engine.query(query_str)

In [127]:
print(str(base_response))

The key concepts for safety fine-tuning include supervised safety fine-tuning and safety RLHF (Reinforcement Learning with Human Feedback). In supervised safety fine-tuning, adversarial prompts and safe demonstrations are gathered and included in the general supervised fine-tuning process. This helps the model align with safety guidelines even before RLHF. Safety RLHF involves observing and generalizing from safe demonstrations in supervised fine-tuning. The model learns to write detailed safe responses, address safety concerns, explain sensitive topics, and provide additional helpful information. These concepts aim to mitigate safety risks and improve the safety of language models.


In [None]:
# autogenerate some questions 
question_gen_query = (
    "You are a Teacher/ Professor. Your task is to setup "
    "a quiz/examination. Using the provided context from a "
    "report on the new Llama 2 model, formulate "
    "a single question that captures an important fact from the "
    "context. Restrict the question to the context information provided."
)
dataset_generator = DatasetGenerator.from_documents(
    docs,
    question_gen_query=question_gen_query,
)

In [None]:
# try 8 questions
questions = dataset_generator.generate_questions_from_nodes(num=40)

In [None]:
from ragas.metrics import answer_relevancy, faithfulness

ds = Dataset.from_dict(
    {
        "question": [query_str],
        "answer": answers,
        "contexts": contexts,
    }
)

result = evaluate(ds, [answer_relevancy, faithfulness])
print(result)