Refer:

1.   https://docs.llamaindex.ai/en/stable/examples/retrievers/recursive_retriever_nodes/
2.   https://docs.llamaindex.ai/en/stable/examples/node_postprocessor/MetadataReplacementDemo/



In [3]:
!pip install -U llama_index

Collecting llama_index
  Downloading llama_index-0.10.65-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-agent-openai<0.3.0,>=0.1.4 (from llama_index)
  Downloading llama_index_agent_openai-0.2.9-py3-none-any.whl.metadata (729 bytes)
Collecting llama-index-cli<0.2.0,>=0.1.2 (from llama_index)
  Downloading llama_index_cli-0.1.13-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core<0.11.0,>=0.10.65 (from llama_index)
  Downloading llama_index_core-0.10.65-py3-none-any.whl.metadata (2.4 kB)
Collecting llama-index-embeddings-openai<0.2.0,>=0.1.5 (from llama_index)
  Downloading llama_index_embeddings_openai-0.1.11-py3-none-any.whl.metadata (655 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.2.0 (from llama_index)
  Downloading llama_index_indices_managed_llama_cloud-0.2.7-py3-none-any.whl.metadata (3.8 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama_index)
  Downloading llama_index_legacy-0.9.48.post1-py3-none-any.whl.metadata (8.5 kB)
Collec

In [4]:
import os
os.environ["DEEPSEEK_API_KEY"] = "sk-"

In [5]:
!wget --user-agent "Mozilla" "https://arxiv.org/pdf/2307.09288.pdf" -O "llama2.pdf"

--2024-08-13 09:35:30--  https://arxiv.org/pdf/2307.09288.pdf
Resolving arxiv.org (arxiv.org)... 151.101.195.42, 151.101.131.42, 151.101.67.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.195.42|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://arxiv.org/pdf/2307.09288 [following]
--2024-08-13 09:35:30--  http://arxiv.org/pdf/2307.09288
Connecting to arxiv.org (arxiv.org)|151.101.195.42|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13661300 (13M) [application/pdf]
Saving to: ‘llama2.pdf’


2024-08-13 09:35:30 (101 MB/s) - ‘llama2.pdf’ saved [13661300/13661300]



# Basic RAG Review

In [6]:
from pathlib import Path
from llama_index.readers.file import PDFReader
from llama_index.legacy.response.notebook_utils import display_source_node
from llama_index.core.retrievers import RecursiveRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core import VectorStoreIndex, ServiceContext
import json

[nltk_data] Downloading package stopwords to
[nltk_data]     /usr/local/lib/python3.10/dist-
[nltk_data]     packages/llama_index/legacy/_static/nltk_cache...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to
[nltk_data]     /usr/local/lib/python3.10/dist-
[nltk_data]     packages/llama_index/legacy/_static/nltk_cache...
[nltk_data]   Unzipping tokenizers/punkt.zip.


# Step 1: Loading Documents

In [7]:
loader = PDFReader()
docs0 = loader.load_data(file=Path("llama2.pdf"))

In [8]:
from llama_index.core import Document

doc_text = "\n\n".join([d.get_content() for d in docs0])
docs = [Document(text=doc_text)]

# Step 2: Parsing Documents into Text Chunks (Nodes)

In [9]:
from llama_index.core.node_parser import SimpleNodeParser
from llama_index.core.schema import IndexNode

In [10]:
node_parser = SimpleNodeParser.from_defaults(chunk_size=1024)

In [11]:
base_nodes = node_parser.get_nodes_from_documents(docs)

In [12]:
# set node ids to be a constant
for idx, node in enumerate(base_nodes):
    node.id_ = f"node-{idx}"

# Step 3: Select Embedding Model and LLM

In [13]:
!pip install llama_index-llms-openai_like
!pip install llama_index-embeddings-huggingface

Collecting llama_index-llms-openai_like
  Downloading llama_index_llms_openai_like-0.1.3-py3-none-any.whl.metadata (753 bytes)
Downloading llama_index_llms_openai_like-0.1.3-py3-none-any.whl (3.0 kB)
Installing collected packages: llama_index-llms-openai_like
Successfully installed llama_index-llms-openai_like-0.1.3
Collecting llama_index-embeddings-huggingface
  Downloading llama_index_embeddings_huggingface-0.2.3-py3-none-any.whl.metadata (769 bytes)
Collecting sentence-transformers>=2.6.1 (from llama_index-embeddings-huggingface)
  Downloading sentence_transformers-3.0.1-py3-none-any.whl.metadata (10 kB)
Collecting minijinja>=1.0 (from huggingface-hub[inference]>=0.19.0->llama_index-embeddings-huggingface)
  Downloading minijinja-2.0.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (8.8 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.11.0->sentence-transformers>=2.6.1->llama_index-embeddings-huggingface)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.10

In [19]:
# from llama_index.embeddings import resolve_embed_model
import os
import logging
import sys
from llama_index.llms.openai_like import OpenAILike
from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# 配置日志
logging.basicConfig(stream=sys.stdout, level=logging.INFO)

# 定义DeepSpeed model
llm = OpenAILike(model="deepseek-chat",
                 api_base="https://api.deepseek.com/v1",
                 api_key=os.environ["DEEPSEEK_API_KEY"],
                 temperature=0.6,
                 is_chat_model=True)

# print(os.environ["DEEPSEEK_API_KEY"])

# 配置环境
Settings.llm = llm

# 设置嵌入模型
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-zh-v1.5")
Settings.embed_model = embed_model
# Settings.chunk_size = 256
# embed_model = resolve_embed_model("local:BAAI/bge-small-en")
# llm = OpenAI(model="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(
    llm=llm, embed_model=embed_model
)

  service_context = ServiceContext.from_defaults(


# Step 4: Create Index, retriever, and query engine

In [20]:
base_index = VectorStoreIndex(base_nodes, service_context=service_context)
base_retriever = base_index.as_retriever(similarity_top_k=2)

In [21]:
retrievals = base_retriever.retrieve(
    "Can you tell me about the key concepts for safety finetuning"
)

In [22]:
for n in retrievals:
    display_source_node(n, source_length=1500)

**Node ID:** node-26<br>**Similarity:** 0.6161667328021709<br>**Text:** AsLLMsareintegratedanddeployed,welookforwardto
continuing research that will amplify their potential for positive impact on these important social issues.
4.2 Safety Fine-Tuning
In this section, we describe our approach to safety fine-tuning, including safety categories, annotation
guidelines,andthetechniquesweusetomitigatesafetyrisks. Weemployaprocesssimilartothegeneral
fine-tuning methods as described in Section 3, with some notable differences related to safety concerns.
Specifically, we use the following techniques in safety fine-tuning:
1.Supervised Safety Fine-Tuning : We initialize by gathering adversarial prompts and safe demonstra-
tions that are then included in the general supervised fine-tuning process (Section 3.1). This teaches
themodeltoalignwithoursafetyguidelinesevenbeforeRLHF,andthuslaysthefoundationfor
high-quality human preference data annotation.
2.Safety RLHF : Subsequently, we integrate safety in the general RLHF pipeline described in Sec-
tion 3.2.2. This includes training a safety-specific reward model and gathering more challenging
adversarial prompts for rejection sampling style fine-tuning and PPO optimization.
3.SafetyContextDistillation : Finally,werefineourRLHFpipelinewithcontextdistillation(Askell
etal.,2021b). Thisinvolvesgeneratingsafermodelresponsesbyprefixingapromptwithasafety
preprompt, e.g., “You are a safe and responsible assistant,” and then fine-tuning the model on the safer
responses without the preprompt, which essentially distill...<br>

**Node ID:** node-84<br>**Similarity:** 0.5818033181603248<br>**Text:** After deployment, safety in chat models involves user experience and long-term effects, which are not
captured by benchmarks alone. Therefore, to assess safety effectively, additional testing of how they are
integrated in a product deployment, how they are used, and what metrics accurately and precisely capture
safety risks given the product context is essential for a comprehensive evaluation of safety. Our future work
willconductmorecomprehensiveevaluationsthatencompasssomedimensionsnotyetaddressedinthe
cases mentioned above.
A.5 Data Annotation
We have relied on human annotators in order to collect annotations for the supervised fine-tuning stage and
human preferences to train the reward models. In this section, we provide details about the data annotation
process.
A.5.1 SFT Annotation Instructions
Wehavecollectedsingle-turnandmulti-turndialogueannotationsfromourpoolofannotators. Weasked
the annotators to write responses that are informative, truthful, relevant, clear and harmless. We also asked
annotatorstoprioritizeharmlessnessoverinformativenessandhelpfulnessincasesofpromptsthatcould
leadtheresponsestobeproblematicinanyway. Wecategorizedthekindofresponsesthatcouldleadto
negative user experiences and shared these categories and examples with the annotators. A summary of
these categories can be seen in Section A.5.2.
72

Judaism Christianity Islam Buddhism Sikhism
Pretrained
MPT7B 0.39 0.38 0.31 0.27 0.07
30B 0.33 0.28 0.20 0.30 0.19
Falcon7B 0.25 0.35 0.20 0.25 0.22
40...<br>

# Chunk References: Smaller Child Chunks Referring to Bigger Parent Chunk

In [24]:
sub_chunk_sizes = [128, 256, 512]
sub_node_parsers = [
    SimpleNodeParser.from_defaults(chunk_size=c, chunk_overlap=20) for c in sub_chunk_sizes
]

all_nodes = []
for base_node in base_nodes:
    for n in sub_node_parsers:
        sub_nodes = n.get_nodes_from_documents([base_node])
        sub_inodes = [
            IndexNode.from_text_node(sn, base_node.node_id) for sn in sub_nodes
        ]
        all_nodes.extend(sub_inodes)

    # also add original node to node
    original_node = IndexNode.from_text_node(base_node, base_node.node_id)
    all_nodes.append(original_node)

In [25]:
all_nodes_dict = {n.node_id: n for n in all_nodes}

In [26]:
vector_index_chunk = VectorStoreIndex(
    all_nodes, service_context=service_context
)

In [27]:
vector_retriever_chunk = vector_index_chunk.as_retriever(similarity_top_k=2)

In [28]:
retriever_chunk = RecursiveRetriever(
    "vector",
    retriever_dict={"vector": vector_retriever_chunk},
    node_dict=all_nodes_dict,
    verbose=True,
)

In [29]:
nodes = retriever_chunk.retrieve(
    "Can you tell me about the key concepts for safety finetuning"
)
for node in nodes:
    display_source_node(node, source_length=2000)

[1;3;34mRetrieving with query id None: Can you tell me about the key concepts for safety finetuning
[0m[1;3;38;5;200mRetrieved node with id, entering: node-26
[0m[1;3;34mRetrieving with query id node-26: Can you tell me about the key concepts for safety finetuning
[0m[1;3;38;5;200mRetrieved node with id, entering: node-25
[0m[1;3;34mRetrieving with query id node-25: Can you tell me about the key concepts for safety finetuning
[0m

**Node ID:** node-26<br>**Similarity:** 0.6760816863046982<br>**Text:** AsLLMsareintegratedanddeployed,welookforwardto
continuing research that will amplify their potential for positive impact on these important social issues.
4.2 Safety Fine-Tuning
In this section, we describe our approach to safety fine-tuning, including safety categories, annotation
guidelines,andthetechniquesweusetomitigatesafetyrisks. Weemployaprocesssimilartothegeneral
fine-tuning methods as described in Section 3, with some notable differences related to safety concerns.
Specifically, we use the following techniques in safety fine-tuning:
1.Supervised Safety Fine-Tuning : We initialize by gathering adversarial prompts and safe demonstra-
tions that are then included in the general supervised fine-tuning process (Section 3.1). This teaches
themodeltoalignwithoursafetyguidelinesevenbeforeRLHF,andthuslaysthefoundationfor
high-quality human preference data annotation.
2.Safety RLHF : Subsequently, we integrate safety in the general RLHF pipeline described in Sec-
tion 3.2.2. This includes training a safety-specific reward model and gathering more challenging
adversarial prompts for rejection sampling style fine-tuning and PPO optimization.
3.SafetyContextDistillation : Finally,werefineourRLHFpipelinewithcontextdistillation(Askell
etal.,2021b). Thisinvolvesgeneratingsafermodelresponsesbyprefixingapromptwithasafety
preprompt, e.g., “You are a safe and responsible assistant,” and then fine-tuning the model on the safer
responses without the preprompt, which essentially distillsthe safety preprompt (context) into the
model. Weuseatargetedapproachthatallowsoursafetyrewardmodeltochoosewhethertouse
context distillation for each sample.
4.2.1 Safety Categories and Annotation Guidelines
Based on limitations of LLMs known from prior work, we design instructions for our annotation team to
createadversarialpromptsalongtwodimensions: a riskcategory ,orpotentialtopicaboutwhichtheLLM
couldproduceunsafecontent;andan attackvector ,orquestionstyletocoverdifferentvarietiesofprompts
...<br>

**Node ID:** node-25<br>**Similarity:** 0.6700340942500668<br>**Text:** For TruthfulQA, we present the
percentageofgenerationsthatarebothtruthfulandinformative(thehigher,thebetter). ForToxiGen,we
presentthepercentageofgenerationsthataredeemedtoxicbythemetric(thelower,thebetter). Detailed
descriptionsofthebenchmarksandmetricscanbefoundinAppendixA.4.7. Whencomparedto Llama 1-7B,
Llama 2-7B demonstrates a 21.37% increase in truthfulness and informativeness and a 7.61% decrease in
toxicity. We also observe an increase in toxicity in the pretrained 13B and 70B Llama 2, which may result
from larger pretraining data or a different dataset mix. Some have postulated the existence of a relationship
between pretraining dataset size and downstream model toxicity or bias (Bender et al., 2021b), but empirical
work to validate this claim is still ongoing (Dodge et al., 2021; Smith and Williams, 2021; Tal et al., 2022), and
further evidence from up-to-date models is still needed.
In Appendix A.4.7, we present bias metrics, such as how the sentiment of model generations varies with
demographic attributes. We note an increase in positive sentiment overall for many of the groups using
BOLDprompts. MoredetailedresultssplitbydifferentdemographicgroupscanbefoundinAppendixA.4.8.
Llama 2 doesnotoutperformothermodelsontoxicitymetrics,andwespeculatethatthismaybebecausewe
refrained from aggressively filtering the pretraining data. Recall that leaving pretraining data unfiltered may
enable base models tuned to perform well on more downstream tasks (including hate speech detection),
and it carries less risk of accidentally filtering out some demographic groups. We observe that models
trained from less aggressively filtered pretraining data also required fewer examples to achieve reasonable
safety-alignment. Wereiteratethatthismotivatedchoicedoesimplythatadditionalsafetymitigationsshould
be applied before deployment of base Llama 2 models.
22

TruthfulQA ↑ToxiGen ↓
MPT7B 29.13 22.32
30B 35.25 22.61
Falcon7B 25.95 14.53
40B 40.39 23.44
Llama 17B 27.42 23.00
13B 41...<br>

# Sentence Window Retrieval

In [31]:
from llama_index.core.node_parser import SentenceWindowNodeParser

In [32]:
# create the sentence window node parser w/ default settings
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

In [33]:
sentence_nodes = node_parser.get_nodes_from_documents(docs)

In [34]:
sentence_index = VectorStoreIndex(sentence_nodes, service_context=service_context)

In [36]:
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

query_engine = sentence_index.as_query_engine(
    similarity_top_k=2,
    # the target key defaults to `window` to match the node_parser's default
    node_postprocessors=[
        MetadataReplacementPostProcessor(target_metadata_key="window")
    ],
)

In [37]:
window_response = query_engine.query(
    "Can you tell me about the key concepts for safety finetuning"
)
print(window_response)

The key concepts for safety fine-tuning include:

1. Supervised Safety Fine-Tuning: This involves collecting adversarial prompts and safe demonstrations to be incorporated into the general supervised fine-tuning process. This approach helps the model align with safety guidelines prior to reinforcement learning from human feedback (RLHF), setting a foundation for high-quality human preference data annotation.

2. Safety RLHF: This integrates safety considerations into the RLHF pipeline, which includes training a safety-specific reward model and gathering more challenging adversarial prompts for techniques like rejection sampling style fine-tuning and Proximal Policy Optimization (PPO) optimization.


In [38]:
# check the original sentence that was retrieved for each node, as well as the actual window of sentences that was sent to the LLM.
window = window_response.source_nodes[0].node.metadata["window"]
sentence = window_response.source_nodes[0].node.metadata["original_text"]

print(f"Window: {window}")
print("------------------")
print(f"Original Sentence: {sentence}")

Window: AsLLMsareintegratedanddeployed,welookforwardto
continuing research that will amplify their potential for positive impact on these important social issues.
 4.2 Safety Fine-Tuning
In this section, we describe our approach to safety fine-tuning, including safety categories, annotation
guidelines,andthetechniquesweusetomitigatesafetyrisks.  Weemployaprocesssimilartothegeneral
fine-tuning methods as described in Section 3, with some notable differences related to safety concerns.
 Specifically, we use the following techniques in safety fine-tuning:
1.Supervised Safety Fine-Tuning : We initialize by gathering adversarial prompts and safe demonstra-
tions that are then included in the general supervised fine-tuning process (Section 3.1).  This teaches
themodeltoalignwithoursafetyguidelinesevenbeforeRLHF,andthuslaysthefoundationfor
high-quality human preference data annotation.
 2.Safety RLHF : Subsequently, we integrate safety in the general RLHF pipeline described in Sec-
tion 3.2.2