# Prompt Engineering for RAG


In this notebook we show various prompt techniques you can try to customize your LlamaIndex RAG pipeline.

* Getting and setting prompts for query engines, etc.
* Defining template variable mappings (e.g. you have an existing QA prompt)
* Adding few-shot examples + performing query transformations/rewriting.

In [17]:
%pip install llama-index-llms-ollama -q 
%pip install llama-index-llms-langchain -q 
%pip install llama-index-readers-file pymupdf -q
%pip install llama-index -q

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


# Setup

In [2]:

from llama_index.core import VectorStoreIndex
from llama_index.core import PromptTemplate
from IPython.display import Markdown, display

from llama_index.core import Settings
from llama_index.core import VectorStoreIndex
from llama_index.llms.ollama import Ollama

# Settings.llm = Ollama(model="llama3.2:1b")
from llama_index.embeddings.ollama import OllamaEmbedding
Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text:latest")


In [3]:
import os

import urllib.request

# Create data directory if it doesn't exist
os.makedirs('data', exist_ok=True)

# Download the PDF file
url = "https://arxiv.org/pdf/2307.09288.pdf"
headers = {'User-Agent': 'Mozilla/5.0'}
req = urllib.request.Request(url, headers=headers)
with urllib.request.urlopen(req) as response, open('data/llama2.pdf', 'wb') as out_file:
    out_file.write(response.read())

In [4]:
from pathlib import Path
from llama_index.readers.file import PyMuPDFReader
loader = PyMuPDFReader()
documents = loader.load(file_path="./data/llama2.pdf")

# Load into Vector Store


In [5]:
from llama_index.core import VectorStoreIndex
from llama_index.llms.ollama import Ollama

index = VectorStoreIndex.from_documents(documents)

In [6]:
llama1b_llm = Ollama(model="llama3.2:1b")
llama3b_llm = Ollama(model="llama3.2:3b")

# Setup Query Engine / Retriever


In [7]:
query_str = "What are the potential risks associated with the use of Llama 2 as mentioned in the context?"

In [8]:
query_engine = index.as_query_engine(similarity_top_k=2, llm=llama1b_llm)
# use this for testing
vector_retriever = index.as_retriever(similarity_top_k=2)

In [9]:
response = query_engine.query(query_str)
print(str(response))

The potential risks associated with the use of Llama 2 include:

1. Inaccurate or objectionable responses to user prompts due to its language model nature and lack of human oversight.
2. Limited understanding of certain cultural, social, or domain-specific nuances that may be present in user input.
3. Potential for Llama 2 to generate content that is misleading, defamatory, or otherwise harmful to individuals, organizations, or communities.
4. Dependence on high-quality training data and fine-tuning with human feedback to mitigate risks, but potential gaps in coverage or updated models if not regularly updated.

These risks are highlighted in the context as a result of testing Llama 2 in English only, which may not cover all scenarios and requires developers to perform safety testing and tuning tailored to specific applications.


# Viewing/Customizing Prompts


In [10]:
# define prompt viewing function
def display_prompt_dict(prompts_dict):
    for k, p in prompts_dict.items():
        text_md = f"**Prompt Key**: {k}" f"**Text:** "
        display(Markdown(text_md))
        print(p.get_template())
        display(Markdown(""))

# load prompt templates 

In [11]:
prompts_dict = query_engine.get_prompts()

# display prompt templates

In [12]:
display_prompt_dict(prompts_dict)

**Prompt Key**: response_synthesizer:text_qa_template**Text:** 

Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer: 




**Prompt Key**: response_synthesizer:refine_template**Text:** 

The original query is as follows: {query_str}
We have provided an existing answer: {existing_answer}
We have the opportunity to refine the existing answer (only if needed) with some more context below.
------------
{context_msg}
------------
Given the new context, refine the original answer to better answer the query. If the context isn't useful, return the original answer.
Refined Answer: 




# Customize Prompts

What if we want to do something different than our standard question-answering prompts?

Let's try out the RAG prompt from LangchainHub

In [13]:
# to do this, you need to use the langchain object

from langchain import hub

langchain_prompt = hub.pull("rlm/rag-prompt")



One catch is that the template variables in the prompt are different than what's expected by our synthesizer in the query engine:

* the prompt uses context and question,
* we expect context_str and query_str

This is not a problem! Let's add our template variable mappings to map variables. We use our LangchainPromptTemplate to map to LangChain prompts.

In [15]:
from llama_index.core.prompts import LangchainPromptTemplate

lc_prompt_tmpl = LangchainPromptTemplate(
    template=langchain_prompt,
    template_var_mappings={"query_str": "question", "context_str": "context"},
)

# add langchain prompt templates to llamaIndex templates
query_engine.update_prompts(
    {"response_synthesizer:text_qa_template": lc_prompt_tmpl}
)

In [18]:
prompts_dict = query_engine.get_prompts()
display_prompt_dict(prompts_dict)

**Prompt Key**: response_synthesizer:text_qa_template**Text:** 

input_variables=['context', 'question'] input_types={} partial_variables={} metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'} messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"), additional_kwargs={})]




**Prompt Key**: response_synthesizer:refine_template**Text:** 

The original query is as follows: {query_str}
We have provided an existing answer: {existing_answer}
We have the opportunity to refine the existing answer (only if needed) with some more context below.
------------
{context_msg}
------------
Given the new context, refine the original answer to better answer the query. If the context isn't useful, return the original answer.
Refined Answer: 




# Try It Out


In [19]:
response = query_engine.query(query_str)
print(str(response))

The potential risks associated with the use of Llama 2 include:

* Inaccurate or objectionable responses to user prompts
* Potential for generating text that is offensive, discriminatory, or violates applicable laws and regulations.

Specifically, testing conducted on English has not covered all scenarios, and developers should perform safety testing and tuning tailored to their specific applications.


# Adding Few-Shot Examples

In [21]:
from llama_index.core.schema import TextNode

few_shot_nodes = []
for line in open("llama2_qa_citation_events.jsonl", "r"):
    few_shot_nodes.append(TextNode(text=line))

few_shot_index = VectorStoreIndex(few_shot_nodes)
few_shot_retriever = few_shot_index.as_retriever(similarity_top_k=2)

In [22]:
import json


def few_shot_examples_fn(**kwargs):
    query_str = kwargs["query_str"]
    retrieved_nodes = few_shot_retriever.retrieve(query_str)
    # go through each node, get json object

    result_strs = []
    for n in retrieved_nodes:
        raw_dict = json.loads(n.get_content())
        query = raw_dict["query"]
        response_dict = json.loads(raw_dict["response"])
        result_str = f"""\
Query: {query}
Response: {response_dict}"""
        result_strs.append(result_str)
    return "\n\n".join(result_strs)

# write prompt template with functions


In [23]:
# write prompt template with functions

qa_prompt_tmpl_str = """\
Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, \
answer the query asking about citations over different topics.
Please provide your answer in the form of a structured JSON format containing \
a list of authors as the citations. Some examples are given below.

{few_shot_examples}

Query: {query_str}
Answer: \
"""

qa_prompt_tmpl = PromptTemplate(
    qa_prompt_tmpl_str,
    function_mappings={"few_shot_examples": few_shot_examples_fn},
)

In [24]:
citation_query_str = (
    "Which citations are mentioned in the section on Safety RLHF?"
)

In [25]:
print(
    qa_prompt_tmpl.format(
        query_str=citation_query_str, context_str="test_context"
    )
)

Context information is below.
---------------------
test_context
---------------------
Given the context information and not prior knowledge, answer the query asking about citations over different topics.
Please provide your answer in the form of a structured JSON format containing a list of authors as the citations. Some examples are given below.

Query: Which citations are mentioned in the section on RLHF Results?
Response: {'citations': [{'author': 'Gilardi et al.', 'year': 2023, 'desc': ''}, {'author': 'Huang et al.', 'year': 2023, 'desc': ''}]}

Query: Which citations are related to the progression of SFT and RLHF versions?
Response: {'citations': [{'author': 'Gilardi et al.', 'year': 2023, 'desc': 'Documented the superior writing abilities of LLMs, as manifested in surpassing human annotators in certain tasks, are fundamentally driven by RLHF'}, {'author': 'Huang et al.', 'year': 2023, 'desc': 'Supported the findings of Gilardi et al. on the effectiveness of RLHF in driving the s

# update prompts 

In [28]:
query_engine.update_prompts(
    {"response_synthesizer:text_qa_template": qa_prompt_tmpl}
)
display_prompt_dict(query_engine.get_prompts())

**Prompt Key**: response_synthesizer:text_qa_template**Text:** 

Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query asking about citations over different topics.
Please provide your answer in the form of a structured JSON format containing a list of authors as the citations. Some examples are given below.

{few_shot_examples}

Query: {query_str}
Answer: 




**Prompt Key**: response_synthesizer:refine_template**Text:** 

The original query is as follows: {query_str}
We have provided an existing answer: {existing_answer}
We have the opportunity to refine the existing answer (only if needed) with some more context below.
------------
{context_msg}
------------
Given the new context, refine the original answer to better answer the query. If the context isn't useful, return the original answer.
Refined Answer: 




In [29]:
response = query_engine.query(citation_query_str)
print(str(response))

{
 "citations": [
  {
   "author": "Gilardi et al.",
   "year": 2023,
   "desc": "Documented the superior writing abilities of LLMs, as manifested in surpassing human annotators in certain tasks, are fundamentally driven by RLHF"
  },
  {
   "author": "Huang et al.",
   "year": 2023,
   "desc": "Supported the findings of Gilardi et al. on the effectiveness of RLHF in driving the superior writing abilities of LLMs"
  }
 ]
