# Step-back prompting in Retrieval Augmented Generation

Step-back prompting is a technique of asking an LLM to abstract over the original question. It was introduced by DeepMind and is thought to improve the performance in benchmarks.

![](step-back-prompting.png)

It turns out, step-back prompting might be also used with RAG. Let's implement it in Langchain, using Cohere embeddings, OpenAI LLM and Qdrant vector store.

In [None]:
!pip install qdrant-client langchain datasets cohere openai

Collecting qdrant-client==1.7.0
  Downloading qdrant_client-1.7.0-py3-none-any.whl (203 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m203.7/203.7 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain
  Downloading langchain-0.0.349-py3-none-any.whl (808 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m808.6/808.6 kB[0m [31m26.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets
  Downloading datasets-2.15.0-py3-none-any.whl (521 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m521.2/521.2 kB[0m [31m44.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting cohere
  Downloading cohere-4.37-py3-none-any.whl (48 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.9/48.9 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting openai
  Downloading openai-1.3.8-py3-none-any.whl (221 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m221.5/221.5 kB[0m [31m21.5 MB/s[0m

## Dataset indexing

We are going to use a [mugithi/ubuntu_question_answer](https://huggingface.co/datasets/mugithi/ubuntu_question_answer) dataset which is a set of questions and corresponding answers related to Ubuntu. It is going to act as out knowledge base, so we need to index it into a vector store. Let's download it first.

In [None]:
from datasets import load_dataset

dataset = load_dataset("mugithi/ubuntu_question_answer")
dataset

Downloading readme:   0%|          | 0.00/464 [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/581k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/12024 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/5154 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['question', 'answer'],
        num_rows: 12024
    })
    test: Dataset({
        features: ['question', 'answer'],
        num_rows: 5154
    })
})

In [None]:
import pandas as pd

train_df = pd.DataFrame(dataset["train"])
train_df.head(n=10)

Unnamed: 0,question,answer
0,hi all. long time suse user here. is there an ...,.org
1,hi... i have a 6.06 lts obtained from shipip. ...,i would say no ... but then i also don't know ...
2,what is the best way to remotely back up a sys...,rsnapshot
3,what should i use to format a disk i need to b...,gparted.
4,"hi, is there a way to tell where a package has...",dpkg -l package
5,does ubuntu 11.04 use wayland for graphics?,i don't think 11.10 will use it
6,has anyone gotten 'g' wireless to work at all?,"sure, it works here."
7,in last time doky began to crash many times. m...,began to crash ... how did it crash? did it go...
8,"hi, can someone help me, i have just installed...",system -> preferences -> power management
9,musik: how do i get the network manager to com...,"alt+f2, nm-applet"


We are going to create embeddings from question and corresponding answer combined together. They will be also stored separately in the document metadata, so we can use them later on if needed. Let's create a template for the text, and then process the dataset to end up with a list of texts and corresponding metadata dictionaries.

In [None]:
text_pattern = """
Example question: {question}
Example answer: {answer}
"""

texts, metadatas = [], []
for entry in train_df.itertuples():
    text = text_pattern.format(question=entry.question, answer=entry.answer)

    texts.append(text.strip())
    metadatas.append({"question": entry.question, "answer": entry.answer})

Our dataset is ready, so we can index it into a vector store with selected embedding model. In our case, we are going to use Qdrant and multilingual Cohere embeddings, so we can ask questions in multiple languages later on.

In [None]:
from langchain.embeddings.cohere import CohereEmbeddings
from langchain.vectorstores import Qdrant

embeddings = CohereEmbeddings(model="embed-multilingual-v3.0")
facts_store = Qdrant.from_texts(
    texts, embeddings, metadatas,
    location=userdata.get("QDRANT_URL"),
    api_key=userdata.get("QDRANT_API_KEY"),
    collection_name="facts",
    force_recreate=True,
)

Our knowledge base is now built, so we can freely ask questions to it. Let's try it out.

In [None]:
facts_store.similarity_search("How do I format the disk?")

[Document(page_content='Example question: what should i use to format a disk i need to boot from usb and wipe the disk\nExample answer: gparted.', metadata={'answer': 'gparted.', 'question': 'what should i use to format a disk i need to boot from usb and wipe the disk'}),
 Document(page_content='Example question: hi all. what is the standard ubuntu way to format a disk?\nExample answer: but those tools only alter partition table, they dont format', metadata={'answer': 'but those tools only alter partition table, they dont format', 'question': 'hi all. what is the standard ubuntu way to format a disk?'}),
 Document(page_content='Example question: i need to format a drive partition (/media/sda3) what the command to use?\nExample answer: i think you have to unmount and give it the /dev', metadata={'answer': 'i think you have to unmount and give it the /dev', 'question': 'i need to format a drive partition (/media/sda3) what the command to use?'}),
 Document(page_content='Example question:

## Step-back prompting

Step-back prompting is based on few-shot prompting. We cheat the LLM with a made up interactions history and force it to produce the abstract question for the user question. For that, we need to create a set of question-question examples.

In [None]:
# All examples come from the original paper on step-back prompting
# Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
# See: https://arxiv.org/abs/2310.06117

examples = [
    {
        "input": "Estella Leopold went to which school between Aug 1954 and Nov 1954?",
        "output": "What was Estella Leopold's history?",
    },
    {
        "input": "Could the members of The Police perform lawful arrests?",
        "output": "What can the members of The Police do?",
    },
    {
        "input": "At year saw the creation of the region where the county of Hertfordshire is located?",
        "output": "which region is the county of Hertfordshire located?"
    },
]

Created examples are going to be used in the prompts we send to LLM. Let's create a prompt template for that. Its goal will be to get the step-back prompt, given the original question.

In [None]:
from langchain.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate

single_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)

few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=single_prompt,
    examples=examples,
)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are simplifying the user questions, so they are more general and easier to answer. Use the following examples:"),
    few_shot_prompt,
    ("user", "{input}"),
])


Created prompt template is now ready to be used in the LLM. Let's create a runnable pipeline for that and then launch it on the same question as before.

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.schema.output_parser import StrOutputParser

chat_model = ChatOpenAI(temperature=0)
question_generator = prompt | chat_model | StrOutputParser()

In [None]:
question_generator.invoke({"input": "How do I format the disk?"})

'What is the process for formatting a disk?'

## Step-back Retrieval Augmented Generation

Step-back prompting is just another prompt engineering strategy, so it might be also integrated into RAG. That effectively ends up with two context attached to each prompt. Let's build another prompt template that will be parametrized with the original question, context and step-back context.

In [None]:
from langchain_core.runnables import RunnablePassthrough, RunnableLambda

facts_retriever = facts_store.as_retriever()

rag_prompt = ChatPromptTemplate.from_template("""
Answer the question based only on the provided context and step-back context. Do not make up the answer if it's not given, but answer "I don't know".
Context, step-back context and question are enclosed with HTML-like tags.

<context>
{context}
</context>

<step-back-context>
{step_back_context}
</step-back-context>

<question>{input}</question>
""")

extract_input = RunnableLambda(lambda x: x["input"])
step_back_rag = (
    {
        "context": extract_input | facts_retriever,
        "step_back_context": question_generator | facts_retriever,
        "input": extract_input
    }
    | rag_prompt
    | chat_model
    | StrOutputParser()
)

In [None]:
step_back_rag.invoke({"input": "What is wayland used for?"})

'Wayland is used for graphics.'

Created pipeline integrates step-back prompting into RAG. Since it's just another prompt engineering strategy, it can be combined with other strategies to improve the performance even further. Please remember that **prompt engineering does not fix the retrieval process**. Choosing a right embedding model and making sure it works properly is still a key to success.