# DEPRECATION WARNING

This integration with LlamaIndex is no longer supported.


----

# Building optimized RAG with LlamaIndex + DSPy

This notebook provides a comprehensive overview of LlamaIndex + DSPy integrations.

We show **three** core integrations:
1. **Build and optimize Query Pipelines with DSPy predictors**: The first section shows you how to write DSPy code to define signatures for LLM inputs/outputs. Then port over these components to overall workflows within LlamaIndex Query pipelines, and then end-to-end optimize the entire system.

2. **Build and optimize Query Pipelines with Existing Prompts**: Instead of writing DSPy signatures, you can just define a LlamaIndex prompt template, and our converter will auto-optimize it for you.

3. **Port over DSPy-Optimized Prompts to any LlamaIndex Module**: Possible through our `DSPyPromptTemplate` - translate an optimized prompt through DSPy into any module that requires prompts in LlamaIndex.

In [None]:
!pip install llama-index==0.10.44

## Setup

Define the LLM setting for DSPy (note: this is separate from using the LlamaIndex LLMs), and also the answer signature.

In [12]:
import dspy

turbo = dspy.OpenAI(model='gpt-3.5-turbo')
dspy.settings.configure(lm=turbo)

In [13]:
import dspy

class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""

    context_str = dspy.InputField(desc="contains relevant facts")
    query_str = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

## [Part 1] Build and Optimize a Query Pipeline with DSPy Modules

Use our DSPy query components to plugin DSPy prompts/LLMs, stitch together with our query pipeline abstraction.

Any query pipeline can be plugged into our `LlamaIndexModule`. We can then let DSPy optimize the entire thing e2e.

#### Load Data, Build Index

In [14]:
# port it over to another index  (paul graham example) 

!wget https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt -O paul_graham_essay.txt

--2024-06-17 23:54:09--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8001::154, 2606:50c0:8002::154, 2606:50c0:8000::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8001::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘paul_graham_essay.txt’


2024-06-17 23:54:10 (7.48 MB/s) - ‘paul_graham_essay.txt’ saved [75042/75042]



In [15]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex

reader = SimpleDirectoryReader(input_files=["paul_graham_essay.txt"])
docs = reader.load_data()

index = VectorStoreIndex.from_documents(docs)

In [16]:
retriever = index.as_retriever(similarity_top_k=2)

#### Build Query Pipeline

Replace the synthesis piece with the DSPy component (make sure GenerateAnswer matches signature of inputs/outputs).

In [17]:
from llama_index.core.query_pipeline import QueryPipeline as QP, InputComponent, FnComponent
from dspy.predict.llamaindex import DSPyComponent, LlamaIndexModule

dspy_component = DSPyComponent(
    dspy.ChainOfThought(GenerateAnswer)
)

retriever_post = FnComponent(
    lambda contexts: "\n\n".join([n.get_content() for n in contexts])
)


p = QP(verbose=True)
p.add_modules(
    {
        "input": InputComponent(),
        "retriever": retriever,
        "retriever_post": retriever_post,
        "synthesizer": dspy_component,
    }
)
p.add_link("input", "retriever")
p.add_link("retriever", "retriever_post")
p.add_link("input", "synthesizer", dest_key="query_str")
p.add_link("retriever_post", "synthesizer", dest_key="context_str")


dspy_qp = LlamaIndexModule(p)

In [18]:
output = dspy_qp(query_str="what did the author do in YC")

[1;3;38;2;155;135;227m> Running module input with input: 
query_str: what did the author do in YC

[0m[1;3;38;2;155;135;227m> Running module retriever with input: 
input: what did the author do in YC

[0m[1;3;38;2;155;135;227m> Running module retriever_post with input: 
contexts: [NodeWithScore(node=TextNode(id_='49a290f5-5f29-413c-97e7-9fdf15169cf4', embedding=None, metadata={'file_path': 'paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain'...

[0m[1;3;38;2;155;135;227m> Running module synthesizer with input: 
query_str: what did the author do in YC
context_str: YC was different from other kinds of work I've done. Instead of deciding for myself what to work on, the problems came to me. Every 6 months there was a new batch of startups, and their problems, what...

[0m

In [19]:
output

Prediction(
    answer='Worked with startups, funded them.'
)

#### Optimize Query Pipeline

Let's try optimizing the query pipeline with few-shot examples.

We define a toy dataset with two examples. We then use our `SemanticSimilarityEvaluator` to define a custom eval function to pass to the DSPy teleprompter.
- Because our passing threshold is set to very low, every example should pass with a reasonable LLM. 
- What this practically means is that all training examples will be added as few-shot examples to the prompt.

In [20]:
from dspy import Example

train_examples = [
    Example(query_str="What did the author do growing up?", answer="The author wrote short stories and also worked on programming."),
    Example(query_str="What did the author do during his time at YC?", answer="organizing a Summer Founders Program, funding startups, writing essays, working on a new version of Arc, creating Hacker News, and developing internal software for YC")
]

train_examples = [t.with_inputs("query_str") for t in train_examples]

In [21]:
import nest_asyncio
nest_asyncio.apply()

In [22]:
from dspy.teleprompt import BootstrapFewShot
from llama_index.core.evaluation import SemanticSimilarityEvaluator

evaluator = SemanticSimilarityEvaluator(similarity_threshold=0.5)

# Validation logic: check that the predicted answer is correct.
# Also check that the retrieved context does actually contain that answer.
def validate_context_and_answer(example, pred, trace=None):
    result = evaluator.evaluate(response=pred.answer, reference=example.answer)
    return result.passing

# Set up a basic teleprompter, which will compile our RAG program.
teleprompter = BootstrapFewShot(max_labeled_demos=0, metric=validate_context_and_answer)

# Compile!
compiled_dspy_qp = teleprompter.compile(dspy_qp, trainset=train_examples)

  0%|                                                                        | 0/2 [00:00<?, ?it/s]

[1;3;38;2;155;135;227m> Running module input with input: 
query_str: What did the author do growing up?

[0m[1;3;38;2;155;135;227m> Running module retriever with input: 
input: What did the author do growing up?

[0m[1;3;38;2;155;135;227m> Running module retriever_post with input: 
contexts: [NodeWithScore(node=TextNode(id_='20a73a69-f604-450e-b07d-cace28b471a5', embedding=None, metadata={'file_path': 'paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain'...

[0m[1;3;38;2;155;135;227m> Running module synthesizer with input: 
query_str: What did the author do growing up?
context_str: What I Worked On

February 2021

Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to...

[0m

 50%|████████████████████████████████                                | 1/2 [00:00<00:00,  1.85it/s]

[1;3;38;2;155;135;227m> Running module input with input: 
query_str: What did the author do during his time at YC?

[0m[1;3;38;2;155;135;227m> Running module retriever with input: 
input: What did the author do during his time at YC?

[0m[1;3;38;2;155;135;227m> Running module retriever_post with input: 
contexts: [NodeWithScore(node=TextNode(id_='49a290f5-5f29-413c-97e7-9fdf15169cf4', embedding=None, metadata={'file_path': 'paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain'...

[0m[1;3;38;2;155;135;227m> Running module synthesizer with input: 
query_str: What did the author do during his time at YC?
context_str: YC was different from other kinds of work I've done. Instead of deciding for myself what to work on, the problems came to me. Every 6 months there was a new batch of startups, and their problems, what...

[0m

100%|████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  2.07it/s]


In [23]:
# test this out 
compiled_dspy_qp(query_str="How did PG meet Jessica Livingston?")

[1;3;38;2;155;135;227m> Running module input with input: 
query_str: How did PG meet Jessica Livingston?

[0m[1;3;38;2;155;135;227m> Running module retriever with input: 
input: How did PG meet Jessica Livingston?

[0m[1;3;38;2;155;135;227m> Running module retriever_post with input: 
contexts: [NodeWithScore(node=TextNode(id_='465a8143-6740-4bbd-8e6c-bb5eba4bb5a3', embedding=None, metadata={'file_path': 'paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain'...

[0m[1;3;38;2;155;135;227m> Running module synthesizer with input: 
query_str: How did PG meet Jessica Livingston?
context_str: Over the next several years I wrote lots of essays about all kinds of different topics. O'Reilly reprinted a collection of them as a book, called Hackers & Painters after one of the essays in it. I al...

[0m

Prediction(
    answer='Met at a party in 2003.'
)

In [None]:
# [optional]: inspect history
turbo.inspect_history(n=1)

## [Part 2] Build and Optimize Query Pipelines with Existing Prompts

Build a query pipeline similar to the previous section. But instead of directly using DSPy signatures/predictors, we can build DSPyComponent modules from LlamaIndex prompts directly. 

This allows you to write any LlamaIndex prompt and trust that it'll be optimized in DSPy.

In [26]:
from llama_index.core.prompts import PromptTemplate

# let's try a fun prompt that writes in Shakespeare! 
qa_prompt_template = PromptTemplate("""\
Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, \
answer the query.

Write in the style of a Shakespearean sonnet.

Query: {query_str}
Answer: 
""")

In [28]:
from llama_index.core.query_pipeline import QueryPipeline as QP, InputComponent, FnComponent
from dspy.predict.llamaindex import DSPyComponent, LlamaIndexModule

dspy_component = DSPyComponent.from_prompt(qa_prompt_template)

retriever_post = FnComponent(
    lambda contexts: "\n\n".join([n.get_content() for n in contexts])
)


p = QP(verbose=True)
p.add_modules(
    {
        "input": InputComponent(),
        "retriever": retriever,
        "retriever_post": retriever_post,
        "synthesizer": dspy_component,
    }
)
p.add_link("input", "retriever")
p.add_link("retriever", "retriever_post")
p.add_link("input", "synthesizer", dest_key="query_str")
p.add_link("retriever_post", "synthesizer", dest_key="context_str")


dspy_qp = LlamaIndexModule(p)

In [38]:
# check the inferred signature
dspy_component.predict_module.signature

StringSignature(context_str, query_str -> sonnet_answer
    instructions='Essential Instructions: Provide an answer to the query based solely on the context information provided. The response should be written in the style of a Shakespearean sonnet, which typically consists of 14 lines written in iambic pentameter, with a rhyme scheme of ABABCDCDEFEFGG.'
    context_str = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Context Str:', 'desc': '${context_str}'})
    query_str = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Query Str:', 'desc': '${query_str}'})
    sonnet_answer = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'output', 'prefix': 'Sonnet Answer:', 'desc': '${sonnet_answer}'})
)

In [41]:
from dspy.teleprompt import BootstrapFewShot
from llama_index.core.evaluation import SemanticSimilarityEvaluator
from dspy import Example

output_key = "sonnet_answer"
train_example_dicts = [
    {"query_str": "What did the author do growing up?", output_key: "The author wrote short stories and also worked on programming."},
    {"query_str": "What did the author do during his time at YC?", output_key: "organizing a Summer Founders Program, funding startups, writing essays, working on a new version of Arc, creating Hacker News, and developing internal software for YC"}
]
train_examples = [Example(**t).with_inputs("query_str") for t in train_example_dicts]

evaluator = SemanticSimilarityEvaluator(similarity_threshold=0.5)
# Validation logic: check that the predicted answer is correct.
# Also check that the retrieved context does actually contain that answer.
def validate_context_and_answer(example, pred, trace=None):
    result = evaluator.evaluate(response=getattr(pred, output_key), reference=getattr(example, output_key))
    return result.passing

# Set up a basic teleprompter, which will compile our RAG program.
teleprompter = BootstrapFewShot(max_labeled_demos=0, metric=validate_context_and_answer)

# Compile!
compiled_dspy_qp = teleprompter.compile(dspy_qp, trainset=train_examples)

  0%|                                                                        | 0/2 [00:00<?, ?it/s]

[1;3;38;2;155;135;227m> Running module input with input: 
query_str: What did the author do growing up?

[0m[1;3;38;2;155;135;227m> Running module retriever with input: 
input: What did the author do growing up?

[0m[1;3;38;2;155;135;227m> Running module retriever_post with input: 
contexts: [NodeWithScore(node=TextNode(id_='20a73a69-f604-450e-b07d-cace28b471a5', embedding=None, metadata={'file_path': 'paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain'...

[0m[1;3;38;2;155;135;227m> Running module synthesizer with input: 
query_str: What did the author do growing up?
context_str: What I Worked On

February 2021

Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to...

[0m

 50%|████████████████████████████████                                | 1/2 [00:00<00:00,  1.94it/s]

[1;3;38;2;155;135;227m> Running module input with input: 
query_str: What did the author do during his time at YC?

[0m[1;3;38;2;155;135;227m> Running module retriever with input: 
input: What did the author do during his time at YC?

[0m[1;3;38;2;155;135;227m> Running module retriever_post with input: 
contexts: [NodeWithScore(node=TextNode(id_='49a290f5-5f29-413c-97e7-9fdf15169cf4', embedding=None, metadata={'file_path': 'paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain'...

[0m[1;3;38;2;155;135;227m> Running module synthesizer with input: 
query_str: What did the author do during his time at YC?
context_str: YC was different from other kinds of work I've done. Instead of deciding for myself what to work on, the problems came to me. Every 6 months there was a new batch of startups, and their problems, what...

[0m

100%|████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.95it/s]


In [42]:
# test this out 
compiled_dspy_qp(query_str="How did PG meet Jessica Livingston?")

[1;3;38;2;155;135;227m> Running module input with input: 
query_str: How did PG meet Jessica Livingston?

[0m[1;3;38;2;155;135;227m> Running module retriever with input: 
input: How did PG meet Jessica Livingston?

[0m[1;3;38;2;155;135;227m> Running module retriever_post with input: 
contexts: [NodeWithScore(node=TextNode(id_='465a8143-6740-4bbd-8e6c-bb5eba4bb5a3', embedding=None, metadata={'file_path': 'paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain'...

[0m[1;3;38;2;155;135;227m> Running module synthesizer with input: 
query_str: How did PG meet Jessica Livingston?
context_str: Over the next several years I wrote lots of essays about all kinds of different topics. O'Reilly reprinted a collection of them as a book, called Hackers & Painters after one of the essays in it. I al...

[0m

Prediction(
    sonnet_answer="In the midst of a party, bright and gay,\nA clever scheme brought guests together, true.\nAmong them, Jessica, in a charming way,\nCaught the author's eye, a friendship grew.\n\nShe, a marketer in a bank of old,\nDiscovered startup tales, colorful and bold.\nAs the bank faced troubles, she sought anew,\nVenture capital's flaws came into view.\n\nTheir paths converged on a fateful night,\nAt the corner of Garden and Walker streets.\nA decision made, a future bright,\nTo start an investment firm, their feats.\n\nThus, through ignorance and boldness, they began,\nA journey in angel investing, a novel plan."
)

In [None]:
# [optional]: inspect the optimized prompt 
turbo.inspect_history(n=1)

## [Part 3] Port over Optimized Prompts to LlamaIndex using the DSPy Prompt Template

Extract out a prompt from an existing compiled DSPy module, and then port it over to any LlamaIndex pipeline! 

In the example below we use our `DSPyPromptTemplate` to extract out the compiled few-shot prompt from the optimized query pipeline. 

We then plug it into a separate query engine over the PG essay.

In [23]:
from dspy.predict.llamaindex import DSPyPromptTemplate

# NOTE: you cannot do DSPyPromptTemplate(dspy_component.predict_module) - the predict_module is replaced.
qa_prompt_tmpl = DSPyPromptTemplate(compiled_dspy_qp.query_pipeline.module_dict["synthesizer"].predict_module)

In [24]:
print(qa_prompt_tmpl.format(query_str="hello?", context_str="this is my context"))

Answer questions with short factoid answers.

---

Follow the following format.

Context Str: contains relevant facts
Query Str: ${query_str}
Answer: often between 1 and 5 words

---

Context Str: What I Worked On February 2021 Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep. The first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district's 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain's lair down there, with all these alien-looking machines — CPU, disk drives, printer, 

In [25]:
query_engine = index.as_query_engine(
    text_qa_template=qa_prompt_tmpl
)

In [30]:
response = query_engine.query("what did the author do at RISD?")

In [31]:
print(str(response))

What did the author do after dropping out of RISD?
Answer: Moved to New York, painted, and wrote a book on Lisp.
