## 🎓 𝗗𝗦𝗣 Compiler

If this is your first time working with the Demonstrate–Search–Predict (𝗗𝗦𝗣) framework, we recommend starting with the [introduction notebook](https://github.com/stanfordnlp/dsp/blob/main/intro.ipynb).

This notebook introduces **v0.1** of the **compiler** for **DSP** programs. The compiler takes a DSP program (alongside a number of _unlabeled_ example inputs) and returns a (much) more computationally-efficient version of the program.

Concretely, the DSP compiler has the following interface.

```
cheap_dsp_program = dsp.compile(expensive_dsp_program, unlabeled_input_examples)
```

We illustrate the usage of the compiler on the multi-hop question answering (QA) task. To do so, we build on a variant of the fifth program introduced in the intro notebook. For a more accessible summary of what the Demonstrate–Search–Predict (𝗗𝗦𝗣) framework is, refer to that notebook instead.

Using the compiler, we'll obtain a (much) cheaper version of the Search stage of the program. In particular, the compiled program will only need to use the smallest version of GPT-3 (i.e., `ada` at 350M parameter) instead of the largest version of GPT3.5 (i.e., `text-davinci-*`, which is typically estimated to have 175B parameters) to conduct multi-hop search. Our compiled program will cost much less to conduct search, and yet it will roughly preserve the quality of the expensive variant.

### Installation

If you haven't installed **DSP** already, let's do that.

In [1]:
try: # When on google Colab, let's clone the notebook so we download the cache.
    import google.colab 
    !git -C dsp/ pull || git clone https://github.com/stanfordnlp/dsp
    !pip install dsp-ml
    import sys; sys.path.insert(0, 'dsp/')
    %set_env DSP_NOTEBOOK_CACHEDIR dsp/cache

except: # Set up the cache for the examples below.
    !pip install dsp-ml
    %set_env DSP_NOTEBOOK_CACHEDIR cache

env: DSP_NOTEBOOK_CACHEDIR=cache


### Setting Up

We'll start by setting up the language model (LM) and retrieval model (RM). Refer to the intro notebook for more details.

To make things easy, we've set up a cache in this repository. _If you want to run this notebook without changing the code or examples, you don't need an API key. All examples are cached._

In [2]:
%load_ext autoreload
%autoreload 2

import dsp
import ujson

colbert_server = 'http://ec2-44-228-128-229.us-west-2.compute.amazonaws.com:8893/api/search'
davinci = dsp.GPT3(model='text-davinci-002')
colbert = dsp.ColBERTv2(url=colbert_server)

dsp.settings.configure(lm=davinci, rm=colbert)

cache/compiler


### Task Examples

We have six training examples (`train`), which we'll feed into the programs. These will help define the task.

Notice that our examples only have input (`question`) and output (`answer`) fields. When our advanced programs build sophisticated pipelines, training "demonstrations" for other fields will be constructed automatically.

In [3]:
train = [
('Who produced the album that included a re-recording of "Lithium"?', ['Butch Vig']),
('Who was the director of the 2009 movie featuring Peter Outerbridge as William Easton?', ['Kevin Greutert']),
('The heir to the Du Pont family fortune sponsored what wrestling team?', ['Foxcatcher', 'Team Foxcatcher', 'Foxcatcher Team']),
('In what year was the star of To Hell and Back born?', ['1925']),
('Which award did the first book of Gary Zukav receive?', ['U.S. National Book Award', 'National Book Award']),
('What city was the victim of Joseph Druces working in?', ['Boston, Massachusetts', 'Boston']),
]

train = [dsp.Example(question=question, answer=answer) for question, answer in train]

The development examples (`dev`) will be used to assess the behavior of each program we build. Of course, this tiny set is not meant to be a reliable benchmark! That said, it'll be instructive to use it for illustration.

In [4]:
dev = [
('How many seasons did the man who runs KnowledgeWare play in the NFL ?', ['18', '18 seasons']),
('Who co-wrote a memoir with an American retired airline captain celebrated for the January 15, 2009 water landing of US Airways Flight 1549?', ['Jeffrey Zaslow']),
('Who was born earlier, Habash al-Marwazi or Mostafa El-Sayed?', ["Habash al-Marwazi"]),
('What is the oldest section from where Egyptologist Robert Hay was born?', ['Norman Keep','Pele Tower', 'Norman Keep or Pele Tower']),
('Which American physician was a co-founder alongside Charles Mayo?', ['Augustus Stinchfield', 'Augustus W. Stinchfield']),
('Simon Lindberg has been the leader of the group that was disbanded for inactivity in what nation in 2016?', ['Denmark']),
('Which actor from Trainspotting also played Obi-Wan Kenobi?', ['Ewan McGregor']),
('What is the publisher of the female superhero created by J. H. Williams III?', ['DC Comics']),
('In what year did the team that built the Lotus E21 begin to compete under the Lotus name?', ['2012']),
('What ranking did the man who Jonathan Brookins defeated on December 4, 2010 have in The Ultimate Fighter?', ['runner-up']),
('Which of the lower keys is served by the bridge that crosses over the Pigeon Key Historic District?', ['Little Duck Key']),
('The city that hosted the 1996 Summer Paralympics is the seat of what county?', ['Fulton County']),
('What actor who was in Sixteen candles is also in the romantic comedy Results?', ['Anthony Michael Hall']),
('Did Scott Draper or Kevin Curren play more sports?', ['Scott Draper']),
('What role did Denis Lawson play in the 1983 British comedy-drama film written and directed by Bill Forsyth?', ['Gordon Urquhart']),
('Which band is from a country closer to Canada, Local H or Bodyjar?', ['Local H']),
('What animated show on Cartoon Network did Wil Wheaton have a role in?', ['Ben 10']),
]

dev = [dsp.Example(question=question, answer=answer) for question, answer in dev]

Next, let's load 800 questions (without answers). These will be used by the compiler.

In [5]:
with open('data/multihop_800_unlabeled_questions.json') as f:
    multihop_800_unlabeled_questions = ujson.loads(f.readline())

multihop_800_unlabeled_questions = [example['question'] for example in multihop_800_unlabeled_questions]

multihop_800_unlabeled_questions[:3]

['Which year was the 49th NHL Entry Draft where Emil Molin selected by the Dallas Stars in the 4th round?',
 'When did the man overthrown in the Libyan Crisis become the leader of Libya?',
 'Are Cynara and Piptanthus both flowering plants?']

### Predict Transformation

Let's define the question answering (QA) transformation used in our programs. This will represent the **Predict** stage of **DSP**. Refer to the intro notebook for more details on these.

In [6]:
Question = dsp.Type(prefix="Question:", desc="${the question to be answered}")
Answer = dsp.Type(prefix="Answer:", desc="${a short factoid answer, often between 1 and 5 words}", format=dsp.format_answers)
Context = dsp.Type(prefix="Context:\n", desc="${sources that may contain relevant content}", format=dsp.passages2text)
Rationale = dsp.Type(prefix="Rationale: Let's think step by step.", desc="${a step-by-step deduction that identifies the correct response, which will be provided below}")

qa_template = dsp.Template(instructions="Answer questions with short factoid answers.", question=Question(), answer=Answer())
qa_template_with_CoT = dsp.Template(instructions=qa_template.instructions, context=Context(), question=Question(), rationale=Rationale(), answer=Answer())

In [7]:
@dsp.transformation
def QA_predict(example: dsp.Example, n=20):
    if n == 0:
        return example.copy(answer='')

    if n > 1:
        example, completions = dsp.generate(qa_template_with_CoT, n=n, temperature=0.7)(example, stage='qa')
        completions = dsp.majority(completions)
    else:
        example, completions = dsp.generate(qa_template_with_CoT)(example, stage='qa')
    
    return example.copy(answer=completions.answer)

### Search Transformation

Similarly, we define the **Search** stage for our program below. As you can see below, we use the decorator `@dsp.compiled` for the `multihop_search` transformation. This tells the compiler that we'd like it to optimize this function for us.

In [8]:
SearchQuery = dsp.Type(prefix="Search Query:", desc="${a simple question for seeking the missing information}")
SearchRationale = dsp.Type(prefix="Rationale: Let's think step by step. To answer this question, we first need to find out", desc="${the missing information}")
CondenseRationale = dsp.Type(prefix="Rationale: Let's think step by step. Based on the context, we have learned the following.", desc="${information from the context that provides useful clues}")

rewrite_template = dsp.Template(instructions="Write a search query that will help answer a complex question.", question=Question(), rationale=SearchRationale(), query=SearchQuery())
hop_template = dsp.Template(instructions=rewrite_template.instructions, context=Context(), question=Question(), rationale=CondenseRationale(), query=SearchQuery())

In [9]:
from dsp.utils import deduplicate

@dsp.compiled
@dsp.transformation
def multihop_search(example: dsp.Example, max_hops=2, num_queries=10, k=7) -> dsp.Example:
    example.background = []
    example.context = []

    for hop in range(max_hops):
        # Generate queries
        template = rewrite_template if hop == 0 else hop_template

        if num_queries == 1:
            example, completions = dsp.generate(template)(example, stage=f'h{hop}')
            passages = dsp.retrieve(completions.query, k=k)

        else:
            num_queries = int(num_queries)
            example, completions = dsp.generate(template, n=num_queries, temperature=0.7)(example, stage=f'h{hop}')
            queries = [c.query for c in completions] + [example.question]
            passages = dsp.retrieveEnsemble(queries, k=k)

        # Arrange the passages for the next hop
        if hop == 0:
            example.context = passages
        else:
            example.context = [completions[0].rationale] + example.background + passages

        example.context = deduplicate(example.context)[:k]
        example.background = deduplicate(example.background + passages)[:hop+1]
    
    return example

### Demonstrate Transformation

... and the **Demonstrate** stage also.

In here, we annotate a single example automatically (i.e., calling `dsp.annotate` with `k=1`) which will be used in the prompts. Thus, the program will bootstrap itself with one-shot prompts, creating annotations for all intermediate transformations (e.g., search queries). This can easily be increased (e.g., `k=3`) potentially leading to a more reliable program, albeit at a higher cost.

In [10]:
@dsp.transformation
def multihop_attempt(d: dsp.Example) -> dsp.Example:
    x = dsp.Example(question=d.question, demos=dsp.all_but(train, d))
    x = multihop_search(x, num_queries=1, k=4)
    
    if not dsp.passage_match(x.context, d.answer): return None
    x = QA_predict(x, n=1)

    if not dsp.answer_match(x.answer, d.answer): return None
    return d.copy(**x)

@dsp.transformation
def multihop_demonstrate(x: dsp.Example) -> dsp.Example:
    demos = dsp.sample(train, k=16)
    x.demos = dsp.annotate(multihop_attempt)(demos, k=1, return_all=True)
    return x

### Multi-Hop QA Program

We'll make the program parameteric:

- `num_queries` is the number of queries it generates in each hop (defaults to one)
- `num_preds` is the number of answers it generates before selecting the majority vote (defaults to zero, which means skip the Predict stage)

In [11]:
def multihop_QA(question: str, num_queries=1, num_preds=0):
    x = dsp.Example(question=question)
    x = multihop_demonstrate(x)
    x = multihop_search(x, num_queries=num_queries)
    x = QA_predict(x, n=num_preds)
    return x

So far, this is a very typical DSP program. Let's try to run it on the first example in our development set.

In [12]:
question = dev[0].question
answer = multihop_QA(dev[0].question, num_queries=10, num_preds=1).answer

print(question)
print(answer)

How many seasons did the man who runs KnowledgeWare play in the NFL ?
18


As usual, we can inspect the inputs and outputs of the language model at each of the stages of the program. We can see the bootstrapped example in each of the prompts.

In [13]:
davinci.inspect_history(n=3)





Write a search query that will help answer a complex question.

---

Follow the following format.

Question: ${the question to be answered}
Rationale: Let's think step by step. To answer this question, we first need to find out ${the missing information}
Search Query: ${a simple question for seeking the missing information}

---

Question: Which award did the first book of Gary Zukav receive?
Rationale: Let's think step by step. To answer this question, we first need to find out the name of the first book of Gary Zukav.
Search Query: "What is the name of Gary Zukav's first book?"

---

Question: How many seasons did the man who runs KnowledgeWare play in the NFL ?
Rationale: Let's think step by step. To answer this question, we first need to find out[32m the name of the man who runs KnowledgeWare.
Search Query: "Who runs KnowledgeWare?"[0m[31m 	 (and 9 other completions)[0m







Write a search query that will help answer a complex question.

---

Follow the following format.


### Compile

Now, let's compile the program to its more efficient version. When doing this, the original DSP program will be traced by the DSP runtime. Calls (and outputs) to the LM will be intercepted and an appropriately simplified version of them will become training data for fine-tuning a much smaller LM. The fine-tuned smaller LM will be automatically invoked instead of the larger LM for the transformations decorated with `@dsp.compiled` (i.e., the **Search** stage).

To make it easy for anyone to use this notebook, we can set `force_reuse_cached_compilation=True` below. This will skip attempting to check that you can access the compiled version of the LM. (If you'd like to try examples NOT in this notebook, you can easily turn that off and re-compile. Compilation may take an hour or two in practice, depending on the OpenAI server load.)

In [14]:
dsp.settings.configure(force_reuse_cached_compilation=True)

Let's run compilation! We'll get back the more efficient version of the program, which we call `compiled_QA`.

In [15]:
compiled_QA = dsp.compile(program=multihop_QA, examples=multihop_800_unlabeled_questions)

100%|██████████| 798/798 [00:10<00:00, 73.12it/s]


52ed1b7eeba61e06
ada:ft-stanfordpraglab-2023-02-06-21-06-50


We can try this out on the example question.

In [16]:
question = dev[0].question
answer = compiled_QA(dev[0].question, num_queries=10, num_preds=1).answer

print(question)
print(answer)

How many seasons did the man who runs KnowledgeWare play in the NFL ?
18


And we can inspect the small LM's outputs for the **Search** stage as well as the large LM's outputs for the **Predict** stage.

In [17]:
compiled_QA.lm.inspect_history(n=2)
davinci.inspect_history(n=1)





Write a search query that will help answer a complex question.

---

Question: How many seasons did the man who runs KnowledgeWare play in the NFL ?
Rationale: Let's think step by step. To answer this question, we first need to find out[32m the name of the man who runs KnowledgeWare.

Search Query: "Who runs KnowledgeWare?"[0m[31m 	 (and 9 other completions)[0m







Write a search query that will help answer a complex question.

---

Context:
[1] «KnowledgeWare | KnowledgeWare KnowledgeWare was a software company headquartered in Atlanta, Georgia co-founded by James Martin and run by Fran Tarkenton. It produced a Computer Aided Software Engineering (CASE) tool called IEW (Information Engineering Workbench). KnowledgeWare was sold to Sterling Software in 1994, which was in its turn acquired by Computer Associates. Tarkenton is credited with having coined, "A fool with a tool is a faster fool" while offering classes at their Peach Tree headquarters. Tarkenton, Don Addington and

### Evaluate

Below, we run our programs and evaluate the quality of retrieval (i.e., do the top-7 contexts contain the answer string) and output (i.e., does the final answer match the ground-truth label).

We will evaluate the original program `multihop_QA` (which uses long prompts for `text-davinci-002`) and the efficient one `compiled_QA` (which uses short prompts for the much smaller model `ada` for **Search**).

In [18]:
from dsp.evaluation.utils import evaluateRetrieval, evaluateAnswer

evaluateRetrieval(lambda question: multihop_QA(question, num_queries=10, num_preds=1), dev)

100%|██████████| 17/17 [00:01<00:00, 12.27it/s]

Answered 17 / 17 (100.0%) correctly.





Unnamed: 0,question,answer,prediction,correct
0,How many seasons did the man who runs KnowledgeWare play in the NFL ?,"['18', '18 seasons']",18,✔️
1,"Who co-wrote a memoir with an American retired airline captain celebrated for the January 15, 2009 water landing of US Airways Flight 1549?",['Jeffrey Zaslow'],Jeffrey Zaslow,✔️
2,"Who was born earlier, Habash al-Marwazi or Mostafa El-Sayed?",['Habash al-Marwazi'],Habash al-Marwazi,✔️
3,What is the oldest section from where Egyptologist Robert Hay was born?,"['Norman Keep', 'Pele Tower', 'Norman Keep or Pele Tower']",1320,✔️
4,Which American physician was a co-founder alongside Charles Mayo?,"['Augustus Stinchfield', 'Augustus W. Stinchfield']",Augustus Stinchfield,✔️
5,Simon Lindberg has been the leader of the group that was disbanded for inactivity in what nation in 2016?,['Denmark'],Sweden,✔️
6,Which actor from Trainspotting also played Obi-Wan Kenobi?,['Ewan McGregor'],Ewan McGregor,✔️
7,What is the publisher of the female superhero created by J. H. Williams III?,['DC Comics'],DC Comics,✔️
8,In what year did the team that built the Lotus E21 begin to compete under the Lotus name?,['2012'],2012,✔️
9,"What ranking did the man who Jonathan Brookins defeated on December 4, 2010 have in The Ultimate Fighter?",['runner-up'],runner-up,✔️


In [19]:
evaluateRetrieval(lambda question: compiled_QA(question, num_queries=10, num_preds=1), dev)

100%|██████████| 17/17 [00:01<00:00, 10.10it/s]

Answered 16 / 17 (94.1%) correctly.





Unnamed: 0,question,answer,prediction,correct
0,How many seasons did the man who runs KnowledgeWare play in the NFL ?,"['18', '18 seasons']",18,✔️
1,"Who co-wrote a memoir with an American retired airline captain celebrated for the January 15, 2009 water landing of US Airways Flight 1549?",['Jeffrey Zaslow'],Jeffrey Zaslow,✔️
2,"Who was born earlier, Habash al-Marwazi or Mostafa El-Sayed?",['Habash al-Marwazi'],Habash al-Marwazi,✔️
3,What is the oldest section from where Egyptologist Robert Hay was born?,"['Norman Keep', 'Pele Tower', 'Norman Keep or Pele Tower']",Norman Keep or Pele Tower,✔️
4,Which American physician was a co-founder alongside Charles Mayo?,"['Augustus Stinchfield', 'Augustus W. Stinchfield']",Augustus Stinchfield,✔️
5,Simon Lindberg has been the leader of the group that was disbanded for inactivity in what nation in 2016?,['Denmark'],Sweden,❌
6,Which actor from Trainspotting also played Obi-Wan Kenobi?,['Ewan McGregor'],Ewan McGregor,✔️
7,What is the publisher of the female superhero created by J. H. Williams III?,['DC Comics'],DC Comics,✔️
8,In what year did the team that built the Lotus E21 begin to compete under the Lotus name?,['2012'],2012,✔️
9,"What ranking did the man who Jonathan Brookins defeated on December 4, 2010 have in The Ultimate Fighter?",['runner-up'],runner-up,✔️


In [20]:
from dsp.utils import EM, F1

evaluateAnswer(lambda question: multihop_QA(question, num_queries=10, num_preds=1), dev, metric=EM)

100%|██████████| 17/17 [00:00<00:00, 466.01it/s]

Answered 14 / 17 (82.4%) correctly.





Unnamed: 0,question,answer,prediction,correct
0,How many seasons did the man who runs KnowledgeWare play in the NFL ?,"['18', '18 seasons']",18,✔️
1,"Who co-wrote a memoir with an American retired airline captain celebrated for the January 15, 2009 water landing of US Airways Flight 1549?",['Jeffrey Zaslow'],Jeffrey Zaslow,✔️
2,"Who was born earlier, Habash al-Marwazi or Mostafa El-Sayed?",['Habash al-Marwazi'],Habash al-Marwazi,✔️
3,What is the oldest section from where Egyptologist Robert Hay was born?,"['Norman Keep', 'Pele Tower', 'Norman Keep or Pele Tower']",1320,❌
4,Which American physician was a co-founder alongside Charles Mayo?,"['Augustus Stinchfield', 'Augustus W. Stinchfield']",Augustus Stinchfield,✔️
5,Simon Lindberg has been the leader of the group that was disbanded for inactivity in what nation in 2016?,['Denmark'],Sweden,❌
6,Which actor from Trainspotting also played Obi-Wan Kenobi?,['Ewan McGregor'],Ewan McGregor,✔️
7,What is the publisher of the female superhero created by J. H. Williams III?,['DC Comics'],DC Comics,✔️
8,In what year did the team that built the Lotus E21 begin to compete under the Lotus name?,['2012'],2012,✔️
9,"What ranking did the man who Jonathan Brookins defeated on December 4, 2010 have in The Ultimate Fighter?",['runner-up'],runner-up,✔️


In [21]:
evaluateAnswer(lambda question: compiled_QA(question, num_queries=10, num_preds=1), dev, metric=EM) # watch cost increase from $13 to ?? for 100 examples

100%|██████████| 17/17 [00:00<00:00, 449.04it/s]

Answered 13 / 17 (76.5%) correctly.





Unnamed: 0,question,answer,prediction,correct
0,How many seasons did the man who runs KnowledgeWare play in the NFL ?,"['18', '18 seasons']",18,✔️
1,"Who co-wrote a memoir with an American retired airline captain celebrated for the January 15, 2009 water landing of US Airways Flight 1549?",['Jeffrey Zaslow'],Jeffrey Zaslow,✔️
2,"Who was born earlier, Habash al-Marwazi or Mostafa El-Sayed?",['Habash al-Marwazi'],Habash al-Marwazi,✔️
3,What is the oldest section from where Egyptologist Robert Hay was born?,"['Norman Keep', 'Pele Tower', 'Norman Keep or Pele Tower']",Norman Keep or Pele Tower,✔️
4,Which American physician was a co-founder alongside Charles Mayo?,"['Augustus Stinchfield', 'Augustus W. Stinchfield']",Augustus Stinchfield,✔️
5,Simon Lindberg has been the leader of the group that was disbanded for inactivity in what nation in 2016?,['Denmark'],Sweden,❌
6,Which actor from Trainspotting also played Obi-Wan Kenobi?,['Ewan McGregor'],Ewan McGregor,✔️
7,What is the publisher of the female superhero created by J. H. Williams III?,['DC Comics'],DC Comics,✔️
8,In what year did the team that built the Lotus E21 begin to compete under the Lotus name?,['2012'],2012,✔️
9,"What ranking did the man who Jonathan Brookins defeated on December 4, 2010 have in The Ultimate Fighter?",['runner-up'],runner-up,✔️


### Citation

If you use DSP, please cite:
```
@article{khattab2022demonstrate,
  title={Demonstrate-Search-Predict: Composing Retrieval and Language Models for Knowledge-Intensive {NLP}},
  author={Khattab, Omar and Santhanam, Keshav and Li, Xiang Lisa and Hall, David and Liang, Percy and Potts, Christopher and Zaharia, Matei},
  journal={arXiv preprint arXiv:2212.14024},
  year={2022}
}
```


If you found this notebook helpful, please cite:
```
@misc{khattab2023dspcompiler,
	author = {},
	title = {DSP compiler v0.1},
	howpublished = {\url{https://github.com/stanfordnlp/dsp/blob/main/compiler.ipynb}},
	year = {},
	note = {[Accessed 10-Feb-2023]},
}

```