# Multi-hop QA Program 5: Multi-hop Condensed Retrieval w/ Automatic Demos and Query Fusion

This notebook is a stand-alone version of Program 5 from the intro notebook.

### Installation

If you haven't installed **DSP** already, let's do that.

Note: If you're running this from a cloned copy of the repo, then you can skip this block.

In [None]:
try: # When on google Colab, let's clone the notebook so we download the cache.
    import google.colab 
    !git -C dsp/ pull || git clone https://github.com/stanfordnlp/dsp
except: pass

!pip install -U pip dsp-ml

### Setting Up

We'll start by setting up the language model (LM) and retrieval model (RM).

We will work with the **GPT-3.5** LM (`text-davinci-002`) and the **ColBERTv2** RM.

To use GPT-3, you'll need an OpenAI key. For ColBERTv2, we've set up a server hosting a Wikipedia (Dec 2018) search index, so you don't need to worry about setting one up!

To make things easy, we've set up a cache in this repository. _If you want to run this notebook without changing the code or examples, you don't need an API key. All examples are cached._

In [1]:
%load_ext autoreload
%autoreload 2

try: import google.colab; root_path = 'dsp'
# The root path is ../ if you're running this from the demo folder of the cloned repository
except: root_path = '../'

import os
os.environ["DSP_NOTEBOOK_CACHEDIR"] = os.path.join(root_path, 'cache')

# Add ../ to the path to import dsp if you're running this directly from the cloned copy of the repo (without pip installing dsp)
import sys
sys.path.insert(0, '../')

import dsp

openai_key = os.getenv('OPENAI_API_KEY')  # or replace with your API key (optional)
colbert_server = 'http://ec2-44-228-128-229.us-west-2.compute.amazonaws.com:8893/api/search'

lm = dsp.GPT3(model='text-davinci-002', api_key=openai_key)
rm = dsp.ColBERTv2(url=colbert_server)

dsp.settings.configure(lm=lm, rm=rm)

Not loading Cohere because it is not installed.


### Task Examples

Next, let's look at a few examples of the task. Each example consists of a question and one or more gold answers.

We have six training examples (`train`), which we'll feed into the programs. These will help define the task.

Notice that our examples only have input (`question`) and output (`answer`) fields. When our advanced programs build sophisticated pipelines, training "demonstrations" for other fields will be constructed automatically.

In [2]:
train = [('Who produced the album that included a re-recording of "Lithium"?', ['Butch Vig']),
         ('Who was the director of the 2009 movie featuring Peter Outerbridge as William Easton?', ['Kevin Greutert']),
         ('The heir to the Du Pont family fortune sponsored what wrestling team?', ['Foxcatcher', 'Team Foxcatcher', 'Foxcatcher Team']),
         ('In what year was the star of To Hell and Back born?', ['1925']),
         ('Which award did the first book of Gary Zukav receive?', ['U.S. National Book Award', 'National Book Award']),
         ('What city was the victim of Joseph Druces working in?', ['Boston, Massachusetts', 'Boston']),]

train = [dsp.Example(question=question, answer=answer) for question, answer in train]

The development examples (`dev`) will be used to assess the behavior of each program we build. Of course, this tiny set is not meant to be a reliable benchmark, but it'll be instructive to use it for illustration.

In [3]:
dev = [('Who has a broader scope of profession: E. L. Doctorow or Julia Peterkin?', ['E. L. Doctorow', 'E.L. Doctorow', 'Doctorow']),
       ('What documentary about the Gilgo Beach Killer debuted on A&E?', ['The Killing Season']),
       ('Right Back At It Again contains lyrics co-written by the singer born in what city?', ['Gainesville, Florida', 'Gainesville']),
       ('What year was the party of the winner of the 1971 San Francisco mayoral election founded?', ['1828']),
       ('Which author is English: John Braine or Studs Terkel?', ['John Braine']),
       ('Anthony Dirrell is the brother of which super middleweight title holder?', ['Andre Dirrell']),
       ('In which city is the sports nutrition business established by Oliver Cookson based ?', ['Cheshire', 'Cheshire, UK']),
       ('Find the birth date of the actor who played roles in First Wives Club and Searching for the Elephant.', ['February 13, 1980']),
       ('Kyle Moran was born in the town on what river?', ['Castletown', 'Castletown River']),
       ("What is the name of one branch of Robert D. Braun's speciality?", ['aeronautical engineering', 'astronautical engineering', 'aeronautics', 'astronautics']),
       ("Where was the actress who played the niece in the Priest film born?", ['Surrey', 'Guildford, Surrey']),
       ('Name the movie in which the daughter of Noel Harrison plays Violet Trefusis.', ['Portrait of a Marriage']),
       ('What year was the father of the Princes in the Tower born?', ['1442'])]

dev = [dsp.Example(question=question, answer=answer) for question, answer in dev]

### Program Definition

Through Program 4, we've begun to explore some of the power of the DSP abstraction. However, if you look closely, you will see a few downsides of the previous approach:

1. The search transformations invoke the LM without any demonstrations in the prompt. That is because we only have training data for the question–answer pairs and not for the intermediate labels (e.g., search queries).
2. The QA prompt uses passages (`context`) and a Chain-of-Thought (`rationale`) for the question to be answered. However, the training demonstrations include neither context nor CoT because they aren't available in our labels.
3. The search transformations commit to a single query per hop, which may single out an unproductive chain of passages and hence fail to uncover relevant information.

In [4]:
from dsp.utils import deduplicate

Question = dsp.Type(prefix="Question:", desc="${the question to be answered}")
Answer = dsp.Type(prefix="Answer:", desc="${a short factoid answer, often between 1 and 5 words}", format=dsp.format_answers)

Context = dsp.Type(
    prefix="Context:\n",
    desc="${sources that may contain relevant content}",
    format=dsp.passages2text
)

Rationale = dsp.Type(
    prefix="Rationale: Let's think step by step.",
    desc="${a step-by-step deduction that identifies the correct response, which will be provided below}"
)

qa_template_with_CoT = dsp.Template(
    instructions="Answer questions with short factoid answers.",
    context=Context(), question=Question(), rationale=Rationale(), answer=Answer()
)

SearchRationale = dsp.Type(
    prefix="Rationale: Let's think step by step. To answer this question, we first need to find out",
    desc="${the missing information}"
)

SearchQuery = dsp.Type(
    prefix="Search Query:",
    desc="${a simple question for seeking the missing information}"
)

CondenseRationale = dsp.Type(
    prefix="Rationale: Let's think step by step. Based on the context, we have learned the following.",
    desc="${information from the context that provides useful clues}"
)

rewrite_template = dsp.Template(
    instructions="Write a search query that will help answer a complex question.",
    question=Question(), rationale=SearchRationale(), query=SearchQuery()
)

hop_template = dsp.Template(
    instructions=rewrite_template.instructions,
    context=Context(), question=Question(), rationale=CondenseRationale(), query=SearchQuery()
)

@dsp.transformation
def QA_predict(example: dsp.Example, sc=True):
    if sc:
        example, completions = dsp.generate(qa_template_with_CoT, n=20, temperature=0.7)(example, stage='qa')
        completions = dsp.majority(completions)
    else:
        example, completions = dsp.generate(qa_template_with_CoT)(example, stage='qa')
    
    return example.copy(answer=completions.answer)

@dsp.transformation
def multihop_search_v1(example: dsp.Example, max_hops=2, k=2) -> dsp.Example:
    example.context = []
    
    for hop in range(max_hops):
        # Generate a query based
        template = rewrite_template if hop == 0 else hop_template
        example, completions = dsp.generate(template)(example, stage=f'h{hop}')

        # Retrieve k results based on the query generated
        passages = dsp.retrieve(completions.query, k=k)

        # Update the context by concatenating old and new passages
        example.context = deduplicate(example.context + passages)

    return example

We address these problems automatically in the program below.

In it, we begin by defining `multihop_demonstrate` (which uses `multihop_attempt`) to automatically **annotate** demonstrations for the complex multi-hop pipeline. These demonstrations will be provided to the LM when it's invoked for each transformation.

In [5]:
@dsp.transformation
def multihop_attempt(d: dsp.Example) -> dsp.Example:
    # Prepare unaugmented demonstrations for the example.
    x = dsp.Example(question=d.question, demos=dsp.all_but(train, d))
    
    # Search. And skip examples where search fails.
    # Annotate demonstrations for multihop_search_v2 with the simpler multihop_search_v1 pipeline.
    x = multihop_search_v1(x)
    if not dsp.passage_match(x.context, d.answer): return None
    
    # Predict. And skip examples where predict fails.
    x = QA_predict(x, sc=False)
    if not dsp.answer_match(x.answer, d.answer): return None
    
    return d.copy(**x)

@dsp.transformation
def multihop_demonstrate(x: dsp.Example) -> dsp.Example:
    demos = dsp.sample(train, k=7)
    x.demos = dsp.annotate(multihop_attempt)(demos, k=3, return_all=True)
    return x

We now implement `multihop_search_v2` as part of `multihop_QA_v2`.

In addition to the changes mentioned earlier, this program simulates the Baleen system (Khattab et al., 2021) in a few lines of code.

In each retrieval hop (after the very first hop), the summary of the previous hop(s) is included in the prompt. This allows us to efficiently read a larger number of passages from the current hop.

In [6]:
@dsp.transformation
def multihop_search_v2(example: dsp.Example, max_hops=2, k=5) -> dsp.Example:
    example.context = []

    for hop in range(max_hops):
        # Generate queries
        template = rewrite_template if hop == 0 else hop_template
        example, completions = dsp.generate(template, n=10, temperature=0.7)(example, stage=f'h{hop}')
        
        # Collect the queries and search with result fusion
        queries = [c.query for c in completions] + [example.question]
        example.context = dsp.retrieveEnsemble(queries, k=k)

        # Arrange the passages for the next hop
        if hop > 0:
            example.context = [completions[0].rationale] + example.context
    
    return example

def multihop_QA_v2(question: str) -> str:
    x = dsp.Example(question=question)
    x = multihop_demonstrate(x)
    x = multihop_search_v2(x)
    x = QA_predict(x)
    return x.answer

In [7]:
multihop_QA_v2(dev[3].question), lm.inspect_history(n=3)





Write a search query that will help answer a complex question.

---

Follow the following format.

Question: ${the question to be answered}
Rationale: Let's think step by step. To answer this question, we first need to find out ${the missing information}
Search Query: ${a simple question for seeking the missing information}

---

Question: Which award did the first book of Gary Zukav receive?
Rationale: Let's think step by step. To answer this question, we first need to find out the name of the first book of Gary Zukav.
Search Query: "What is the name of Gary Zukav's first book?"

---

Question: The heir to the Du Pont family fortune sponsored what wrestling team?
Rationale: Let's think step by step. To answer this question, we first need to find out who the heir to the Du Pont family fortune is.
Search Query: "Heir to the Du Pont family fortune"

---

Question: Who was the director of the 2009 movie featuring Peter Outerbridge as William Easton?
Rationale: Let's think step by step

('1828', None)

In [8]:
from dsp.evaluation.utils import evaluate

evaluate(multihop_QA_v2, dev)

100%|██████████| 13/13 [00:02<00:00,  6.48it/s]

Answered 11 / 13 (84.6%) correctly.





Unnamed: 0,question,answer,prediction,correct
0,Who has a broader scope of profession: E. L. Doctorow or Julia Peterkin?,"['E. L. Doctorow', 'E.L. Doctorow', 'Doctorow']",E. L. Doctorow,✔️
1,What documentary about the Gilgo Beach Killer debuted on A&E?,['The Killing Season'],The Killing Season,✔️
2,Right Back At It Again contains lyrics co-written by the singer born in what city?,"['Gainesville, Florida', 'Gainesville']","Gainesville, Florida",✔️
3,What year was the party of the winner of the 1971 San Francisco mayoral election founded?,['1828'],1828,✔️
4,Which author is English: John Braine or Studs Terkel?,['John Braine'],John Braine,✔️
5,Anthony Dirrell is the brother of which super middleweight title holder?,['Andre Dirrell'],Andre Dirrell,✔️
6,In which city is the sports nutrition business established by Oliver Cookson based ?,"['Cheshire', 'Cheshire, UK']","Cheshire, UK",✔️
7,Find the birth date of the actor who played roles in First Wives Club and Searching for the Elephant.,"['February 13, 1980']","February 13, 1980",✔️
8,Kyle Moran was born in the town on what river?,"['Castletown', 'Castletown River']",Dundalk,❌
9,What is the name of one branch of Robert D. Braun's speciality?,"['aeronautical engineering', 'astronautical engineering', 'aeronautics', 'astronautics']",Aerospace engineering,❌


84.6