# Chain-of-Verification Recipe - Prompt Engineering
_Authored by: [Ankush Pala](https://github.com/Ankush-lastmile)_

Chain-of-Verification (CoVe) is a **prompt engineering technique to reduce hallucinations!** An LLM generates a baseline response to a user query, but this might contain errors. CoVe helps by creating a plan comprising of verification questions that are used to validate the information. This process results in more accurate answers than the initial response. The final answer is revised based on these validations. **[ Link to Paper](https://arxiv.org/pdf/2309.11495.pdf)**

**Check out the open-source tool used here! 🚀 [AIConfig Github Repo](https://github.com/lastmile-ai/aiconfig)**

In [None]:
# Install AIConfig package
!pip install python-aiconfig
!pip install aiconfig-extension-hugging-face

In [None]:
# Import required modules from AIConfig and other dependencies

import json
import pandas as pd
from IPython.display import display, Markdown
from aiconfig import AIConfigRuntime, CallbackManager, InferenceOptions
from aiconfig_extension_hugging_face import HuggingFaceTextGenerationRemoteInference

In [3]:
# Enables the Use of HuggingFace models for remote inference
def register_model_parsers() -> None:
    """Register model parsers for HuggingFace models."""
    # Register remote inference client for text generation
    text_generation_remote = HuggingFaceTextGenerationRemoteInference()
    AIConfigRuntime.register_model_parser(
        text_generation_remote, "Text Generation"
    )
register_model_parsers()

**The cell below defines the CoVe prompt template config.**

The next code cell sets up a 'CoVe prompt template' within a structure known as AIConfig. AIConfig is a data format to organize prompt templates and specific model settings. To generate an AIConfig, use the AIConfig Editor VSCode Extension. This gives you a user-friendly interface to create prompt templates across any model and store these templates in a config format. You can use the AIConfig SDK to execute the prompts in the config along with their settings in your application code.

Alternatively, you can also download the config [here](https://github.com/lastmile-ai/aiconfig/blob/main/cookbooks/Chain-of-Verification/cove_template_config.json) and load the config with

`config = AIConfigRuntime.load('cove_template_config.json')`.

Check these links out for more background on AIConfig:

[AIConfig Github](https://github.com/lastmile-ai/aiconfig) 

[AIConfig Vscode Extension](https://marketplace.visualstudio.com/items?itemName=lastmile-ai.vscode-aiconfig)


In [4]:
# @title
cove_template_config = {
  "name": "Chain-of-Verification (CoVe)  Template",
  "schema_version": "latest",
  "metadata": {
    "parameters": {
      "baseline_prompt": "Name 20 programming languages that were developed in the United States. Include the developer name in parantheses.",
      "verification_question": "Where was {{entity}} born? "
    },
    "models": {
      "gpt-4": {
        "model": "gpt-4",
        "top_p": 1,
        "temperature": 0,
        "presence_penalty": 0,
        "frequency_penalty": 0
      }
    }
  },
  "description": "",
  "prompts": [
    {
      "name": "baseline_response_gen",
      "input": "{{baseline_prompt}}",
      "metadata": {
        "model": {
          "name": "Text Generation",
          "settings": {
            "system_prompt": "",
            "model": "mistralai/Mixtral-8x7B-Instruct-v0.1"
          }
        },
        "parameters": {},
        "remember_chat_context": False
      },
      "outputs": []
    },
    {
      "name": "verification",
      "input": "{{verification_question}}",
      "metadata": {
        "model": {
          "name": "gpt-4",
          "settings": {
            "system_prompt": "{{entity}}"
          }
        },
        "parameters": {
          "entity": "George Pataki"
        },
        "remember_chat_context": False
      },
      "outputs": []
    },
    {
      "name": "final_response_gen",
      "input": "Cross-check the provided list of verification data with the original baseline response that is supposed to accurately answer the baseline prompt. \n\nBaseline prompt: {{baseline_prompt}} \nBaseline response: {{baseline_response_gen.output}}\nVerification data: {{verification_results}}",
      "metadata": {
        "model": {
          "name": "gpt-4",
          "settings": {
            "system_prompt": "For each entity from the baseline response, verify that the entity met the criteria asked for in the baseline prompt based on the verification data. \n\nOutput Format: \n\n### Revised Response \nThis is the revised response after running chain-of-verification. \n(Please output the revised response after the cross-check.)\n\n### Failed Entities \nThese are the entities that failed the cross-check and are no longer included in revised response. \n(List the entities that failed the cross-check with a concise reason why)"
          }
        },
        "parameters": {
          "verification_results": "Theodore Roosevelt was born in New York City, New York on October 27, 1858. Franklin D. Roosevelt was born in Hyde Park, New York on January 30, 1882. Alexander Hamilton was born in Charlestown, Nevis on January 11, 1755. John Jay was born in New York City, New York on December 12, 1745. DeWitt Clinton was born in Little Britain, New York on March 2, 1769. William H. Seward was born in Florida, New York on May 16, 1801. Charles Evans Hughes was born in Glens Falls, New York on April 11, 1862. Nelson Rockefeller was born in Bar Harbor, Maine on July 8, 1908. Robert F. Wagner Jr. was born in Manhattan, New York on April 20, 1910. Bella Abzug was born in New York City, New York on July 24, 1920. Shirley Chisholm was born in Brooklyn, New York on November 30, 1924. Geraldine Ferraro was born in Newburgh, New York on August 26, 1935. Eliot Spitzer was born in The Bronx, New York on June 10, 1959. Michael Bloomberg was born in Boston, Massachusetts on February 14, 1942. Andrew Cuomo was born in New York City, New York on December 6, 1957. Bill de Blasio was born in Manhattan, New York on May 8, 1961. Charles Rangel was born in Harlem, New York City on June 11, 1930. Daniel Patrick Moynihan was born in Tulsa, Oklahoma on March 16, 1927. Jacob Javits was born in New York City, New York on May 18, 1904. Al Smith was born in New York City, New York on December 30, 1873. Rudy Giuliani was born in Brooklyn, New York on May 28, 1944. George Pataki was born in Peekskill, New York on June 24, 1945. Kirsten Gillibrand was born in Albany, New York on December 9, 1966. Chuck Schumer was born in Brooklyn, New York on November 23, 1950. Alexandria Ocasio-Cortez was born in The Bronx, New York City, New York on October 13, 1989."
        },
        "remember_chat_context": False
      },
      "outputs": []
    }
  ],
}

## 1. Baseline Response
Prompt LLM with user question that generates a list. The baseline response from the LLM might contain inaccuracies that we can verify.

**Prompt: Name 20 programming languages that were developed in the United States.**

In [5]:

config = AIConfigRuntime.create(**cove_template_config) # loads config (see code above)
config.callback_manager = CallbackManager([])

inference_options = InferenceOptions() # setup streaming

In [6]:
baseline_prompt = "Name 20 programming languages that were developed in the United States. Include the developer name in parantheses."

# Run baseline prompt to generate initial response which might contain errors
async def run_baseline_prompt(baseline_prompt):
    config.update_parameter("baseline_prompt", baseline_prompt)
    config.save()

    await config.run("baseline_response_gen", options=inference_options) # run baseline prompt
    return config.get_output_text("baseline_response_gen")

baseline_response = await run_baseline_prompt(baseline_prompt)
print(baseline_response)



1. C (Dennis Ritchie)
2. C++ (Bjarne Stroustrup)
3. Java (James Gosling)
4. Python (Guido van Rossum)
5. JavaScript (Brendan Eich)
6. Swift (Chris Lattner)
7. Go (Robert Griesemer, Rob Pike, Ken Thompson)
8. Rust (Graydon Hoare)
9. PHP (Rasmus Lerdorf)
10. Ruby (Yukihiro Matsumoto)
11. Perl (Larry Wall)
12. Haskell (Simon Peyton Jones, Paul Hudak, John Hughes)
13. Lisp (John McCarthy)
14. Smalltalk (Alan Kay, Dan Ingalls, Adele Goldberg)
15. Ada (Jean Ichbiah)
16. Fortran (John Backus)
17. COBOL (Grace Hopper)
18. Lua (Roberto Ierusalimschy, Luiz Henrique de Figueiredo, Waldemar Celes)
19. Prolog (Alain Colmerauer, Philippe Roussel)
20. ML (Robin Milner)

Note: While some of these languages were developed by individuals who were not American citizens, they were developed while they were working in the United States.</s>



## 2. Setup and Test Verification Question
Given both query and baseline response, generate a verification
question that could help to self-analyze if there are any mistakes in the original response. We will use one verification question here.

**Verification Prompt: Where was this coding language developed: {{entity}}?**

In [7]:
# verification_question = "Where was {{entity}} born?"
verification_question =  "Where was this coding language developed: {{entity}}?"

# Run verification on a single entity from baseline response to test
async def run_single_verification(verification_question, entity):
    params = {"entity": entity}
    config.update_parameter("verification_question", verification_question)
    config.save()

    verification_completion = await config.run("verification", params, options=inference_options)
    return verification_completion

verification_completion = await run_single_verification(verification_question, "clojure")

Clojure was developed in the United States.

## 3. Execute Verifications
Answer each verification question for each entity from the the baseline response. Save the verification results in a single string.

In [8]:
# Extracts entity names from a given baseline response by processing each line with regex.
def gen_entities_list(baseline_response):
  rows = baseline_response.split('\n')
  entities = []

  for row in rows:
      if not row.strip():
          continue
      entities.append(pd.Series(row).str.extract(r'(\d+\.\s)([^,]*)')[1].values[0])

  return entities

# Run verification question for each entity and concatenates returned verifications into a single string.
async def gen_verification_results(entities):
  verification_data = ""
  for n in entities:
      params = {
          "verification_question": verification_question,
          "entity": n
      }
      verification_completion = await config.run("verification", params, options=inference_options)
      single_verification_text = config.get_output_text("verification")
      verification_data += " " + single_verification_text
      print("\n")

  return verification_data


entities = gen_entities_list(baseline_response)
verification_data = await gen_verification_results(entities)

The C programming language was developed at Bell Labs (Bell Telephone Laboratories Inc.) in the United States.

The C++ coding language was developed at Bell Labs in Murray Hill, New Jersey, USA.

Java was developed at Sun Microsystems.

Python was developed in the Netherlands.

JavaScript was developed at Netscape Communications Corporation.

Swift was developed at Apple Inc.

The Go coding language was developed at Google Inc. in the United States.

The Rust coding language was developed at Mozilla Research.

PHP was developed in Greenland.

The coding language Ruby was developed in Japan.

Perl was developed in the United States.

Haskell was developed at the University of Glasgow, Scotland.

Lisp was developed at the Massachusetts Institute of Technology (MIT).

Smalltalk was developed at Xerox PARC (Palo Alto Research Center) in Palo Alto, California.

The Ada programming language was developed in the United States by the Defense Advanced Research Projects Agency (DARPA) for the U

## 4. Generate Revised Response
Given the discovered inconsistencies (if any), generate a revised response incorporating the verification results.

In [9]:
# Generated the revised response using verification data
params = {"verification_results": verification_data}
revised_response = await config.run("final_response_gen", params)

# Display with Markdown
display(Markdown(config.get_output_text("final_response_gen")))

### Revised Response 
1. C (Dennis Ritchie)
2. C++ (Bjarne Stroustrup)
3. Java (James Gosling)
4. JavaScript (Brendan Eich)
5. Swift (Chris Lattner)
6. Go (Robert Griesemer, Rob Pike, Ken Thompson)
7. Rust (Graydon Hoare)
8. Perl (Larry Wall)
9. Lisp (John McCarthy)
10. Smalltalk (Alan Kay, Dan Ingalls, Adele Goldberg)
11. Ada (Jean Ichbiah)
12. Fortran (John Backus)
13. COBOL (Grace Hopper)

### Failed Entities 
1. Python (Guido van Rossum) - Python was developed in the Netherlands, not the United States.
2. PHP (Rasmus Lerdorf) - PHP was developed in Greenland, not the United States.
3. Ruby (Yukihiro Matsumoto) - Ruby was developed in Japan, not the United States.
4. Haskell (Simon Peyton Jones, Paul Hudak, John Hughes) - Haskell was developed in Scotland, not the United States.
5. Lua (Roberto Ierusalimschy, Luiz Henrique de Figueiredo, Waldemar Celes) - Lua was developed in Brazil, not the United States.
6. Prolog (Alain Colmerauer, Philippe Roussel) - Prolog was developed in France, not the United States.
7. ML (Robin Milner) - ML was developed in Scotland, not the United States.