# Homework and bakeoff: Few-shot OpenQA with DSP

In [117]:
__author__       = "Christopher Potts and Omar Khattab"
__version__      = "CS224u, Stanford, Spring 2023"
__completed_by__ = "Anthony Weng"

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cgpotts/cs224u/blob/master/hw_openqa.ipynb)
[![Open in SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/cgpotts/cs224u/blob/master/hw_openqa.ipynb)

If Colab is opened with this badge, please **save a copy to drive** (from the File menu) before running the notebook.

## Overview

The goal of this homework is to explore retrieval-augmented in-context learning. This is an exciting area that brings together a number of recent task ideas and modeling innovations. We will use the [DSP programming library](https://github.com/stanfordnlp/dsp) to build systems in this new mode.

Our core task is __open-domain question answering (OpenQA)__. In this task, all that is given by the dataset is a question text, and the task is to answer that question. By contrast, in modern QA tasks, the dataset provides a text and a gold passage, usually with a firm guarantee that the answer will be a substring of the passage. 

OpenQA is substantially harder than standard QA. The usual strategy is to use a _retriever_ to find passages in a large collection of texts and train a _reader_ to find answers in those passages. This means we have no guarantee that the retrieved passage will contain the answer we need. If we don't retrieve a passage containing the answer, our reader has no hope of succeeding. Although this is challenging, it is much more realistic and widely applicable than standard QA. After all, with the right retriever, an OpenQA system could be deployed over the entire Web.

The task posed by this homework is harder even than OpenQA. We are calling this task __few-shot OpenQA__. The defining feature of this task is that the reader is simply a frozen, general purpose language model. It accepts string inputs (prompts) and produces text in response. It is not trained to answer questions per se, and nothing about its structure ensures that it will respond with a substring of the prompt corresponding to anything like an answer.

__Few-shot QA__ (but not OpenQA!) is explored in the famous GPT-3 paper ([Brown et al. 2020](https://arxiv.org/abs/2005.14165)). The authors are able to get traction on the problem using GPT-3, an incredible finding. Our task here – __few-shot OpenQA__ – pushes this even further by retrieving passages to use in the prompt rather than assuming that the gold passage can be used in the prompt. If we can make this work, then it should be a major step towards flexibly and easily deploying QA technologies in new domains.

In summary:

| Task             | Passage given | Task-specific reader training |Task-specific retriever training  | 
|-----------------:|:-------------:|:-----------------------------:|:--------------------------------:|
| QA               | yes           | yes                           | n/a                              |
| OpenQA           | no            | yes                           | maybe                            |
| Few-shot QA      | yes           | no                            | n/a                              |
| Few-shot OpenQA  | no            | no                            | maybe                            | 

Just to repeat: your mission is to explore the final line in this table. The core notebook and assignment don't address the issue of training the retriever in a task-specific way, but this is something you could pursue for a final project; [the ColBERT codebase](https://github.com/stanford-futuredata/ColBERT) makes easy.

As usual, this notebook sets up the task and provides starter code. We will be relying on the DSP library, which allows us to define retrieval-augmented in-context learning systems in code. We first provide two fully implemented examples:

* _Few-shot OpenQA_: The given input is a question and the goal is to provide an answer. Some _demonstration_ Q/A pairs are sampled from a train set (in our case, SQuAD).

* _Few-shot QA with context_: The given input is a question with an associated evidence passage, and the goal is to provide an answer. The _demonstrations_ are now Q/A pairs with associated gold evidence passages. These are sampled from a train set (in our case, SQuAD).

The above examples are followed by some assignment questions aimed at helping you to think creatively about the problem. The first of these defines a core system for our target task:

* _Few-shot OpenQA with context_: This is like _few-shot QA with context_ except the passages are now retrieved from a large search index using ColBERT. 

The second question illustrates how to use the powerful DSP `annotate` function to improve the set of demonstrations used by the system.

It is a requirement of the bake-off that a general-purpose language model be used. In particular, trained QA systems cannot be used at all, and no fine-tuning is allowed either. See the original system question at the bottom of this message for guidance on which models are allowed.

Note: the models we are working with here are _big_. This poses a challenge that is increasingly common in NLP: you have to pay one way or another. You can pay to use the GPT-3 API, or you can pay to use an Eleuther model on a heavy-duty cluster computer, or you can pay with time by using an Eleuther model on a more modest computer.  __For now, though, the Cohere models are free to use, so they should be your first choice; see [setup.ipynb](setup.ipynb) if you don't have an account__.

## Set-up

We have sought to make this notebook self-contained and easy to use on a personal computer, on Google Colab, and in Sagemaker Studio. For personal computer use, we assume you have already done everything in [setup.ipynb](setup.ipynb]). For cloud usage, the next few code blocks should handle all set-up steps.

In [118]:
try: 
    # This library is our indicator that the required installs
    # need to be done.
    import datasets
    root_path = '.'
except ModuleNotFoundError:
    !git clone https://github.com/cgpotts/cs224u/
    !pip install -r cs224u/requirements.txt
    root_path = 'dsp'

In [119]:
import cohere
from datasets import load_dataset
import openai
import os
import dsp

In [120]:
os.environ["DSP_NOTEBOOK_CACHEDIR"] = os.path.join(root_path, 'cache')

openai_key = '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'  # or replace with your API key (optional)

cohere_key = 'SQg9nARczH2C1kAqppkQliLbJJbituSNLbt6LuNm'  # or replace with your API key (optional)

colbert_server = 'http://ec2-44-228-128-229.us-west-2.compute.amazonaws.com:8893/api/search'

Here we establish the Language Model `lm` and Retriever Model `rm` that we will be using. The defaults for `lm` are just for development. You may want to develop using an inexpensive model and then do your final evalautions wih an expensive one.

In [121]:
lm = dsp.GPT3(model='text-davinci-001', api_key=openai_key)

# Options for Cohere: command-medium-nightly, command-xlarge-nightly
# lm = dsp.Cohere(model='command-medium-nightly', api_key=cohere_key)

rm = dsp.ColBERTv2(url=colbert_server)

dsp.settings.configure(lm=lm, rm=rm)

Here's a command you can run to see which OpenAI models are available; OpenAI has entered into an increasingly closed mode where many older models are not available, so there are likely to be some surprises lurking here:

In [122]:
[d["root"] for d in openai.Model.list()["data"]]

['babbage',
 'davinci',
 'text-davinci-edit-001',
 'babbage-code-search-code',
 'text-similarity-babbage-001',
 'code-davinci-edit-001',
 'text-davinci-001',
 'text-davinci-003',
 'ada',
 'babbage-code-search-text',
 'babbage-similarity',
 'gpt-3.5-turbo',
 'code-search-babbage-text-001',
 'text-curie-001',
 'whisper-1',
 'code-search-babbage-code-001',
 'text-ada-001',
 'text-embedding-ada-002',
 'text-similarity-ada-001',
 'gpt-3.5-turbo-0301',
 'curie-instruct-beta',
 'ada-code-search-code',
 'ada-similarity',
 'code-search-ada-text-001',
 'text-search-ada-query-001',
 'davinci-search-document',
 'ada-code-search-text',
 'text-search-ada-doc-001',
 'davinci-instruct-beta',
 'text-similarity-curie-001',
 'code-search-ada-code-001',
 'ada-search-query',
 'text-search-davinci-query-001',
 'curie-search-query',
 'davinci-search-query',
 'babbage-search-document',
 'ada-search-document',
 'text-search-curie-query-001',
 'text-search-babbage-doc-001',
 'curie-search-document',
 'text-sear

## SQuAD

Our core development dataset is [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/). We chose this dataset because it is well-known and widely used, and it is large enough to support lots of meaningful development work, without, though, being so large as to require lots of compute power. It is also useful that it has gold passages supporting the standard QA formulation, so we can see how well our LM performs with an "oracle" retriever that always retrieves the gold passage.

In [125]:
squad = load_dataset("squad")

Found cached dataset squad (C:/Users/ad2we/.cache/huggingface/datasets/squad/plain_text/1.0.0/d6ec3ceb99ca480ce37cdd35555d6cb2511d223b9150cce08a837ef62ffea453)


  0%|          | 0/2 [00:00<?, ?it/s]

The following utility just reads a SQuAD split in as a list of `SquadExample` instances:

In [126]:
def get_squad_split(squad, split="validation"):
    """
    Use `split='train'` for the train split.

    Returns
    -------
    list of SquadExample named tuples with attributes
    id, title, context, question, answers

    """
    data = zip(*[squad[split][field] for field in squad[split].features])
    return [dsp.Example(id=eid, title=title, context=context, question=q, answer=a['text']) 
            for eid, title, context, q, a in data]

### SQuAD train

To build few-shot prompts, we will often sample SQuAD train examples, so we load that split here:

In [127]:
squad_train = get_squad_split(squad, split="train")

### SQuAD dev

In [128]:
squad_dev = get_squad_split(squad)

### SQuAD dev sample

Evaluations are expensive in this new era! Here's a small sample to use for dev assessments:

In [129]:
dev_exs = sorted(squad_dev, key=lambda x: hash(x.id))[: 200]

## Evaluation

Our evaluation protocols are the standard ones for SQuAD and related tasks: exact match of the answer (EM) and token-level F1. We'll reply primarily on DSP for these evaluation utilities; the following is a light modification of `dsp.evaluation.utils.evaluateAnswer`, which is itself built evaluation code from [apple/ml-qrecc](https://github.com/apple/ml-qrecc/blob/main/utils/evaluate_qa.py) repository. It performs very basic string normalization before doing the core comparisons.

In [130]:
from dsp.utils import EM, F1
import tqdm
import pandas as pd

def evaluateAnswer(fn, dev):
    """Evaluate a DSP program on `dev`.

    Parameters
    ----------
    fn : DSP system
    dev : list of `dsp.Example` instances

    Returns
    -------
    dict with keys "df", "em", "f1" storing assessment data
    """
    data = []
    for example in tqdm.tqdm(dev):
        prediction = fn(example)
        d = dict(example)
        pred = prediction.answer
        d['prediction'] = pred
        d['em'] = EM(pred, example.answer)
        d['f1'] = F1(pred, example.answer)
        data.append(d)
    df = pd.DataFrame(data)
    em = round(100.0 * df['em'].sum() / len(dev), 1)
    df['em'] = df['em'].apply(lambda x: '✔️' if x else '❌')
    f1 = df['f1'].mean()
    return {'df': df, 'em': em, 'f1': f1}

## DSP basics

### LM usage

Here's the most basic way to use the LM:

In [178]:
lm("Which U.S. states border no U.S. states?")

['\n\nAlaska and Hawaii are the only U.S. states that border no other U.S. states.']

Keyword arguments to the underlying LM are passed through:

In [173]:
lm("Which U.S. states border no U.S. states?", temperature=0.5, n=3)

['\n\nAlaska and Hawaii are the only U.S. states that border no other U.S. states.',
 '\n\nAlaska and Hawaii are the only U.S. states that border no other U.S. states.',
 '\n\nAlaska and Hawaii are the only U.S. states that border no other U.S. states.']

With `lm.inspect_history`, we can see the most recent language model calls:

In [133]:
lm.inspect_history(n=1)





Which U.S. states border no U.S. states?[32m

If you include Alaska, then the U.S. states that border no other U.S. states are Hawaii, California, Oregon, and Washington.[0m[31m 	 (and 3 other completions)[0m





### Prompt templates

In DSP, the more usual way to call the LM is to define a prompt template. Here we define a generic QA prompt template:

In [134]:
Question = dsp.Type(
    prefix="Question:", 
    desc="${the question to be answered}")

Answer = dsp.Type(
    prefix="Answer:", 
    desc="${a short factoid answer, often between 1 and 5 words}", 
    format=dsp.format_answers)

qa_template = dsp.Template(
    instructions="Answer questions with short factoid answers.", 
    question=Question(), 
    answer=Answer())

And here is a self-contained example that uses our question and template to create a prompt:

In [135]:
states_ex = dsp.Example(
    question="Which U.S. states border no U.S. states?",
    demos=dsp.sample(squad_train, k=2))

print(qa_template(states_ex))

Answer questions with short factoid answers.

---

Follow the following format.

Question: ${the question to be answered}
Answer: ${a short factoid answer, often between 1 and 5 words}

---

Question: What album made her a worldwide known artist?
Answer: Dangerously in Love

Question: Immunoassays are able to detect what type of proteins?
Answer: generated by an infected organism in response to a foreign agent

Question: Which U.S. states border no U.S. states?
Answer:


### Prompt-based generation

We can how put the above pieces together to call the model with our constructed prompt:

In [136]:
states_ex, states_compl = dsp.generate(qa_template)(states_ex, stage='basics')

In [137]:
print(states_compl.answer)

Alaska, Hawaii


And here's precisely what the model saw and did:

In [138]:
lm.inspect_history(n=1)





Answer questions with short factoid answers.

---

Follow the following format.

Question: ${the question to be answered}
Answer: ${a short factoid answer, often between 1 and 5 words}

---

Question: What album made her a worldwide known artist?
Answer: Dangerously in Love

Question: Immunoassays are able to detect what type of proteins?
Answer: generated by an infected organism in response to a foreign agent

Question: Which U.S. states border no U.S. states?
Answer:[32m Alaska, Hawaii[0m





### Retrieval

The final major component of our systems is retrieval. When we defined `rm`, we connected to a remote ColBERT index and retriever system that we can now use for search.

In [139]:
states_ex.question

'Which U.S. states border no U.S. states?'

The basic `dsp.retrieve` method returns only passages:

In [140]:
passages = dsp.retrieve(states_ex.question, k=1)

In [141]:
passages

['Mexico–United States border | has the shortest. Among the states in Mexico, Chihuahua has the longest border with the United States, while Nuevo León has the shortest. Texas borders four Mexican states—Tamaulipas, Nuevo León, Coahuila, and Chihuahua—the most of any U.S. states. New Mexico and Arizona each borders two Mexican states (Chihuahua and Sonora; Sonora and Baja California, respectively). California borders only Baja California. Three Mexican states border two U.S. states each: Baja California borders California and Arizona; Sonora borders Arizona and New Mexico; and Chihuahua borders New Mexico and Texas. Tamaulipas, Nuevo León, and Coahuila each borders only one U.S. state: Texas. The']

If we need passages with scores and other metadata, we can call `rm` directly:

In [142]:
rm(states_ex.question, k=1)

[{'pid': 6140356,
  'prob': 1.0,
  'rank': 1,
  'score': 22.40652084350586,
  'text': 'Mexico–United States border | has the shortest. Among the states in Mexico, Chihuahua has the longest border with the United States, while Nuevo León has the shortest. Texas borders four Mexican states—Tamaulipas, Nuevo León, Coahuila, and Chihuahua—the most of any U.S. states. New Mexico and Arizona each borders two Mexican states (Chihuahua and Sonora; Sonora and Baja California, respectively). California borders only Baja California. Three Mexican states border two U.S. states each: Baja California borders California and Arizona; Sonora borders Arizona and New Mexico; and Chihuahua borders New Mexico and Texas. Tamaulipas, Nuevo León, and Coahuila each borders only one U.S. state: Texas. The',
  'long_text': 'Mexico–United States border | has the shortest. Among the states in Mexico, Chihuahua has the longest border with the United States, while Nuevo León has the shortest. Texas borders four Mexi

## Few-shot OpenQA

With the above pieces in place, we can define our first DSP system. This one does few-shot OpenQA with no context passages. In essense, our prompts contain

1. A sequences of Q/A demonstrations (no context passages).
2. The target question (no context passage).

Here is the full system; note the use of the decorator `@dsp.transformation` – this will ensure that no `example` instances are modified when the program is used.

In [143]:
@dsp.transformation
def few_shot_openqa(example, train=squad_train, k=2): 
    example.demos = dsp.sample(train, k=k)
    example, completions = dsp.generate(qa_template)(example, stage='qa')
    return completions

There are really just two steps here. Let's go through them individually. Our example:

In [144]:
ex = squad_dev[0].copy()

ex

{'id': '56be4db0acb8001400a502ec',
 'title': 'Super_Bowl_50',
 'context': 'Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi\'s Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as "Super Bowl L"), so that the logo could prominently feature the Arabic numerals 50.',
 'question': 'Which NFL team represented the AFC at Super Bowl 50?',
 'answer': ['Denver Broncos', 'Denver Broncos', 'Denver Broncos']}

We add some demonstrations:

In [145]:
ex.demos = dsp.sample(squad_train, k=2)

ex

{'id': '56be4db0acb8001400a502ec',
 'title': 'Super_Bowl_50',
 'context': 'Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi\'s Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as "Super Bowl L"), so that the logo could prominently feature the Arabic numerals 50.',
 'question': 'Which NFL team represented the AFC at Super Bowl 50?',
 'answer': ['Denver Broncos', 'Denver Broncos', 'Denver Broncos'],
 'demos': 

And then we call the LM using `qa_template`:

In [146]:
ex, ex_compl = dsp.generate(qa_template)(ex, stage='qa')

Here, `ex_compl` is a `Completions` instance. We will typically use only the `answer` attribute:

In [147]:
print(ex_compl.answer)

Denver Broncos


And, as a final check, we can see precisely what the LM saw:

In [148]:
lm.inspect_history(n=1)





Answer questions with short factoid answers.

---

Follow the following format.

Question: ${the question to be answered}
Answer: ${a short factoid answer, often between 1 and 5 words}

---

Question: What album made her a worldwide known artist?
Answer: Dangerously in Love

Question: Immunoassays are able to detect what type of proteins?
Answer: generated by an infected organism in response to a foreign agent

Question: Which NFL team represented the AFC at Super Bowl 50?
Answer:[32m Denver Broncos[0m





## Few-shot QA with context

The above system makes no use of evidence passages. As a first step toward bringing in such passages, we define a regular few-shot QA system. For this system, prompts contain:

1. A sequences of Q/A demonstrations, each with a gold context passage.
2. The target question with a gold context passage.

This kind of system is very demanding in terms of data, since we need to have gold evidence passages for every Q/A pair used for demonstations and the Q that is our target. Datasets like SQuAD support this, but it's a rare situation in the world. (Our next system will address this by dropping the need for gold passages).

### Template with context

The first step toward defining this system is a new prompt template that includes context:

In [149]:
Context = dsp.Type(
    prefix="Context:",
    desc="${sources that may contain relevant content}",
    format=dsp.passages2text)

qa_template_with_passages = dsp.Template(
    instructions=qa_template.instructions,
    context=Context(), 
    question=Question(), 
    answer=Answer())

Here's what this does for a SQUaD example:

In [150]:
print(qa_template_with_passages(ex))

Answer questions with short factoid answers.

---

Follow the following format.

Context: ${sources that may contain relevant content}
Question: ${the question to be answered}
Answer: ${a short factoid answer, often between 1 and 5 words}

---

Context: Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".
Question: What album made her a worldwide known art

### The system

And here is the full system; the code is identical to `few_shot_openqa` except we now use `qa_template_with_passages`:

In [151]:
@dsp.transformation
def few_shot_qa_with_context(example, train=squad_train, k=3):
    example.demos = dsp.sample(train, k=k)
    generator = dsp.generate(qa_template_with_passages)
    example, completions = generator(example, stage='qa')
    return completions

In [152]:
print(few_shot_qa_with_context(squad_dev[0]).answer)

Denver Broncos


In [153]:
lm.inspect_history(n=1)





Answer questions with short factoid answers.

---

Follow the following format.

Context: ${sources that may contain relevant content}
Question: ${the question to be answered}
Answer: ${a short factoid answer, often between 1 and 5 words}

---

Context: Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".
Question: What album made her a worldwide known

## Dev evaluations

This quick section shows some full evaluations using `evaluateAnswer` (see [Evaluation](#Evaluation) above). Depending on which model you're using, these evaluations could be expensive, so you might want to run them only sparingly. Here I am running them on just 25 dev examples to further avoid cost run-ups.

In [154]:
tiny_dev = dev_exs[: 25]

In [55]:
few_shot_openqa_results = evaluateAnswer(few_shot_openqa, tiny_dev)

print(few_shot_openqa_results['em'])
print(few_shot_openqa_results['f1'])

100%|██████████| 25/25 [00:19<00:00,  1.28it/s]

12.0
0.251985347985348





You can also see the full set of results:

In [56]:
few_shot_openqa_results['df'].head()

Unnamed: 0,id,title,context,question,answer,prediction,em,f1
0,572825714b864d1900164594,Doctor_Who,TVOntario picked up the show in 1976 beginning...,What science fiction writer introduced the Doc...,"[Judith Merril, Judith Merril, Judith Merril]",Douglas Adams,❌,0.0
1,56dde0ba66d3e219004dad76,Normans,"In the course of the 10th century, the initial...",Who did Rollo sign the treaty of Saint-Clair-s...,"[King Charles III, King Charles III, King Char...",Charles the Simple,❌,0.4
2,5725c7f5271a42140099d1a2,Apollo_program,"Wiesner kept up the pressure, even making the ...",What did Wiesner shout out in front of the pre...,"[""No, that's no good"", ""No, that's no good, No...","He shouted ""Heil Hitler!""",❌,0.0
3,56beb86b3aeaaa14008c92be,Super_Bowl_50,Peyton Manning became the first quarterback ev...,Who previously held the record for being the o...,"[John Elway, John Elway, Elway, Elway]",Tom Brady,❌,0.0
4,57281cb22ca10214002d9e20,Doctor_Who,Six soundtrack releases have been released sin...,What music did the fourth soundtrack feature?,"[music from the 2008–2010 specials, the 2008–2...","""The Battle of the Five Armies""",❌,0.0


## Question 1: Few-shot OpenQA with context [3 points]

Your task here is to define a first instance of our target system: Few-shot OpenQA with context passages. To do this, you simply complete `few_shot_openqa_with_context`:

In [155]:
@dsp.transformation
def few_shot_openqa_with_context(example, train=squad_train, k=3):
    
    # Sample `k` demonstrations from `train`:
    ##### YOUR CODE HERE
    demos = dsp.sample(train=train, k=k)

    # For each demonstration, retrieve one passage and add it
    # as the `context` attribute` so we can use our template
    # `qa_template_with_passages`:
    ##### YOUR CODE HERE
    for demo in demos: 
        passage = dsp.retrieve(demo.question, k=1)
        demo.context = passage

    # Add the list of demonstrations to `example` as the `demos` attribute:
    ##### YOUR CODE HERE
    example.demos = demos

    # Retrieve a context passage for `example` itself and add it
    # as the `context` attribute:
    ##### YOUR CODE HERE
    ex_context = dsp.retrieve(example.question, k=1)
    example.context = ex_context

    # Use `dsp.generate` to call the model on `example` using
    # `qa_template_with_passages`:
    ##### YOUR CODE HERE
    ex, ex_compl = dsp.generate(qa_template_with_passages)(example, stage='qa')

    # Return the Completions instance returned by `dsp.generate`:
    ##### YOUR CODE HERE
    return ex_compl

A quick test you can use:

In [156]:
def test_few_shot_openqa_with_context(func):
    ex = dsp.Example(question="Q0", context="C0", answer=["A0"])
    train = [
        dsp.Example(question="Q1", context=None, answer=["A1"]),
        dsp.Example(question="Q2", context=None, answer=["A2"]),
        dsp.Example(question="Q3", context=None, answer=["A3"])]
    compl = func(ex, train=train, k=2)
    errcount = 0
    # Check the LM was used as expected:
    if len(compl.data) != 1:
        errcount += 1
        print(f"Error for `{func.__name__}`: Unexpected LM output.")
    data = compl.data[0]
    # Check that the right number of demos was used:
    demos = data['demos']
    if len(demos) > 2:
        errcount += 1
        print(f"Error for `{func.__name__}`: "
              f"Unexpected demo count: {len(demos)}")
    # Check that context passages were included in the prompt:
    fields = compl.template.fields
    if not any(f.name == 'Context:' for f in fields):
        errcount += 1
        print(f"Error for `{func.__name__}`: "
              f"No context passages in the prompt.")
    # Check that the context passages were retrieved:
    if data['context'] == "C0":
        errcount += 1
        print(f"Error for `{func.__name__}`: "
              f"No context passage retrieved for the target.")
    for d in demos:
        if d['context'] is None:
            errcount += 1
            print(f"Error for `{func.__name__}`: "
                  f"No context passage retrieved for demo {d}.")
    if errcount == 0:
        print(f"No errors found for `{func.__name__}`")

In [157]:
test_few_shot_openqa_with_context(few_shot_openqa_with_context)

No errors found for `few_shot_openqa_with_context`


In [158]:
print(few_shot_openqa_with_context(dev_exs[0]).answer)

Anthony Coburn


In [159]:
lm.inspect_history(n=1)





Answer questions with short factoid answers.

---

Follow the following format.

Context: ${sources that may contain relevant content}
Question: ${the question to be answered}
Answer: ${a short factoid answer, often between 1 and 5 words}

---

Context: «Janet (album) | worldwide sales of over 14 million copies, it is Janet's best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»
Question: What album made her a worldwide known artist?
Answer: 

Here's an optional evaluation of the system using `tiny_dev`:

In [65]:
few_shot_openqa_with_context_results = evaluateAnswer(
    few_shot_openqa_with_context, tiny_dev)

few_shot_openqa_with_context_results['f1']

100%|██████████| 25/25 [00:27<00:00,  1.09s/it]


0.4861176470588235

## Question 2: Using annotate

This question is designed to give you some experience with DSP's powerful `annotate` method. You can think of this as a generic tool for defining general aspects of your prompt. Here we will use it to filter the set of demonstrations we use.

The overall idea here is that the demonstrations we sample might vary in quality in ways that could impact model performance. For example, if we want to try to push the model to provide extractive answers as in classical QA – answers that are substrings of the evidence passage – then it works against our interests to include demonstrations where the model is unable to do this.

We will do this in two parts to facilitate testing.

### Task 1: Filtering demonstrations 1 [2 points]

This is the heart of the question: complete `filter_demos` so that, given a demonstration `d` and a list of demonstrations `demos`, it keeps `d` if and only if

1. The passage retrieved for `d` contrains `d.answer`, and
2. The model's generation for `d` based on `qa_template_with_passages` contains `d.answer`.

In [160]:
@dsp.transformation
def filter_demos(d):

    # Retrieve a passage for `d.question` and make sure that it
    # contains `d.answer`. Use `dsp.passage_match` for this!
    # return None if there is no match.
    ##### YOUR CODE HERE
    d_context = dsp.retrieve(d.question, k=1)
    if not dsp.passage_match(d_context, d.answer): return None
    d.context = d_context

    # Sample `k=3` demonstrations to help the model assess this
    # potential demonstration:
    ##### YOUR CODE HERE
    squad_demos = dsp.sample(train=squad_train, k=3)
    d.demos = squad_demos

    # Generate an answer based on `qa_template_with_passages`
    # and use `dsp.answer_match` to check that the predicted answer
    # contains `d.answer`. If it does not, return None.
    ##### YOUR CODE HERE
    ex, ex_compl = dsp.generate(qa_template_with_passages)(d, stage='qa')
    if not dsp.answer_match(ex_compl.answer, d.answer): return None 

    # Return d, if you got this far:
    ##### YOUR CODE HERE
    return d

Here's a test; this is not an ideal unit test because we don't know which LM you will be using, but it should clarify our intentions and help you with debugging.

In [161]:
def test_filter_demos(func):
    # This example should be filtered at the retrieval step, since
    # 👽 is not in the index:
    ex1 = dsp.Example(
        question="Who is 👽?", context="C0", answer=["👽"])
    result1 = func(ex1)
    errcount = 0
    if result1 is not None:
        errcount += 1
        print(f"Error for `{func.__name__}`: Expected {None}, got {result1}")
    # This example should not be filtered given our tester LM:
    ex2 = dsp.Example(
        question="Who is Beyoncé?", context="C0", answer=["Beyoncé"])
    # This example should be filtered given our tester LM:
    ex3 = dsp.Example(
        question="Who is Beyoncé?", context="C0", answer=["NO MATCH"])
    class TestLM:
        def __init__(self, **kwargs):
            self.kwargs = kwargs
            self.history = []

        def __call__(self, prompt, **kwargs):
            answer = ["Beyoncé"]
            return answer
    dsp.settings.configure(lm=TestLM(), rm=rm)
    try:
        result2 = func(ex2)
        if result2 is None:
            errcount += 1
            print(f"Error for `{func.__name__}`: "
                  f"Expected example not to be filtered by `answer_match`.")
        result3 = func(ex3)
        if result3 is not None:
            errcount += 1
            print(f"Error for `{func.__name__}`: "
                  f"Expected example to be filtered by `answer_match`.")
    except:
        raise
    finally:
        # Restore the actual model:
        dsp.settings.configure(lm=lm, rm=rm)
    if errcount == 0:
        print(f"No errors detected for `{func.__name__}`")

In [162]:
test_filter_demos(filter_demos)

No errors detected for `filter_demos`


### Task 2: Full filtering program [1 point]

The task is to complete `few_shot_openqa_with_context_and_demo_filtering` as a few-shot OpenQA system like the one from Question 1, but using the filtering mechanism defined by `filter_demos`.

In [163]:
@dsp.transformation
def few_shot_openqa_with_context_and_demo_filtering(example, train=squad_train, k=3):

    # Sample 20 demonstrations:
    ##### YOUR CODE HERE
    demos = dsp.sample(train=train, k=20)

    # Filter the demonstrations using `annotate` and `filter_demos`.
    # The user's `k` should be used to specify the maximum number of
    # demonstrations kept at this stage.
    ##### YOUR CODE HERE
    filtered = dsp.annotate(filter_demos)(demos, k=k)

    # Add the list of filtered demonstrations as the `demos`
    # attribute of `example`:
    ##### YOUR CODE HERE
    example.demos = filtered

    # Retrieve a context passage for `example.question` and add it
    # as the `context` attribute for the example:
    ##### YOUR CODE HERE
    ex_context = dsp.retrieve(example.question, k=1)
    example.context = ex_context

    # Generate a prediction using `qa_template_with_passages` as
    # we did before:
    ##### YOUR CODE HERE
    ex, ex_compl = dsp.generate(qa_template_with_passages)(example, stage='qa')
    
    # Return the generated `Completions` instance:
    ##### YOUR CODE HERE
    return ex_compl

Our previous test should suffice to help with debugging this program:

In [164]:
test_few_shot_openqa_with_context(
    few_shot_openqa_with_context_and_demo_filtering)

No errors found for `few_shot_openqa_with_context_and_demo_filtering`


Quick example:

In [165]:
print(few_shot_openqa_with_context_and_demo_filtering(dev_exs[0]).answer)

Anthony Coburn


In [166]:
lm.inspect_history(n=1)





Answer questions with short factoid answers.

---

Follow the following format.

Context: ${sources that may contain relevant content}
Question: ${the question to be answered}
Answer: ${a short factoid answer, often between 1 and 5 words}

---

Context: «Adult Contemporary (chart) | An article on MTV's website by Corey Moss describes this trend: "In other words, AC stations are where pop songs go to die a very long death. Or, to optimists, to get a second life." One theory states that many adult contemporary stations play less newer music because they also give ample airtime to hits of the past, so the de-emphasis on new songs slows the progression of the AC chart. Also, certain program directors have asserted that AC is a song-based format, as opposed to other radio formats that are infused with singer-based programming, so there is no guarantee»
Question: What type of music are AC stations noted as playing less of versus hits of the past?
Answer: newer music

---

Context: «Mayor

Here is code for an optional initial evaluation with `tiny_dev`:

In [114]:
filtering_results = evaluateAnswer(
    few_shot_openqa_with_context_and_demo_filtering, tiny_dev)

filtering_results['f1']

100%|██████████| 25/25 [01:25<00:00,  3.41s/it]


0.46799999999999997

## Question 3: Your original system [3 points]

This question asks you to design your own few-shot OpenQA system. All of the code above can be used and modified for this, and the requirement is just that you try something new that goes beyond what we've done so far. 

Terms for the bake-off:

* You can make free use of SQuAD and other publicly available data.

* The LM must be an autoregressive language model. No trained QA components can be used. This includes general purpose LMs that have been fine-tuned for QA. (We have obviously waded into some vague territory here. The spirit of this is to make use of frozen, general-purpose models. We welcome questions about exactly how this is defined, since it could be instructive to explore this.)

Here are some ideas for the original system:

* We have so far sampled randomly from the SQuaD train set to create few-shot prompts. One might instead sample passages that have some connection to the target question. See `dsp.knn`, for example.

* There are a lot of parameters to our LMs that we have so far ignored. Exploring different values might lead to better results. The `temperature` parameter is highly impactful for our task.

* We have so far made no use of the scores from the LM or the RM.

* We have so far made no use of DSP's functionality for self-consistency. See the DSP intro notebook for examples.

__Original system instructions__:

In the cell below, please provide a brief technical description of your original system, so that the teaching team can gain an understanding of what it does. This will help us to understand your code and analyze all the submissions to identify patterns and strategies.

In [204]:
# PLEASE MAKE SURE TO INCLUDE THE FOLLOWING BETWEEN THE START AND STOP COMMENTS:
#   1) Textual description of your system.
#   2) The code for your original system.
# PLEASE MAKE SURE NOT TO DELETE OR EDIT THE START AND STOP COMMENTS

# START COMMENT: Enter your system description in this cell.
"""__system-description__
This bakeoff system consists of two minor changes and one major change. 

Changes are as follows: 
1. [minor] The LM employed by the DSP program has been changed from 'text-davinci-001' 
   to 'gpt-3.5-turbo' to afford the DSP program the general improved performance and 
   expressiveness that comes with this larger, more powerful, and more expensive model. 
   
2. [minor] In the call to instantiate the DSP `lm`, we specify the `temperature` 
   paramater to be set to 0.1 to strongly bias model responses toward those with 
   higher scores and associated certainty. 
   
3. [major] The DSP program now utilizes a summary-based demonstration filtering 
   function which involves cosine similarity computations as a part of the 
   `dsp.annotate` call. The function which generates answers to given dsp.Example 
   questions (few_shot_openqa_with_custom_context_and_demo_filtering) also employs 
   cosine similarity-based filtering of potential context passages before summarizing 
   said passages with the summary being used as the context for question answering. 
   Further details on both components of this system are provided below. 

The inspiration for this system was my personal interpretation of how a 
human agent would answer a generic question without being given any kind 
of reference passage. In such a situation, I envisioned a human agent 
proceeding as follows: 

1. The agent will look up multiple context sources related to the question. 

2. The agent will assess whether each context source is sufficiently related 
   to the question (for some def. of sufficiently related).
   
3. After potentially filtering unrelated sources in the previous step, the agent 
   will summarize information presented across the remaining context sources, with 
   the summarization function acting as a natural language 'intersection' function 
   (in that the information common to all sources will likely be included in the 
   summary as if some factoid is present in all sources, it is likely of high truth 
   value and importance). 
   
4. The agent will review the summary and use it to answer the question.

A human agent may implicitly perform these operations, but for this DSP program, 
I seek to make each step explicit and reproducible. Their translation into code 
is as follows: 

When considering the demonstrations I want to provide to the DSP program for a 
given dsp.Example question, I want the demonstrations to represent successful 
enactments of the worklow described above. To that end, the demonstration 
filtering function (custom_filter_demos) used as a part of `dsp.annotate` to 
curate demonstrations operates as follows: 

1. For a given candidate demonstration d, I retrieve a few context passages for 
   the associated `d.question` and summarize them using an XLNet-based transformer 
   architecture designed for text summarization. 
   
2. If the summary does not contain `d.answer` and/or if the answer generated using 
   the `qa_template_with_passages` template does not contain `d.answer`, the 
   demonstration is discarded.  
   
3. Since both the context summary and generated answer contain `d.answer`, this 
   is a signal that the originally retrieved context passages were 'sufficiently 
   related'to the demonstration's question. To codify this definition in a 
   quantitative way, for all non-discarded demonstrations, I embed the relevant 
   question and context passages using a general-purpose BERT-based sentence 
   embedding transformer architecture, compute the pairwise cosine similarity 
   between question and passage embeddings, and incrementally estimate the avg. 
   cosine similarity for all such pairs. This number, `AVG_COSINE_SIM` will be 
   relevant shortly.
   
With our demonstration filtering function defined, the actual question answering
function, few_shot_openqa_with_custom_context_and_demo_filtering, works as follows: 

1. For a given dsp.Example, we begin by sampling a high-ish number (30) of 
   demonstrations and filter them (using dsp.annotate) according to our 
   previously-described filtering function. A maximum of some number (set 
   by a parameter) of these demonstrations are kept and provided to the 
   dsp.Example as demos. 
   
2. Then, as many as 10 context passages are retrieved for the dsp.Example
   question. The question and each context passage is embedded using the 
   same BERT-based sentence embedder used in the filtering function. 
   Pairwise cosine similarity between the question and context embeddings 
   are computed. 
   
   All context passages whose embeddings share a cosine similarity of at
   least `AVG_COSINE_SIM` (the metric defining 'sufficiently related' as 
   informed by retained demonstrations) are then summarized (using the 
   same XLNet-based summarizer as used in the filtering function) and 
   provided as the context for the dsp.Example question, which can then 
   be passed to the dsp lm for attempted question answering. 

Notes on imported model components: 

1. The summarizer is a version of the XLNet model. This model uses 
   generalized autoregressive pretraining which is not specific to 
   question answering. 
      - Link: https://huggingface.co/xlnet-base-cased.
   
2. The sentence embedder is based on the BERT architecture. It is a 
   fine-tuned version of the pretrained 'MiniLM-L6-H384-uncased' model.
   Fine-tuning is performed with a contrastive learning objective not 
   specific to question answering. 
      - Link: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2.
      
3. The language model used by the DSP program is 'gpt-3.5-turbo' provided 
   by OpenAI. The model is optimized for chat completion which is not 
   specific to quesion answering. 
      - Link: https://platform.openai.com/docs/models/gpt-3-5. 
"""

# additional imports: 
from summarizer             import TransformerSummarizer    # for summarizing context passages. 
from sentence_transformers  import SentenceTransformer      # for embedding sentences.
from scipy.spatial.distance import cosine                   # for computing cosine similarity between text embeddings. 
 
import numpy as np

# instantiate our summarizer and sentence embedding models: 
summarizer  = TransformerSummarizer(transformer_type="XLNet", transformer_model_key="xlnet-base-cased")
sbert_model = SentenceTransformer('all-MiniLM-L6-v2')

# global variables: 
# (1) AVG_COSINE_SIM:
#       the average cosine similarity between a demonstration question 
#       and its relevant context passages if said demonstration question
#       can be answered by a summary of the context passages. 
# (2) NUM_EXS:
#       the number of context passages observed in the demonstration   
#       filtering and annotation process. 
AVG_COSINE_SIM, NUM_EXS = 0, 0  

# change `lm` to more powerful (and expensive) 'gpt-3.5-turbo' model for evaluation.
# when setting `lm`, set the temperature parameter near-ish 0 to bias toward answers
# with higher scores/certainty.
lm = dsp.GPT3(model='gpt-3.5-turbo', api_key=openai_key, temperature=0.1)
dsp.settings.configure(lm=lm, rm=rm)

@dsp.transformation
def custom_filter_demos(d): 
    # retrieve a few context passages for `d.question` and summarize them 
    # using our previously-specified XLNet-based summarizer: 
    d_context = dsp.retrieve(d.question, k=5)
    summary = ''.join(summarizer(' '.join(d_context), min_length = 60)) 
    
    # check if the summary contains `d.answer`. if not, return None;
    # if yes, assign `summary` to `d.context`: 
    if not dsp.passage_match([summary], d.answer): return None 
    d.context = summary 
    
    # as with filter_demos(), sample `k=3` demonstrations to help 
    # the model assess this potential demonstration: 
    squad_demos = dsp.sample(train=squad_train, k=3)
    d.demos = squad_demos 
    
    # generate an answer based on `qa_template_with_passages`
    # and use `dsp.answer_match` to check that the predicted answer 
    # contains `d.answer`. if it doesn't, return None: 
    _, ex_compl = dsp.generate(qa_template_with_passages)(d, stage='qa')
    if not dsp.answer_match(ex_compl.answer, d.answer): return None 
    
    # embed the `d.question` vector using the previously-specified 
    # BERT-based SentenceTransformer model: 
    d_question_vec = sbert_model.encode([d.question])[0]
    
    # emebd each of the context passages using the same model: 
    d_context_vecs = [sbert_model.encode([context_passage])[0] for context_passage in d_context]
    
    # compute the cosine distance between `d_question_vec` and 
    # each context vec. and incrementally update global vars. 
    # `AVG_COSINE_SIM` and `NUM_EXS`: 
    for context_vec in d_context_vecs: 
        question_context_sim = 1 - cosine(d_question_vec, context_vec)
        global AVG_COSINE_SIM; global NUM_EXS
        total_cosine_sim = AVG_COSINE_SIM * NUM_EXS 
        total_cosine_sim += question_context_sim 
        NUM_EXS += 1
        AVG_COSINE_SIM = total_cosine_sim / NUM_EXS
        
    # return the demonstration: 
    return d 

@dsp.transformation
def few_shot_openqa_with_custom_context_and_demo_filtering(example, train=squad_train, k=3): 
    # sample a high-ish number (30) of demonstrations: 
    demos = dsp.sample(train=train, k=30)
    
    # filter the demonstrations using `annotate` and `filter_demos`. 
    # the `k` provided in the function signature specifies the maximum 
    # number of demonstrations kept at this stage: 
    filtered = dsp.annotate(custom_filter_demos)(demos, k=k)
    
    # assign the list of filtered demonstrations to the `demos`
    # attribute of `example`:
    example.demos = filtered 
    
    # retrieve many (10) context passages for `example.question`: 
    ex_context = dsp.retrieve(example.question, k=10)
    
    # embed `example.question` and the context passages using the 
    # BERT-based SentenceTransformer model: 
    q_vec = sbert_model.encode([ex.question])[0]
    context_vecs = [sbert_model.encode([context_passage])[0] for context_passage in ex_context]
    
    # compute the cosine sim. between each context passage vector 
    # and the question vector: 
    cosine_sims = [1 - cosine(context_vec, q_vec) for context_vec in context_vecs]
    
    # determine which context vectors (and associated passages) 
    # have at least AVG_COSINE_SIM similarity with the question: 
    global AVG_COSINE_SIM
    min_cosine_sim_achieved = ['T' if cos_sim >= AVG_COSINE_SIM else 'F' for cos_sim in cosine_sims]

    # if no context vector achieves at least AVG_COSINE_SIM, then 
    # set the `example.context` to be the context passage for the 
    # vector with the highest cosine similarity with the question: 
    if 'T' not in min_cosine_sim_achieved: 
        index_max = np.argmax(cosine_sims)
        example.context = ex_context[index_max]
        
    # otherwise, retrieve all context passages which achieve 
    # at least AVG_COSINE_SIM similarity with the question and 
    # summarize them using the XLNet-based summarizer. Set the 
    # summary as the context for the question. 
    else: 
        context_passages = ""
        for idx, elem in enumerate(min_cosine_sim_achieved): 
            if elem == 'T': 
                context_passages += ex_context[idx]
                
        context_passages_summary = ''.join(summarizer(context_passages, min_length = 60))
        example.context = context_passages_summary

    # generate a prediction using `qa_template_with_passages`
    # as done before:
    _, ex_compl = dsp.generate(qa_template_with_passages)(example, stage='qa')
    
    # return the generated `Completions` instance: 
    return ex_compl

# system evaluation: 
few_shot_openqa_with_custom_context_and_demo_filtering_results = evaluateAnswer(
    few_shot_openqa_with_custom_context_and_demo_filtering, tiny_dev)
em, f1 = few_shot_openqa_with_custom_context_and_demo_filtering_results['em'], few_shot_openqa_with_custom_context_and_demo_filtering_results['f1']
print(f'Number of exact matches :: {em} out of {len(tiny_dev)}, F1 on evaluation set: {f1:.3f}')
# STOP COMMENT: Please do not remove this comment.

Some weights of the model checkpoint at xlnet-base-cased were not used when initializing XLNetModel: ['lm_loss.bias', 'lm_loss.weight']
- This IS expected if you are initializing XLNetModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLNetModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
100%|██████████| 25/25 [07:42<00:00, 18.49s/it]

Number of exact matches :: 20.0 out of 25, F1 on evaluation set: 0.333





## Question 4: Bakeoff entry [1 point]

For the bake-off, you simply need to be able to run your system on the file 

```data/openqa/cs224u-openqa-test-unlabeled.txt```

The following code should download it for you if necessary:

In [116]:
if not os.path.exists(os.path.join("data", "openqa", "cs224u-openqa-test-unlabeled.txt")):
    !mkdir -p data/openqa
    !wget https://web.stanford.edu/class/cs224u/data/cs224u-openqa-test-unlabeled.txt -P data/openqa/

If the above fails, you can just download https://web.stanford.edu/class/cs224u/data/cs224u-openqa-test-unlabeled.txt and place it in `data/openqa`.

This file contains only questions. The starter code below will help you structure this. It writes a file "cs224u-openqa-bakeoff-entry.json" to the current directory. That file should be uploaded as-is. Please do not change its name.

In [202]:
import json

def create_bakeoff_submission(fn):
    """"
    The argument `fn` is a DSP program with the same signature as the 
    ones we wrote above: `dsp.Example` to `dsp.Completions`.
    """

    filename = os.path.join("data", "openqa", "cs224u-openqa-test-unlabeled.txt")

    # This should become a mapping from questions (str) to response
    # dicts from your system.
    gens = {} 

    with open(filename) as f:
        questions = f.read().splitlines()

    questions = [dsp.Example(question=q) for q in questions]

    # `questions` is the list of `dsp.Example` instances you need to 
    # evaluate your system on. 
    #
    # Here we loop over the questions, run the system `fn`, and
    # store its `answer` value as the prediction:
    for question in tqdm.tqdm(questions):
        gens[question.question] = fn(question).answer
        # write results to file incrementally to ensure no lost progress:
        with open("cs224u-openqa-bakeoff-entry.json", "wt") as f:
            json.dump(gens, f, indent=4)
            
    # Quick tests we advise you to run: 
    # 1. Make sure `gens` is a dict with the questions as the keys:
    assert all(q.question in gens for q in questions)
    # 2. Make sure the values are dicts and have the key we will use:
    assert all(isinstance(d, str) for d in gens.values())

Here's what it looks like to evaluate our first program, `few_shot_openqa`, on the bakeoff data:

In [None]:
create_bakeoff_submission(few_shot_openqa_with_context)

And here's a function call to evaluate the original bakeoff system, ```few_shot_openqa_with_custom_context_and_demo_filtering```, on the bakeoff data: 

In [203]:
create_bakeoff_submission(few_shot_openqa_with_custom_context_and_demo_filtering)

  3%|▎         | 12/400 [03:26<1:48:46, 16.82s/it]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


  3%|▎         | 13/400 [03:43<1:49:14, 16.94s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.8 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


  4%|▍         | 17/400 [05:05<2:05:12, 19.62s/it]

Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


  5%|▍         | 19/400 [05:46<2:08:09, 20.18s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.1 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


  5%|▌         | 20/400 [06:04<2:03:48, 19.55s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


  5%|▌         | 21/400 [06:23<2:01:47, 19.28s/it]

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.8 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.7 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


  6%|▌         | 22/400 [06:44<2:05:46, 19.96s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


  6%|▌         | 23/400 [07:04<2:05:02, 19.90s/it]

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


  6%|▌         | 24/400 [07:23<2:03:27, 19.70s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.1 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


  6%|▋         | 25/400 [07:44<2:05:29, 20.08s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


  6%|▋         | 26/400 [08:03<2:02:19, 19.63s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.4 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


  7%|▋         | 28/400 [08:43<2:03:04, 19.85s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


  7%|▋         | 29/400 [09:03<2:02:57, 19.89s/it]

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.2 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.7 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


  8%|▊         | 30/400 [09:26<2:07:10, 20.62s/it]

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.8 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


  8%|▊         | 31/400 [09:45<2:03:47, 20.13s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


  8%|▊         | 32/400 [10:03<2:01:11, 19.76s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.9 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


  8%|▊         | 33/400 [10:23<2:00:33, 19.71s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.3 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.4 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 8.4 seconds after 5 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


  9%|▉         | 35/400 [11:07<2:03:37, 20.32s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


  9%|▉         | 36/400 [11:23<1:56:28, 19.20s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.7 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 7.8 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


  9%|▉         | 37/400 [11:48<2:06:24, 20.89s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 10%|▉         | 38/400 [12:03<1:54:40, 19.01s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.7 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 10%|▉         | 39/400 [12:23<1:56:26, 19.35s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.4 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.7 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 10%|█         | 40/400 [12:45<2:01:25, 20.24s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.8 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 10%|█         | 41/400 [13:03<1:56:53, 19.54s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.9 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 7.2 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 11%|█         | 43/400 [13:44<1:56:53, 19.65s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 12%|█▏        | 49/400 [15:48<1:52:05, 19.16s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 12%|█▎        | 50/400 [16:04<1:45:43, 18.12s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.4 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.7 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.2 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 4.3 seconds after 5 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 13%|█▎        | 51/400 [16:26<1:53:08, 19.45s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.4 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 13%|█▎        | 52/400 [16:44<1:50:09, 18.99s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 13%|█▎        | 53/400 [17:05<1:53:38, 19.65s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.3 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 14%|█▎        | 54/400 [17:23<1:50:06, 19.09s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.4 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.1 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 14%|█▍        | 55/400 [17:44<1:52:36, 19.58s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.4 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 6.5 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 14%|█▍        | 56/400 [18:08<1:59:40, 20.87s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.1 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 14%|█▍        | 57/400 [18:23<1:49:50, 19.22s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.3 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 6.6 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 14%|█▍        | 58/400 [18:48<1:59:06, 20.90s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.2 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 15%|█▍        | 59/400 [19:05<1:52:05, 19.72s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.4 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.9 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.1 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 15%|█▌        | 60/400 [19:23<1:49:47, 19.38s/it]

Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.5 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 4.2 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 15%|█▌        | 61/400 [19:47<1:56:58, 20.70s/it]

Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.5 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 16%|█▌        | 62/400 [20:06<1:53:35, 20.17s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.1 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 16%|█▌        | 63/400 [20:24<1:49:38, 19.52s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.4 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 4.9 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 16%|█▌        | 64/400 [20:47<1:54:03, 20.37s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.7 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 16%|█▋        | 65/400 [21:03<1:47:45, 19.30s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.2 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.2 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 5.9 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 16%|█▋        | 66/400 [21:25<1:50:38, 19.88s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 6.7 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 17%|█▋        | 67/400 [21:48<1:55:34, 20.82s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.2 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 17%|█▋        | 68/400 [22:04<1:47:36, 19.45s/it]

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 4.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 17%|█▋        | 69/400 [22:24<1:49:17, 19.81s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.2 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.8 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 18%|█▊        | 70/400 [22:43<1:46:52, 19.43s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 4.8 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 18%|█▊        | 71/400 [23:05<1:50:52, 20.22s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.8 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.8 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.2 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 18%|█▊        | 72/400 [23:24<1:47:46, 19.71s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.2 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.8 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.8 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 18%|█▊        | 73/400 [23:44<1:48:26, 19.90s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.2 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.4 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 7.0 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 19%|█▉        | 77/400 [25:05<1:47:29, 19.97s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 20%|█▉        | 78/400 [25:25<1:46:01, 19.76s/it]

Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.5 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 20%|█▉        | 79/400 [25:45<1:46:22, 19.88s/it]

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 20%|██        | 80/400 [26:03<1:43:38, 19.43s/it]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 20%|██        | 81/400 [26:23<1:43:47, 19.52s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.5 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 20%|██        | 82/400 [26:45<1:48:13, 20.42s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.1 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 21%|██        | 83/400 [27:04<1:44:55, 19.86s/it]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 21%|██        | 84/400 [27:23<1:43:07, 19.58s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.5 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 22%|██▏       | 86/400 [28:03<1:42:55, 19.67s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.1 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.2 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 13.9 seconds after 5 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 22%|██▏       | 89/400 [29:06<1:40:31, 19.39s/it]

Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.8 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 22%|██▎       | 90/400 [29:23<1:36:43, 18.72s/it]

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.2 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 23%|██▎       | 91/400 [29:43<1:37:50, 19.00s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.7 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.4 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 23%|██▎       | 92/400 [30:04<1:41:04, 19.69s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.8 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 23%|██▎       | 93/400 [30:23<1:39:02, 19.36s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 24%|██▎       | 94/400 [30:45<1:42:31, 20.10s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.8 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 24%|██▍       | 95/400 [31:04<1:41:34, 19.98s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.3 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 24%|██▍       | 96/400 [31:24<1:40:31, 19.84s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 24%|██▍       | 97/400 [31:43<1:39:06, 19.62s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 4.5 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 24%|██▍       | 98/400 [32:05<1:42:40, 20.40s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 25%|██▍       | 99/400 [32:23<1:38:21, 19.61s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.8 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.8 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 25%|██▌       | 100/400 [32:45<1:42:21, 20.47s/it]

Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 25%|██▌       | 101/400 [33:03<1:37:32, 19.57s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.8 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.8 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 26%|██▌       | 102/400 [33:23<1:38:47, 19.89s/it]

Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.7 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.1 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 26%|██▌       | 103/400 [33:44<1:38:51, 19.97s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 26%|██▌       | 104/400 [34:04<1:38:22, 19.94s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.6 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.2 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 26%|██▋       | 105/400 [34:24<1:39:02, 20.15s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.2 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.8 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 26%|██▋       | 106/400 [34:45<1:39:53, 20.38s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.8 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 27%|██▋       | 107/400 [35:04<1:36:52, 19.84s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.4 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 27%|██▋       | 109/400 [35:43<1:35:08, 19.62s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.2 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 28%|██▊       | 110/400 [36:03<1:34:36, 19.57s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.8 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 7.8 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 28%|██▊       | 113/400 [37:06<1:36:13, 20.12s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.8 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 29%|██▉       | 116/400 [38:04<1:32:27, 19.53s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.4 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 29%|██▉       | 117/400 [38:24<1:32:42, 19.65s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 30%|██▉       | 118/400 [38:44<1:32:59, 19.79s/it]

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 30%|███       | 120/400 [39:23<1:32:54, 19.91s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 30%|███       | 122/400 [40:04<1:32:34, 19.98s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.4 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 31%|███       | 123/400 [40:24<1:32:14, 19.98s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.2 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 31%|███▏      | 125/400 [41:05<1:33:07, 20.32s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.6 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 32%|███▏      | 126/400 [41:26<1:33:59, 20.58s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 32%|███▏      | 127/400 [41:43<1:28:53, 19.54s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 32%|███▏      | 128/400 [42:03<1:28:42, 19.57s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 33%|███▎      | 132/400 [43:25<1:28:02, 19.71s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.6 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 5.7 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 34%|███▍      | 135/400 [44:25<1:27:58, 19.92s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.7 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.8 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 34%|███▍      | 136/400 [44:45<1:26:50, 19.74s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.1 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 34%|███▍      | 137/400 [45:03<1:24:35, 19.30s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 34%|███▍      | 138/400 [45:23<1:25:23, 19.56s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.9 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 35%|███▌      | 140/400 [46:04<1:24:48, 19.57s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.8 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.8 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 35%|███▌      | 141/400 [46:23<1:24:06, 19.48s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.8 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.1 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 36%|███▌      | 142/400 [46:45<1:26:51, 20.20s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 36%|███▌      | 143/400 [47:04<1:25:10, 19.88s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.8 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 36%|███▌      | 144/400 [47:24<1:25:34, 20.06s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 36%|███▋      | 145/400 [47:44<1:25:04, 20.02s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 37%|███▋      | 149/400 [49:05<1:23:37, 19.99s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.2 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.1 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 38%|███▊      | 152/400 [50:03<1:19:40, 19.28s/it]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.6 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.4 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 38%|███▊      | 153/400 [50:24<1:20:46, 19.62s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.3 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 38%|███▊      | 154/400 [50:43<1:20:35, 19.66s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.9 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 39%|███▉      | 155/400 [51:03<1:20:32, 19.72s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.5 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.5 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 39%|███▉      | 156/400 [51:26<1:23:22, 20.50s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 39%|███▉      | 157/400 [51:44<1:20:26, 19.86s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 40%|███▉      | 158/400 [52:04<1:19:54, 19.81s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.2 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.3 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 40%|███▉      | 159/400 [52:24<1:19:39, 19.83s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.1 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.1 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 5.8 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 40%|████      | 161/400 [53:06<1:20:10, 20.13s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.8 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.1 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 40%|████      | 162/400 [53:24<1:18:20, 19.75s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 41%|████      | 163/400 [53:43<1:16:48, 19.44s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.7 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.8 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 41%|████      | 164/400 [54:04<1:18:12, 19.88s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.3 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 42%|████▏     | 166/400 [54:44<1:16:58, 19.74s/it]

Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.9 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.4 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 42%|████▏     | 167/400 [55:04<1:16:41, 19.75s/it]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.4 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.6 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 42%|████▏     | 168/400 [55:23<1:16:01, 19.66s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 42%|████▏     | 169/400 [55:44<1:16:40, 19.92s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.2 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.7 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 42%|████▎     | 170/400 [56:04<1:16:52, 20.05s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 43%|████▎     | 171/400 [56:24<1:16:17, 19.99s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.2 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 43%|████▎     | 172/400 [56:44<1:15:32, 19.88s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 7.5 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 44%|████▎     | 174/400 [57:28<1:17:19, 20.53s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 44%|████▍     | 175/400 [57:43<1:11:11, 18.98s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.7 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.8 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 44%|████▍     | 176/400 [58:06<1:14:55, 20.07s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 44%|████▍     | 177/400 [58:24<1:12:39, 19.55s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 4.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 45%|████▍     | 179/400 [59:04<1:12:06, 19.58s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.6 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.3 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 45%|████▌     | 180/400 [59:25<1:12:36, 19.80s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.7 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 45%|████▌     | 181/400 [59:45<1:12:59, 20.00s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.2 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.0 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 46%|████▌     | 182/400 [1:00:04<1:11:07, 19.58s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.5 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 46%|████▌     | 183/400 [1:00:25<1:12:43, 20.11s/it]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 46%|████▌     | 184/400 [1:00:44<1:11:21, 19.82s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.8 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 46%|████▋     | 185/400 [1:01:04<1:11:04, 19.84s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.8 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.3 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 46%|████▋     | 186/400 [1:01:25<1:11:38, 20.09s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 47%|████▋     | 187/400 [1:01:43<1:09:42, 19.64s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.2 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 47%|████▋     | 188/400 [1:02:03<1:09:32, 19.68s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.1 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.3 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.0 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 47%|████▋     | 189/400 [1:02:25<1:12:06, 20.50s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 48%|████▊     | 190/400 [1:02:43<1:09:10, 19.76s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.7 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.2 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.7 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 48%|████▊     | 192/400 [1:03:25<1:09:13, 19.97s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.6 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 48%|████▊     | 193/400 [1:03:45<1:09:47, 20.23s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 48%|████▊     | 194/400 [1:04:04<1:07:36, 19.69s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.9 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 49%|████▉     | 195/400 [1:04:26<1:09:23, 20.31s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 49%|████▉     | 196/400 [1:04:43<1:06:21, 19.52s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.3 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.1 seconds after 5 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 49%|████▉     | 197/400 [1:05:04<1:07:05, 19.83s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.4 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 50%|████▉     | 198/400 [1:05:24<1:06:44, 19.82s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.1 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.4 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 50%|█████     | 200/400 [1:06:03<1:05:09, 19.55s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.2 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.7 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 4.4 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 50%|█████     | 202/400 [1:06:44<1:05:03, 19.71s/it]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.4 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 7.1 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 51%|█████     | 204/400 [1:07:27<1:06:11, 20.26s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.8 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.6 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 51%|█████▏    | 205/400 [1:07:46<1:04:59, 20.00s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 52%|█████▏    | 206/400 [1:08:04<1:02:08, 19.22s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.2 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.6 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 52%|█████▏    | 207/400 [1:08:25<1:03:21, 19.70s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 52%|█████▏    | 208/400 [1:08:43<1:01:50, 19.33s/it]

Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.6 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 6.5 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 52%|█████▎    | 210/400 [1:09:25<1:02:59, 19.89s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.6 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 53%|█████▎    | 211/400 [1:09:44<1:01:41, 19.58s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.8 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 53%|█████▎    | 212/400 [1:10:05<1:02:41, 20.01s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.2 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.9 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 53%|█████▎    | 213/400 [1:10:26<1:03:02, 20.23s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.1 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 54%|█████▎    | 214/400 [1:10:44<1:00:38, 19.56s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.2 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.3 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 54%|█████▍    | 215/400 [1:11:03<1:00:01, 19.47s/it]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.3 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 54%|█████▍    | 216/400 [1:11:23<1:00:06, 19.60s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.7 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.1 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.1 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 10.7 seconds after 5 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 55%|█████▍    | 219/400 [1:12:24<56:47, 18.83s/it]  

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.7 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.8 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 55%|█████▌    | 220/400 [1:12:45<58:11, 19.40s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.7 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 55%|█████▌    | 221/400 [1:13:05<59:08, 19.82s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.8 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 56%|█████▌    | 222/400 [1:13:25<58:17, 19.65s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.8 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 56%|█████▌    | 223/400 [1:13:44<57:38, 19.54s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 56%|█████▌    | 224/400 [1:14:03<57:12, 19.50s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.6 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.6 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.3 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 56%|█████▋    | 225/400 [1:14:24<58:19, 20.00s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 56%|█████▋    | 226/400 [1:14:43<56:57, 19.64s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.1 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 57%|█████▋    | 227/400 [1:15:03<56:57, 19.75s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.8 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 57%|█████▋    | 228/400 [1:15:23<56:52, 19.84s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.8 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 4.2 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 57%|█████▊    | 230/400 [1:16:04<56:24, 19.91s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.2 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 58%|█████▊    | 231/400 [1:16:23<55:17, 19.63s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.8 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.4 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 58%|█████▊    | 232/400 [1:16:46<57:42, 20.61s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 58%|█████▊    | 233/400 [1:17:03<54:18, 19.51s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.4 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 58%|█████▊    | 234/400 [1:17:23<54:34, 19.73s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.1 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.3 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 59%|█████▉    | 235/400 [1:17:44<54:32, 19.83s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.9 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 59%|█████▉    | 236/400 [1:18:03<53:54, 19.72s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.3 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 6.7 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 60%|██████    | 242/400 [1:20:04<50:15, 19.09s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.8 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.2 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.8 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 61%|██████    | 244/400 [1:20:43<49:37, 19.09s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.1 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 4.9 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 61%|██████▏   | 245/400 [1:21:05<51:21, 19.88s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 62%|██████▏   | 246/400 [1:21:23<49:42, 19.37s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 62%|██████▏   | 247/400 [1:21:45<51:15, 20.10s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 62%|██████▏   | 248/400 [1:22:03<49:55, 19.71s/it]

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 62%|██████▏   | 249/400 [1:22:23<49:42, 19.75s/it]

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 63%|██████▎   | 251/400 [1:23:04<49:09, 19.79s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 6.4 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 64%|██████▎   | 254/400 [1:24:03<46:59, 19.31s/it]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.4 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 64%|██████▍   | 257/400 [1:25:07<48:01, 20.15s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 64%|██████▍   | 258/400 [1:25:24<45:25, 19.19s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 65%|██████▌   | 260/400 [1:26:04<45:30, 19.50s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.2 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.7 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 65%|██████▌   | 261/400 [1:26:24<45:22, 19.58s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.2 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 66%|██████▌   | 263/400 [1:27:05<45:36, 19.97s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.9 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 66%|██████▌   | 264/400 [1:27:24<44:26, 19.61s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.1 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 66%|██████▋   | 265/400 [1:27:44<44:44, 19.89s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.4 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 66%|██████▋   | 266/400 [1:28:04<44:21, 19.86s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 67%|██████▋   | 267/400 [1:28:23<43:31, 19.64s/it]

Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.4 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 67%|██████▋   | 269/400 [1:29:03<43:08, 19.76s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.8 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 68%|██████▊   | 271/400 [1:29:44<43:07, 20.06s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 68%|██████▊   | 272/400 [1:30:04<42:14, 19.80s/it]

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 68%|██████▊   | 274/400 [1:30:45<41:51, 19.93s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.3 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.8 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 14.5 seconds after 5 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 70%|██████▉   | 279/400 [1:32:24<36:39, 18.18s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 70%|███████   | 281/400 [1:33:05<38:15, 19.29s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.3 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.6 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 71%|███████   | 283/400 [1:33:43<37:26, 19.20s/it]

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.2 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.6 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 71%|███████   | 284/400 [1:34:03<37:41, 19.50s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.4 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.7 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 72%|███████▏  | 286/400 [1:34:44<37:23, 19.68s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.4 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 4.0 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 72%|███████▏  | 288/400 [1:35:23<36:34, 19.59s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.2 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.9 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 15.2 seconds after 5 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 73%|███████▎  | 292/400 [1:36:45<32:56, 18.30s/it]

Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.8 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.5 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 73%|███████▎  | 293/400 [1:37:03<32:54, 18.46s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.0 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 74%|███████▎  | 294/400 [1:37:23<33:25, 18.92s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.5 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 74%|███████▍  | 296/400 [1:38:03<33:14, 19.18s/it]

Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.8 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.5 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 74%|███████▍  | 297/400 [1:38:24<33:40, 19.62s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.2 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.5 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 74%|███████▍  | 298/400 [1:38:45<34:24, 20.24s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 75%|███████▍  | 299/400 [1:39:03<32:50, 19.51s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.7 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.1 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 75%|███████▌  | 300/400 [1:39:24<33:25, 20.05s/it]

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.6 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 75%|███████▌  | 301/400 [1:39:44<32:38, 19.78s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.9 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 3.3 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


 76%|███████▌  | 303/400 [1:40:24<31:47, 19.66s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 1.1 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.1 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 5.2 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


100%|█████████▉| 399/400 [2:12:51<00:16, 16.05s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 0.1 seconds after 2 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 2.3 seconds after 3 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}
Backing off 6.7 seconds after 4 tries calling function <function GPT3.request at 0x000001643308F040> with kwargs {}


100%|██████████| 400/400 [2:13:16<00:00, 19.99s/it]
