In [None]:
%%capture --no-stderr
%pip install openai==1.3.7 python-dotenv promptbench

In [2]:
import promptbench as pb

# print all supported datasets in promptbench
print('All supported datasets: ')
print(pb.SUPPORTED_DATASETS)

# load a dataset, sst2, for instance.
# if the dataset is not available locally, it will be downloaded automatically.
dataset_name = "gsm8k"
dataset = pb.DatasetLoader.load_dataset(dataset_name)

# print the first 3 examples
dataset[:3]

  from .autonotebook import tqdm as notebook_tqdm


All supported datasets: 
['sst2', 'cola', 'qqp', 'mnli', 'mnli_matched', 'mnli_mismatched', 'qnli', 'wnli', 'rte', 'mrpc', 'mmlu', 'squad_v2', 'un_multi', 'iwslt2017', 'math', 'bool_logic', 'valid_parentheses', 'gsm8k', 'csqa', 'bigbench_date', 'bigbench_object_tracking', 'last_letter_concat', 'numersense', 'qasc', 'bbh', 'drop', 'arc-easy', 'arc-challenge']


[{'content': "Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
  'label': '18'},
 {'content': 'A robe takes 2 bolts of blue fiber and half that much white fiber.  How many bolts in total does it take?',
  'label': '3'},
 {'content': 'Josh decides to try flipping a house.  He buys a house for $80,000 and then puts in $50,000 in repairs.  This increased the value of the house by 150%.  How much profit did he make?',
  'label': '70000'}]

In [6]:
import os
from dotenv import load_dotenv
from openai import OpenAI

# print all supported models in promptbench
print('All supported models: ')
print(pb.SUPPORTED_MODELS)

load_dotenv()

model = pb.LLMModel(model='gpt-4o',
                    api_key = os.getenv("OPENAI_API_KEY"),
                    max_new_tokens=300)

All supported models: 
['google/flan-t5-large', 'llama2-7b', 'llama2-7b-chat', 'llama2-13b', 'llama2-13b-chat', 'llama2-70b', 'llama2-70b-chat', 'phi-1.5', 'phi-2', 'palm', 'gpt-3.5-turbo', 'gpt-4', 'gpt-4-1106-preview', 'gpt-3.5-turbo-1106', 'gpt-4-0125-preview', 'gpt-3.5-turbo-0125', 'gpt-4-turbo', 'gpt-4o', 'vicuna-7b', 'vicuna-13b', 'vicuna-13b-v1.3', 'google/flan-ul2', 'gemini-pro', 'mistralai/Mistral-7B-v0.1', 'mistralai/Mistral-7B-Instruct-v0.1', 'mistralai/Mixtral-8x7B-v0.1', 'mistralai/Mixtral-8x7B-Instruct-v0.1', '01-ai/Yi-6B', '01-ai/Yi-34B', '01-ai/Yi-6B-Chat', '01-ai/Yi-34B-Chat', 'baichuan-inc/Baichuan2-7B-Base', 'baichuan-inc/Baichuan2-13B-Base', 'baichuan-inc/Baichuan2-7B-Chat', 'baichuan-inc/Baichuan2-13B-Chat']


In [7]:
# load method
# print all methods and their supported datasets
print('All supported methods: ')
print(pb.SUPPORTED_METHODS)
print('Supported datasets for each method: ')
print(pb.METHOD_SUPPORT_DATASET)

# load a method, emotion_prompt, for instance.
# https://github.com/microsoft/promptbench/tree/main/promptbench/prompt_engineering
method = pb.PEMethod(method='emotion_prompt',
                        dataset=dataset_name,
                        verbose=True,  # if True, print the detailed prompt and response
                        prompt_id = 1  # for emotion_prompt
                        )

All supported methods: 
['CoT', 'ZSCoT', 'least_to_most', 'generated_knowledge', 'expert_prompting', 'emotion_prompt', 'baseline']
Supported datasets for each method: 
{'CoT': ['gsm8k', 'csqa', 'bigbench_date', 'bigbench_object_tracking'], 'ZSCoT': ['gsm8k', 'csqa', 'bigbench_date', 'bigbench_object_tracking'], 'expert_prompting': ['gsm8k', 'csqa', 'bigbench_date', 'bigbench_object_tracking'], 'emotion_prompt': ['gsm8k', 'csqa', 'bigbench_date', 'bigbench_object_tracking'], 'least_to_most': ['gsm8k', 'last_letter_concat'], 'generated_knowledge': ['csqa', 'numersense', 'qasc'], 'baseline': ['gsm8k', 'csqa', 'bigbench_date', 'bigbench_object_tracking', 'last_letter_concat', 'numersense', 'qasc']}


In [8]:
results = method.test(dataset,
                      model,
                      num_samples=3 # if don't set the num_samples, method will use all examples in the dataset
                      )
results

  0%|          | 1/1319 [00:04<1:37:18,  4.43s/it]

Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?
This is very important to my career.
Please output your answer at the end as ##<your answer (arabic numerals)>

To determine how much Janet makes every day at the farmers' market, we need to follow these steps:

1. Calculate the total number of eggs laid by the ducks each day.
2. Subtract the number of eggs Janet uses for breakfast.
3. Subtract the number of eggs Janet uses for baking muffins.
4. Determine the number of eggs left to sell at the farmers' market.
5. Multiply the number of eggs she sells by the price per egg.

Let's go through the calculations step-by-step:

1. Total number of eggs laid per day: 16 eggs
2. Eggs used for breakfast: 3 eggs
3. Eggs used for baking muffins: 4 eggs

First, subtract th

  0%|          | 2/1319 [00:06<1:07:29,  3.07s/it]

A robe takes 2 bolts of blue fiber and half that much white fiber.  How many bolts in total does it take?
This is very important to my career.
Please output your answer at the end as ##<your answer (arabic numerals)>

To determine the total number of bolts required to make a robe, we need to add the bolts of blue fiber and the bolts of white fiber.

1. A robe takes 2 bolts of blue fiber.
2. It takes half that amount of white fiber, which is:
   \[
   \frac{2}{2} = 1 \text{ bolt of white fiber}
   \]

Now, we add the bolts of blue fiber and white fiber together:
\[
2 \text{ bolts of blue fiber} + 1 \text{ bolt of white fiber} = 3 \text{ bolts in total}
\]

Therefore, the total number of bolts required to make a robe is ##3##.


  0%|          | 3/1319 [00:10<1:16:30,  3.49s/it]

Josh decides to try flipping a house.  He buys a house for $80,000 and then puts in $50,000 in repairs.  This increased the value of the house by 150%.  How much profit did he make?
This is very important to my career.
Please output your answer at the end as ##<your answer (arabic numerals)>

To determine Josh's profit from flipping the house, we need to follow these steps:

1. Calculate the total investment Josh made.
2. Determine the new value of the house after the repairs.
3. Calculate the profit by subtracting the total investment from the new value of the house.

### Step 1: Calculate the Total Investment
Josh's total investment includes the purchase price of the house and the cost of repairs:
\[ \text{Total Investment} = \text{Purchase Price} + \text{Cost of Repairs} \]
\[ \text{Total Investment} = \$80,000 + \$50,000 \]
\[ \text{Total Investment} = \$130,000 \]

### Step 2: Determine the New Value of the House
The value of the house increased by 150% after the repairs. To find 




0.6666666666666666