In [39]:
import sys

# Add the directory of promptbench to the Python path
sys.path.append('/Users/iwatson/Repos/promptbench')

# Now you can import promptbench by name
import promptbench as pb

### Load Models and Dataset

In [40]:
# print all supported datasets in promptbench
print('All supported datasets: ')
for dataset in pb.SUPPORTED_DATASETS:
    print(f'  {dataset}')

All supported datasets: 
  sst2
  cola
  qqp
  mnli
  mnli_matched
  mnli_mismatched
  qnli
  wnli
  rte
  mrpc
  mmlu
  squad_v2
  un_multi
  iwslt2017
  math
  bool_logic
  valid_parentheses
  gsm8k
  csqa
  bigbench_date
  bigbench_object_tracking
  last_letter_concat
  numersense
  qasc
  bbh
  drop
  arc-easy
  arc-challenge


In [41]:
# print all supported models in promptbench
print('All supported models: ')
for model in pb.SUPPORTED_MODELS:
    print(f'  {model}')

All supported models: 
  google/flan-t5-large
  llama2-7b
  llama2-7b-chat
  llama2-13b
  llama2-13b-chat
  llama2-70b
  llama2-70b-chat
  phi-1.5
  phi-2
  palm
  gpt-3.5-turbo
  gpt-4
  gpt-4-1106-preview
  gpt-3.5-turbo-1106
  gpt-4-0125-preview
  gpt-3.5-turbo-0125
  gpt-4-turbo
  gpt-4o
  vicuna-7b
  vicuna-13b
  vicuna-13b-v1.3
  google/flan-ul2
  gemini-pro
  mistralai/Mistral-7B-v0.1
  mistralai/Mistral-7B-Instruct-v0.1
  mistralai/Mixtral-8x7B-v0.1
  mistralai/Mixtral-8x7B-Instruct-v0.1
  01-ai/Yi-6B
  01-ai/Yi-34B
  01-ai/Yi-6B-Chat
  01-ai/Yi-34B-Chat
  baichuan-inc/Baichuan2-7B-Base
  baichuan-inc/Baichuan2-13B-Base
  baichuan-inc/Baichuan2-7B-Chat
  baichuan-inc/Baichuan2-13B-Chat


In [42]:
dataset_name = "gsm8k"

In [43]:
dataset = pb.DatasetLoader.load_dataset(dataset_name)
dataset[:20]

[{'content': "Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
  'label': '18'},
 {'content': 'A robe takes 2 bolts of blue fiber and half that much white fiber.  How many bolts in total does it take?',
  'label': '3'},
 {'content': 'Josh decides to try flipping a house.  He buys a house for $80,000 and then puts in $50,000 in repairs.  This increased the value of the house by 150%.  How much profit did he make?',
  'label': '70000'},
 {'content': 'James decides to run 3 sprints 3 times a week.  He runs 60 meters each sprint.  How many total meters does he run a week?',
  'label': '540'},
 {'content': "Every day, Wendi feeds each of her chickens three cups of mixed chicken feed, containing seeds, mealworms and vegetables to help keep them healthy.  She giv

In [44]:
model = "gpt-3.5-turbo"
model = pb.LLMModel(model=model, max_new_tokens=4096, temperature=0.0)

### Test Custom Prompts

In [45]:
prompts = pb.Prompt([
'{content}. Please output your answer at the end as ##<your answer (arabic numerals)> .',
'Solve the following math word problem: {content}. Provide your answer at the end in the format ##<your answer (arabic numerals)>.',
'Solve the following math problem: {content}. Output the answer at the end as ##<answer> with no spaces or units.',
"1. Clearly Break Down the Problem: Solve {content} by breaking it into individual steps to show your work.\n2. Format Final Answer: Ensure the final answer is in the format ##<answer> with no spaces, units, or extra characters. Example: For '5 * 3', the output should be ##15.",
'Please output your answer at the end as ##<your answer (arabic numerals)>.\n\nSolve the following math question step-by-step for accuracy:\n\n{content}\n\nThe answer must be formatted exactly as ##<answer> with no spaces or units.\n\nTo ensure clarity and accuracy, display all step-by-step calculations before the final answer. Emphasize the importance of accuracy and clarity in each step, and provide detailed explanations for all intermediate steps. Clearly identify the math question within the given content.',
])

In [46]:
# Custom mapping function

In [47]:
from tqdm import tqdm
for prompt in prompts:
    preds = []
    labels = []
    for data in tqdm(dataset[:20]):
        # process input
        input_text = pb.InputProcess.basic_format(prompt, data)
        label = data['label']
        raw_pred = model(input_text)
        # process output
        pred = pb.OutputProcess.pattern_re(raw_pred, r"##(\d+)")
        # print(f"Pred: {pred}, Label: {label}")
        preds.append(pred)
        labels.append(label)
    
    # evaluate
    score = pb.Eval.compute_cls_accuracy(preds, labels)
    print(f"{score:.3f}, {prompt}")

100%|██████████| 20/20 [00:53<00:00,  2.68s/it]


0.600, {content}. Please output your answer at the end as ##<your answer (arabic numerals)> .


100%|██████████| 20/20 [00:48<00:00,  2.41s/it]


0.750, Solve the following math word problem: {content}. Provide your answer at the end in the format ##<your answer (arabic numerals)>.


100%|██████████| 20/20 [00:51<00:00,  2.56s/it]


0.700, Solve the following math problem: {content}. Output the answer at the end as ##<answer> with no spaces or units.


100%|██████████| 20/20 [00:51<00:00,  2.58s/it]


0.550, 1. Clearly Break Down the Problem: Solve {content} by breaking it into individual steps to show your work.
2. Format Final Answer: Ensure the final answer is in the format ##<answer> with no spaces, units, or extra characters. Example: For '5 * 3', the output should be ##15.


100%|██████████| 20/20 [01:18<00:00,  3.93s/it]

0.550, Please output your answer at the end as ##<your answer (arabic numerals)>.

Solve the following math question step-by-step for accuracy:

{content}

The answer must be formatted exactly as ##<answer> with no spaces or units.

To ensure clarity and accuracy, display all step-by-step calculations before the final answer. Emphasize the importance of accuracy and clarity in each step, and provide detailed explanations for all intermediate steps. Clearly identify the math question within the given content.





### Test Pre-defined Prompts

In [40]:
# load method
# print all methods and their supported datasets
print('All supported methods: ')
print(pb.SUPPORTED_METHODS)
print('Supported datasets for each method: ')
print(pb.METHOD_SUPPORT_DATASET)

method = pb.PEMethod(method='emotion_prompt', 
                        dataset=dataset_name,
                        verbose=True,  # if True, print the detailed prompt and response
                        prompt_id = 1  # for emotion_prompt 
                        )

All supported methods: 
['CoT', 'ZSCoT', 'least_to_most', 'generated_knowledge', 'expert_prompting', 'emotion_prompt', 'baseline']
Supported datasets for each method: 
{'CoT': ['gsm8k', 'csqa', 'bigbench_date', 'bigbench_object_tracking'], 'ZSCoT': ['gsm8k', 'csqa', 'bigbench_date', 'bigbench_object_tracking'], 'expert_prompting': ['gsm8k', 'csqa', 'bigbench_date', 'bigbench_object_tracking'], 'emotion_prompt': ['gsm8k', 'csqa', 'bigbench_date', 'bigbench_object_tracking'], 'least_to_most': ['gsm8k', 'last_letter_concat'], 'generated_knowledge': ['csqa', 'numersense', 'qasc'], 'baseline': ['gsm8k', 'csqa', 'bigbench_date', 'bigbench_object_tracking', 'last_letter_concat', 'numersense', 'qasc']}


In [41]:
results = method.test(dataset, 
                      model, 
                      num_samples=2,
                      )

results

  0%|          | 1/1319 [00:04<1:46:57,  4.87s/it]

Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?
This is very important to my career.
Please output your answer at the end as ##<your answer (arabic numerals)>

To determine how much Janet makes daily from selling fresh duck eggs at the farmers' market, we can follow these steps:

1. **Calculate the total number of eggs laid per day:** Janet’s ducks lay 16 eggs per day.
   
2. **Calculate the number of eggs used for breakfast:** Janet eats 3 eggs for breakfast every morning.

3. **Calculate the number of eggs used for baking muffins:** Janet uses 4 eggs to bake muffins every day.

4. **Calculate the number of eggs left to sell:** Subtract the eggs used for breakfast and baking from the total number of eggs laid daily.
   
   \[
   \text{Eggs left to sell} = 

  0%|          | 2/1319 [00:08<1:29:12,  4.06s/it]

A robe takes 2 bolts of blue fiber and half that much white fiber.  How many bolts in total does it take?
This is very important to my career.
Please output your answer at the end as ##<your answer (arabic numerals)>

To solve this problem, we need to find the total number of bolts of fiber required to make one robe. Here's the step-by-step process:

1. Determine the amount of blue fiber needed. According to the problem, each robe takes 2 bolts of blue fiber.

2. Determine the amount of white fiber needed. The problem states that this is half the amount of blue fiber. Therefore, since the blue fiber required is 2 bolts, the white fiber required would be:
   \[
   \frac{2}{2} = 1 \text{ bolt of white fiber}
   \]

3. Calculate the total number of bolts by adding the bolts of blue fiber and white fiber together:
   \[
   2 \text{ bolts of blue fiber} + 1 \text{ bolt of white fiber} = 3 \text{ bolts in total}
   \]

Thus, the answer is:
##3##





1.0