# Anthropic Completion Experiment Example

## Installations

In [None]:
# !pip install --quiet --force-reinstall prompttools

## Setup imports and API keys

First, we'll need to set our API keys. If we are in DEBUG mode, we don't need to use a real OpenAI key, so for now we'll set them to empty strings.

In [1]:
import os

os.environ["DEBUG"] = ""  # Set this to "" to call Anthropic's API, "1" to use debug mode
os.environ["ANTHROPIC_API_KEY"] = ""  # Insert your key here

Then we'll import the relevant `prompttools` modules to setup our experiment.

In [2]:
from prompttools.experiment import AnthropicCompletionExperiment
from anthropic import HUMAN_PROMPT, AI_PROMPT

## Run an experiment

Next, we create our test inputs. We can iterate over models, prompts, and configurations like temperature.

In this case, we test the models 'claude-instant-v1' and 'claude-2' on two similar but differently worded prompts.
Two prompts both ask Claude "Is 17077 a prime number?", but the second prompt encourages the model to "I don't know" if the model is not sure.

This is a technique to prevent hallucination.

In [10]:
models = ["claude-instant-v1", "claude-2"]

prompts = [
    f"""{HUMAN_PROMPT}Is 17077 a prime number?
    {AI_PROMPT}""",
    f"""{HUMAN_PROMPT}Answer the following question only if you know the answer or can make a well-informed guess; otherwise tell me you don't know it.
    Is 17077 a prime number?
    {AI_PROMPT}""",
]

experiment = AnthropicCompletionExperiment(max_tokens_to_sample=[1000], model=models, prompt=prompts)

In [12]:
experiment.run()
experiment.visualize()

Unnamed: 0,prompt,response(s),latency,model
0,\n\nHuman:Is 17077 a prime number?\n \n\nAssistant:,"[ No, 17077 is not a prime number. 17077 can be factorized into 11 * 1553.]",0.923073,claude-instant-v1
1,\n\nHuman:Answer the following question only if you know the answer or can make a well-informed guess; otherwise tell me you don't know it.\n Is 17077 a prime number?\n \n\nAssistant:,[ I do not know if 17077 is a prime number.],0.610506,claude-instant-v1
2,\n\nHuman:Is 17077 a prime number?\n \n\nAssistant:,"[ Okay, let's check if 17077 is a prime number:\n\nFirst, we know prime numbers are only divisible by 1 and themselves.\n\nLet's see if 17077 is divisible by any numbers between 2 and 17076:\n\n- It is not divisible by 2\n- It is not divisible by 3\n- It is not divisible by 4\n- and so on...\n\nAfter checking all possible divisors from 2 up to 17076, we find that 17077 is only divisible by 1 and 17077 itself.\n\nTherefore, since 17077 has no other divisors except 1 and itself, 17077 is a prime number.]",5.648757,claude-2
3,\n\nHuman:Answer the following question only if you know the answer or can make a well-informed guess; otherwise tell me you don't know it.\n Is 17077 a prime number?\n \n\nAssistant:,[ I don't know if 17077 is a prime number. I don't have enough information to determine that definitively without doing some calculations.],1.90701,claude-2


## Auto-Evaluate the model response

To evaluate the model response, we can define an eval method that passes a fact and the previous model response into another LLM to get feedback.

In this case, we are using a built-in evaluation function `autoeval_scoring` provided within `prompttools`.

The evaluation function provides Claude 2 with a fact (truth) and the previous model response. With those, the function asks Claude 2 to provide a score from 1 - 7, with a lower score means the answer is factually wrong, higher score means the answer is correct, and a medium score for uncertain answer that is not necessary wrong.

You can also write your own auto-evaluation function or pick a different model to be the judge.

In [45]:
from prompttools.utils import autoeval_scoring

fact = "17077 is a prime number, because it has no divisor aside from 1 and 17077."
experiment.evaluate("Score", autoeval_scoring.evaluate, expected=[fact] * 4)
experiment.visualize()

Unnamed: 0,prompt,response(s),latency,Score,model
0,\n\nHuman:Is 17077 a prime number?\n \n\nAssistant:,"[ No, 17077 is not a prime number. 17077 can be factorized into 11 * 1553.]",0.923073,3,claude-instant-v1
1,\n\nHuman:Answer the following question only if you know the answer or can make a well-informed guess; otherwise tell me you don't know it.\n Is 17077 a prime number?\n \n\nAssistant:,[ I do not know if 17077 is a prime number.],0.610506,4,claude-instant-v1
2,\n\nHuman:Is 17077 a prime number?\n \n\nAssistant:,"[ Okay, let's check if 17077 is a prime number:\n\nFirst, we know prime numbers are only divisible by 1 and themselves.\n\nLet's see if 17077 is divisible by any numbers between 2 and 17076:\n\n- It is not divisible by 2\n- It is not divisible by 3\n- It is not divisible by 4\n- and so on...\n\nAfter checking all possible divisors from 2 up to 17076, we find that 17077 is only divisible by 1 and 17077 itself.\n\nTherefore, since 17077 has no other divisors except 1 and itself, 17077 is a prime number.]",5.648757,7,claude-2
3,\n\nHuman:Answer the following question only if you know the answer or can make a well-informed guess; otherwise tell me you don't know it.\n Is 17077 a prime number?\n \n\nAssistant:,[ I don't know if 17077 is a prime number. I don't have enough information to determine that definitively without doing some calculations.],1.90701,4,claude-2
