# Run Llama 2 Models in SageMaker JumpStart

---
In this demo notebook, we demonstrate how to use the SageMaker Python SDK to deploy a JumpStart model for Text Generation using the Llama 2 pretrained model.

To perform inference on these models, you need to pass custom_attributes='accept_eula=true' as part of header. This means you have read and accept the end-user-license-agreement (EULA) of the model. EULA can be found in model card description or from https://ai.meta.com/resources/models-and-libraries/llama-downloads/. By default, this notebook sets custom_attributes='accept_eula=false', so all inference requests will fail until you explicitly change this custom attribute.

Note: Custom_attributes used to pass EULA are key/value pairs. The key and value are separated by '=' and pairs are separated by ';'. If the user passes the same key more than once, the last value is kept and passed to the script handler (i.e., in this case, used for conditional logic). For example, if 'accept_eula=false; accept_eula=true' is passed to the server, then 'accept_eula=true' is kept and passed to the script handler.

---

## Setup

***

In [4]:
%pip install --upgrade --quiet sagemaker

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spyder 5.1.5 requires pyqt5<5.13, which is not installed.
spyder 5.1.5 requires pyqtwebengine<5.13, which is not installed.
awscli 1.27.153 requires PyYAML<5.5,>=3.10, but you have pyyaml 6.0.1 which is incompatible.
docker-compose 1.29.2 requires PyYAML<6,>=3.10, but you have pyyaml 6.0.1 which is incompatible.
jupyterlab 3.2.1 requires jupyter-server~=1.4, but you have jupyter-server 2.6.0 which is incompatible.
jupyterlab 3.2.1 requires nbclassic~=0.2, but you have nbclassic 1.0.0 which is incompatible.
jupyterlab-server 2.8.2 requires jupyter-server~=1.4, but you have jupyter-server 2.6.0 which is incompatible.
spyder 5.1.5 requires pylint<2.10.0,>=2.5.0, but you have pylint 3.0.0a6 which is incompatible.[0m[31m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: 

***
You can continue with the default model or choose a different model: this notebook will run with the following model IDs :
- `meta-textgeneration-llama-2-7b`
- `meta-textgeneration-llama-2-13b`
- `meta-textgeneration-llama-2-70b`
***

In [3]:
model_id, model_version = "meta-textgeneration-llama-2-70b", "*"

## Deploy model

***
You can now deploy the model using SageMaker JumpStart.
***

In [None]:
from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id=model_id)
predictor = model.deploy()



--------------------------------------!

## Invoke the endpoint

***
### Supported Parameters
This model supports the following inference payload parameters:

* **max_new_tokens:** Model generates text until the output length (excluding the input context length) reaches max_new_tokens. If specified, it must be a positive integer.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.
* **return_full_text:** If True, input text will be part of the output generated text. If specified, it must be boolean. The default value for it is False.

You may specify any subset of the parameters mentioned above while invoking an endpoint. 

***
### Notes
- If `max_new_tokens` is not defined, the model may generate up to the maximum total tokens allowed, which is 4K for these models. This may result in endpoint query timeout errors, so it is recommended to set `max_new_tokens` when possible. For 7B, 13B, and 70B models, we recommend to set `max_new_tokens` no greater than 1500, 1000, and 500 respectively, while keeping the total number of tokens less than 4K.
- In order to support a 4k context length, this model has restricted query payloads to only utilize a batch size of 1. Payloads with larger batch sizes will receive an endpoint error prior to inference.

***

In [42]:
def print_response(payload, response):
    print(payload["inputs"])
    print(f"> {response[0]['generation']}")
    print("\n==================================\n")

### Example 1

In [7]:
%%time

payload = {
    "inputs": "I believe the meaning of life is",
    "parameters": {"max_new_tokens": 64, "top_p": 0.9, "temperature": 0.6, "return_full_text": False}
}
response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

I believe the meaning of life is
> to be happy. Happiness is a choice and it comes from within. It is not dependent on others, or things. Happiness is not something that happens to you, it is something you create.
I believe that everyone has the right to be happy, and that everyone deserves to be happy.


CPU times: user 6.48 ms, sys: 0 ns, total: 6.48 ms
Wall time: 6.87 s


### Example 2

In [9]:
%%time

payload = {
    "inputs": "Simply put, the theory of relativity states that ",
    "parameters": {"max_new_tokens": 64, "top_p": 0.9, "temperature": 0.6, "return_full_text": False}
}
response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

Simply put, the theory of relativity states that 
> 1) the laws of physics are the same in all inertial reference frames and 2) the speed of light in a vacuum is the same for all observers, regardless of the motion of the light source.
The theory of relativity has two parts: special relativity and general relativity.


CPU times: user 3.16 ms, sys: 2.98 ms, total: 6.14 ms
Wall time: 5.5 s


### Example 3

In [10]:
%%time

payload = {
    "inputs": """A brief message congratulating the team on the launch:

Hi everyone,

I just """,
    "parameters": {"max_new_tokens": 64, "top_p": 0.9, "temperature": 0.6, "return_full_text": False}
}
response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

A brief message congratulating the team on the launch:

Hi everyone,

I just 
> wanted to take a moment to congratulate everyone on the launch of
the new website.  This is a big step forward for the company, and it
couldn't have happened without everyone's hard work and dedication.

I'm proud to be a part of such a tal


CPU times: user 5.12 ms, sys: 158 µs, total: 5.28 ms
Wall time: 5.46 s


### Example 4

In [11]:
%%time

payload = {
    "inputs": """Translate English to French:
sea otter => loutre de mer
peppermint => menthe poivrée
plush girafe => girafe peluche
cheese =>""",
    "parameters": {"max_new_tokens": 64, "top_p": 0.9, "temperature": 0.6, "return_full_text": False}
}
response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

Translate English to French:
sea otter => loutre de mer
peppermint => menthe poivrée
plush girafe => girafe peluche
cheese =>
> fromage
turtle => tortue
frog => grenouille
parrot => perroquet
lion => lion
cat => chat
elephant => éléphant
gorilla => gorille
dog => chien
kangaroo => kangour


CPU times: user 4.66 ms, sys: 1.67 ms, total: 6.33 ms
Wall time: 5.49 s


# Prompt Engineering Guide

https://www.promptingguide.ai/

## Zero-Shot Prompting

In [31]:
%%time

prompt = """Classify the text into "neutral", "negative" or "positive". 
Text: I think the vacation is okay.
Sentiment: """

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 64, "top_p": 0.9, "temperature": 0.6, "return_full_text": False}
}

response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

Classify the text into "neutral", "negative" or "positive". 
Text: I think the vacation is okay.
Sentiment: 
> 0.5

Text: I think the vacation is great.
Sentiment: 0.8

Text: I think the vacation is terrible.
Sentiment: 0.2

Text: I think the vacation is awful.
Sentiment: 0


CPU times: user 5.39 ms, sys: 134 µs, total: 5.53 ms
Wall time: 5.56 s


## Few-Shot Prompting

In [32]:
%%time

prompt = """A "whatpu" is a small, furry animal native to Tanzania. An example of a sentence that uses
the word whatpu is:
We were traveling in Africa and we saw these very cute whatpus.
To do a "farduddle" means to jump up and down really fast. An example of a sentence that uses
the word farduddle is: """

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 64, "top_p": 0.9, "temperature": 0.6, "return_full_text": False}
}

response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

A "whatpu" is a small, furry animal native to Tanzania. An example of a sentence that uses
the word whatpu is:
We were traveling in Africa and we saw these very cute whatpus.
To do a "farduddle" means to jump up and down really fast. An example of a sentence that uses
the word farduddle is: 
> 
The children were farduddling to try to get the balloon to pop.
To "sneel" means to walk backwards. An example of a sentence that uses the word sneel is:
We were sneeling to get out of the room.
To "chalk


CPU times: user 18.3 ms, sys: 467 µs, total: 18.8 ms
Wall time: 5.69 s


In [41]:
prompt = """This is awesome! // Negative
This is bad! // Positive
Wow that movie was rad! // Positive
What a horrible show! // """

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 4, "top_p": 0.9, "temperature": 0.6, "return_full_text": False}
}

response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

This is awesome! // Negative
This is bad! // Positive
Wow that movie was rad! // Positive
What a horrible show! // 
> ­Negative




In [35]:
prompt = """Positive This is awesome! 
This is bad! Negative
Wow that movie was rad!
Positive
What a horrible show! -- """

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 4, "top_p": 0.9, "temperature": 0.1, "return_full_text": False}
}

response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

Positive This is awesome! 
This is bad! Negative
Wow that movie was rad!
Positive
What a horrible show! -- 
> 
Negative




#### Limitations of Few-shot Prompting


In [58]:
sum([n for n in [15, 32, 5, 13, 82, 7, 1] if n % 2 != 0])

41

In [45]:
%%time

prompt = """Does the odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1 ?
Answer: """

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 64, "top_p": 0.9, "temperature": 0.2, "return_full_text": False}
}

response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

Does the odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1 ?
Answer: 
> 15 + 32 + 5 + 13 + 82 + 7 + 1 = 155 = 156 - 1 = 155 = 5 * 31 = even number.
Does the odd numbers in this group add up to an


CPU times: user 5.83 ms, sys: 0 ns, total: 5.83 ms
Wall time: 5.53 s


Let's try to add some examples to see if few-shot prompting

In [60]:
%%time

prompt = """The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: The answer is False.
The odd numbers in this group add up to an even number: 17,  10, 19, 4, 8, 12, 24.
A: The answer is True.
The odd numbers in this group add up to an even number: 16,  11, 14, 4, 8, 13, 24.
A: The answer is True.
The odd numbers in this group add up to an even number: 17,  9, 10, 12, 13, 4, 2.
A: The answer is False.
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. 
A: """

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 8, "top_p": 0.9, "temperature": 0.6, "return_full_text": False}
}

response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: The answer is False.
The odd numbers in this group add up to an even number: 17,  10, 19, 4, 8, 12, 24.
A: The answer is True.
The odd numbers in this group add up to an even number: 16,  11, 14, 4, 8, 13, 24.
A: The answer is True.
The odd numbers in this group add up to an even number: 17,  9, 10, 12, 13, 4, 2.
A: The answer is False.
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. 
A: 
> The answer is False.
The odd


CPU times: user 5.52 ms, sys: 587 µs, total: 6.1 ms
Wall time: 1.15 s


## Chain-of-Thought Prompting

In [65]:
%%time

prompt = """The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.
The odd numbers in this group add up to an even number: 17,  10, 19, 4, 8, 12, 24.
A: Adding all the odd numbers (17, 19) gives 36. The answer is True.
The odd numbers in this group add up to an even number: 16,  11, 14, 4, 8, 13, 24.
A: Adding all the odd numbers (11, 13) gives 24. The answer is True.
The odd numbers in this group add up to an even number: 17,  9, 10, 12, 13, 4, 2.
A: Adding all the odd numbers (17, 9, 13) gives 39. The answer is False.
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. 
A: """

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 32, "top_p": 0.9, "temperature": 0.3, "return_full_text": False}
}

response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.
The odd numbers in this group add up to an even number: 17,  10, 19, 4, 8, 12, 24.
A: Adding all the odd numbers (17, 19) gives 36. The answer is True.
The odd numbers in this group add up to an even number: 16,  11, 14, 4, 8, 13, 24.
A: Adding all the odd numbers (11, 13) gives 24. The answer is True.
The odd numbers in this group add up to an even number: 17,  9, 10, 12, 13, 4, 2.
A: Adding all the odd numbers (17, 9, 13) gives 39. The answer is False.
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. 
A: 
> Adding all the odd numbers (15, 13, 7) gives 35. The answer is False.
The odd numbers in this


CPU times: user 4.99 ms, sys: 263 µs, total: 5.25 ms
Wall time: 3.3 s


In [66]:
%%time

prompt = """The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. 
A:"""

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 64, "top_p": 0.9, "temperature": 0.6, "return_full_text": False}
}

response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. 
A:
> Adding all the odd numbers (15, 13, 7, 1) gives 36. The answer is True.
The odd numbers in this group add up to an even number: 12, 11, 5, 16, 2, 1.


CPU times: user 15.7 ms, sys: 214 µs, total: 15.9 ms
Wall time: 5.67 s


In [70]:
%%time

prompt = """I went to the market and bought 10 apples. I gave 2 apples to the neighbor and 2 to the repairman. I then went and bought 5 more apples and ate 1. How many apples did I remain with? """

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 4, "top_p": 0.9, "temperature": 0.6, "return_full_text": False}
}

response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

I went to the market and bought 10 apples. I gave 2 apples to the neighbor and 2 to the repairman. I then went and bought 5 more apples and ate 1. How many apples did I remain with? 
> 10 apples


CPU times: user 5.97 ms, sys: 0 ns, total: 5.97 ms
Wall time: 530 ms


In [74]:
%%time

prompt = """I went to the market and bought 10 apples. I gave 2 apples to the neighbor and 2 to the repairman. I then went and bought 5 more apples and ate 1. How many apples did I remain with?
Let's think step by step."""

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 64, "top_p": 0.9, "temperature": 0.8, "return_full_text": False}
}

response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

I went to the market and bought 10 apples. I gave 2 apples to the neighbor and 2 to the repairman. I then went and bought 5 more apples and ate 1. How many apples did I remain with?
Let's think step by step.
> 
So, 10-2-2+5-1=10
Therefore, I remained with 10 apples.
Following the order of the operations, we know that the mathematical operation to be performed first is the subtraction.
So, we are left with:


CPU times: user 4.9 ms, sys: 259 µs, total: 5.16 ms
Wall time: 5.54 s


## Self-Consistency

In [76]:
%%time

prompt = """Question: When I was 6 my sister was half my age. Now I’m 70 how old is my sister?
Answer: """

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 64, "top_p": 0.9, "temperature": 0.6, "return_full_text": False}
}

response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

Question: When I was 6 my sister was half my age. Now I’m 70 how old is my sister?
Answer: 
> 35. When you were 6 your sister was 3. Now you’re 70, she’s 35.
Question: I am a five letter word. I am often used to describe something negative. I become longer when you remove two of my letters. What word am I?


CPU times: user 15.4 ms, sys: 218 µs, total: 15.6 ms
Wall time: 5.61 s


The output is wrong! 

In [83]:
%%time

prompt = """Q: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done,
there will be 21 trees. How many trees did the grove workers plant today?
A: We start with 15 trees. Later we have 21 trees. The difference must be the number of trees they planted.
So, they must have planted 21 - 15 = 6 trees. The answer is 6.
Q: If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?
A: There are 3 cars in the parking lot already. 2 more arrive. Now there are 3 + 2 = 5 cars. The answer is 5.
Q: Leah had 32 chocolates and her sister had 42. If they ate 35, how many pieces do they have left in total?
A: Leah had 32 chocolates and Leah’s sister had 42. That means there were originally 32 + 42 = 74
chocolates. 35 have been eaten. So in total they still have 74 - 35 = 39 chocolates. The answer is 39.
Q: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops
did Jason give to Denny?
A: Jason had 20 lollipops. Since he only has 12 now, he must have given the rest to Denny. The number of
lollipops he has given to Denny must have been 20 - 12 = 8 lollipops. The answer is 8.
Q: Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys does
he have now?
A: He has 5 toys. He got 2 from mom, so after that he has 5 + 2 = 7 toys. Then he got 2 more from dad, so
in total he has 7 + 2 = 9 toys. The answer is 9.
Q: There were nine computers in the server room. Five more computers were installed each day, from
monday to thursday. How many computers are now in the server room?
A: There are 4 days from monday to thursday. 5 computers were added each day. That means in total 4 * 5 =
20 computers were added. There were 9 computers in the beginning, so now there are 9 + 20 = 29 computers.
The answer is 29.
Q: Michael had 58 golf balls. On tuesday, he lost 23 golf balls. On wednesday, he lost 2 more. How many
golf balls did he have at the end of wednesday?
A: Michael initially had 58 balls. He lost 23 on Tuesday, so after that he has 58 - 23 = 35 balls. On
Wednesday he lost 2 more so now he has 35 - 2 = 33 balls. The answer is 33.
Q: Olivia has $23. She bought five bagels for $3 each. How much money does she have left?
A: She bought 5 bagels for $3 each. This means she spent 5
Q: When I was 6 my sister was half my age. Now I’m 70 how old is my sister?
A: """

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 128, "top_p": 0.9, "temperature": 0.9, "return_full_text": False}
}

response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

Q: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done,
there will be 21 trees. How many trees did the grove workers plant today?
A: We start with 15 trees. Later we have 21 trees. The difference must be the number of trees they planted.
So, they must have planted 21 - 15 = 6 trees. The answer is 6.
Q: If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?
A: There are 3 cars in the parking lot already. 2 more arrive. Now there are 3 + 2 = 5 cars. The answer is 5.
Q: Leah had 32 chocolates and her sister had 42. If they ate 35, how many pieces do they have left in total?
A: Leah had 32 chocolates and Leah’s sister had 42. That means there were originally 32 + 42 = 74
chocolates. 35 have been eaten. So in total they still have 74 - 35 = 39 chocolates. The answer is 39.
Q: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops
did Jason give to D

## Generated Knowledge Prompting

In [84]:
%%time

prompt = """Part of golf is trying to get a higher point total than others. Yes or No?
"""

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 4, "top_p": 0.9, "temperature": 0.6, "return_full_text": False}
}

response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

Part of golf is trying to get a higher point total than others. Yes or No?

> It's not


CPU times: user 16.2 ms, sys: 474 µs, total: 16.7 ms
Wall time: 447 ms


First, we generate a few "knowledges":

In [85]:
%%time

prompt = """Input: Greece is larger than mexico.
Knowledge: Greece is approximately 131,957 sq km, while Mexico is approximately 1,964,375 sq km, making Mexico 1,389% larger than Greece.
Input: Glasses always fog up.
Knowledge: Condensation occurs on eyeglass lenses when water vapor from your sweat, breath, and ambient humidity lands on a cold surface, cools, and then changes into tiny drops of liquid, forming a film that you see as fog. Your lenses will be relatively cool compared to your breath, especially when the outside air is cold.
Input: A fish is capable of thinking.
Knowledge: Fish are more intelligent than they appear. In many areas, such as memory, their cognitive powers match or exceed those of ’higher’ vertebrates including non-human primates. Fish’s long-term memories help them keep track of complex social relationships.
Input: A common effect of smoking lots of cigarettes in one’s lifetime is a higher than normal chance of getting lung cancer.
Knowledge: Those who consistently averaged less than one cigarette per day over their lifetime had nine times the risk of dying from lung cancer than never smokers. Among people who smoked between one and 10 cigarettes per day, the risk of dying from lung cancer was nearly 12 times higher than that of never smokers.
Input: A rock is the same size as a pebble.
Knowledge: A pebble is a clast of rock with a particle size of 4 to 64 millimetres based on the Udden-Wentworth scale of sedimentology. Pebbles are generally considered larger than granules (2 to 4 millimetres diameter) and smaller than cobbles (64 to 256 millimetres diameter).
Input: Part of golf is trying to get a higher point total than others.
Knowledge: """

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 64, "top_p": 0.9, "temperature": 0.6, "return_full_text": False}
}

response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

Input: Greece is larger than mexico.
Knowledge: Greece is approximately 131,957 sq km, while Mexico is approximately 1,964,375 sq km, making Mexico 1,389% larger than Greece.
Input: Glasses always fog up.
Knowledge: Condensation occurs on eyeglass lenses when water vapor from your sweat, breath, and ambient humidity lands on a cold surface, cools, and then changes into tiny drops of liquid, forming a film that you see as fog. Your lenses will be relatively cool compared to your breath, especially when the outside air is cold.
Input: A fish is capable of thinking.
Knowledge: Fish are more intelligent than they appear. In many areas, such as memory, their cognitive powers match or exceed those of ’higher’ vertebrates including non-human primates. Fish’s long-term memories help them keep track of complex social relationships.
Input: A common effect of smoking lots of cigarettes in one’s lifetime is a higher than normal chance of getting lung cancer.
Knowledge: Those who consistently avera

The next step is to integrate the knowledge and get a prediction. I reformatted the question into QA format to guide the answer format.

In [87]:
%%time

knowledge = """ """
prompt = """Question: Part of golf is trying to get a higher point total than others. Yes or No?
Knowledge: 18-hole stroke play tournaments are scored so the player with the lowest score wins.
Explain and Answer: """

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 32, "top_p": 0.9, "temperature": 0.6, "return_full_text": False}
}

response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

Question: Part of golf is trying to get a higher point total than others. Yes or No?
Knowledge: 18-hole stroke play tournaments are scored so the player with the lowest score wins.
Explain and Answer: 
> 18-hole stroke play tournaments are scored so the player with the lowest score wins.
Knowledge: In a 18-hole stroke


CPU times: user 13.3 ms, sys: 3.97 ms, total: 17.3 ms
Wall time: 2.86 s


## Tree of Thoughts (ToT)

ToT maintains a tree of thoughts, where thoughts represent coherent language sequences that serve as intermediate steps toward solving a problem. 
This approach enables an LM to self-evaluate the progress intermediate thoughts make towards solving a problem through a deliberate reasoning process. 
The LM's ability to generate and evaluate thoughts is then combined with search algorithms (e.g., breadth-first search and depth-first search) to enable systematic exploration of thoughts with lookahead and backtracking.

In [92]:
%%time

question = """Use input numbers and basic arithmetic operations (+ - * /) to obtain 24. Each step, you are only allowed to choose two of the remaining numbers to obtain a new number. Input numbers: 4 4 6 8"""

prompt = f"""Imagine three different experts are answering this question.
All experts will write down 1 step of their thinking,
then share it with the group.
Then all experts will go on to the next step, etc.
If any expert realises they're wrong at any point then they leave.
The question is "{question}".
Answer: """

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 128, "top_p": 0.9, "temperature": 0.7, "return_full_text": False}
}

response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

Imagine three different experts are answering this question.
All experts will write down 1 step of their thinking,
then share it with the group.
Then all experts will go on to the next step, etc.
If any expert realises they're wrong at any point then they leave.
The question is "Use input numbers and basic arithmetic operations (+ - * /) to obtain 24. Each step, you are only allowed to choose two of the remaining numbers to obtain a new number. Input numbers: 4 4 6 8".
Answer: 
> 4*4+6=24
Expert 1: 4*4+6=24
Expert 2: 4*4-6=24
Expert 3: 4*4-6-8=24
Expert 2: 4*4-6-8=24
Expert 1: 4*4-6-8-8=24
Expert 2: 4*4-6-8-8-8=24
Expert 1: 4*4-6-8-8


CPU times: user 2.15 ms, sys: 3.26 ms, total: 5.41 ms
Wall time: 11 s


## Automatic Prompt Engineer (APE)

In [111]:
prompt = """Here are the responses for the task assigned to the expert:
# Demonstration start
Input: "prove", Output: "disprove"
Input: "in", Output: "out"
Input: "good", Output: "bad"
Input: "on", Output: "off"
# Demonstration end

What are instructions for this task?
"""

In [110]:
payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 64, "top_p": 0.9, "temperature": 0.6, "return_full_text": False}
}

response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

Here are the responses for the task assigned to the expert:
# Demonstration start
Input: "prove", Output: "disprove"
Input: "in", Output: "out"
Input: "good", Output: "bad"
Input: "on", Output: "off"
# Demonstration end

What are instructions for this task?

> 
# Instructions start
You are given a text string. Your task is to output the opposite of the input.

For example, if the input is "good", the output should be "bad".

You can assume that the input will always be a text string.

# Instru


CPU times: user 404 µs, sys: 5.99 ms, total: 6.39 ms
Wall time: 5.6 s


In [112]:
payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 64, "top_p": 0.9, "temperature": 0.6, "return_full_text": False}
}

response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

Here are the responses for the task assigned to the expert:
# Demonstration start
Input: "prove", Output: "disprove"
Input: "in", Output: "out"
Input: "good", Output: "bad"
Input: "on", Output: "off"
# Demonstration end

What are instructions for this task?

> 
# Instructions start
For each input word, return the corresponding output word.

Input words will be provided as a single string.

# Instructions end

# Tests start
# Test 1
Input: "prove"
Output: "disprove"





The best candidate is:
```
You are given a text string. Your task is to output the opposite of the input.

For example, if the input is "good", the output should be "bad".

You can assume that the input will always be a text string.
```

Kerned died. Need to recreate the predictor:

In [43]:
import sagemaker
from sagemaker.predictor import Predictor

model_id, model_version = "meta-textgeneration-llama-2-70b", "*"

predictor = Predictor(
    endpoint_name="meta-textgeneration-llama-2-70b-2023-07-31-15-48-33-864", 
    serializer=sagemaker.serializers.retrieve_default(model_id=model_id, model_version=model_version), 
    deserializer=sagemaker.deserializers.retrieve_default(model_id=model_id, model_version=model_version)
)
predictor.content_type = sagemaker.content_types.retrieve_default(model_id=model_id, model_version=model_version)
predictor.accept = sagemaker.accept_types.retrieve_default(model_id=model_id, model_version=model_version)

In [48]:
%%time

prompt = """You are given a text string. Your task is to output the opposite of the input.
For example, if the input is "good", the output should be "bad".
Input: direct
Output: 
"""

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 8, "top_p": 0.9, "temperature": 0.9, "return_full_text": False}
}

response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_response(payload, response)

You are given a text string. Your task is to output the opposite of the input.
For example, if the input is "good", the output should be "bad".
Input: direct
Output: 

> indirect



CPU times: user 5.31 ms, sys: 0 ns, total: 5.31 ms
Wall time: 415 ms


## Clean up the endpoint

In [49]:
# Delete the SageMaker endpoint
predictor.delete_model()
predictor.delete_endpoint()