# Prompt Engineering with Llama 2
Getting started with Llama 2.

In [None]:
# import llama helper function
from modules.utils import llama

In [None]:
# define the prompt
prompt = "Help me write a birthday card for my dear friend Andrew."

In [None]:
# pass prompt to the llama function, store output as 'response' then print
response = llama(prompt)
print(response)

In [None]:
# Set verbose to True to see the full prompt that is passed to the model.
prompt = "Help me write a birthday card in Python for my dear friend Andrew."
response = llama(prompt, model="meta-llama/Llama-2-7b-chat-hf", verbose=True)
print(response)

### Chat vs. base models

Ask model a simple question to demonstrate the different behavior of chat vs. base models.

In [None]:
### chat model
prompt = "What is the capital of France?"
response = llama(prompt, 
                 verbose=True,
                 model="meta-llama/Llama-2-70b-chat-hf")

In [None]:
print(response)

In [None]:
### base model
prompt = "What is the capital of France?"
response = llama(prompt, 
                 verbose=True,
                 add_inst=False,
                 model="meta-llama/Llama-2-7b-chat-hf")

Note how the prompt does not include the [INST] and [/INST] tags as add_inst was set to `False`.

It is because the base models do not understand the instruction tags.

In [None]:
print(response)

## Changing the temperature setting

For consistent responses set the temperature to 0.

In [None]:
prompt = """
Help me write a birthday card for my dear friend Andrew.
Here are details about my friend:
He likes long walks on the beach and reading in the bookstore.
His hobbies include reading research papers and speaking at conferences.
His favorite color is light blue.
He likes pandas.
"""
response = llama(prompt, temperature=0.0, verbose=True)
print(response)

In [None]:
# Run the code again - the output should be identical
response = llama(prompt, temperature=0.0)
print(response)

In [None]:
# temp = 0.9
prompt = """
Help me write a birthday card for my dear friend Andrew.
Here are details about my friend:
He likes long walks on the beach and reading in the bookstore.
His hobbies include reading research papers and speaking at conferences.
His favorite color is light blue.
He likes pandas.
"""
response = llama(prompt, temperature=0.9)
print(response)

In [None]:
# run the code again - the output should be different
response = llama(prompt, temperature=0.9)
print(response)

## Changing the max tokens setting

Determines the length of the response.

In [None]:
prompt = """
Help me write a birthday card for my dear friend Andrew.
Here are details about my friend:
He likes long walks on the beach and reading in the bookstore.
His hobbies include reading research papers and speaking at conferences.
His favorite color is light blue.
He likes pandas.
"""
response = llama(prompt,max_tokens=20)
print(response)

The next cell reads in the text of the children's book The Velveteen Rabbit by Margery Williams, and stores it as a string named `text`

In [None]:
with open("data/TheVelveteenRabbit.txt", "r", encoding='utf=8') as file:
    text = file.read()

In [None]:
prompt = f"""
Give me a summary of the following text in 50 words:\n\n
{text}
"""
response = llama(prompt, model="togethercomputer/StripedHyena-Nous-7B")

In [None]:
# error because we have too many tokens.
print(response)

`inputs` tokens + `max_new_tokens` must be <= 4097

For Llama 2 chat models, the sum of the input and max_new_tokens parameter must be <= 4097 tokens.

In [None]:
# sum of input tokens (prompt + Velveteen Rabbit text) and output tokens
3974 + 1024

In [None]:
# calculate tokens available for response after accounting for 3974 input tokens
4097 - 3974

In [None]:
# set max_tokens to stay within limit on input + output tokens
prompt = f"""
Give me a summary of the following text in 50 words:\n\n
{text}
"""
response = llama(prompt,
                max_tokens=123)

In [None]:
print(response)

In [None]:
# increase max_tokens beyond limit on input + output tokens
prompt = f"""
Give me a summary of the following text in 50 words:\n\n
{text}
"""
response = llama(prompt,
                max_tokens=124)
print(response)