### Prompt Llama 3 like a pro


When engaging with large language models (LLMs), it’s important to pay close attention to how you construct your prompts. More often than not, the quality of your output is a reflection of the quality of your input prompt. LLMs do offer some flexibility in prompting, but, for optimal results, you should align your prompts with the model’s training and expected syntax. Here we will maximize our chances of getting a high-quality response.

### Libraries

*   [`ibm-watsonx-ai`](https://pypi.org/project/ibm-watsonx-ai/) allows to work with IBM watsonx.ai services, which provides access to the **Llama 3** model, amongst others.
*   [`langchain`](https://www.langchain.com/) is a library used for developing applications powered by large language models (LLMs).
*   [`langchain-ibm`](https://github.com/langchain-ai/langchain) provides integration between `langchain` and `ibm-watsonx-ai`.


In [None]:
!pip install ibm-watsonx-ai==0.2.6 langchain==0.1.16 langchain-ibm==0.1.4

Collecting ibm-watsonx-ai==0.2.6
  Downloading ibm_watsonx_ai-0.2.6-py3-none-any.whl.metadata (8.1 kB)
Collecting langchain==0.1.16
  Downloading langchain-0.1.16-py3-none-any.whl.metadata (13 kB)
Collecting langchain-ibm==0.1.4
  Downloading langchain_ibm-0.1.4-py3-none-any.whl.metadata (5.2 kB)
Collecting ibm-watson-machine-learning>=1.0.349 (from ibm-watsonx-ai==0.2.6)
  Downloading ibm_watson_machine_learning-1.0.360-py3-none-any.whl.metadata (4.0 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain==0.1.16)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting langchain-community<0.1,>=0.0.32 (from langchain==0.1.16)
  Downloading langchain_community-0.0.38-py3-none-any.whl.metadata (8.7 kB)
Collecting langchain-core<0.2.0,>=0.1.42 (from langchain==0.1.16)
  Downloading langchain_core-0.1.52-py3-none-any.whl.metadata (5.9 kB)
Collecting langchain-text-splitters<0.1,>=0.0.1 (from langchain==0.1.16)
  Downloading langchain_text_splitters-0.0.2-py3-

In [None]:
from ibm_watsonx_ai.foundation_models import Model
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams

from langchain.prompts.prompt import PromptTemplate
from langchain.chains import LLMChain

from langchain_ibm import WatsonxLLM


### A quick introduction to Llama 3
Llama 3 is a state-of-the-art open-access large language model released by Meta. Llama 3 is available for commercial use and comes with a community license. The model has been released in various sizes, ranging from 8B to 70B parameters.

Llama 3 introduces a range of improvements over the previous version, Llama 2. Some of these enhancements include:

 - Training on a dataset that is seven times larger than the dataset used to train Llama 2.
 - Training on a dataset that consists of English and non-English data, meaning that although Llama 3 is fine-tuned for English, it does have some ability to recognize and predict text for over 30 other languages.
 - Usage of a tokenizer that supports a broader spectrum of Unicode characters than the tokenizer used by Llama 2.

Additional details about the Llama 3 model are available at [ai.meta.com](https://ai.meta.com/blog/meta-llama-3/).

----


### Prompting Llama 3

One of the significant advantages of open-access models such as Llama 3 is the ability to provide a `system` instruction in chat applications. This feature allows you to define the behavior of your chat assistant and to give it a unique personality. The prompt template for initiating a conversation with Llama 3 follows:

> **<|begin_of_text|><|start_header_id|>system<|end_header_id|>**
>
> **{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>**
>
> **{{ user_message_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>**
>
> **{{ model_answer_1 }}<|eot_id|><|start_header_id|>user<|end_header_id|>**
>
> **{{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>**

The Llama 3 prompt template uses some special tokens from the [tiktoken](https://github.com/openai/tiktoken) tokenizer. Let's break down each of these special tokens:

**`<|begin_of_text|>`**:
> This is equivalent to the BOS token and signifies the beginning of the text.
>
**`<|eot_id|>`**:
> This signifies the end of a message in a turn.

**`<|start_header_id|>{role}<|end_header_id|>`**:
>These tokens enclose the role of a particular message. The possible roles are `system`, `user` and `assistant`. Here, `system` indicates the system message, `user` indicates a prompt from the user, and `assistant` represents a response by the model.

**`<|end_of_text|>`**:

>This is equivalent to the EOS token. Upon encountering this token, Llama 3 will cease to generate more tokens.

**`{{ system_prompt }}`:**
>This is a placeholder for the `system` instruction. This is where you define the behavior of your chat assistant and/or the personality you would like your assistant to have. For example:
> "You are a helpful, respectful, and honest assistant. Always answer as helpfully as possible."

**`{{ user_message }}`:**
>This is a placeholder for the user's input or question. When using the model, this would be replaced by the actual message or query from the user. For instance:
"What's the capital of France?"

**`{{ model_answer }}`:**
>This is a placeholder for a model's response to a user's message. You can supply such responses to remind the model of the ongoing conversation or to provide it with an example of the kind of reply you expect. This placeholder will be used in the exercise at the end of this lab.

A prompt should contain only a single system message, but can contain multiple alternating user and assistant messages, and always ends with the last user message followed by the assistant header.

In summary, this structure allows you to provide a specific context or behavior instruction to the model (using the system prompt) and then ask a question or make a statement (using the user message), perhaps preceded by a conversation history or a series of examples. The model will then generate a response based on the system prompt, the chat history (if provided), and the (final) user message.


First, we will initialize Llama 3 model. We will initializes a Llama 3 model on IBM's watsonx.ai platform. It then feeds that model into the `langchain-ibm` `WatsonxLLM` function, which integrates the watsonx.ai model into the `langchain` framework:


In [None]:
# Create a dictionary to store credential information
credentials = {
    "url"    : "https://us-south.ml.cloud.ibm.com"
}

# Indicate the model we would like to initialize. In this case, Llama 3 70B.
model_id    = 'meta-llama/llama-3-70b-instruct'

# Initialize some watsonx.ai model parameters
params = {
        GenParams.MAX_NEW_TOKENS: 256, # The maximum number of tokens that the model can generate in a single run.
        GenParams.TEMPERATURE: 0,   # A parameter that controls the randomness of the token generation. A lower value makes the generation more deterministic, while a higher value introduces more randomness.
    }
project_id  = "skills-network" # <--- NOTE: specify "skills-network" as your project_id
space_id    = None
verify      = False

# Launch a watsonx.ai model
model = Model(
    model_id=model_id,
    credentials=credentials,
    params=params,
    project_id=project_id,
    space_id=space_id,
    verify=verify
)

# Integrate the watsonx.ai model with the langchain framework
llm = WatsonxLLM(watsonx_model=model)

Now, we will use Llama 3 to generate a random question about a topic titled "cat".

The following code prompts the LLM with the prompt `Make a random question about cat`:


In [None]:
llm.invoke("Make a random question about cat")

' behavior.'

The LLM did not understand the task and instead decided to complete the sentence rather than actually generate a question about the topic "cat". So, the prompt `Make a random question about cat` could have been written a bit better, for instance, by including a period.

The following code addresses this issue by prompting Llama 3 with the prompt `Generate a random question about a cat: Question: `, while simultaneously highlighting the usage of the `PromptTemplate` syntax in `langchain`:


In [None]:
# Create a prompt template
template="Generate a random question about {topic}: Question: "
pt = PromptTemplate(
    input_variables=["topic"],
    template=template)

# Create an LLM chain with the Llama 3 model and the first prompt template
prompt_to_Llama_3 = LLMChain(llm=llm, prompt=pt)

# Run the chain with the input "cat", which will generate a random question about "cat" and then answer that question
result = prompt_to_Llama_3.invoke("cat")
print(result)


{'topic': 'cat', 'text': ' What is the average lifespan of a domestic cat?\nAnswer: The average lifespan of a domestic cat is around 12-15 years, depending on various factors such as breed, diet, lifestyle, and health conditions.'}


Firstly, note that with `PromptTemplate`, we created a template where we just filled a few key missing parts of the prompt. Here, with `prompt_to_Llama_3.invoke("cat")`, the we simply supplied the topic (`cat`) from which the entire prompt (`Generate a random question about {topic}: Question: `) is generated.

Secondly, note that even though the prompt has been improved and Llama 3 did provide us with a question (` What is the average lifespan of a domestic cat?`), the answer to the prompt was also included (`Answer: The average lifespan of a domestic cat is around 12-15 years, depending on various factors such as breed, diet, lifestyle, and health conditions.`). Moreover, the question was preceded by a space, which is not the behavior that we would typically want, especially if we were to parse or process this response in an automated workflow.

The LLM failed to generate a question without an answer because, in the absence of additional instructions and context, the model interpreted our prompt as most likely being followed by a question *and* an answer. To avoid this type of behavior, we can use Llama 3's prompt syntax to provide it with specific instructions about how we want the LLM to respond:


In [None]:
template = """
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Generate a random question about the user specified topic. Be sure to preface your answer with 'Question: ' before you return your question.
<|eot_id|>
<|start_header_id|>user<|end_header_id|>{topic}<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
"""
pt = PromptTemplate(
    input_variables=["topic"],
    template=template)

prompt_to_LLAMA3 = LLMChain(llm=llm, prompt=pt)
result = prompt_to_LLAMA3.invoke("cat")
print(result)

{'topic': 'cat', 'text': 'Question: What is the average lifespan of a domestic cat, and what factors can influence its longevity?'}


Here, Llama 3 was given a specific `system` instruction on how it was to behave, and it gave us just one random question about "cat", without an answer. Moreover, Llama 3 followed instruction with respect to the exact format that we would want in the output, outputting `Question: ` before the question, as opposed to prepending a space before the question, as occurred in the response to the previous prompt.

Using the special tags in the Llama 3 prompt template, we have instructed the model to respond more effectively to our prompt and to provide us with a response that is in the exact format we want and expect.

Providing a `system` instruction to the model is just one way to get the model to prepend `Question: ` before generating a question. Another possible way to express this desire is by providing the model with an example or a series of examples. This is called one-shot, few-shot, or multi-shot prompting, depending on how many examples are provided.

In [None]:
# template = """
# <|begin_of_text|>
# <|start_header_id|>system<|end_header_id|>
# Generate a random question about the user specified topic.
# <|eot_id|>
# <|start_header_id|>user<|end_header_id|>{topic}<|eot_id|>
# <|start_header_id|>assistant<|end_header_id|>
# """

template = """
<|begin_of_text|>
<|start_header_id|>system<|end_header_id|>
Generate a random question about the user specified topic.
<|eot_id|>
<|start_header_id|>user<|end_header_id|>coffee<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>Question: What is the color of coffee?<|eot_id|>
<|start_header_id|>user<|end_header_id|>{topic}<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
"""
pt = PromptTemplate(
    input_variables=["topic"],
    template=template)

prompt_to_LLAMA3 = LLMChain(llm=llm, prompt=pt)
result = prompt_to_LLAMA3.invoke("cat")
print(result)

{'topic': 'cat', 'text': 'Question: What is the average lifespan of a domestic cat?'}


### Conclusion

So, we have learned how to guide Llama 3 to get the precise response we are looking for, either by using a `system` instruction or a sequence of examples. By adhering to Llama 3's prompt template, now we can enhance the likelihood of receiving a high-quality response in our preferred format.