# Prompt Engineering with LLMs

**Learning Objective**

1. Learn how query the Vertex PaLM API
1. Learn how to setup the PaLM API parameters 
1. Learn prompt engineering for text generation
1. Learn prompt engineering for chat applications


The Vertex AI PaLM API lets you test, customize, and deploy instances of Google's PaLM large language models (LLM) so that you can leverage the capabilities of PaLM in your applications. The PaLM family of models supports text completion, multi-turn chat, and text embeddings generation.

This notebook will provide examples of accessing pre-trained PaLM models with the API for use cases like text classification, summarization, extraction, and chat.

### Setup

In [None]:
from google.cloud import aiplatform
from vertexai.language_models import (
    ChatModel,
    ChatSession,
    InputOutputTextPair,
    TextGenerationModel,
)

print(aiplatform.__version__)

## Text generation

The cell below implements the helper function `generate` to generate responses from the PaLM API. 
The PaLM API has a number of parameters to set up. Here are their meanings:

The input parameters are as follows:

* `prompt`: Text input to generate model response. Prompts can include preamble, questions, suggestions, instructions, or examples.

* `temperature`: The temperature is used for sampling during the response generation, which occurs when topP and topK are applied. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of 0 is deterministic: the highest probability response is always selected. For most use cases, try starting with a temperature of 0.2.

* `max_output_tokens`: Maximum number of tokens that can be generated in the response. Specify a lower value for shorter responses and a higher value for longer responses. A token may be smaller than a word. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words.

* `top_k`: Top-k changes how the model selects tokens for output. A top-k of 1 means the selected token is the most probable among all tokens in the model's vocabulary (also called greedy decoding), while a top-k of 3 means that the next token is selected from among the 3 most probable tokens (using temperature). For each token selection step, the top K tokens with the highest probabilities are sampled. Then tokens are further filtered based on topP with the final token selected using temperature sampling. Specify a lower value for less random responses and a higher value for more random responses.

* `top_p`: Top-p changes how the model selects tokens for output. Tokens are selected from the most probable until the sum of their probabilities equals the top-p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the top-p value is 0.5, then the model will select either A or B as the next token (using temperature) and doesn't consider C. The default top-p value is 0.95. Specify a lower value for less random responses and a higher value for more random responses.

### Exercise

Complete the function below as follows:
* You'll first create an instance of [TextGenerationModel](https://cloud.google.com/vertex-ai/docs/generative-ai/text/test-text-prompts) from the pretrained `model_name`
* then you'll instantiate a `response` using `model.predict` to which you'll pass the prompt, temperature, max_output_tokens, top_p, and top_k arguments.


In [None]:
def generate(
    prompt,
    model_name="text-bison@001",
    temperature=0.2,
    max_output_tokens=256,
    top_p=0.8,
    top_k=40,
):
    model = None  # TODO
    response = None  # TODO

    return response

In [None]:
generate(
    "What are five important things to understand about large language models?"
)

### Text Classification 

Now that we've tested our wrapper function, let us explore prompting for classification. Text classification is a common machine learning use-case and is frequently used for tasks like spam detection, sentiment analysis, topic classification, and more. 

Both **zero-shot** and **few-shot** prompting are common with text classification use cases. Zero-shot prompting is where you do not provide examples with labels in the input, and few-shot prompting is where you provide (a few) examples in the input. 

Additionally, for text classification use cases we can reduce `max_output_tokens` if we simply want the predicted class (because labels are typically a few words or less).

### Exercise

Write a zero-shot prompt that allows you to categorize a text into the following categories: "technology", "polictics", and "sport":

In [None]:
prompt = """
TODO
"""

generate(prompt, max_output_tokens=5)

### Exercise

Write a few-shot prompting for classification. Along with increasing the accuracy of your model, few-shot prompting gives you a certain control over the output format. The prompt should be able to classify a text into the categories "dogs" and "cats", and return only these categories.

In [None]:
prompt = """
What is the topic for a given text? 
- cats 
- dogs 

Text: They always sits on my lap and purr!
The answer is: cats

Text: They love to play fetch
The answer is: dogs

Text: I throw the frisbee in the water and they swim for hours! 
The answer is: dogs 

Text: They always knock things off my counter!
The answer is:
"""

generate(prompt, max_output_tokens=5)

### Text Summarization

### Exercise

PaLM can also be used for text summarization use cases. Text summarization produces a concise and fluent summary of a longer text document. 
In the cell below, write a prompt that can summarize the following text:

```
A transformer is a deep learning model. It is distinguished by its adoption of self-attention, 
differentially weighting the significance of each part of the input (which includes the 
recursive output) data. Like recurrent neural networks (RNNs), transformers are designed to 
process sequential input data, such as natural language, with applications towards tasks such 
as translation and text summarization. However, unlike RNNs, transformers process the 
entire input all at once. The attention mechanism provides context for any position in the 
input sequence. For example, if the input data is a natural language sentence, the transformer 
does not have to process one word at a time. This allows for more parallelization than RNNs 
and therefore reduces training times.
```

In [None]:
prompt = """
TODO
"""
generate(prompt, max_output_tokens=1024)

### Exercise

Modify the prompt in the cell above so that it outputs 4 bullet point summary of the text.

In [None]:
prompt = """
Provide four bullet points summarizing the following:

A transformer is a deep learning model. It is distinguished by its adoption of self-attention, 
differentially weighting the significance of each part of the input (which includes the 
recursive output) data. Like recurrent neural networks (RNNs), transformers are designed to 
process sequential input data, such as natural language, with applications towards tasks such 
as translation and text summarization. However, unlike RNNs, transformers process the 
entire input all at once. The attention mechanism provides context for any position in the 
input sequence. For example, if the input data is a natural language sentence, the transformer 
does not have to process one word at a time. This allows for more parallelization than RNNs 
and therefore reduces training times.

Summary:

"""
generate(prompt, max_output_tokens=1024)

### Exercise

Consider the following dialog between a customer and service representative:

```
Kyle: Hi! I'm reaching out to customer service because I am having issues.

Service Rep: What seems to be the problem? 

Kyle: I am trying to use the PaLM API but I keep getting an error. 

Service Rep: Can you share the error with me? 

Kyle: Sure. The error says: "ResourceExhausted: 429 Quota exceeded for 
      aiplatform.googleapis.com/online_prediction_requests_per_base_model 
      with base model: text-bison"
      
Service Rep: It looks like you have exceeded the quota for usage. Please refer to 
             https://cloud.google.com/vertex-ai/docs/quotas for information about quotas
             and limits. 
             
Kyle: Can you increase my quota?

Service Rep: I cannot, but let me follow up with somebody who will be able to help.

```

Write a prompt that can give a short summary of what was said along with todo items for 
the support representative:

In [None]:
prompt = """
TODO
"""

generate(prompt, max_output_tokens=256)

### Text Extraction 
PaLM can be used to extract and structure text. Text extraction can be used for a variety of purposes. One common purpose is to convert documents into a machine-readable format. This can be useful for storing documents in a database or for processing documents with software. Another common purpose is to extract information from documents. This can be useful for finding specific information in a document or for summarizing the content of a document. 

###Exercise

Consider the following recipe:

```
Ingredients:
* 1 tablespoon olive oil
* 1 onion, chopped
* 2 carrots, chopped
* 2 celery stalks, chopped
* 1 teaspoon ground cumin
* 1/2 teaspoon ground coriander
* 1/4 teaspoon turmeric powder
* 1/4 teaspoon cayenne pepper (optional)
* Salt and pepper to taste
* 1 (15 ounce) can black beans, rinsed and drained
* 1 (15 ounce) can kidney beans, rinsed and drained
* 1 (14.5 ounce) can diced tomatoes, undrained
* 1 (10 ounce) can diced tomatoes with green chilies, undrained
* 4 cups vegetable broth
* 1 cup chopped fresh cilantro
```

Write a zero-shot prompt that can return the ingredients in JSON format with keys:
"ingredient", "quantity", and "type".

In [None]:
prompt = """
TODO
"""
generate(prompt, max_output_tokens=1024)

### Exercise

Consider a product description of the type

```
 Google Nest WiFi, network speed up to 1200Mpbs, 2.4GHz and 5GHz frequencies, WP3 protocol
```

Write a few-shot prompt that can output a the product characteristics in JSON format, as for example:

```python
JSON: {
  "product":"Google Nest WiFi",
  "speed":"1200Mpbs",
  "frequencies": ["2.4GHz", "5GHz"],
  "protocol":"WP3"
}
```

In [None]:
prompt = """
TODO
"""

generate(prompt, max_output_tokens=1024)

## Prompt engineering for chat

The Vertex AI PaLM API for chat is optimized for multi-turn chat. Multi-turn chat is when a model tracks the history of a chat conversation and then uses that history as the context for responses.

PaLM API chat prompts are composed of the following three components:

* **Messages (required)**: Messages are the list of author-content pairs. The model responds to the current message, which is the last pair in the messages list. The pairs before the last pair comprise the chat session history. 

* **Context (optional)**: Context allows you to tell a model how to respond or what to refer to when it responds. Context enables you to do things like: specify words that the model can and can't use, specify topics to avoid or focus on, specify the style/tone/format, assume a character/figure, and more.

* **Examples (optional)**: List of input-output pairs that demonstrate the model behavior you want to see. This is similar to few-shot learning. 

The following cell implements a helper function that creates a chat session with a specified language model
and parameters. Within a chat session, the model keeps context and remembers
the previous conversation.

### Exercise

Complete the function below so that it returns a `ChatSession` instance configured with the parameters passed as input of the function.

In [None]:
def create_chat_session(
    model_name="chat-bison@001",
    max_output_tokens=256,
    temperature=0.0,
    top_k=40,
    top_p=0.95,
    context=None,
    examples=None,
):
    model = ChatModel.from_pretrained(model_name)

    return ChatSession(
        # TODO
    )

In [None]:
chat_session = create_chat_session()

After creating the `ChatSession` instance, you can converse with PaLM using the `.send_message` method:

In [None]:
response = chat_session.send_message("Hello, my name is Kyle!")
response

In [None]:
response = chat_session.send_message(
    """
    Good to meet you too. I was just wondering, what is the most populated city
    in the United States?
    """
)

response

Recall that within a chat session, history is preserved. This enables the model to remember things within a given chat session for context. You can see this history in the `message_history` attribute of the chat session object. Notice that the history is simply a list of previous input/output pairs.

In [None]:
chat_session.message_history

In [None]:
response = chat_session.send_message("What question did I ask you last?")
response

### Adding Context and Examples

Adding context and examples can help customize the chat model to specified needs. Context can be used to do things like apply specific tones/styles or avoid specific word/phrase usage (you can get very creative!). Examples provide the model with input/output pairs that demonstrate the type of model behavior you want to see.

### Exercise

Complete the `context` as well as the `input_text`, `output_text` examples in the cell below, so that the chatbot is primed to believe its name is Electra and that it can answer physic questions very well, using a lot of exclamation marks in its responses:

In [None]:
context = """
TODO
"""

examples = [
    InputOutputTextPair(
        input_text="TODO",
        output_text="TODO",
    ),
    InputOutputTextPair(
        input_text="TODO",
        output_text="TODO",
    ),
    InputOutputTextPair(
        input_text="TODO",
        output_text="TODO",
    ),
]

chat_session = create_chat_session(context=context, examples=examples)

In [None]:
response = chat_session.send_message(
    "Hi, my name is Kyle! What can you help me with?"
)
response

In [None]:
response = chat_session.send_message("What is thermodynamics?")
response

### Customer Service Context

While the above example demonstrates the idea of context and examples, it is perhaps not useful in the real world. Lets see if we can use context and examples for a more practical case - a customer service agent. 

### Exercise

Repeat the same exercise as above, except that now you'll prime the chatbot to be `Billy`, a customer service chatbot for `Bills Books`, that can only answer questions about Bills Books and its products:

In [None]:
context = """
You a Billy, a customer service chatbot for Bills Books. You only answer customer questions about Bills Books and its products.
"""

examples = [
    InputOutputTextPair(
        input_text="What is the capital of Washington State?",
        output_text="Sorry, I only answer questions about Bills Books.",
    ),
    InputOutputTextPair(
        input_text="Do you sell video games?",
        output_text="Sorry, we only sell books.",
    ),
]

chat_session = create_chat_session(context=context, examples=examples)

In [None]:
response = chat_session.send_message("Where should I go on my next vacation?")
response

In [None]:
response = chat_session.send_message("What's a good fantasy novel?")
response

Copyright 2023 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

     https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.