# Prompt Engineering Guide

Submitted by,
> Suhail Chand <br>
> suhail.chand@outlook.com

This notebook demonstrates some Prompt Engineering techniques with the **Llama 2 open-source LLM.** We will try to cover some of the major ideas behind Prompt Engineering, and in the process of examining various examples and their outputs, observe the importance of various prompt elements to the LLM's abilities.


## Importing the LLM

We will use the 5-bit integer quantized version of the Llama2 13b chat model on a single T4 GPU in Google Colab. To load the model, we shall install all the pre-requisites and download the model weights from HuggingFace.

### Enabling NVIDIA cuBLAS

**About the cuBLAS library**

- The cuBLAS library provides highly optimized implementations of common linear algebra routines, such as matrix-matrix multiplication, matrix-vector multiplication, and linear system solvers. These operations are crucial in many scientific and computational tasks, including machine learning, numerical simulations, and data analysis.

- By utilizing the parallel processing power of GPUs, cuBLAS can significantly accelerate linear algebra computations compared to traditional CPU-based implementations. GPUs are designed with many cores and specialized hardware for parallel computation, making them well-suited for performing large-scale matrix operations in parallel.

- ***Applications that involve extensive matrix computations, such as deep learning models, numerical simulations, and scientific computations, can benefit from using the cuBLAS library***. It enables faster and more efficient calculations, reducing the overall computational time and enabling researchers and developers to tackle more complex problems.

In [1]:
# Installation for GPU llama-cpp-python
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.2.28  --force-reinstall --upgrade --no-cache-dir -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/9.4 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.9/9.4 MB[0m [31m29.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.4/9.4 MB[0m [31m134.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.1/62.1 kB[0m [31m150.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m248.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.9/16.9 MB[0m [31m280.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━

**!CMAKE_ARGS="-DLLAMA_CUBLAS=on"**:
- This line sets an environment variable named CMAKE_ARGS with the value -DLLAMA_CUBLAS=on.
- It indicates that the cuBLAS library should be enabled for GPU-accelerated linear algebra operations performed by the "llama-cpp-python" package, if available.

**FORCE_CMAKE=1**:
- This line sets an environment variable called FORCE_CMAKE with the value 1.
- It instructs the installation process to use the CMake build system.

**pip install llama-cpp-python**: This command uses the pip package manager to install or upgrade the llama-cpp-python package.

**--force-reinstall**: This option forces the reinstallation of the package, even if it is already installed on the system.

**--upgrade**: This option ensures that the installed package is upgraded to the latest version if a newer version is available.

**--no-cache-dir**: This option disables the use of pip's cache directory, which means that the package will be downloaded and installed directly from the source without using any cached files.

**--verbose**: This option enables verbose output during the installation process, providing more detailed information about each step being performed.

### Downloading the LLM

In [2]:
# For downloading the models from Hugging Face
!pip install huggingface_hub



In [3]:
# Llama class from the llama_cpp library
from llama_cpp import Llama

# hf_hub_download function from the Hugging Face Hub library
from huggingface_hub import hf_hub_download

- Llama2 is a collection of pretrained and fine-tuned LLMs ranging from 7 billion to 70 billion parameters.
- GGUF (Generalized Graph Unification Format) is a format introduced by the llama.cpp, an extensible, future-proof format which stores more information about the model as metadata.
- It also includes significantly improved tokenization code, including for the first time full support for special tokens. This has been shown to improve performance, especially with models that use new special tokens and implement custom prompt templates.

In [4]:
model_path = hf_hub_download(
    repo_id="TheBloke/Llama-2-13B-chat-GGUF",
    filename="llama-2-13b-chat.Q5_K_M.gguf"
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


llama-2-13b-chat.Q5_K_M.gguf:   0%|          | 0.00/9.23G [00:00<?, ?B/s]

**hf_hub_download()**:
- The hf_hub_download() function will connect to the Hugging Face Model Hub, locate the specified model using the provided repo_id, and then download it.
- The downloaded model will be saved locally with the specified filename.
- The model_path variable will contain the path to the downloaded model file on your local file system.

### Initialising the LLM

In [5]:
llama_llm = Llama(
    model_path=model_path,
    n_threads=2,                # CPU cores.
    n_batch=512,                # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
    n_gpu_layers=43,            # Change this value based on your model and your GPU VRAM pool.
    n_ctx=4096,                 # Context window
)

AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


**model_path**: Specifies the path to the model.

**n_threads**:
- Specifies the number of CPU cores that the Llama instance should use for its operations.
- Modern CPUs often have multiple cores, allowing them to perform multiple tasks simultaneously. By specifying the number of threads, you're controlling how many of these cores the Llama instance will utilize.

**n_batch**:
- It's used to specify the batch size for processing data with the model.
- It should be between 1 and n_ctx (context window), taking into consideration the GPU's VRAM capacity.
- In machine learning, especially with deep learning models, data is often processed in batches rather than one sample at a time. This is because it can lead to more efficient computations, taking advantage of parallel processing capabilities in modern hardware.
- In this case, n_batch=512 means that the Llama instance is set to process data in batches of 512.
- Choosing an appropriate batch size is important for balancing computational efficiency with memory constraints.

**n_gpu_layers**: Specifies the number of GPU layers (adjust based on GPU VRAM).

**n_ctx**:
- This parameter specifies the context window size, which is a crucial aspect of how a language model processes and generates text.
- In the context of language models, a "context window" refers to the range of tokens (words or subwords) that the model considers when generating responses or making predictions.
- When the model generates text, it doesn't consider the entire input sequence but focuses on a limited window of context. This helps the model manage memory and computational resources efficiently.
- A larger context window can provide more context for generating coherent and contextually relevant responses. However, it may also come with increased memory requirements, as the model needs to store and process a larger amount of text.
- The choice of n_ctx should strike a balance between the need for context and the available computational resources (both CPU and GPU memory).

## Prompt Parameters

In [6]:
prompt_template = """<s>[INST]\n <<SYS>> \n {system_message} \n <</SYS>>```{user_message}``` /n [/INST] """

def generate_prompt(system_message, user_input):
    prompt = prompt_template.format(system_message=system_message, user_message=user_input)
    return prompt

In [7]:
system_message = """Respond to the user question based on the user prompt."""

def generate_llama_response(user_input):
    # Generate prompt from user_input and system_message
    prompt=generate_prompt(system_message, user_input)

    # Generate a response from the LLaMA model
    response = llama_llm(
        prompt=prompt,
        max_tokens=1024,
        temperature=0,
        top_p=0.95,
        repeat_penalty=1.2,
        top_k=50,
        stop=['INST'],
        echo=False
    )

    # Extract and return the response text
    response_text = response["choices"][0]["text"]

    return response_text

- **max_tokens**: Specifies the maximum number of *tokens* that the model should generate in response to the prompt.
- **temperature**: Controls the *randomness* of the generated response. A higher temperature value will result in a more random response, while a lower temperature value will result in a more predictable response. In this case, it's set to 0, which means the response will be as deterministic as possible.
- **top_p**: Controls the *diversity* of the generated response. A higher value of top_p will result in a more diverse response, while a lower value will result in a less diverse response. In this case, it's set to 0.95, which means the model will try to generate a diverse response.
- **repeat_penalty**: Controls the *penalty for repeating tokens* in the generated response. A higher value of repeat_penalty will result in a lower probability of repeating tokens, while a lower value will result in a higher probability of repeating tokens. In this case, it's set to 1.2, which means the model will try to avoid repeating tokens.
- **top_k**: Controls the *maximum number* of tokens that will be considered when generating the response. In this case, it's set to 50, which means the model will consider up to 50 tokens when generating the response.
- **stop**: List of tokens that should *stop the generation* of response. In this case, it's set to ['INST'], which means the model will stop generating tokens when it encounters the token "INST".
- **echo**: Controls whether the prompt should be echoed back to the user.

## Prompting Techniques

### Prompt Templates

Prompt templates consist of two main components: fixed text and variable slots. These components work together to provide a structured framework for generating prompts in generative AI tasks.

1. Fixed Text:
    - Fixed text refers to the static or pre-defined portions of the prompt template.
    - It can include any text, instructions, or context that remains constant across different prompts generated using the template.
    - Fixed text helps set the overall tone, style, or specific requirements for the generated output.

2. Variable Slots:
    - Variable slots are the dynamic parts of the prompt template that can be filled with different values or options.
    - These slots act as placeholders for specific information that can vary from prompt to prompt. Often the variable slots are used to gather input from the users.
    - In this way, these slots allow for customization and flexibility in generating prompts that vary by user input.
    - A variable slot is usually delimited by a specific character (e.g., triple backticks ``` ) so that this portion of the prompt can be dynamically altered when the input is given.

Fixed and variable slots in a prompt are arranged in a prompt template as the system message, few shot examples and user messages. Prompt templates can be repeatedly reused by changing the user message. The system message is included at the beginning of the prompt and is used to prime the model with context, instructions, or other information relevant to the use case.

### 1. Zero-shot Prompting

- The key idea behind zero-shot prompts is that the model can leverage its general language understanding and knowledge to generate relevant responses, even for tasks it has not been explicitly trained on.
- This is achieved by providing the model with a prompt that includes a description or instruction about the desired task or topic, without any additional training data specific to that task.
- Zero-shot prompts are useful in scenarios where training data for a specific task is limited or unavailable.
- They allow for more flexible and versatile use of generative AI models, as they can generate responses for a wide range of tasks without the need for task-specific training.

**Classify customer review in the inputs as positive or negative in sentiment.**

In [8]:
system_message = """
Classify customer reviews in the input as positive or negative in sentiment.
Reviews will be delimited by triple backticks, that is, ```.
Do not explain your answer. Your answer should only contain the label: positive or negative.
"""

In [9]:
zero_shot_template = """<s>[INST]\n <<SYS>> \n {system_message} \n <</SYS>>```{user_message}``` /n [/INST] """

def generate_prompt(system_message, user_input):
    prompt = zero_shot_template.format(system_message=system_message, user_message=user_input)
    return prompt

In [10]:
customer_review = """
I couldn't be happier with my experience at your store!
The staff went above and beyond to assist me, providing exceptional customer service.
They were friendly, knowledgeable, and genuinely eager to help.
The product I purchased exceeded my expectations and was exactly what I was looking for.
From start to finish, everything was seamless and enjoyable.
I will definitely be returning and recommending your store to all my friends and family.
Thank you for making my shopping experience so wonderful!
"""

response = generate_llama_response(customer_review)
print(response)

 Positive


In [11]:
customer_review = """"
I am extremely disappointed with the service I received at your store!
The staff was rude and unhelpful, showing no regard for my concerns.
Not only did they ignore my requests for assistance, but they also had the audacity to speak to me condescendingly.
It's clear that your company values profit over customer satisfaction.
I will never shop here again and will make sure to spread the word about my awful experience.
You've lost a loyal customer, and I hope others steer clear of your establishment!
"""

response = generate_llama_response(customer_review)
print(response)

Llama.generate: prefix-match hit


 Negative


**Entity extraction.**

In [12]:
system_message = """Your task is act like assistant and extract entities from customer reviews in the input.
Reviews will be delimited by triple backticks, that is, ```.
Do not explain your answer."""

customer_review = """I had a old but very nice logitech lazer gamin mouse, my dog at the cord off it so had to get a replacement.
I was tempted to get another logitech because well I knew it was a sure thing.
Anyways I saw the reviews on this mouse and thought it looked awesome so I thought I would give it a try.
Well it does indeed look awesome and feels good in the hand.
My old mouse was weighted and kind of like the feel of the heft but I'm pleased with this new one and so long as it doesn't fail on me would say its definitely worth the price.
I would have had to play something like a First Person Shooter side by side to get a real idea how they compare on precision but this new mouse seems fine. Again my logitech was probably more than 10 years old so I can't compare to a new one.
If I had to guess they based the button placement, size and shape of this mouse off the logitech, don't know.
"""

response = generate_llama_response(customer_review)
print(response)

Llama.generate: prefix-match hit


 Sure! Here are the entities extracted from the customer review:

* Brand: Logitech
* Product: Laser Gaming Mouse
* Features: weight, feel, precision, button placement, size, shape


In this scenario, the model identified only three entities, whereas the review mentions four entities. To achieve the desired output, we'll employ Few-Shot Prompting.

### 2. Few-shot Prompting

- Few-shot prompts refer to a technique used in generative AI tasks where a model is prompted to generate responses for tasks or topics with only a limited amount of training data.
- Unlike zero-shot prompts that require no task-specific training, few-shot prompts provide the model with a small amount of training data to adapt it to a specific task or domain.
- The idea behind few-shot prompts is to adapt a pre-trained model using a small number of examples or demonstrations related to the target task. This allows the model to observe task-specific patterns and improve its performance on that particular task.

**Extract relevant entities from customer reviews.**

In [13]:
system_message = """Your task is act like assistant and extract entities from customer review as done for below examples.
Reviews will be delimited by triple backticks, that is, ```.
Do not explain your answer."""

review_1= """Ordered grey which advertises green lighting, when you're going for a cheap aesthetic, it's upsetting. Mouse works fine."""
assistant_1 ="""Entities: [Mouse, Logitech MX Master, DPI Buttons, Mouse Wheel, Wire]"""

review_2="""
I bought one of these for PC gaming. Loved it, then bought another for work.
This mouse is not on par with high end mouses from like the Logitech MX Master series, but at 1/5-/8th the price, I didn't expect that level of quality.
It does perform well, mouse wheel feels weighty, side buttons are well place with different textures so you can tell them apart.
DPI buttons are handy for adjusting between games, work jobs, etc.
The mouse does feel rather plasticky and cheap, but for the money, it about what I expected.
I like a wired mouse to avoid the pointer/game jumping around due to latency.
Long wire too, so snagging issues are minimized. Great value overall."""
assistant_2 = """Entities: [Mouse, Logitech MX Master, DPI Buttons, Mouse Wheel, Wire]"""

In [14]:
new_review = """I had a old but very nice logitech lazer gamin mouse, my dog at the cord off it so had to get a replacement.
I was tempted to get another logitech because well I knew it was a sure thing.
Anyways I saw the reviews on this mouse and thought it looked awesome so I thought I would give it a try.
Well it does indeed look awesome and feels good in the hand.
My old mouse was weighted and kind of like the feel of the heft but I'm pleased with this new one and so long as it doesn't fail on me would say its definitely worth the price.
I would have had to play something like a First Person Shooter side by side to get a real idea how they compare on precision but this new mouse seems fine. Again my logitech was probably more than 10 years old so I can't compare to a new one.
If I had to guess they based the button placement, size and shape of this mouse off the logitech, don't know.
"""

The few shot prompt extends the few shot example list by adding the user input for which we need a completion.

In [15]:
first_turn_template = """<s>[INST]\n <<SYS>> \n {system_message} \n <</SYS>>```{user_message}``` /n [/INST] \n{assistant_message}\n</s> """

In [16]:
examples_template = """<s>[INST]\n ```{user_message}``` \n [/INST] \n {assistant_message}\n</s>"""

In [17]:
prediction_template = """<s>[INST]\n ```{user_message}```[/INST]"""

In [18]:
first_example = first_turn_template.format(system_message=system_message,user_message=review_1,assistant_message=assistant_1)
examples=examples_template.format(user_message=review_2,assistant_message=assistant_2)
few_shot_examples  =first_example + examples

In [19]:
def generate_prompt(few_shot_examples,new_input):
    prompt = few_shot_examples + prediction_template.format(user_message=new_input)
    return prompt

In [21]:
def generate_llama_response(user_input):
    # Generate prompt
    prompt=generate_prompt(few_shot_examples, user_input)

    # Generate a response from the LLaMA model
    response = llama_llm(
        prompt=prompt,
        max_tokens=1024,
        temperature=0,
        top_p=0.95,
        repeat_penalty=1.2,
        top_k=50,
        stop=['INST'],
        echo=False
    )

    # Extract and return the response text
    response_text = response["choices"][0]["text"]
    return response_text

In [22]:
response = generate_llama_response(new_review)
print(response)

Llama.generate: prefix-match hit


  
Entities: [Mouse, Logitech Laser Gaming Mouse, First Person Shooter]


- Few-shot prompts are particularly useful when there is a scarcity of task-specific training data or when adapting a pre-trained model to a new domain or task.
- By providing a small amount of relevant training data, the model can quickly learn and generalize from the examples, improving its performance on the target task.
- It's important to note that the effectiveness of few-shot prompts depends on the quality and representativeness of the training data provided. While few-shot learning can yield promising results, it may not always match the performance of models trained with a larger amount of task-specific data.
- Overall, few-shot prompts offer a middle ground between zero-shot prompts and fully trained models, allowing for targeted adaptation and improved performance on specific tasks with limited training data.

### 3. Chain-of-Thought (CoT) Prompting

- Chain-of-Thought prompting is a technique used in generative AI tasks to guide the model's response generation by providing a sequence of related prompts or questions.
- Instead of a single prompt, a CoT consists of multiple interconnected steps that build upon each other to guide the model's thinking process. These steps represent the "thinking" process that we want the model to follow.
- The purpose of CoT prompting is to encourage the model to generate more coherent and contextually relevant responses by guiding its thought process in a structured manner.
- Each step in the chain serves as a stepping stone, providing additional context or constraints for the model to consider while generating the response.
- CoT prompts could also be augmented with few-shot examples.

**Generate a detailed key information from each customer complaint, presented in a structured JSON format by following a logical sequence of acknowledging the issue, explaining the cause for customer complains.**

In [23]:
cot_prompt_template = """<s>[INST]\n <<SYS>> \n {system_message} \n <</SYS>>```{user_message}``` /n [/INST] """

In [24]:
system_message = """
You are an assistant that helps a customer service representatives from a mobile phone company to better understand customer complaints.
Customer complaints will be submitted as text delimited by triple backticks, that is, ```.
For each complaint, extract the following information and present it only in a JSON format:
1. phone_model: This is the name of the phone - if unknown, just say “UNKNOWN”
2. phone_price: The price in dollars - if unknown, assume it to be 1000 $
3. complaint_desc: A short description/summary of the complaint in less than 20 words
4. additional_charges: How much in dollars did the customer spend to fix the problem? - this should be an integer
5. refund_expected: TRUE or FALSE - check if the customer explicitly mentioned the word “refund” to tag as TRUE. If unknown, assume that the customer is not expecting a refund

Take a step-by-step approach in your response, and give a detailed explanation before sharing your final answer in the following JSON format:
{phone_model:, phone_price:, complaint_desc:, additional_charges:, refund_expected:}.
"""

In [25]:
customer_complaint = """
I am fuming with anger and regret over my purchase of the XUI890.
First, the price tag itself was exorbitant at 1500 $, making me expect exceptional quality.
Instead, it turned out to be a colossal disappointment.
The additional charges to fix its constant glitches and defects drained my wallet even more.
I spend 275 $ to get a new battery.
The final straw was when the phone's camera malfunctioned, and the repair cost was astronomical.
I demand a full refund and an apology for this abysmal product.
Returning it would be a relief, as this phone has become nothing but a money pit. Beware, fellow buyers!
"""

In [26]:
def generate_prompt(system_message, user_input):
    prompt=cot_prompt_template.format(system_message=system_message, user_message=user_input)
    return prompt

In [27]:
def generate_llama_response(user_input):
    #generate prompt
    prompt=generate_prompt(system_message,user_input)

    # Generate a response from the LLaMA model
    response = llama_llm(
        prompt=prompt,
        max_tokens=1024,
        temperature=0,
        top_p=0.95,
        repeat_penalty=1.2,
        top_k=50,
        stop=['INST'],
        echo=False
    )

    # Extract and return the response text
    response_text = response["choices"][0]["text"]
    return response_text

In [28]:
response = generate_llama_response(customer_complaint)
print(response)

Llama.generate: prefix-match hit


 Sure, I'd be happy to help you with that! Here's the information extracted from the customer complaint:

1. Phone model: XUI890
2. Phone price: 1500 $ (mentioned in the complaint as "exorbitant")
3. Complaint desc: Constant glitches and defects, camera malfunctioned
4. Additional charges: 275 $ (mentioned for a new battery)
5. Refund expected: TRUE (explicitly mentioned in the complaint as "demand a full refund")

Here's the final answer in JSON format:
{phone_model:"XUI890", phone_price:1500, complaint_desc:"Constant glitches and defects, camera malfunctioned", additional_charges:275, refund_expected:TRUE}


- Chain of thought prompting helps in maintaining coherence and relevance in the generated responses by providing a structured framework for the model's thinking process.
- It ensures that the model considers the necessary context and constraints while generating each part of the response.
- By guiding the model's thought process through a chain of interconnected prompts, chain of thought prompting can lead to more focused, contextually appropriate, and coherent responses in generative AI tasks.

Sometimes portions of the instruction might not be followed. To avoid such a situation, we should add a reiteration of the task summary at the end of the system message like so:

In [29]:
system_message ="""
You are an assistant that helps a customer service representatives from a mobile phone company to better understand customer complaints.
For each complaint, extract the following information:
1. phone_model: This is the name of the phone - if unknown, just say “UNKNOWN”
2. phone_price: The price in dollars - if unknown, assume it to be 1000 $
3. complaint_desc: A short description/summary of the complaint in less than 20 words
4. additional_charges: How much in dollars did the customer spend to fix the problem? - this should be an integer
5. refund_expected: TRUE or FALSE - check if the customer explicitly mentioned the word “refund” to tag as TRUE. If unknown, assume that the customer is not expecting a refund

Take a step-by-step approach in your response, and give a detailed explanation before sharing your final answer in the following JSON format:
{phone_model:, phone_price:, complaint_desc:, additional_charges:, refund_expected:}.

To reiterate, explain your rationale in detail before presenting  your final answer.
"""

In [30]:
customer_complaint = """
I am fuming with anger and regret over my purchase of the XUI890.
First, the price tag itself was exorbitant at 1500 $, making me expect exceptional quality.
Instead, it turned out to be a colossal disappointment.
The additional charges to fix its constant glitches and defects drained my wallet even more.
I spend 275 $ to get a new battery.
The final straw was when the phone's camera malfunctioned, and the repair cost was astronomical.
I demand a full refund and an apology for this abysmal product.
Returning it would be a relief, as this phone has become nothing but a money pit. Beware, fellow buyers!
"""

In [31]:
response = generate_llama_response(customer_complaint)
print(response)

Llama.generate: prefix-match hit


 Sure, I'd be happy to help you with that! Here's my step-by-step approach in understanding the customer complaint:

Step 1: Identify the phone model
The customer mentions "XUI890" as the name of their phone. Therefore, we can confirm that the phone model is XUI890.

Step 2: Determine the price of the phone
The customer states that the price tag was exorbitant at $1500, which suggests that the phone was a high-end device. However, since they did not explicitly mention the price, we can only assume it to be $1000 as a default value.

Step 3: Summarize the complaint
The customer is unhappy with their purchase of the XUI890 due to its constant glitches and defects. They had to spend additional money for repairs, including $275 for a new battery, and the camera malfunctioned eventually. The customer demands a full refund and an apology.

Step 4: Determine additional charges
The customer mentions spending $275 for a new battery, but there is no information about other repair costs. Therefor

### 4. Self-consistency

#### Simple Prompting

In [32]:
# Simple prompting to adjust recipe ingredients
simple_prompt = """
Adjust the following recipe to serve 10 people.
Original Recipe (Serves 4):
- 2 cups of flour
- 1 cup of sugar
- 0.5 cup of butter
"""

response = llama_llm(
    prompt=simple_prompt,
    max_tokens=1024,
    temperature=0,
    top_p=0.95,
    repeat_penalty=1.2,
    echo=False
)

simple_answer = response["choices"][0]["text"]
print("Simple Answer (Adjusted Recipe):")
print(simple_answer)

Llama.generate: prefix-match hit


Simple Answer (Adjusted Recipe):
- 2 eggs
- 1 teaspoon of baking powder
- 1/2 teaspoon of salt
- 1 cup of milk
- 1 teaspoon of vanilla extract

Instructions:

1. Preheat oven to 375 degrees Fahrenheit (190 degrees Celsius).
2. In a large mixing bowl, whisk together flour and sugar.
3. Add in softened butter and mix until crumbly.
4. Beat in eggs and vanilla extract until well combined.
5. Gradually add in milk while continuously stirring to avoid lumps.
6. Pour the batter into a greased 9x13-inch baking dish.
7. Bake for 20-25 minutes or until a toothpick inserted comes out clean.
8. Remove from oven and let cool before serving.

Adjusted Recipe (Serves 10):

* Multiply all ingredients by 2.5 (except milk, which remains the same)

Instructions:

1. Preheat oven to 375 degrees Fahrenheit (190 degrees Celsius).
2. In a large mixing bowl, whisk together flour and sugar.
3. Add in softened butter and mix until crumbly.
4. Beat in eggs and vanilla extract until well combined.
5. Gradually a

- The output generated through simple prompting presents a recipe that is clear and straightforward, making it easy for beginners to follow.
- However, it falls short in several critical areas. Notably, the lack of detailed calculations for adjusting ingredient quantities based on the desired increase in servings may leave learners confused about the scaling process.
- Additionally, there is no explanation provided on how to adjust these quantities, limiting their understanding of the underlying mathematical principles involved in recipe scaling.
- Furthermore, the inclusion of unnecessary ingredients and instructional details not relevant to the task contributes to the phenomenon of "hallucination" in the output, which can further mislead learners and detract from the overall quality of the recipe.

#### Self-consistency

In self-consistency, we generate multiple answers to the same question and pick the answer that is repeated the most across these occurrences. This is particularly valuable for factual questions.

In [33]:
# System message to provide context for the LLM
system_message = """
You are a cooking assistant. Adjust recipe ingredient quantities based on desired servings.
"""

# Template for generating multiple answers
answers_template = """
The original recipe serves 4 people:
- 2 cups of flour
- 1 cup of sugar
- 0.5 cup of butter

Adjust the following recipe to serve 10 people. Provide {num_answers} different adjustments.
"""

# Formatting the prompt to generate multiple answers (Self-Consistency)
answers_prompt = prompt_template.format(
    system_message=system_message,
    user_message=answers_template.format(
        num_answers=5  # Generate 5 distinct answers
    )
)

# Sending the request to generate multiple answers
response = llama_llm(
    prompt=answers_prompt,
    max_tokens=1024,
    temperature=0,
    top_p=0.95,
    repeat_penalty=1.2,
    echo=False  # Do not return the prompt
)

# Extracting the generated answers
factual_answers = response["choices"][0]["text"]
print("Generated Answers:")
print(factual_answers)

Llama.generate: prefix-match hit


Generated Answers:
 Sure, I'd be happy to help! Here are five different adjustments to the original recipe to serve 10 people:

Adjustment 1 (Classic Portion):

* Flour: 4 cups (x2 = 8 cups for 10 servings)
* Sugar: 2 cups (x2 = 4 cups for 10 servings)
* Butter: 1 cup (x2 = 2 cups for 10 servings)

Adjustment 2 (Increased Portion):

* Flour: 6 cups (x3 = 18 cups for 10 servings)
* Sugar: 3 cups (x3 = 9 cups for 10 servings)
* Butter: 2 cups (x3 = 6 cups for 10 servings)

Adjustment 3 (Small Portion):

* Flour: 2.5 cups (x1.5 = 5 cups for 10 servings)
* Sugar: 1.5 cups (x1.5 = 6 cups for 10 servings)
* Butter: 1 cup (x1.5 = 3 cups for 10 servings)

Adjustment 4 (Vegan Version):

* Flour: 2.5 cups (x1.5 = 5 cups for 10 servings)
* Sugar: 1 cup (x1.5 = 6 cups for 10 servings)
* Butter Substitute: 1/2 cup (x1.5 = 3 cups for 10 servings)

Adjustment 5 (Gluten-Free Version):

* Flour: 2 cups of gluten-free all-purpose flour (x2 = 4 cups for 10 servings)
* Sugar: 1 cup (x2 = 4 cups for 10 ser

In [34]:
# Template for selecting the most frequent answer
consistency_template = """
Here are {num_answers} answers to the recipe adjustment:
Answers:
{answers}

Choose the most frequently suggested quantities for each ingredient.
Final Answer:
"""

# Formatting the prompt to select the most frequent answer
consistency_prompt = prompt_template.format(
    system_message=system_message,
    user_message=consistency_template.format(
        num_answers=5,  # Number of answers generated previously
        answers=factual_answers  # Pass the generated answers from the previous step
    )
)

# Sending the request to select the most frequent answer
response = llama_llm(
    prompt=consistency_prompt,
    max_tokens=1024,
    temperature=0,
    top_p=0.95,
    repeat_penalty=1.2,
    echo=False  # Do not return the prompt
)

# Extracting and printing the final answer
final_answer = response["choices"][0]["text"]
print("Final Answer (Self-Consistency):")
print(final_answer)

Llama.generate: prefix-match hit


Final Answer (Self-Consistency):
 Sure, I'd be happy to help! Based on the five different adjustments provided, here are the most frequently suggested quantities for each ingredient for a batch of cookies that serves 10 people:

Adjustment 3 (Small Portion):

* Flour: 2.5 cups (x1.5 = 5 cups for 10 servings)
* Sugar: 1.5 cups (x1.5 = 6 cups for 10 servings)
* Butter: 1 cup (x1.5 = 3 cups for 10 servings)

These quantities are the most frequently suggested in the five adjustments provided, and they will yield a batch of cookies that serves 10 people with a smaller portion size.


- The self-consistency prompting approach yields an output that is more tailored to the specific needs of the recipe, addressing some of the issues present in the simple prompting method.
- This output maintains a degree of consistency in ingredient adjustments and offers multiple perspectives by showcasing different methods for recipe scaling, catering to various preferences. However, it still contains inconsistencies in ingredient amounts for the same serving size, leading to potential confusion regarding measurements like sugar and flour.
- Additionally, the output lacks clear explanations for why certain quantities were selected over others, leaving users uncertain about which method to trust. This ambiguity could undermine the confidence of learners who are trying to grasp the intricacies of recipe adjustment.

### 5. Tree-of-Thought Prompting

Tree-of-thought prompting is a generalization of chain-of-thought prompting where the model is prompted to take multiple reasoning paths. This forces the LLM into a deliberate reasoning mode.

In [35]:
# Step 1: Tree of Thought Prompting to generate multiple solutions
tot_prompt_template = """
Generate {num_solutions} possible solutions for the following problem:
Problem:
Calculate the ingredient adjustments for the following recipe:
Original Recipe (Serves 4):
- 2 cups of flour
- 1 cup of sugar
- 0.5 cup of butter
Desired Servings: 10 people.

1. Calculate the adjustment factor mathematically: (10 servings / 4 servings).
2. Adjust each ingredient using the adjustment factor.
3. Present the adjusted quantities in bullet points.
"""

tot_prompt = prompt_template.format(
    system_message=system_message,
    user_message=tot_prompt_template.format(
        num_solutions=5  # Generate 5 distinct solutions
    )
)

# Sending the request to the LLM for generating solutions
response = llama_llm(
    prompt=tot_prompt,
    max_tokens=1024,
    temperature=0.7,
    top_p=0.9,
    echo=False  # Do not return the prompt
)

# Extracting and printing generated solutions
tot_solutions = response["choices"][0]["text"]
print("Generated Solutions (Tree of Thought):")
print(tot_solutions)

Llama.generate: prefix-match hit


Generated Solutions (Tree of Thought):
 Sure, I'd be happy to help! Here are five possible solutions for calculating the ingredient adjustments for the recipe to serve 10 people:

Solution 1: Calculate the adjustment factor mathematically

To calculate the adjustment factor, we divide the desired number of servings (10) by the original number of servings (4):

Adjustment Factor = 10 / 4 = 2.5

Now, we can use this adjustment factor to adjust each ingredient quantity:

Solution 2: Adjust each ingredient using the adjustment factor

* Flour: 2 cups x 2.5 = 5 cups
* Sugar: 1 cup x 2.5 = 2.5 cups
* Butter: 0.5 cup x 2.5 = 1.25 cups

Solution 3: Present the adjusted quantities in bullet points

Here are the adjusted ingredient quantities for the recipe to serve 10 people:

* Flour: 5 cups
* Sugar: 2.5 cups
* Butter: 1.25 cups

Solution 4: Use a formula to adjust the ingredient quantities

We can use the following formula to adjust each ingredient quantity:

New Quantity = Original Quantity 

In [36]:
# Step 2: Evaluation prompt template
evaluation_template = """
For the following problem: Adjust the recipe to serve 10 people, evaluate each solution based on accuracy and mathematical reasoning:
Solutions:
{solutions}

Present your evaluation of each solution, highlighting strengths and weaknesses.
"""

# Create a new prompt to evaluate the solutions
evaluation_prompt = prompt_template.format(
    system_message=system_message,
    user_message=evaluation_template.format(
        solutions=tot_solutions
    )
)

# Sending the request to evaluate the solutions
response = llama_llm(
    prompt=evaluation_prompt,
    max_tokens=1024,
    temperature=0.7,
    top_p=0.9,
    echo=False  # Do not return the prompt
)

# Extract and print evaluation results
evaluation_results = response["choices"][0]["text"]
print("Evaluation of Solutions:")
print(evaluation_results)

Llama.generate: prefix-match hit


Evaluation of Solutions:
 As a cooking assistant, I will evaluate each solution based on accuracy and mathematical reasoning.

Solution 1: Calculate the adjustment factor mathematically
Strengths: This solution is simple and straightforward, using basic multiplication to calculate the adjustment factor. It also highlights the concept of scaling up or down a recipe based on the desired number of servings.

Weaknesses: The solution does not provide any visual representation of the adjusted quantities, which can make it more difficult to understand and apply the adjustments.

Solution 2: Adjust each ingredient using the adjustment factor
Strengths: This solution provides a clear and concise way of adjusting each ingredient quantity based on the adjustment factor. It also shows how the adjustment factor is applied to each ingredient.

Weaknesses: The solution does not provide any visual representation of the adjusted quantities, and it may be more difficult to understand for some users who

In [37]:
# Step 3: Ranking prompt template
ranking_template = """
For the following problem: Adjust the recipe to serve 10 people, rank the following solutions based on their accuracy and mathematical reasoning:
Evaluations:
{evaluations}

Rank the solutions and explain which one is the best and why.
"""

# Create a new prompt to rank the solutions
ranking_prompt = prompt_template.format(
    system_message=system_message,
    user_message=ranking_template.format(
        evaluations=evaluation_results
    )
)

# Sending the request to rank the solutions
response = llama_llm(
    prompt=ranking_prompt,
    max_tokens=1024,
    temperature=0.7,
    top_p=0.9,
    echo=False  # Do not return the prompt
)

# Extract and print the ranked solutions
ranked_solutions = response["choices"][0]["text"]
print("Ranked Solutions:")
print(ranked_solutions)


Llama.generate: prefix-match hit


Ranked Solutions:
 As a cooking assistant, I would rank the solutions as follows:

1. Solution 3: Present the adjusted quantities in bullet points

This solution provides a clear and concise list of the adjusted ingredient quantities, making it easy to understand and apply the adjustments. It highlights each ingredient individually, which can be helpful for users who want to quickly identify the adjustments. Additionally, this solution does not require any mathematical calculations or formulas, making it accessible to a wide range of users.

2. Solution 5: Use a recipe adjustment chart

This solution provides a clear visual representation of the adjusted quantities, which can be helpful for users who prefer to see the calculations. However, it may be more difficult to understand for some users who are not familiar with basic multiplication or charts. Additionally, the chart may become cluttered if there are many ingredients or servings to adjust.

3. Solution 1: Calculate the adjustmen

- The tree-of-thought prompting technique provides comprehensive solutions by breaking down each methodical process for scaling recipes, significantly enhancing understanding of various mathematical approaches involved.
- Unlike the self-consistency prompting approach, which presents conflicting ingredient amounts and lacks clarity on selection rationale, the ToT method systematically organizes information, allowing for the evaluation of each solution based on a clear ranking scheme.
- By showcasing multiple strategies for achieving the same goal, this output encourages critical thinking and helps in discerning the most effective method for different contexts.
- Additionally, the structured nature of the ToT output reduces ambiguity, making it easier to grasp complex concepts.

### 6. Rephrase & Respond (RaR) Prompting

- The Rephrase and Respond (RaR) prompting technique is a clever way to boost the accuracy and clarity of responses from large language models. Instead of jumping straight into answering a question, the model is first asked to rephrase and expand the original prompt—then respond to it.

- This two-step approach helps in a few key ways:
    - It clarifies ambiguous or poorly worded questions.
    - It encourages the model to think more deeply about the intent behind the query.
    - It often leads to more accurate, context-aware answers.
- RaR is especially useful in domains where precision matters—like legal, medical, or technical contexts—and can be used in both single-step and two-step formats

In [38]:
system_message = """
You are tasked to answer queries on financial information.
Only answer the specific question presented by the user.
"""

In [39]:
rephrase_prompt_template = """
Context:
{context}
===

Question:
{question}.

Observe the context presented above, rephrase and expand the above question to help you do better answering.
Maintain all the information in the original question.
Please note that you only have to rephrase the question, do not mention the context.
The context is only presented for your reference.

Present your answer in the following format:
Rephrased Question:
<rephrased-question-here>
"""

Here is an extract from the Tesla 2022 10-K statement that will be used as context for this demonstration.

In [40]:
tesla_annual_report_context ="""
In 2022, we recognized total revenues of $81.46 billion, respectively, representing an increase of $27.64 billion, compared to the prior year.
We continue to ramp production, build new manufacturing capacity and expand our operations to enable increased deliveries and deployments of our products and further revenue growth.
"""

In [41]:
factual_question = "What was the increase in annual revenue in 2022 compared to 2021?"

In [42]:
rephrase_prompt = prompt_template.format(
    system_message=system_message,
    user_message=rephrase_prompt_template.format(
        context=tesla_annual_report_context,
        question=factual_question
    )
)

In [43]:
print(rephrase_prompt)

<s>[INST]
 <<SYS>> 
 
You are tasked to answer queries on financial information.
Only answer the specific question presented by the user.
 
 <</SYS>>```
Context:

In 2022, we recognized total revenues of $81.46 billion, respectively, representing an increase of $27.64 billion, compared to the prior year.
We continue to ramp production, build new manufacturing capacity and expand our operations to enable increased deliveries and deployments of our products and further revenue growth.

===

Question:
What was the increase in annual revenue in 2022 compared to 2021?.

Observe the context presented above, rephrase and expand the above question to help you do better answering.
Maintain all the information in the original question.
Please note that you only have to rephrase the question, do not mention the context.
The context is only presented for your reference.

Present your answer in the following format:
Rephrased Question:
<rephrased-question-here>
``` /n [/INST] 


In [44]:
response = llama_llm(
    prompt=rephrase_prompt,
    max_tokens=1024,
    temperature=0,
    top_p=0.95,
    repeat_penalty=1.2,
    echo=False
)

Llama.generate: prefix-match hit


In [45]:
rephrased_question = response["choices"][0]["text"].strip()

In [46]:
rephrase_marker = rephrased_question.find('Rephrased Question:')

In [47]:
print(rephrased_question[rephrase_marker+19:])

 What was the increase in annual revenues from 2021 to 2022? Specifically, what was the dollar amount of the increase in total revenues during this period?


In [48]:
rephrased_factual_question = rephrased_question[rephrase_marker+19:]

In [49]:
response_template = """
Context:
{context}
===

Original Question:
{question}

Rephrased Question:
{rephrased_question}

Given the above context, use your answer for the rephrased question presented above to answer the original question.
Present your final answer in the following format.
Final Answer: <your-final-answer>
"""

In [50]:
response_prompt = prompt_template.format(
    system_message=system_message,
    user_message=response_template.format(
        context=tesla_annual_report_context,
        question=factual_question,
        rephrased_question=rephrased_factual_question
    )
)

In [51]:
print(response_prompt)

<s>[INST]
 <<SYS>> 
 
You are tasked to answer queries on financial information.
Only answer the specific question presented by the user.
 
 <</SYS>>```
Context:

In 2022, we recognized total revenues of $81.46 billion, respectively, representing an increase of $27.64 billion, compared to the prior year.
We continue to ramp production, build new manufacturing capacity and expand our operations to enable increased deliveries and deployments of our products and further revenue growth.

===

Original Question:
What was the increase in annual revenue in 2022 compared to 2021?

Rephrased Question:
 What was the increase in annual revenues from 2021 to 2022? Specifically, what was the dollar amount of the increase in total revenues during this period?

Given the above context, use your answer for the rephrased question presented above to answer the original question.
Present your final answer in the following format.
Final Answer: <your-final-answer>
``` /n [/INST] 


In [52]:
response = llama_llm(
    prompt=response_prompt,
    max_tokens=1024,
    temperature=0,
    top_p=0.95,
    repeat_penalty=1.2,
    echo=False
)

Llama.generate: prefix-match hit


In [53]:
print(response["choices"][0]["text"].strip())

Sure! Based on the information provided, the increase in annual revenues from 2021 to 2022 was $27.64 billion. This is calculated by taking the total revenues for 2022 ($81.46 billion) and subtracting the total revenues for 2021.

Final Answer: $27.64 billion


### 7. Chain-of-Verification (CoVe) Prompting

- The Chain-of-Verification (CoVe) prompting technique is a structured method designed to reduce hallucinations in large language models by encouraging self-critique and fact-checking.
- This technique is especially effective for complex or factual queries, as it helps the model catch and correct its own mistakes—kind of like proofreading with a built-in fact-checker. It's a powerful tool for improving reliability in AI-generated content.
- Instead of relying solely on a single response, CoVe guides the model through a four-step process:

In [54]:
system_message = """
You are tasked to answer queries on financial information.
Only answer the specific question presented by the user.
"""

#### Step 1: Baseline response

In [55]:
baseline_prompt_template = """
Context:
{context}
===

Use the above context to answer the following question:
Question:
{question}
"""

In [56]:
tesla_annual_report_context ="""
In 2022, we recognized total revenues of $81.46 billion, respectively, representing an increase of $27.64 billion, compared to the prior year.
We continue to ramp production, build new manufacturing capacity and expand our operations to enable increased deliveries and deployments of our products and further revenue growth.
"""

In [57]:
factual_question = "Which year had more revenue - 2022 or 2021?"

In [58]:
baseline_prompt = prompt_template.format(
    system_message=system_message,
    user_message=baseline_prompt_template.format(
        context=tesla_annual_report_context,
        question=factual_question
    )
)

In [59]:
print(baseline_prompt)

<s>[INST]
 <<SYS>> 
 
You are tasked to answer queries on financial information.
Only answer the specific question presented by the user.
 
 <</SYS>>```
Context:

In 2022, we recognized total revenues of $81.46 billion, respectively, representing an increase of $27.64 billion, compared to the prior year.
We continue to ramp production, build new manufacturing capacity and expand our operations to enable increased deliveries and deployments of our products and further revenue growth.

===

Use the above context to answer the following question:
Question:
Which year had more revenue - 2022 or 2021?
``` /n [/INST] 


In [60]:
response = llama_llm(
    prompt=baseline_prompt,
    max_tokens=1024,
    temperature=0,
    top_p=0.95,
    repeat_penalty=1.2,
    echo=False
)

Llama.generate: prefix-match hit


In [61]:
baseline_factual_response = response["choices"][0]["text"].strip()

In [62]:
print(baseline_factual_response)

Sure! Based on the information provided in the context, we can see that the total revenues for 2022 were $81.46 billion, while there is no specific mention of the total revenues for 2021. Therefore, we can conclude that 2022 had more revenue than 2021.


#### Step 2: Verification questions

In [63]:
verifications_prompt_template = """
Your task is to create verification questions based on the below original question and the baseline response.
The verification questions are meant for verifying the factual acuracy in the baseline response.

Context:
{context}
===

Question:
Use the above context to answer the following question: {question}

Baseline Response:
{baseline_response}

Respond with your verification questions in a JSON format with the following headers.
```JSON
question1: <verification-question-1>
question2: <veriification-question-2>
and so on.
```
Do not provide answers to these verification questions. Respond only with the JSON.
"""

In [64]:
verifications_prompt = prompt_template.format(
    system_message=system_message,
    user_message=verifications_prompt_template.format(
        context=tesla_annual_report_context,
        question=factual_question,
        baseline_response=baseline_factual_response
    )
)

In [65]:
print(verifications_prompt)

<s>[INST]
 <<SYS>> 
 
You are tasked to answer queries on financial information.
Only answer the specific question presented by the user.
 
 <</SYS>>```
Your task is to create verification questions based on the below original question and the baseline response.
The verification questions are meant for verifying the factual acuracy in the baseline response.

Context:

In 2022, we recognized total revenues of $81.46 billion, respectively, representing an increase of $27.64 billion, compared to the prior year.
We continue to ramp production, build new manufacturing capacity and expand our operations to enable increased deliveries and deployments of our products and further revenue growth.

===

Question:
Use the above context to answer the following question: Which year had more revenue - 2022 or 2021?

Baseline Response:
Sure! Based on the information provided in the context, we can see that the total revenues for 2022 were $81.46 billion, while there is no specific mention of the total

In [66]:
response = llama_llm(
    prompt=verifications_prompt,
    max_tokens=1024,
    temperature=0,
    top_p=0.95,
    repeat_penalty=1.2,
    echo=False
)

Llama.generate: prefix-match hit


In [67]:
verification_factual_questions = response["choices"][0]["text"].strip()

In [68]:
print(verification_factual_questions)

Sure, here are three verification questions based on the provided context and baseline response:

{
"question1": "What was the total revenue for 2021 according to the given context?",
"question2": "How much did total revenues increase from 2021 to 2022, as stated in the context?",
"question3": "Does the baseline response accurately state that there is no specific mention of total revenues for 2021 in the given context?"
}


#### Step 3: Verification responses

In [69]:
verification_responses_template = """Answer the following question correctly based on the context given below:
Context:
{context}
===

Question: {verification_question}"""

In [70]:
verification_question_beginning = verification_factual_questions.find("{")

In [71]:
import json
verification_factual_questions_dict = json.loads(verification_factual_questions[verification_question_beginning:])

In [72]:
verification_responses = []

In [73]:
verification_factual_questions_dict.values()

dict_values(['What was the total revenue for 2021 according to the given context?', 'How much did total revenues increase from 2021 to 2022, as stated in the context?', 'Does the baseline response accurately state that there is no specific mention of total revenues for 2021 in the given context?'])

In [74]:
for verification_factual_question in verification_factual_questions_dict.values():

    verification_responses_prompt = prompt_template.format(
        system_message=system_message,
        user_message=verification_responses_template.format(
            context=tesla_annual_report_context,
            verification_question=verification_factual_question
        )
    )

    response = llama_llm(
        prompt=verification_responses_prompt,
        max_tokens=1024,
        temperature=0,
        top_p=0.95,
        repeat_penalty=1.2,
        echo=False
    )

    verification_responses.append(response["choices"][0]["text"].strip())

Llama.generate: prefix-match hit
Llama.generate: prefix-match hit
Llama.generate: prefix-match hit


In [75]:
for q, a in zip(verification_factual_questions_dict.values(), verification_responses):
    print(q)
    print(a)

What was the total revenue for 2021 according to the given context?
Sure! Based on the context provided, the total revenue for 2021 is $81.46 billion.
How much did total revenues increase from 2021 to 2022, as stated in the context?
Sure! Based on the context you provided, total revenues increased by $27.64 billion from 2021 to 2022.
Does the baseline response accurately state that there is no specific mention of total revenues for 2021 in the given context?
Yes, the baseline response accurately states that there is no specific mention of total revenues for 2021 in the given context. The text only mentions total revenues for 2022 and the increase compared to the prior year, but does not provide any information about total revenues for 2021.


In [76]:
verification_factual_question_answer_pairs = ''

for q, a in zip(verification_factual_questions_dict.values(), verification_responses):
    verification_factual_question_answer_pairs += ('\n' + \
    f"Verification Question: {q}" + '\n' + \
    f"Response: {a}"
)

In [77]:
print(verification_factual_question_answer_pairs)


Verification Question: What was the total revenue for 2021 according to the given context?
Response: Sure! Based on the context provided, the total revenue for 2021 is $81.46 billion.
Verification Question: How much did total revenues increase from 2021 to 2022, as stated in the context?
Response: Sure! Based on the context you provided, total revenues increased by $27.64 billion from 2021 to 2022.
Verification Question: Does the baseline response accurately state that there is no specific mention of total revenues for 2021 in the given context?
Response: Yes, the baseline response accurately states that there is no specific mention of total revenues for 2021 in the given context. The text only mentions total revenues for 2022 and the increase compared to the prior year, but does not provide any information about total revenues for 2021.


#### Step 4: Final prompt

In [78]:
final_prompt_template = """
Given the below `Context`, `Original Query` and `Baseline Answer`, analyze the `Verification Questions & Answers` to finally filter the refined answer.
Context:
{context}
===
Original Query: {original_question}
Baseline Answer: {baseline_response}

Verification Questions & Answer Pairs:
{verification_question_answer_pairs}

To reiterate, you should answer the original query accounting for the veracity of the verification question answer pairs and the context.
Only answer the original question and do not present any additional details. Present your final answer in the following format:
Final Refined Answer:<your-final-answer>"""

In [79]:
final_prompt = prompt_template.format(
    system_message=system_message,
    user_message=final_prompt_template.format(
        context=tesla_annual_report_context,
        original_question=factual_question,
        baseline_response=baseline_factual_response,
        verification_question_answer_pairs=verification_factual_question_answer_pairs
    )
)

In [80]:
print(final_prompt)

<s>[INST]
 <<SYS>> 
 
You are tasked to answer queries on financial information.
Only answer the specific question presented by the user.
 
 <</SYS>>```
Given the below `Context`, `Original Query` and `Baseline Answer`, analyze the `Verification Questions & Answers` to finally filter the refined answer.
Context:

In 2022, we recognized total revenues of $81.46 billion, respectively, representing an increase of $27.64 billion, compared to the prior year.
We continue to ramp production, build new manufacturing capacity and expand our operations to enable increased deliveries and deployments of our products and further revenue growth.

===
Original Query: Which year had more revenue - 2022 or 2021?
Baseline Answer: Sure! Based on the information provided in the context, we can see that the total revenues for 2022 were $81.46 billion, while there is no specific mention of the total revenues for 2021. Therefore, we can conclude that 2022 had more revenue than 2021.

Verification Questions &

In [81]:
response = llama_llm(
    prompt=final_prompt,
    max_tokens=1024,
    temperature=0,
    top_p=0.95,
    repeat_penalty=1.2,
    echo=False
)

Llama.generate: prefix-match hit


In [82]:
print(response["choices"][0]["text"].strip())

Final Refined Answer: Based on the provided context, the total revenue for 2021 is $81.46 billion.


## Use Cases

In [86]:
prompt_template = """<s>[INST]\n <<SYS>> \n {system_message} \n <</SYS>>```{user_message}``` /n [/INST] """

def generate_prompt(system_message, user_input):
    prompt = prompt_template.format(system_message=system_message, user_message=user_input)
    return prompt

system_message = """Respond to the user question based on the user prompt."""

def generate_llama_response(user_input):
    # Generate prompt from user_input and system_message
    prompt=generate_prompt(system_message, user_input)

    # Generate a response from the LLaMA model
    response = llama_llm(
        prompt=prompt,
        max_tokens=1024,
        temperature=0,
        top_p=0.95,
        repeat_penalty=1.2,
        top_k=50,
        stop=['INST'],
        echo=False
    )

    # Extract and return the response text
    response_text = response["choices"][0]["text"]

    return response_text

### 1. Clear and specific instructions

- Vague inputs will always give you generic and vague outputs.
- The more detailed you are with the context, the better the chance you will get an output that is tailored to your needs.

**Industry Setting: E-commerce**

#### Baseline Prompt

In [87]:
user_prompt = """What are good products?"""
response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 There are many great products available in various categories, and what constitutes a "good" product can depend on individual preferences and needs. However, here are some popular and highly-regarded products across different categories that you may find useful:

1. Electronics:
	* Smartphones: Apple iPhones, Samsung Galaxy series, Google Pixel phones
	* Laptops: MacBook Air/Pro, Dell XPS, HP Envy
	* TVs: OLED TVs from LG, Sony Bravia, Samsung QLED
2. Home and Kitchen:
	* Smart home devices: Amazon Echo, Google Nest Hub, Apple HomePod
	* Coffee makers: Keurig, Nespresso, Breville
	* Stand mixers: KitchenAid, Bosch, Anker
3. Beauty and Personal Care:
	* Skincare products: Neutrogena Hydrating Facial Cleanser, Olay Regenerist Micro-Sculpting Cream, La Roche-Posay Toleriane Ultra Fluid
	* Makeup: MAC Pro Longwear Foundation, NARS Radiant Creamy Concealer, Urban Decay All Nighter Setting Spray
4. Clothing and Accessories:
	* Denim jeans: Levi's 501, Gap Classic Skinny Jeans, Madewell High

#### Improved Prompt - 1

In [88]:
user_prompt = """List the top 5 best-selling electronics on amazon e-commerce site this month"""
response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 Sure, here are the top 5 best-selling electronics on Amazon's e-commerce site this month based on sales rank and customer reviews:

1. Apple AirPods Pro with Wireless Charging Case (2nd Generation) - This wireless earbuds has a sales rank of #1 in the Electronics category and an average rating of 4.8 out of 5 stars from over 3,000 customer reviews. It's currently priced at $239.99.
2. Samsung Galaxy Note 20 Ultra 5G Smartphone - This smartphone has a sales rank of #2 in the Electronics category and an average rating of 4.6 out of 5 stars from over 1,000 customer reviews. It's currently priced at $999.99.
3. Amazon Echo (4th Generation) - This smart speaker has a sales rank of #3 in the Electronics category and an average rating of 4.7 out of 5 stars from over 1,000 customer reviews. It's currently priced at $99.99.
4. Sony WH-1000XM4 Wireless Noise Canceling Headphones - This wireless headphone has a sales rank of #4 in the Electronics category and an average rating of 4.7 out of 5 st

#### Improved Prompt - 2

In [89]:
user_prompt = """Provide a detailed comparison of the top 5 best-selling electronics on our e-commerce site this month, including features, prices, and customer ratings."""
response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 Sure! Here's a detailed comparison of the top 5 best-selling electronics on your e-commerce site this month:

1. **Apple AirPods Pro**
	* Features: Active noise cancellation, water resistance, wireless charging case, sweat and water resistant, up to 24 hours battery life (with charging case)
	* Price: $249
	* Customer Ratings: 4.8/5 stars (based on over 10,000 reviews)
	* Why they're popular: The AirPods Pro offer excellent sound quality, long battery life, and innovative features like active noise cancellation and water resistance, making them a top choice for Apple fans and wireless earbuds enthusiasts.
2. **Samsung Galaxy S21**
	* Features: 6.2-inch AMOLED display, up to 16GB RAM, up to 512GB storage, quad camera setup with 50MP primary sensor, long battery life (up to 48 hours), fast charging support
	* Price: $799 (for the base model)
	* Customer Ratings: 4.6/5 stars (based on over 1,000 reviews)
	* Why they're popular: The Galaxy S21 offers top-of-the-line specs and features at 

### 2. Keep it clean

- Avoid Prompt Injections by using delimiters to specify sections of a prompt.

**Industry Setting: Legal Services**

#### Baseline Prompt

In [90]:
user_prompt ="""What do I need to include in a contract?"""
response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 When drafting a contract, it's important to include several key elements to ensure that the agreement is legally binding and protects the interests of all parties involved. Here are some essential items to include in your contract:

1. Parties Involved: Start by clearly identifying the names and addresses of both parties entering into the contract, as well as their roles and responsibilities.
2. Scope of Work: Describe the services or goods being provided, including any specific details such as quantity, quality standards, delivery dates, and payment terms.
3. Consideration: Specify what each party is providing in exchange for the other party's promises, such as money, goods, or services.
4. Termination Clause: Include a provision that outlines how the contract can be terminated, including any notice periods, termination fees, and dispute resolution procedures.
5. Payment Terms: Define when and how payments will be made, including the amount, payment frequency, and any late payment pe

#### Improved Prompt - 1

In [91]:
user_prompt ="""Outline the essential components of a legal contract. <What do I need to include in a contract?>"""
response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 Certainly! Here are the essential components of a legally binding contract:

1. Offer and Acceptance: There must be a clear offer made by one party, and an unequivocal acceptance by the other party. The terms of the offer and acceptance must be definite and certain.
2. Intention to Create Legal Relations: Both parties must have the intention to enter into a legally binding agreement. In other words, they must intend to create a contract that is enforceable in court.
3. Consideration: There must be some form of consideration, which means that one party must give something of value to the other party. This can take many forms, such as money, goods, or services. The consideration must be sufficient but need not be adequate.
4. Capacity to Contract: Both parties must have the legal capacity to enter into a contract. This means that they must be of legal age and of sound mind.
5. Free Consent: Both parties must give their free consent to the terms of the contract. This means that they cann

#### Improved Prompt - 2

In [92]:
user_prompt ="""Provide a structured list of the necessary elements of a legal contract in bullet points:
<What do I need to include in a contract?>

Parties involved
Terms of the agreement
Payment details
Duration of the contract
Termination conditions."""
response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 Certainly! Here is a structured list of the necessary elements of a legal contract, based on your user prompt:

**Parties Involved:**

* The names and addresses of both parties involved in the contract (e.g., "John Doe" and "ABC Corporation")
* A description of each party's role and responsibilities in the agreement

**Terms of the Agreement:**

* A clear and concise statement of the terms of the agreement, including any specific promises or commitments made by each party (e.g., "John Doe will provide consulting services to ABC Corporation for a period of one year")
* Any conditions or limitations on the parties' obligations under the contract (e.g., "The consulting services must be performed during regular business hours only")

**Payment Details:**

* The amount and method of payment for the goods or services being provided (e.g., "ABC Corporation will pay John Doe $10,000 per month for a period of one year, with payments made on the 15th day of each month")
* Any applicable taxes, 

### 3. Structured outputs

- Ask for structured outputs in the form of JSON / Tables.

**Industry Setting: Supply Chain Management**

#### Baseline Prompt

In [93]:
user_prompt ="""List our inventory items

Laptop Model X
Smartphone Model Y
Wireless Headphones
USB-C Charger
Bluetooth Speaker"""
response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 Sure! Here is a list of the inventory items you have listed:

1. Laptop Model X
2. Smartphone Model Y
3. Wireless Headphones
4. USB-C Charger
5. Bluetooth Speaker


#### Improved Prompt - 1

In [94]:
user_prompt ="""For the following inventory items, generate a JSON report reflecting their current quantities:
Laptop Model X
Smartphone Model Y
Wireless Headphones
USB-C Charger
Bluetooth Speaker

Ensure to include the following fields for each item: Item Name, SKU, Quantity on Hand, Reorder Level, and Supplier Name."""
response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 Sure! Here's a JSON report reflecting the current quantities of the inventory items you requested:

{
"Laptop Model X": {
"SKU": "LMX-123",
"Quantity on Hand": 5,
"Reorder Level": 10,
"Supplier Name": "Acme Laptops"
},
"Smartphone Model Y": {
"SKU": "SMY-456",
"Quantity on Hand": 20,
"Reorder Level": 30,
"Supplier Name": "Brightstar Mobile"
},
"Wireless Headphones": {
"SKU": "WH-789",
"Quantity on Hand": 15,
"Reorder Level": 25,
"Supplier Name": "AudioXperts"
},
"USB-C Charger": {
"SKU": "UC-CHARGER-001",
"Quantity on Hand": 30,
"Reorder Level": 40,
"Supplier Name": "ChargeTech"
},
"Bluetooth Speaker": {
"SKU": "BS-987",
"Quantity on Hand": 25,
"Reorder Level": 35,
"Supplier Name": "SpeakerWorld"
}
}

Please note that the quantities and reorder levels are just examples and can be adjusted based on your actual inventory needs.


#### Improved Prompt - 2

In [95]:
user_prompt ="""For the following inventory items, simulate an updated inventory report after a sales event, reducing the quantities based on typical sales patterns:
Laptop Model X
Smartphone Model Y
Wireless Headphones
USB-C Charger
Bluetooth Speaker

Please provide a JSON report with Item Name, SKU, Updated Quantity on Hand, and Reorder Level."""
response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 Sure! Here's the updated inventory report based on typical sales patterns after a sales event:

{
"Laptop Model X": {
"SKU": "LMX-123",
"Updated Quantity on Hand": 50,
"Reorder Level": 75
},
"Smartphone Model Y": {
"SKU": "SMY-456",
"Updated Quantity on Hand": 25,
"Reorder Level": 50
},
"Wireless Headphones": {
"SKU": "WH-789",
"Updated Quantity on Hand": 100,
"Reorder Level": 150
},
"USB-C Charger": {
"SKU": "UC-CHARGER",
"Updated Quantity on Hand": 20,
"Reorder Level": 30
},
"Bluetooth Speaker": {
"SKU": "BS-102",
"Updated Quantity on Hand": 5,
"Reorder Level": 10
}
}

Here's a breakdown of the updated quantities based on typical sales patterns:

* Laptop Model X: Sold out 25 units during the sale event, leaving 50 units in stock. Reorder level is set to 75 units.
* Smartphone Model Y: Sold out 15 units during the sale event, leaving 25 units in stock. Reorder level is set to 50 units.
* Wireless Headphones: Sold out 30 units during the sale event, leaving 100 units in stock. Reorde

#### Other Use-cases

In [96]:
user_prompt ="""Give me the top 3 played video games on PC in the year 2020

The output should be in the form of a JSON with
1. the game's name (as string),
2. release month (as string),
3. number of downloads (as a float in millions correct to 3 decimals),
4. total grossing revenue (as string)

order the games by descending order of downloads"""

response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 Sure, here are the top 3 played video games on PC in the year 2020 based on data from Steam, the largest digital distribution platform for PC games. The output is in JSON format as requested:

[
{
"name": "PlayerUnknown's Battlegrounds (PUBG)",
"releaseMonth": "March",
"downloads": 137.94,
"grossingRevenue": "$1.52 billion"
},
{
"name": "Dota Underlords",
"releaseMonth": "June",
"downloads": 86.03,
"grossingRevenue": "$179.4 million"
},
{
"name": "Counter-Strike: Global Offensive (CSGO)",
"releaseMonth": "August",
"downloads": 52.32,
"grossingRevenue": "$608.8 million"
}
]

Note that the download figures are in millions and correct to three decimals, as requested. The grossing revenue figures are also included in string format as requested. The games are ordered by descending order of downloads.


In [97]:
user_prompt ="""Imagine you are developing a movie recommendation system. Your task is to provide a list of recommended movies based
on user preferences. The movies are from 2010 to 2020. Please only recomment movies released with this year range. Recommend only top 3 movies
The output should be in the form of a JSON object containing the following information for each recommended movie.:

1. Movie title (as a string)
2. Release year (as an integer)
3. Genre(s) (as an array of strings)
4. IMDb rating (as a float with two decimal places)
5. Description (as a string)

Order the movies by descending IMDb rating."""

response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 Sure, I'd be happy to help! Here are my top three movie recommendations based on your preferences:

{
"movies": [
{
"title": "The Shawshank Redemption (2014)",
"releaseYear": 2014,
"genres": ["Drama", "Crime"],
"imdbRating": 9.2,
"description": "Two men from different walks of life become unlikely friends in prison and find a way to escape."
},
{
"title": "The Grand Budapest Hotel (2014)",
"releaseYear": 2014,
"genres": ["Comedy", "Drama"],
"imdbRating": 8.1,
"description": "The adventures of Gustave H, a legendary concierge at the famous Grand Budapest Hotel, and Zero Moustafa, the lobby boy who becomes his most trusted friend."
},
{
"title": "Parasite (2019)",
"releaseYear": 2019,
"genres": ["Comedy", "Drama"],
"imdbRating": 8.7,
"description": "The Kims, a poor family of four, scheme their way into the lives of a wealthy family, the Parks."
}
]
}

These movies are highly rated and fall within the year range you specified (2010-2020). I've ordered them by descending IMDb rating to e

### 4. Teaching AI how to behave

- Conditional Prompting + Few-shot prompting + Step-wise Expectations

**Industry Setting: Hospitality**

#### Baseline Prompt

In [98]:
user_prompt ="""How should I respond to guest reviews?"""

response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 Sure, here are some tips for responding to guest reviews:

1. Respond promptly: Try to respond to all reviews within a few days of receiving them, especially if they are negative. This shows that you value your guests' feedback and are willing to address any issues or concerns they may have.
2. Be genuine and sincere: Always be honest and authentic in your responses. Avoid canned responses or generic templates, as these can come across as insincere. Instead, focus on providing personalized responses that show you care about each guest's experience.
3. Listen actively: Pay attention to what guests are saying and try to understand their perspective. If they have a complaint or issue, ask questions to clarify the situation and offer solutions where possible.
4. Be respectful: Treat all guests with respect and dignity, even if they have left a negative review. Avoid getting defensive or argumentative, as this can escalate the situation and make things worse. Instead, focus on finding a re

#### Improved Prompt

In [99]:
user_prompt ="""Craft a response to guest reviews: If the review is positive, thank the guest for their feedback and encourage them to return. If the review is negative, apologize for their experience and offer to resolve the issue directly."""

response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 Sure! Here's an example of how you could craft a response to guest reviews based on their content:

Positive Review:

"Wow, thank you so much for taking the time to leave such a kind review! We are thrilled to hear that you had a great experience at our hotel and we appreciate your feedback. Your satisfaction is our top priority and we're glad to see that we met or exceeded your expectations. We hope to have the pleasure of hosting you again soon! 😊"

Negative Review:

"Sorry to hear that you had a negative experience at our hotel. We apologize for any inconvenience or disappointment caused and would like to make things right. Please know that we take all feedback seriously and are committed to providing the best possible service to our guests. Can you please reach out to us directly so we can address your concerns and provide a resolution? We value your input and look forward to hearing from you soon. 😞"

Remember, when responding to negative reviews, it's important to be genuine, em

#### Conditional Prompting

In [100]:
user_prompt = """Here is the customer review {customer_review}

Check the sentiment of the customer and classify it as “angry” or “happy”
If the customer is “angry” - reply starting with an apology
Else - just thank the customer

customer_review = "
I am extremely disappointed with the service I received at your store! The staff was rude and unhelpful, showing no regard for my concerns. Not only did they ignore my requests for assistance, but they also had the audacity to speak to me condescendingly. It's clear that your company values profit over customer satisfaction. I will never shop here again and will make sure to spread the word about my awful experience. You've lost a loyal customer, and I hope others steer clear of your establishment!
"


Here is the customer review {customer_review}

Check the sentiment of the customer and classify it as “angry” or “happy”
If the customer is “angry” - reply starting with an apology
Else - just thank the customer

customer_review = "
I couldn't be happier with my experience at your store! The staff went above and beyond to assist me, providing exceptional customer service. They were friendly, knowledgeable, and genuinely eager to help. The product I purchased exceeded my expectations and was exactly what I was looking for. From start to finish, everything was seamless and enjoyable. I will definitely be returning and recommending your store to all my friends and family. Thank you for making my shopping experience so wonderful!
"""

response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 Sure, I'd be happy to help! Here is the customer review with the sentiment classified as "angry":

Customer Review:
I am extremely disappointed with the service I received at your store! The staff was rude and unhelpful, showing no regard for my concerns. Not only did they ignore my requests for assistance, but they also had the audacity to speak to me condescendingly. It's clear that your company values profit over customer satisfaction. I will never shop here again and will make sure to spread the word about my awful experience. You've lost a loyal customer, and I hope others steer clear of your establishment!

Sentiment: Angry

Here is the apology response:

Dear [Customer Name],

We are truly sorry for the poor service you received at our store. We understand that our staff's behavior was unacceptable, and we take full responsibility for their actions. We value your business and would like to make things right. Please reach out to us directly so we can address your concerns and pr

#### Few-shot Prompting

In [101]:
user_prompt ="""Teacher prompt: There are countless fascinating animals on Earth. In just a few shots, describe three distinct animals, highlighting their unique characteristics and habitats.

Student response:

Animal: Tiger
Description: The tiger is a majestic big cat known for its striking orange coat with black stripes. It is one of the largest predatory cats in the world and can be found in various habitats across Asia, including dense forests and grasslands. Tigers are solitary animals and highly territorial. They are known for their exceptional hunting skills and powerful builds, making them apex predators in their ecosystems.

Animal: Penguin
Description: Penguins are flightless birds that have adapted to life in the Southern Hemisphere, particularly in Antarctica. They have a distinct black and white plumage that helps camouflage them in the water, while their streamlined bodies enable swift swimming. Penguins are well-suited for both land and sea, and they often form large colonies for breeding and raising their young. These social birds have a unique waddling walk and are known for their playful behavior.

Animal: Elephant
Description: Elephants are the largest land mammals on Earth. They have a characteristic long trunk, which they use for various tasks such as feeding, drinking, and social interaction. Elephants are highly intelligent and display complex social structures. They inhabit diverse habitats like savannahs, forests, and grasslands in Africa and Asia. These gentle giants have a deep connection to their families and are known for their exceptional memory and empathy.

Do this for Lion, Duck, and Monkey"""

response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 Sure! Here are three distinct animals, highlighting their unique characteristics and habitats:

Animal: Lion
Description: The lion is a majestic big cat known for its regal mane and powerful roar. It is one of the largest predatory cats in Africa and can be found in savannahs and grasslands. Lions are social animals that live in prides, which are typically made up of several females, their cubs, and one or more males. They are skilled hunters and use coordinated attacks to bring down their prey.

Animal: Duck
Description: Ducks are aquatic birds known for their distinct webbed feet and flat bills. There are over 120 species of ducks found in wetlands around the world, from ponds and lakes to rivers and oceans. Many duck species migrate seasonally, while others remain year-round in their habitats. Ducks are social birds that often form large flocks and communicate with each other through quacks and body language.

Animal: Monkey
Description: Monkeys are primates known for their agility

#### Marketing Campaigns

In [102]:
user_prompt = """
Below we have described two distinct marketing strategies for a product launch campaigns,
highlighting their key points, pros, cons and risks.

1. **Digital Marketing:**
   - Key Points: Utilizes online platforms to promote the product, engage with the audience, and drive traffic to the product website.
   - Pros: Wide reach, targeted audience segmentation, cost-effective, ability to track and measure results.
   - Cons: High competition, rapidly evolving digital landscape, ad fatigue.
   - Risks: Negative feedback or criticism can spread quickly online, potential for ad fraud or click fraud.

2. **Traditional Advertising:**
   - Key Points: Uses traditional media channels like TV, radio, and print to reach a broader audience.
   - Pros: Wide reach, brand visibility, potential to reach a diverse audience.
   - Cons: High cost, difficulty in targeting specific demographics, less trackability compared to digital channels.
   - Risks: Limited audience engagement, potential for ad avoidance or low attention.

Now as described above can you do this for do this for 1) Public Relations(PR) and 2) Product Collaborations

"""

response = generate_llama_response(user_prompt)
print(response)


Llama.generate: prefix-match hit


 Sure! Here are two distinct marketing strategies for a product launch campaign, highlighting their key points, pros, cons, and risks:

1. **Public Relations (PR):**
   - Key Points: Utilizes media coverage to build brand awareness, credibility, and reputation.
   - Pros: High perceived value, third-party endorsement, ability to reach a wider audience.
   - Cons: Limited control over messaging, potential for negative publicity if not managed properly.
   - Risks: Dependence on media coverage, potential for crisis or reputation damage if not handled effectively.

2. **Product Collaborations:**
   - Key Points: Partners with other brands or influencers to co-create products or promote existing ones.
   - Pros: Increased brand exposure, access to new audiences, potential for cross-promotion and bundling.
   - Cons: Difficulty in finding the right partners, potential creative differences, risks associated with joint branding.
   - Risks: Dependence on partner's reputation, potential for ne

#### Stepwise Instructions

In [103]:
user_prompt ="""El cambio climático continúa siendo una preocupación apremiante en Europa.
La región ha experimentado un aumento en eventos climáticos extremos en las últimas décadas, desde olas de calor mortales
hasta inundaciones devastadoras. Estos eventos extremos han dejado en claro la urgente necesidad de abordar el cambio climático y sus impactos.
Europa se ha comprometido a liderar los esfuerzos mundiales para combatir el cambio climático.
Varios países europeos han establecido ambiciosos objetivos de reducción de emisiones y han implementado políticas para promover la energía
renovable y la eficiencia energética. La Unión Europea ha adoptado el Acuerdo Verde Europeo, un plan integral para lograr la neutralidad de
carbono para 2050.Sin embargo, los desafíos persisten. Algunas regiones de Europa aún dependen en gran medida de combustibles fósiles,
lo que dificulta la transición hacia una economía baja en carbono. Además, la cooperación internacional es fundamental, ya que el
cambio climático trasciende las fronteras nacionales.La acción climática en Europa también tiene implicaciones económicas.
La transición hacia una economía sostenible puede generar oportunidades de empleo y promover la innovación tecnológica.En resumen, Europa reconoce la gravedad del cambio climático y está tomando medidas significativas para abordar esta crisis. Sin embargo, se necesita un esfuerzo colectivo continuo y una cooperación global para enfrentar los desafíos planteados por el cambio climático y garantizar un futuro sostenible para Europa y el resto del mundo.”

1. Change the above article from Spanish to English
2. Summarize this article in 30 words
3. Check the tags for the summary from the tags list (ClimateChange, Environment, Technology, Healthcare, Education, Business, ArtificialIntelligence, Travel, Sports, Fashion, Entertainment, Science)
4. Create a JSON file for all the tags with values 1 if the tag is present, and 0 if not in the above summary
5. Segregate the tags based on 1 and 0"""

response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 Sure! Here are the answers to your questions:

1. English translation of the article:

"Climate change remains a pressing concern in Europe, with the region experiencing an increase in extreme weather events over the past decades, such as deadly heatwaves and devastating floods. These events have highlighted the urgent need to address climate change and its impacts. Europe has committed to leading global efforts to combat climate change, with several European countries setting ambitious emissions reduction targets and implementing policies to promote renewable energy and energy efficiency. However, challenges persist, as some regions in Europe still rely heavily on fossil fuels, making the transition to a low-carbon economy difficult. International cooperation is essential, as climate change transcends national borders. The economic implications of climate action in Europe are also significant, with the transition towards a sustainable economy potentially creating job opportunities an

### 5. Teaching AI how to think

- Asking the model to analyze, relate, and ask you questions before it replies/reaches a conclusion.

#### Make it ask questions

In [104]:
user_prompt ="""Suggest one Gaming Laptop. Ask me relevant questions before you choose"""
response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 Sure, I'd be happy to help! Before I suggest a gaming laptop, can you please tell me a bit more about your needs and preferences? Here are some questions that will help me narrow down the options:

1. What is your budget for this laptop?
2. Do you have any specific game titles or genres that you're interested in playing on this laptop?
3. How important is portability to you - do you need a laptop that's lightweight and easy to carry around, or are you more focused on power and performance?
4. Are there any other features that you consider must-haves for your gaming laptop (e.g. touchscreen display, long battery life, etc.)?
5. Do you have a preferred operating system - Windows, macOS, or Linux?

Once I have a better understanding of your needs and preferences, I can suggest some great options for gaming laptops that fit your criteria!


#### Teach it how to engineer something before asking it to generate the final response

In [105]:
user_prompt ="""You are an engineer tasked with designing a renewable energy system for a remote island community that currently relies on diesel generators for electricity. The island has limited access to fuel and experiences frequent power outages due to logistical challenges and adverse weather conditions. Your goal is to develop a sustainable and reliable energy solution that can meet the island's power demands. Consider the following factors in your analysis and provide your recommendations:

Energy Demand Analysis:
a. Determine the island's energy consumption patterns and peak demand.
b. Analyze any anticipated future growth in energy demand.

Resource Assessment:
a. Evaluate the island's geographical location and climate conditions to identify available renewable energy resources (e.g., solar, wind, hydro, geothermal).
b. Assess the variability and intermittency of these resources to determine their reliability and potential for power generation.

System Design and Integration:
a. Propose an optimal mix of renewable energy technologies based on the resource assessment and energy demand analysis.
b. Address any technical challenges, such as grid integration, energy storage, and voltage regulation.

Economic Viability:
a. Perform a cost analysis comparing the renewable energy system with the existing diesel generator setup.
b. Consider the initial investment, operational costs, maintenance requirements, and potential government incentives or subsidies.

Environmental Impact:
a. Assess the environmental benefits of transitioning to renewable energy, such as reduced greenhouse gas emissions and local pollution.
b. Consider the potential impact on local ecosystems and wildlife, ensuring that the chosen technologies minimize negative effects.

Implementation and Operations:
a. Develop an implementation plan, including the timeline, procurement of equipment, and construction considerations.
b. Outline an operational strategy, including maintenance schedules, training requirements, and emergency response protocols.

Based on your analysis, provide a well-reasoned recommendation for the most suitable renewable energy system for the remote island, considering factors such as reliability, scalability, economic viability, and environmental sustainability."""

response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 As an engineer tasked with designing a renewable energy system for a remote island community that currently relies on diesel generators for electricity, I have conducted a comprehensive analysis of the island's energy demand patterns, resource availability, and technical, economic, environmental, and social factors. Based on my findings, I recommend a hybrid renewable energy system consisting of solar photovoltaic (PV), wind turbines, and battery energy storage to meet the island's power demands sustainably and reliably.

Energy Demand Analysis:

a. Determine the island's energy consumption patterns and peak demand:
The remote island community has a population of approximately 2,000 residents and consumes an average of 3,500 MWh of electricity per year. The peak demand occurs during summer months when air conditioning and water pumping are in high demand.

b. Analyze any anticipated future growth in energy demand:
The island's population is expected to grow by 10% over the next five y

### 6. Extracting and Filtering Information

**Industry Setting: Market Research**

#### Baseline Prompt

In [106]:
user_prompt ="""Extract the below information from the product reviews:

1. phone_model: This is the name of the phone - if unknown, just say “UNKNOWN”.
2. phone_price: The price in dollars - if unknown, assume it to be 1000 $.
3. complaint_desc: A short description/summary of the complaint in less than 20 words.
4. additional_charges: How much in dollars did the customer spend to fix the problem? - this should be an integer.
5. refund_expected: TRUE or FALSE - check if the customer explicitly mentioned the word “refund” to tag as TRUE. If unknown, assume that the customer is not expecting a refund.

Here are the reviews:
- Review-1: “I am fuming with anger and regret over my purchase of the XUI890. First, the price tag itself was exorbitant at 1500 $, making me expect exceptional quality. Instead, it turned out to be a colossal disappointment... money pit. Beware, fellow buyers!”
- Review-2: “I am beyond furious with my purchase of the ZetaPhone Z5! The $1200 price tag should have guaranteed excellence, but it was a complete rip-off... avoid the ZetaPhone Z5 at all costs!”
- Review-3: “Purchasing the TechPro X8 for $900 was the biggest mistake of my life. I expected a top-notch device, but it was a complete disaster... utter waste of money.”
- Review-4: “This phone left me seething with anger and regret. Spending $1400 on this phone was an outright scam... pitiful excuse of a phone.”
"""

response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 Sure, I can help you extract the information from the product reviews! Here are the answers to your questions based on each review:

1. Phone model: Review-1: XUI890; Review-2: ZetaPhone Z5; Review-3: TechPro X8; Review-4: UNKNOWN (no mention of phone model)
2. Price: Review-1: 1500 $; Review-2: 1200 $; Review-3: 900 $; Review-4: 1400 $
3. Complaint description: Review-1: "a colossal disappointment... money pit"; Review-2: "complete rip-off"; Review-3: "utter waste of money"; Review-4: "an outright scam"
4. Additional charges: None mentioned in any review
5. Refund expected: Review-1: TRUE; Review-2: TRUE; Review-3: UNKNOWN (no mention of refund); Review-4: TRUE


#### Improved Prompt

In [107]:
user_prompt = """"Please extract the specified information from the following product reviews for phones sold on Amazon. Ensure clarity and accuracy in your JSON output based on the provided headers. The reviews contain customer experiences, and you should focus on summarizing their main complaints effectively.

**JSON Headers:**
1. **phone_model:** Name of the phone mentioned in the review. If not specified, use “UNKNOWN”.
2. **phone_price:** Price of the phone in dollars. If the price is unclear, default to 1000 $.
3. **complaint_desc:** A concise summary of the complaint, limited to 20 words or fewer.
4. **additional_charges:** Total additional costs incurred by the customer for repairs, represented as an integer.
5. **refund_expected:** Set to TRUE if the customer explicitly requests a “refund”; otherwise, set to FALSE.

**Product Reviews:**
- Review-1: “I am fuming with anger and regret over my purchase of the XUI890. First, the price tag itself was exorbitant at 1500 $, making me expect exceptional quality. Instead, it turned out to be a colossal disappointment... money pit. Beware, fellow buyers!”
- Review-2: “I am beyond furious with my purchase of the ZetaPhone Z5! The $1200 price tag should have guaranteed excellence, but it was a complete rip-off... avoid the ZetaPhone Z5 at all costs!”
- Review-3: “Purchasing the TechPro X8 for $900 was the biggest mistake of my life. I expected a top-notch device, but it was a complete disaster... utter waste of money.”
- Review-4: “This phone left me seething with anger and regret. Spending $1400 on this phone was an outright scam... pitiful excuse of a phone.”
"""

response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 Here is the JSON output based on the provided product reviews for phones sold on Amazon, with the specified headers:

[
    {
        "phone_model": "XUI890",
        "phone_price": 1500,
        "complaint_desc": "colossal disappointment, money pit",
        "additional_charges": null,
        "refund_expected": true
    },
    {
        "phone_model": "ZetaPhone Z5",
        "phone_price": 1200,
        "complaint_desc": "complete rip-off",
        "additional_charges": null,
        "refund_expected": true
    },
    {
        "phone_model": "TechPro X8",
        "phone_price": 900,
        "complaint_desc": "utter waste of money",
        "additional_charges": null,
        "refund_expected": true
    },
    {
        "phone_model": "UNKNOWN",
        "phone_price": 1400,
        "complaint_desc": "pitiful excuse of a phone",
        "additional_charges": null,
        "refund_expected": true
    }
]

Note that for the last review, the phone model is unknown, so it is listed as "U

### 7. Other Use-cases

#### Grammar and Spellcheck

In [108]:
user_prompt ="""Dear Sir/Madam,
I am writting to inqure about the avaliability of your produc. I saw it on your websit and it looks very intresting. Can you plase send me more informtion regaring pricig and shippng optins? Also, do you have any discounts avilable for bulck orders? I would appriciate if you could get back to me as soon as possble. My company is intersted in purchsing your produc for our upcomimg projct. Thank you in advanc for your assistnce.

Best regards,
[Your Name]

Can you proofread the above text ?"""

response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 Certainly! Here's the proofread version of the email:

Dear Sir/Madam,

I am writing to inquire about the availability of your product. I saw it on your website and it looks very interesting. Could you please send me more information regarding pricing and shipping options? Additionally, do you have any discounts available for bulk orders? I would appreciate it if you could get back to me as soon as possible. My company is interested in purchasing your product for our upcoming project.

Thank you in advance for your assistance.

Best regards,
[Your Name]

Here are some corrections and suggestions:

1. "avaliability" should be spelled "availability".
2. "pricig" should be spelled "pricing".
3. "shippng" should be spelled "shipping".
4. "optins" should be spelled "options".
5. "bulck" should be spelled "bulk".
6. Add a comma after "project" in the last sentence for better clarity.
7. Use a colon instead of a comma to separate the items in the list (e.g., "pricing, shipping options").
8. 

#### Changing the tone of text

In [109]:
user_prompt ="""This phone left me seething with anger and regret. Spending $1400 on this phone was an outright scam. The device was riddled with issues from day one. The software glitches made it virtually unusable, and the constant crashes were infuriating. To add insult to injury, the charging port became faulty within two weeks, costing me an extra $100 for repairs. And guess what? The camera stopped functioning properly, and the repair quote was a shocking $500! I demand an apology for this pitiful excuse of a phone.

Convert this angry review into a neutral tone
Convert this angry review into a humorous tone
Convert this angry review into an angrier tone"""

response = generate_llama_response(user_prompt)
print(response)

Llama.generate: prefix-match hit


 Sure, here are the three conversions:

Neutral tone:
"I am disappointed with my recent purchase of a phone. While I was initially excited about the device, it has been plagued by software glitches and constant crashes since day one. Additionally, the charging port became faulty within two weeks, requiring an extra $100 for repairs. The camera also stopped functioning properly, and the repair quote was surprisingly high at $500. Despite these issues, I am still hoping for a resolution to make this experience more satisfactory."

Humorous tone:
"Oh boy, where do I begin? My new phone has been a real treat! It's like a never-ending game of "Angry Birds" with glitches and crashes everywhere. The charging port decided to take a vacation after just two weeks, leaving me feeling drained (get it?) and the camera is now more useful as a paperweight. But hey, at least I'm getting a great workout in from all the throwing my phone against the wall! Can someone please pass the Advil? Oh, and by th

## Evaluation

**Evaluating Summarization Capability**

There are two methods to summarize input text:
- Abstractive (output: a gist of the input).
- Extractive (output: a selection of key sentences from the input).

The objective in abstractive summarization is to generate a clear summary of the input text, while that of extractive summarization is to generate a selection of appropriate sentences that summarize the input text.

In order to evaluate model predictions, we compare the model predictions with the ground truth on a sample of human-annotated gold examples. However, given the subjective nature of model predictions, we need new metrics to evaluate summarization outputs: ROUGE Score and BERT Score.

Apart from these automated metrics, another method used to judge the quality of a summary is to use another LLM to assign a quality rating to the summary. Using an LLM to avaluate another LLM offers further flexibility in evaluation. For example, we could also specify specific attributes of an ideal summary (for e.g., conciseness, clarity of exposition).

### Recall-Oriented Understudy for Gisting Evaluation (ROUGE)

ROUGE takes a exact match approach to compare the prediction from the model with the human reference summary by relying on matches of n-grams between the two.

The [$\text{ROUGE}_{N}$](https://huggingface.co/spaces/evaluate-metric/rouge) score is computed using the ratio of the number of n-gram matches to the total number of n-grams in the human generated summary. However, we still have to make a choice whether unigram, bigram, or any other n-gram should be used.

To solve this conundrum, a common variant of ROUGE that is used to generate a comparison metric is $\text{ROUGE}_{\text{L}}$, where we first compute the recall and precision of the longest common subsequence and then compute the harmonic mean of these values (punctuation and case of the word are disregarded).

In [1]:
ai_generated_summary = "Alice and Ben boarded a train to Mexico."
human_generated_summary = "Alice and Ben boarded their train to Mexico for vacation."

The length of the largest common subsequence (LCS) between the two summaries is 5. The number of unigrams in the AI-generated summary is 10 and the number of unigrams in the human generated summary is 17.

We define the recall of the LCS as 5/17 and the precision of the LCS as 5/10 (notice the parallel with the precision and recall measures used to evaluate classification tasks).

From these measures, we can compute $\text{ROUGE}_{\text{L}}$ as the F1 score associated with the precision and recall like so:

In [2]:
# ROUGE-L
r_lcs, p_lcs = 7/10, 7/8
print('ROUGE-L:', (2 * r_lcs * p_lcs)/(r_lcs + p_lcs))

ROUGE-L: 0.7777777777777777


One important limitation of ROUGE is that it accounts for exact matches. This means that a summary that uses semantically close words would receive a poor score despite capturing the intent of the human summary. Hence, ROUGE is usually used for extractive summarization.

ROUGE values close to 1 indicate that the AI-generated text is close to the text generated by a human.

### BERTScore

BERTScore is ideal in situations where abstractive summarization is the objective (as is in this case). To illustrate the computation of the BERTScore, consider the following two summary outputs (one from a generative AI model and another from a human).

In [3]:
ai_generated_summary = "Major issues, malfunctioning camera."
human_generated_summary = "Severely disappointed, constant problems."

- Look at the two summaries presented above. Though the choice of words is not exactly the same, both are close in intent.
- In order to capture intent, we use specific models that encode the semantic meaning of words used in the models in a mathematical space where we can measure the distance between the words used.
- Since distances can be computed, if two words are close to each other in this mathematical space (i.e., less distance), we can infer that these two words are close in meaning.
- Models that encode this mapping, that is, models that associate words with a list of numbers (called *vectors*) that define positions of the words in a mathematical space are referred to as [*embedding models*](https://projector.tensorflow.org/).
- Embedding models are precursors to language models and are a crucial component of how we represent the semantic meaning of words used in text.

BERTScore uses one such pre-trained embedding model (i.e., Bi-directional Encoder Representation from Transformers - BERT) to:
- map individual words in sentences (in both the AI summary and the human summary) to vectors.
- compute pairwise similarity between all possible pairs of words using these vectors.

Once pairwise similarities are estimated, we use these similarities to compute precision and recall for each word.

- The candidate text (i.e., the AI-generated summary) and the reference text (i.e., the human generated summary) are tokenized and assigned a numeric representation.
- In this representation, tokens that are correlated with each other lie close to each other (in the space defined by the numeric representation).
- All pairwise correlations between the tokens of the candidate and the reference are then collected in a table.
- For precision ($P$), we average the maximum correlation scores for each token in the *candidate*.
- For recall ($R$), we average the maximum correlation scores for each token in the *reference*.
- The F1 score is estimated as: $(2 \times P \times R)/(P+ R)$.
- We report the F1 score as the BERTScore. As with ROUGE, BERTScores close to 1 are considered ideal (i.e., the AI-generated text is close to one that is produced by a human).

In [116]:
!pip install evaluate bert_score

Collecting bert_score
  Downloading bert_score-0.3.13-py3-none-any.whl.metadata (15 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.0.0->bert_score)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.0.0->bert_score)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.0.0->bert_score)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.0.0->bert_score)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.0.0->bert_score)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=1.0.

In [4]:
import evaluate
bert_scorer = evaluate.load("bertscore")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [5]:
bert_scorer.compute(
    predictions=[ai_generated_summary],
    references=[human_generated_summary],
    lang="en",
    rescale_with_baseline=True
)

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/482 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


{'precision': [0.36399322748184204],
 'recall': [0.17047250270843506],
 'f1': [0.2666224539279938],
 'hashcode': 'roberta-large_L17_no-idf_version=0.3.12(hug_trans=4.52.4)-rescaled'}