# Random Personas - Prompt Engineering

In this module, you'll gain insights into regulating the temperature of the model's generation process, empowering you to influence the level of randomness in its responses. By harnessing temperature, you'll acquire the ability to craft a diverse array of AI personalities.

## Learning Objectives

Upon completing this module, you'll achieve the following objectives:
- Understand how random **sampling** contributes to non-deterministic responses in Language and Logic Models (LLMs).
- Learn to regulate the level of randomness in responses by adjusting the **temperature** parameter.

# <FONT COLOR="purple">Verify that the runtime environment is GPU in Colab!</FONT>

## Install Dependencie(s)

In [None]:
# The 'device_map' paramter requires Accelerate package.
# Restart workspace after the install!
!pip install accelerate flash_attn

## Create Microsoft [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) Pipeline

In [None]:
import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
from transformers import TextStreamer

# Microsoft Phi-3-mini-4k-instruct model
model = "microsoft/Phi-3-mini-4k-instruct"

# The tokenizer is responsible for converting the text into a format understandable by the model.
tokenizer = AutoTokenizer.from_pretrained(model)

# Load model
model = AutoModelForCausalLM.from_pretrained(model, 
                                             torch_dtype=torch.float16, 
                                             device_map="auto",
                                             trust_remote_code=True,
                                             attn_implementation="eager")

# The task of the streamer object is to ensure that the model's response is continuous. This reduces the waiting time.
streamer = TextStreamer(tokenizer, skip_prompt=True)

## Generate Functions

In this notebook, we will use the following `generate` function to support our interaction with the LLM.

```python
# Microsoft Phi-3-mini-4k-instruct default prompt template

<|system|>
{system}<|end|>
<|user|>
{question}<|end|>
<|assistant|> 
{response}<|end|>
<|user|>
{question}<|end|>
<|assistant|> 
```

In [None]:
def generate(question, system=None, history=[], model=model, max_new_tokens = 512, do_sample=False, temperature=1):
    """
    This function facilitates the generation of text responses leveraging a designated large language model (LLM) pipeline.
    It accepts a prompt as input and transmits it to the specified LLM pipeline to produce a textual output.
    The function offers comprehensive control over the generative process through the inclusion of configurable parameters and keyword arguments.

    - question (str): This parameter holds the user question or any other instruction.
    - system (str): This parameter holds contextual information to be provided to the language model for all conversations.
    - history (array, opitonal) - This parameter stores the chat history. Each tuple within the list comprises a question and the corresponding assistant response.
    - model (object): This object contains the model.
    - max_new_tokens (int, optional) — The maximum numbers of tokens to generate, ignoring the number of tokens in the prompt.
    - do_sample (bool, optional, defaults to False) — Whether or not to use sampling ; use greedy decoding otherwise.
    - temperature (float, optional, defaults to 1.0) — The value used to modulate the next token probabilities.
    """

    if system is None:
        system = """This is a chat between a user and an artificial intelligence assistant.
        The assistant gives helpful, detailed, and polite answers to the user's questions based on the context.
        The assistant should also indicate when the answer cannot be found in the context."""

    prompt = f"<|system|>\n{system}<|end|>\n"

    # Add each example from the history to the prompt
    for prev_question, prev_response in history:
        prompt += f"<|user|>{prev_question}<|end|>\n<|assistant|>{prev_response}<|end|>\n"
    
    # Add the user_message prompt at the end
    prompt += f"<|user|>{question}<|end|>\n<|assistant|>"
    tokenized_prompt = tokenizer(prompt, return_tensors="pt").to(model.device)

    outputs = model.generate(input_ids=tokenized_prompt.input_ids,
                             max_new_tokens=max_new_tokens,
                             streamer=streamer,
                             temperature=1, 
                             do_sample=do_sample)

    # Return the decoded text from outputs
    return tokenizer.decode(outputs[0][tokenized_prompt.input_ids.shape[-1]:], skip_special_tokens=True).strip()

---

## Non-random Responses (Default)

You might have observed in prior modules that the responses from our Phi-3 model, provided we don't modify the prompt, tend to be deterministic. Let's delve into this explicitly.

In this exercise, we'll employ our model for another generative task: crafting fictional customer experiences with their Ferrari F40 sports car. To ensure the context is set appropriately, we'll establish the system context and prompt the model to recollect a memorable day with their cars.

In [None]:
system_context = """
You're a perpetually content customer who delights in sharing personal product experiences.
Your focus lies not in technical details, but in the emotions and joy derived from these experiences.
You eagerly anticipate others experiencing the same sense of euphoria and happiness as you.
Your intention isn't to promote, but to authentically convey heartfelt stories of joy and satisfaction.
"""

prompt = "Please reminisce about a memorable day spent with your Ferrari F40 sports car in 150 words or fewer."

In [None]:
_ = generate(prompt, system_context)

Let's generate a response a few more times using the identical prompt.

In [None]:
_ = generate(prompt, system_context)

In [None]:
_ = generate(prompt, system_context)

---

## Sampling and Temperature

Within the domain of language models, **sampling** constitutes the method by which a model generates text by selecting from the probability distribution of potential next words (technically tokens, although for simplicity, we can regard them as words). When we engage with language models without activating **sampling**, the model operates deterministically, consistently choosing the most probable next word during text generation. This default behavior proves valuable when seeking consistency and precision, yet it can prove restrictive when aiming for diverse and creative responses.

In the context of the transformers pipeline utilized to interact with our Phi-3 model, **sampling** remains deactivated by default. To activate **sampling**, we specify do_sample=True when invoking our generate function, thereby directing the model to select words based on a probability distribution. This allows for less probable words to be chosen, potentially resulting in more varied and engaging text.

Furthermore, once **sampling** is enabled, we have the option to input specific values for **temperature**, representing the degree of randomness in the response. **Temperature** values range between 0.0 and 1.0, with higher values indicating a greater degree of randomness.

---

## Exercise: Random Responses

TODO: Using the identical **system context** and prompt as previously mentioned, activate sampling `do_sample=True` and adjust the **temperature** to its maximum value `temperature=1.0`. Generate three distinct responses to ensure each one is unique.

If you encounter any challenges, refer to the solution provided below.

### Your Work Here

### Solution

In [None]:
_ = generate(prompt, system_context, do_sample=True, temperature=1)

In [None]:
_ = generate(prompt, system_context, do_sample=True, temperature=0.8)

In [None]:
_ = generate(prompt, system_context, do_sample=False, temperature=0.6)

### Creativity and Accuracy

In conclusion, it's important to note that while random generation, facilitated by adjusting the **temperature**, can be beneficial for producing unique or imaginative outputs, it may not align well with the precision and accuracy of responses. In situations where it's crucial for the model to generate precise and/or factually accurate responses, it's advisable to exercise caution and thoroughly review its outputs when increasing the **temperature** setting.

---

## Review

In this module, we covered the following key concepts:

- **Sampling:** This is a fundamental process in text generation where the language model chooses the next word (token) based on a probability distribution across the vocabulary.
- **Temperature:** Considered a hyperparameter, temperature governs the degree of randomness in the model's predictions during sampling. Elevating the temperature enhances diversity, resulting in a broader array of responses, while lowering it renders the model's output more predictable and conservative.

---

## Optional Advanced Exercises

Let's delve into the creation of interacting characters:

- **Make Interacting Characters:** Now that you've mastered generating statements from fictional personas, let's explore crafting a system that generates multiple distinct personalities and facilitates interactions among them.
- **Make Interacting Characters Play a Game:** Building upon the previous exercise, expand your system to include multiple characters working towards a shared objective defined by a "game" of your design. This could entail tasks such as persuading another character to utter a specific word, divulging the location of a hidden treasure, or even engaging in collaborative endeavors where characters must cooperate to achieve a common goal. For an added challenge, consider introducing more than two characters or even organizing them into teams, fostering interactions among multiple players.

---

## Restart the Kernel

In [None]:
from IPython import get_ipython

get_ipython().kernel.do_shutdown(restart=True)

## <FONT COLOR="red">The notebook is licensed under the Creative [Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)](https://creativecommons.org/). This means that you can freely copy, distribute, and modify the notebook by authors ([Balázs Harangi](https://inf.unideb.hu/dr-harangi-balazs), [András Hajdu](https://inf.unideb.hu/munkatars/4250), and [Róbert Lakatos](https://inf.unideb.hu/lakatos-robert-tanarseged)), but not for commercial purposes. Additionally, if you modify the notebook, you must cite them as the original creators and share the modified version under the same terms.
</FONT>