<center><font size=6>Hands-on: Text Generation with Transformers</center></font>

# **Installing and Importing Necessary Libraries**

In [None]:
# installing the transformers library
!pip install transformers -q

In [None]:
# to load transformer models
from transformers import T5Tokenizer, T5ForConditionalGeneration

# to load a Deep Learning library
import torch

# **Generating Text with Transformers**

## **Step 1:** Load the pre-trained model and tokenizer


First, we load the pre-trained model and its tokenizer.

We'll be using the **Google FLAN-T5** model here.

💡 **FLAN-T5, developed by Google Research**, is a **"Fine-tuned LAnguage Net" (FLAN) with "Text-To-Text Transfer Transformer" (T-5)** architecture.

📊 **FLAN-T5 excels in various NLP tasks**, including translation and question answering, and it's known for its speed and efficiency.

📋 **FLAN-T5 comes in different sizes** - small, base, large, XL, and XXL - offering customization options.

🛠️ Potential use cases include text generation, classification, summarization, sentiment analysis, question-answering, translation, and chatbots.

In [None]:
# defining the model name
model_name = "google/flan-t5-large"

# The 'from_pretrained' method allows us to load a pre-trained tokenizer from Hugging Face's model hub.
tokenizer = T5Tokenizer.from_pretrained(model_name)

# Load the pre-trained model for text generation
model = T5ForConditionalGeneration.from_pretrained(model_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


- **`T5Tokenizer`**: Converts text into numbers that the model can understand.
- **`T5ForConditionalGeneration`**: Loads the pretrained model for generating text.

## **Step 2:** Define the input

Next, we define the input for the model.

In [None]:
input_text = "List the steps to prepare lasagna."

## **Step 3:** Tokenize the input

Next, we tokenize the input to make it ready to be passed to the model.

In [None]:
# tokenizing the input
input_ids = tokenizer.encode(input_text, return_tensors='pt')

- **`tokenizer.encode()`**: It converts human-readable text into numeric tokens that the model understands. The tokens represent pieces of words or entire words — it's how the model processes language.
    - **`input_question`**: This is the text input for the model.
    - **`return_tensors='pt'`**: This specifies that the output is to be formatted as a PyTorch tensor (`'pt'` stands for PyTorch).

Let's check the output of tokenization.

In [None]:
print("Token IDs:", input_ids)

Token IDs: tensor([[ 6792,     8,  2245,    12,  2967, 12031, 11260,     5,     1]])


- As we can see, the input text is broken down into a list of tokens, and the IDs of each token are displayed.

## **Step 4:** Pass tokenized input to the model

Next, we pass the tokenized input to the model for generating outputs.

In [None]:
# generating model output
output_tokens = model.generate(input_ids, max_length=256, num_return_sequences=1)

- **`input_ids`**
  - Tokenized input text

- **`max_length=256`**
  - Sets the maximum length (in tokens) of the generated output
  - This number includes ***both*** the input and output tokens
  - Range of the values depends on the required output size and the model's preset limits

- **`num_return_sequences=1`**
  - Tells the model how many different outputs to generate for the same input.
  - Can be set to more than 1 if multiple variations of the output are required

Let's check the output from the model.

In [None]:
print("Output Tokens:", output_tokens)

Output Tokens: tensor([[    0,   304,  2967, 12031, 11260,     6,    25,   166,   174,    12,
          2967,     8, 12031, 11260,  3837,     5,    86,     3,     9,   508,
         22869,     6,  1678,     8,  7994,  1043,   147,  2768,  1678,     5,
          2334,     8, 12031, 11260, 22200,    11,  3989,   552,    79,    33,
           491,   177,    17,    15,     6,    81,   305,   676,     5,  2334,
             8,  3285,    11,  3989,   552,    34,    19,     3, 19293,     6,
            81,   204,   676,     5,  2334,     8,  3837,    11,  3989,   552,
            34,    19,  4126,  4632,     6,    81,   204,   676,     5,  2334,
             8,   260,  2687,   152,  3285,    11,  3989,   552,    34,    19,
             3, 19293,     6,    81,   204,   676,     5,  2334,     8,   260,
          8887,    11,  3989,   552,    34,    19,     3,   210,   173,  1054,
             6,    81,   204,   676,     5,  2334,     8,     3,    52,    23,
         10405,     9,    11,  3989, 

- As we can see, the model output is a list of token IDs.

## **Step 5:** Decode the generated output

Finally, we decode the model's output token IDs to obtain a human-readable text.

In [None]:
# decode the generated output from token IDs to human-readable text
# 'skip_special_tokens=True' ensures that special tokens are omitted from the decoded output
generated_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True)

In [None]:
print("Generated Text: ", generated_text)

Generated Text:  To prepare lasagna, you first need to prepare the lasagna sauce. In a large skillet, heat the olive oil over medium heat. Add the lasagna noodles and cook until they are al dente, about 5 minutes. Add the cheese and cook until it is melted, about 2 minutes. Add the sauce and cook until it is thickened, about 2 minutes. Add the parmesan cheese and cook until it is melted, about 2 minutes. Add the parsley and cook until it is wilted, about 2 minutes. Add the ricotta and cook until it is melted, about 2 minutes. Add the ricotta and cook until it is melted, about 2 minutes. Add the parmesan cheese and cook until it is melted, about 2 minutes. Add the parsley and cook until it is melted, about 2 minutes. Add the parsley and cook until it is melted, about 2 minutes. Add the parsley and cook until it is melted, about 2 minutes. Add the parsley and cook until it is melted, about 2 minutes. Add the parsley and cook until it is melted, about 2 minutes.


- As we can see, the model has listed out the steps involved in preparing the specified dish.

**Note**: Given the probabilistic nature of transformers, the output might vary with multiple executions.

## **Stitching Everything Together**

To make the process reusable and efficient, we encapsulate the entire sequence of operations above into a function. This function will

1. take an input text
2. perform necessary processing, and
3. output a human-readable response

In [None]:
def generate_response(input, max_length=256, num_return_sequences=1):
    """
    Generates a text response to a given question using a language model.

    Parameters:
    - question (str): The input text for the model.
    - max_length (int): Maximum length of the model's generated response, including the input text.
    - num_return_sequences (int): Number of different responses to generate for the same input.

    Returns:
    - str: The generated text response.
    """
    # checking for GPU availability
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # Encode the input question into token IDs
    input_ids = tokenizer(input, return_tensors='pt').input_ids.to(device)

    # Generate the output tokens from the model
    output_tokens = model.generate(input_ids, max_length=max_length, num_return_sequences=num_return_sequences)

    # Decode the generated tokens into human-readable text
    responses = []

    # Iterate over all responses generated, decode them one at a time, and then add them to the response list
    for generated_sequence in output_tokens:
        generated_text = tokenizer.decode(generated_sequence, skip_special_tokens=True)
        responses.append(generated_text)

    return responses

Let's test our function.

In [None]:
input_text = "List the steps to prepare lasagna."
response = generate_response(input_text, max_length=300)

print(f"Question: {input_text}")
print(f"Response: {response[0]}")

Question: List the steps to prepare lasagna.
Response: To prepare lasagna, you first need to prepare the lasagna sauce. In a large skillet, heat the olive oil over medium heat. Add the lasagna noodles and cook until they are al dente, about 5 minutes. Add the cheese and cook until it is melted, about 2 minutes. Add the sauce and cook until it is thickened, about 2 minutes. Add the parmesan cheese and cook until it is melted, about 2 minutes. Add the parsley and cook until it is wilted, about 2 minutes. Add the ricotta and cook until it is melted, about 2 minutes. Add the ricotta and cook until it is melted, about 2 minutes. Add the parmesan cheese and cook until it is melted, about 2 minutes. Add the parsley and cook until it is melted, about 2 minutes. Add the parsley and cook until it is melted, about 2 minutes. Add the parsley and cook until it is melted, about 2 minutes. Add the parsley and cook until it is melted, about 2 minutes. Add the parsley and cook until it is melted, about

# **More Examples**

**Let's take a look at a few more examples.**

In [None]:
input_text = "What is the capital of France?"
response = generate_response(input_text)

print(f"Question: {input_text}")
print(f"Response: {response[0]}")

Question: What is the capital of France?
Response: paris


- The model provided a crisp and correct answer.

In [None]:
input_text = "Provide a brief overview of NLP."
response = generate_response(input_text)

print(f"Question: {input_text}")
print(f"Response: {response[0]}")

Question: Provide a brief overview of NLP.
Response: NLP is a branch of computer science that uses machine learning to improve the performance of human decision-making processes.


- Again, the model output is crisp and to the point.

In [None]:
input_text = "Provide a brief overview of attention."
response = generate_response(input_text)

print(f"Question: {input_text}")
print(f"Response: {response[0]}")

Question: Provide a brief overview of attention.
Response: Attention is the ability of a person to focus attention on a particular stimulus.


- The model's output is factually correct.
- However, the generated text is not related to the attention mechanism in transformers.
- This is because ***the input didn't specify*** that the output should be in the context of transformers.
    - The model can generate text, but it ***doesn't know what we're expecting from it implicitly***.
- We need to be ***clear and specific with the input***.

In [None]:
input_text = "Provide a brief overview of attention in the context of the transformer neural network architecture."
response = generate_response(input_text)

print(f"Question: {input_text}")
print(f"Response: {response[0]}")

Question: Provide a brief overview of attention in the context of the transformer neural network architecture.
Response: Attention is the ability to focus attention on a particular stimulus or object. The Transformer architecture uses a transformer architecture to achieve this goal.


- While the output provides a surface-level overview of the attention mechanism in transformers to some extent, it also contains parts that are unclear/incorrect.
- This likely occurred because the ***model wasn't trained on data that the current input is based on***.
- We would need to ***provide more context to the model***.

In [None]:
input_text = """
The attention mechanism in transformers is a technique that enables models to assign weights to the importance of different tokens in a sequence when embedding or generating text.
Instead of processing the input sequentially one token at a time, attention allows the model to consider all tokens simultaneously by focusing on relevant parts of the input.
It computes attention scores based on the relationships between tokens, which helps identify contextual dependencies. This results in an improved understanding of language context and meaning.
Self-attention assesses connections within the input sequence and multi-head attention allows the model to capture diverse relationships by using multiple attention heads simultaneously.

Based on the above information, provide a brief overview of attention in the context of the transformer neural network architecture.
"""
response = generate_response(input_text)

print(f"Question: {input_text}")
print(f"Response: {response[0]}")

Question: 
The attention mechanism in transformers is a technique that enables models to assign weights to the importance of different tokens in a sequence when embedding or generating text.
Instead of processing the input sequentially one token at a time, attention allows the model to consider all tokens simultaneously by focusing on relevant parts of the input.
It computes attention scores based on the relationships between tokens, which helps identify contextual dependencies. This results in an improved understanding of language context and meaning.
Self-attention assesses connections within the input sequence and multi-head attention allows the model to capture diverse relationships by using multiple attention heads simultaneously.

Based on the above information, provide a brief overview of attention in the context of the transformer neural network architecture.

Response: The attention mechanism in transformers is a technique that enables models to assign weights to the importanc

- Given more context, the model was able to generate a more accurate and desired output.

# **Conclusion**

We learned how to utilize transformers to generate text.
- An input text is first tokenized and then passed to the model.
- The model's generated output is then decoded to obtain a human-readable output.
- Model parameters like `max_length` allow us to specify the size of the generated output

**Remember!**

While Generative AI models can help in generating text or asking questions, it's important to ask the question in the right manner and provide the right context for the model to build upon.

In upcoming weeks, we'll learn more about:

1. How to engineer the input to a language model to obtain outputs aligned with business goals?
2. How to provide relevant context to a language model via documents to generate context-relevant output?

<font size=6 color='navyblue'>Power Ahead!</font>
___