# **Introduction**

## **Objective**
This notebook explores how different transformer architectures—encoder-only, decoder-only, and encoder-decoder—are applied to various NLP tasks. Through hands-on examples, we will compare their performance and limitations to better understand their strengths and use cases.

## **Overview of Transformer Architectures**

- **Encoder-Only Models (e.g., BERT)**:
  - Specialized for understanding and extracting meaning from input text.
  - Commonly used for tasks like text classification, sentiment analysis, and feature extraction.

- **Decoder-Only Models (e.g., GPT-2)**:
  - Designed for generating coherent sequences of text.
  - Best suited for tasks like text generation and auto-completion.

- **Encoder-Decoder Models (e.g., T5, BART)**:
  - Built for sequence-to-sequence tasks like summarization and translation.
  - They first encode the input text into meaningful representations, then decode it into the desired output.

Transformers are flexible, but no single architecture is optimal for all tasks.

| **Architecture**    | **Examples**            | **Main Applications**                   |
|----------------------|-------------------------|-----------------------------------------|
| Encoder-Only         | BERT, RoBERTa          | Sentiment analysis, text classification |
| Decoder-Only         | GPT-2, GPT-4o, Llama3.x           | Text generation, creative writing       |
| Encoder-Decoder      | T5, BART        | Translation, summarization |


# **Task 1: Sentiment Analysis with a pre-trained version of BERT**

In [1]:
# Import the required libraries
from transformers import pipeline

# Step 1: Load the pre-trained sentiment analysis pipeline
sentiment_analyzer = pipeline("text-classification", model="cardiffnlp/twitter-roberta-base-sentiment")

# Step 2: Define input texts for analysis
print("Defining input texts...")
texts = [
    "I love using this library! It's so intuitive and powerful.",
    "This is the worst app I've ever used. Completely useless.",
    "It's okay, does the job but nothing extraordinary.",
    "Absolutely fantastic! Highly recommended to everyone.",
    "Terrible experience, would never use it again."
]
print(f"Input Texts: {texts}\n")

# Step 3: Perform sentiment analysis
print("Performing sentiment analysis...")
results = sentiment_analyzer(texts)

# Step 4: Map labels to human-readable sentiments
label_mapping = {
    "LABEL_0": "negative",
    "LABEL_1": "neutral",
    "LABEL_2": "positive"
}

# Step 5: Display results with human-readable labels
print("\n--- Sentiment Analysis Results ---\n")
for text, result in zip(texts, results):
    sentiment = label_mapping.get(result['label'], "unknown")  # Map label to sentiment
    score = result['score']
    print(f"Text: {text}")
    print(f"  Sentiment: {sentiment} | Confidence: {score:.2f}\n")


  from .autonotebook import tqdm as notebook_tqdm
Device set to use mps:0


Defining input texts...
Input Texts: ["I love using this library! It's so intuitive and powerful.", "This is the worst app I've ever used. Completely useless.", "It's okay, does the job but nothing extraordinary.", 'Absolutely fantastic! Highly recommended to everyone.', 'Terrible experience, would never use it again.']

Performing sentiment analysis...

--- Sentiment Analysis Results ---

Text: I love using this library! It's so intuitive and powerful.
  Sentiment: positive | Confidence: 0.99

Text: This is the worst app I've ever used. Completely useless.
  Sentiment: negative | Confidence: 0.98

Text: It's okay, does the job but nothing extraordinary.
  Sentiment: neutral | Confidence: 0.58

Text: Absolutely fantastic! Highly recommended to everyone.
  Sentiment: positive | Confidence: 0.98

Text: Terrible experience, would never use it again.
  Sentiment: negative | Confidence: 0.98



# **Task 2: Text Generation with GPT-2**

In [2]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load the pre-trained GPT-2 model and tokenizer
model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# Input text (prompt for generation)
input_text = "Please translate 'Artificial Intelligence' into German."

# Encode the input text and convert it to tensor format
inputs = tokenizer.encode(input_text, return_tensors="pt")

# Generate the output text based on the input
outputs = model.generate(inputs, max_length=160, num_return_sequences=1, no_repeat_ngram_size=2, temperature=0.7)

# Decode the generated token IDs back to human-readable text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Print the generated text
print(f"Generated Text: {generated_text}")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Generated Text: Please translate 'Artificial Intelligence' into German.

The German language is a very important part of the world. It is the language of many cultures. The German people are very good at it. They are the best at reading and writing. In fact, they are so good that they can write in English. But they don't understand English very well. So they write it in German, and then they translate it into English, which is very difficult. And then, of course, the English language has a lot of problems. For example, it is not very easy to understand. You have to learn how to read. There are many problems with it, but it's very hard to translate. I think that's why the German is so important. Because it has so many languages.


# Pipeline of Tasks with different Transformer Architectures

### Pipeline Overview

This pipeline demonstrates how different Transformer models can be combined to accomplish a series of NLP tasks in a sequential manner. Each step showcases a unique capability of Transformer architectures, highlighting their strengths and versatility.

#### Key Steps:
1. **Fill-Mask Task**: Use a masked language model (BERT) to predict the missing word in a sentence.
2. **Text Generation**: Generate a continuation of the unmasked sentence using a decoder-only model (GPT-2).
3. **Sentiment Analysis**: Analyze the sentiment of the generated text using a text classification model.
4. **Summarization**: Summarize the generated text using an encoder-decoder model (BART).
5. **Question Generation**: Generate a meaningful question based on the summary using an external language model.
6. **Question Answering**: Answer the generated question using a pre-trained question-answering model (BERT fine-tuned on SQuAD).

This pipeline illustrates how Transformer models complement each other to solve complex tasks, emphasizing the interplay between different architectures like encoder-only, decoder-only, and encoder-decoder models.


### Step 0: Setup and Initialization

Before diving into the pipeline tasks, we need to set up the environment and initialize the required libraries and settings. This includes:

1. **Importing Necessary Libraries**:
   - `subprocess`: To interact with external tools (e.g., Ollama for advanced question generation).
   - `os`: For managing environment settings.
   - `transformers.pipeline`: The Hugging Face pipeline API to access pre-trained models for various NLP tasks.

2. **Disabling Tokenizer Parallelism**:
   - To avoid potential warnings or conflicts during execution, we explicitly disable tokenizer parallelism.

In [4]:
# Import required libraries and disable tokenizer parallelism
import subprocess
from transformers import pipeline
import os

os.environ["TOKENIZERS_PARALLELISM"] = "false"


### Step 1: Fill-Mask Task
The fill-mask task uses a masked language model (BERT) to predict the missing word in a sentence.
- **Input**: A sentence with a masked word (`[MASK]`).
- **Output**: The sentence with the mask replaced by the most likely word.


In [3]:
print("Step 1: Performing fill-mask task...")
mask_filler = pipeline("fill-mask", model="bert-base-uncased")
masked_sentence = "Renewable energy reduces [MASK]."
fill_results = mask_filler(masked_sentence, top_k=1)
unmasked_sentence = fill_results[0]['sequence']
print(f"Input Sentence: {masked_sentence}")
print(f"Unmasked Sentence: {unmasked_sentence}\n")


Step 1: Performing fill-mask task...


BertForMaskedLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another archite

Input Sentence: Renewable energy reduces [MASK].
Unmasked Sentence: renewable energy reduces costs.



### Step 2: Text Generation
Using a decoder-only model (GPT-2) to generate text based on the unmasked sentence.
- **Input**: The output of the fill-mask task.
- **Output**: A generated continuation of the sentence.


In [5]:
print("Step 2: Generating text with GPT-2...")
text_generator = pipeline("text-generation", model="gpt2")
generated_texts = text_generator(unmasked_sentence, max_length=100, num_return_sequences=1, truncation=True)
generated_text = generated_texts[0]['generated_text']
print(f"Generated Text: {generated_text}\n")


Step 2: Generating text with GPT-2...


Device set to use mps:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated Text: renewable energy reduces costs. So far, however, efforts have been in vain to get nuclear power plants to meet the energy needs of the next generation.

The U.S. Energy Department, after years of neglect, began to act and move forward with a policy aimed at reducing greenhouse gas emissions associated with coal-fired power plants on Wednesday.

The policy, known as Energy Policy Reform Act of 2015 (EPRA), lays out clear rules for government and utilities to establish a



### Step 3: Sentiment Analysis

The sentiment analysis task evaluates the emotional tone of the generated text. This step uses a text classification model to label the text as **positive**, **negative**, or **neutral**.

- **Input**: The generated text from Step 2.
- **Output**: Sentiment label and confidence score.

In [7]:
print("Step 3: Analyzing sentiment...")
sentiment_analyzer = pipeline("sentiment-analysis", model="distilbert/distilbert-base-uncased-finetuned-sst-2-english")
sentiment_results = sentiment_analyzer(generated_text)
sentiment = sentiment_results[0]['label']
confidence = sentiment_results[0]['score']
print(f"Sentiment: {sentiment} | Confidence: {confidence:.2f}\n")

Step 3: Analyzing sentiment...


Device set to use mps:0


Sentiment: POSITIVE | Confidence: 0.93



### Step 4: Summarization

In this step, the generated text is summarized using an encoder-decoder model. Summarization compresses the information while retaining the key ideas.

- **Input**: The generated text from Step 2.
- **Output**: A concise summary of the input text.

In [8]:
print("Step 4: Summarizing the text...")
summarizer = pipeline("summarization", model="facebook/bart-large-cnn", device=-1)  # Force CPU

try:
    summary = summarizer(generated_text, max_length=100, min_length=10, do_sample=False)
    summary_text = summary[0]['summary_text']
    print(f"Summary: {summary_text}")
except Exception as e:
    print(f"Summarization failed: {e}")
    summary_text = "No summary available."

Step 4: Summarizing the text...


Device set to use cpu


Summary: U.S. Energy Department moves forward with policy aimed at reducing greenhouse gas emissions associated with coal-fired power plants. Energy Policy Reform Act of 2015 lays out clear rules for government and utilities to establish a renewable energy program.


### Step 5: Question Generation

Based on the summary, a meaningful question is generated. This task leverages an external language model (like LLaMA) to create a question that aligns with the context of the summary.

- **Input**: Summary text from Step 4.
- **Output**: A generated question based on the summary.

In [9]:
def generate_question_with_instruction(summary_text):
    print("Generating a question based on the summary...")
    instruction = f"Generate a simple question based on the following summary:\n\n{summary_text}\n\nQuestion:"
    try:
        # Using a subprocess to call Ollama CLI with specific instruction
        command = ["ollama", "run", "llama3.2"]
        result = subprocess.run(command, input=instruction, text=True, capture_output=True, check=True)
        question = result.stdout.strip()
        print(f"Generated Question: {question}")
        return question
    except subprocess.CalledProcessError as e:
        print(f"Error using Ollama: {e}")
        return "What are the key points mentioned in the summary?"  # Fallback question

question = generate_question_with_instruction(summary_text)

Generating a question based on the summary...
Generated Question: What is the main goal of the U.S. Energy Department's new policy regarding coal-fired power plants?


### Step 6: Question Answering

In the final step, a question-answering model (fine-tuned on SQuAD) is used to answer the generated question using the summary as context.

- **Input**:
  - Generated question from Step 5.
  - Summary text from Step 4 as context.
- **Output**: Answer to the question.

In [10]:
print("Step 6: Question answering task...")
qa_pipeline = pipeline("question-answering", model="google-bert/bert-large-uncased-whole-word-masking-finetuned-squad")

context = summary_text  # Using the summary as context for QA

qa_result = qa_pipeline(question=question, context=context)
answer = qa_result['answer']
print(f"Answer: {answer}\n")

Step 6: Question answering task...


Some weights of the model checkpoint at google-bert/bert-large-uncased-whole-word-masking-finetuned-squad were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use mps:0


Answer: reducing greenhouse gas emissions



### Conclusion:
This pipeline demonstrates the synergy between different Transformer architectures:
- **Encoder-only models** excel at understanding and classifying text.
- **Decoder-only models** generate creative and coherent sequences.
- **Encoder-decoder models** specialize in sequence-to-sequence tasks like summarization.

By combining these models, we unlock the full potential of Transformers for complex, multi-step NLP workflows.