<a href="https://colab.research.google.com/github/mukeshrock7897/GenerativeAI/blob/main/1_Transformers_Beginner_Level.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Introduction to Transformers**

**What are Transformers?**
* Transformers are a type of neural network architecture designed for handling sequential data, particularly natural language processing (NLP) tasks. Unlike traditional recurrent neural networks (RNNs), transformers process input data in parallel, allowing them to achieve better performance and scalability.

**History and Evolution**
* Transformers were introduced by Vaswani et al. in their 2017 paper "Attention is All You Need." This architecture has since revolutionized NLP by enabling the development of powerful models like BERT, GPT, and T5.

**Key Concepts and Terminology**
* **Attention Mechanism:** A way for the model to focus on specific parts of the input sequence.

* **Self-Attention:** A mechanism where each element of the input attends to all other elements, allowing the model to capture dependencies regardless of their distance.

* **Encoder-Decoder:** The original transformer architecture consisting of an encoder that processes the input and a decoder that generates the output.
* **Multi-Head Attention:** A mechanism that allows the model to focus on different parts of the sequence simultaneously.

**Self-Attention Mechanism**

* Attention Mechanism Basics
The attention mechanism allows the model to weigh the importance of different parts of the input sequence. It computes a weighted sum of the input values, where the weights are determined by a similarity measure between the query and the keys.

**Example of Attention Mechanism:**

In [1]:
import torch
import torch.nn.functional as F

# Define the query, key, and value matrices
query = torch.tensor([[1.0, 0.0, 1.0]])
key = torch.tensor([[1.0, 0.0, 1.0], [0.0, 1.0, 0.0], [1.0, 1.0, 1.0]])
value = torch.tensor([[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]])

# Compute the attention scores
scores = torch.matmul(query, key.T)
attention_weights = F.softmax(scores, dim=-1)

# Compute the weighted sum of the values
context_vector = torch.matmul(attention_weights, value)
print("Context Vector:", context_vector)


Context Vector: tensor([[0.3000, 0.4000]])


**Self-Attention and Its Importance**
* Self-attention allows the model to relate different positions of the input sequence to compute a representation of the sequence. It is crucial for capturing long-range dependencies and parallel processing.

**Transformer Architecture**

**Encoder-Decoder Structure**
* The transformer architecture consists of an encoder and a decoder:

     * **Encoder:** Processes the input sequence and generates a context vector.
     * **Decoder:** Generates the output sequence based on the context vector from the encoder.

* Each encoder and decoder layer contains multi-head self-attention and feed-forward neural networks.

**Multi-Head Attention**
* Multi-head attention allows the model to focus on different parts of the sequence simultaneously by splitting the queries, keys, and values into multiple heads and processing them independently.

**Example of Multi-Head Attention:**

In [2]:
import torch
import torch.nn as nn

# Define the multi-head attention layer
multihead_attn = nn.MultiheadAttention(embed_dim=64, num_heads=8)

# Create dummy input
query = torch.randn(10, 32, 64)  # (sequence_length, batch_size, embed_dim)
key = torch.randn(10, 32, 64)
value = torch.randn(10, 32, 64)

# Apply multi-head attention
attn_output, attn_output_weights = multihead_attn(query, key, value)
print("Attention Output Shape:", attn_output.shape)


Attention Output Shape: torch.Size([10, 32, 64])


**Positional Encoding**
* Since transformers do not have a built-in notion of the order of the sequence, positional encoding is added to the input embeddings to provide information about the position of each token.

**Example of Positional Encoding:**



In [3]:
import numpy as np

def positional_encoding(seq_length, embed_dim):
    pos_enc = np.zeros((seq_length, embed_dim))
    for pos in range(seq_length):
        for i in range(0, embed_dim, 2):
            pos_enc[pos, i] = np.sin(pos / (10000 ** (i / embed_dim)))
            pos_enc[pos, i + 1] = np.cos(pos / (10000 ** ((i + 1) / embed_dim)))
    return pos_enc

# Create positional encoding for a sequence of length 10 and embedding dimension 64
pos_enc = positional_encoding(10, 64)
print("Positional Encoding Shape:", pos_enc.shape)


Positional Encoding Shape: (10, 64)


**Applications of Transformers in Generative AI**

**Text Generation**
* Transformers can generate coherent and contextually relevant text by predicting the next word in a sequence based on the previous words.

**Example of Text Generation:**

In [4]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load pre-trained model and tokenizer
model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# Encode input text
input_text = "Once upon a time"
input_ids = tokenizer.encode(input_text, return_tensors="pt")

# Generate text
output = model.generate(input_ids, max_length=50, num_return_sequences=1)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print("Generated Text:", generated_text)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated Text: Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a


**Image Generation**
* Transformers can be used for generating images by treating the image as a sequence of pixels or patches.

**Other its Applications**
* Transformers are also used in various other generative tasks, such as music generation, code generation, and more.

**Basic Implementation**

* **Installing Necessary Packages**
    * To work with transformers, you need to install the transformers library from Hugging Face.

**Simple Example Using Hugging Face Transformers Library**
* Here is a basic example of using a transformer model for text generation.

**Code:**

In [5]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load pre-trained model and tokenizer
model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

# Encode input text
input_text = "The quick brown fox"
input_ids = tokenizer.encode(input_text, return_tensors="pt")

# Generate text
output = model.generate(input_ids, max_length=50, num_return_sequences=1)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print("Generated Text:", generated_text)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated Text: The quick brown foxes are a great way to get a little bit of a kick out of your dog.

The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown fox
