<a href="https://colab.research.google.com/github/mukeshrock7897/GenerativeAI/blob/main/2_LLM_Intermediate_Level.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Intermediate Level**

1. Transformer Architecture
    * Introduction to Transformers
    * Self-attention mechanism
    * Encoder-decoder architecture

2. Pre-training and Fine-tuning
    * Pre-training objectives (e.g., masked language modeling, causal language modeling)
    * Fine-tuning for specific tasks

3. Popular LLMs
    * GPT (Generative Pre-trained Transformer)
    * BERT (Bidirectional Encoder Representations from Transformers)
    * T5 (Text-To-Text Transfer Transformer)

4. Practical Implementation
    * Using Hugging Face Transformers library
    * Training and fine-tuning LLMs on custom datasets
    * Handling large datasets and model training

# **1. Transformer Architecture**

**Introduction to Transformers**

* Transformers are a type of deep learning model introduced in the paper "Attention is All You Need" by Vaswani et al. They revolutionized natural language processing (NLP) by effectively handling long-range dependencies in text and enabling parallelization of training.

**Self-attention mechanism**

* The self-attention mechanism allows each word in a sentence to attend to every other word, enabling the model to capture contextual relationships.

In [None]:
import torch
import torch.nn.functional as F

# Example: Self-attention mechanism
def self_attention(query, key, value):
    scores = torch.matmul(query, key.transpose(-2, -1)) / torch.sqrt(torch.tensor(key.size(-1), dtype=torch.float32))
    attention_weights = F.softmax(scores, dim=-1)
    output = torch.matmul(attention_weights, value)
    return output, attention_weights

# Dummy data
query = torch.randn(1, 5, 64)  # (batch_size, sequence_length, hidden_dim)
key = torch.randn(1, 5, 64)
value = torch.randn(1, 5, 64)

output, attention_weights = self_attention(query, key, value)
print("Output:", output)
print("Attention Weights:", attention_weights)


**Encoder-decoder architecture**

* The encoder-decoder architecture is a common design for sequence-to-sequence tasks such as translation. The encoder processes the input sequence, and the decoder generates the output sequence.

In [None]:
from transformers import BartTokenizer, BartForConditionalGeneration

# Load a pre-trained BART model (encoder-decoder architecture)
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large')

# Example: Translation
input_text = "Hello, how are you?"
inputs = tokenizer(input_text, return_tensors='pt')
output = model.generate(**inputs)
translated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print("Translated text:", translated_text)


# **2. Pre-training and Fine-tuning**

**Pre-training objectives**

* **Masked Language Modeling (MLM):** Predicting missing words in a sentence (used by BERT).

* **Causal Language Modeling (CLM):** Predicting the next word in a sequence (used by GPT).


**Fine-tuning for specific tasks**

* Fine-tuning involves taking a pre-trained model and training it further on a specific task with task-specific data.

In [None]:
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments

# Load a pre-trained BERT model and tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')

# Prepare data (dummy example)
train_texts = ["I love programming", "I hate bugs"]
train_labels = [1, 0]
train_encodings = tokenizer(train_texts, truncation=True, padding=True, return_tensors='pt')
train_dataset = torch.utils.data.TensorDataset(train_encodings['input_ids'], train_encodings['attention_mask'], torch.tensor(train_labels))

# Training arguments
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=4,
    warmup_steps=10,
    weight_decay=0.01,
    logging_dir='./logs'
)

# Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset
)

# Train the model
trainer.train()


# **3. Popular LLMs**

**GPT (Generative Pre-trained Transformer)**

* GPT is designed for text generation tasks. It uses a unidirectional transformer model that predicts the next word in a sequence.

**Example: Text generation with GPT**

In [None]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel

# Load a pre-trained GPT-2 model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Generate text
input_text = "Artificial intelligence is"
inputs = tokenizer(input_text, return_tensors='pt')
output = model.generate(**inputs, max_length=50)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print("Generated text:", generated_text)


**BERT (Bidirectional Encoder Representations from Transformers)**

* BERT is designed for tasks requiring understanding of both the left and right context. It uses masked language modeling for pre-training.

**Example: Text classification with BERT**

In [None]:
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments

# Load a pre-trained BERT model and tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')

# Prepare data (dummy example)
train_texts = ["I love programming", "I hate bugs"]
train_labels = [1, 0]
train_encodings = tokenizer(train_texts, truncation=True, padding=True, return_tensors='pt')
train_dataset = torch.utils.data.TensorDataset(train_encodings['input_ids'], train_encodings['attention_mask'], torch.tensor(train_labels))

# Training arguments
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=4,
    warmup_steps=10,
    weight_decay=0.01,
    logging_dir='./logs'
)

# Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset
)

# Train the model
trainer.train()


**T5 (Text-To-Text Transfer Transformer)**

* T5 treats all NLP tasks as a text-to-text problem. It uses a text generation approach for both input and output.

**Example: Text summarization with T5**

In [None]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

# Load a pre-trained T5 model and tokenizer
tokenizer = T5Tokenizer.from_pretrained('t5-small')
model = T5ForConditionalGeneration.from_pretrained('t5-small')

# Summarize text
input_text = "summarize: Artificial intelligence is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals. Leading AI textbooks define the field as the study of 'intelligent agents'."
inputs = tokenizer(input_text, return_tensors='pt')
output = model.generate(**inputs, max_length=50)
summary = tokenizer.decode(output[0], skip_special_tokens=True)
print("Summary:", summary)


# **4. Practical Implementation**

**Using Hugging Face Transformers library**

* The Hugging Face Transformers library provides an easy-to-use interface for working with state-of-the-art NLP models.

**Training and fine-tuning LLMs on custom datasets**

* You can fine-tune pre-trained models on your own datasets using the Trainer class in the Transformers library.

**Handling large datasets and model training**

* When dealing with large datasets, it's important to use efficient data processing and training techniques to handle the computational load.

In [None]:
from transformers import DataCollatorForLanguageModeling, Trainer, TrainingArguments

# Prepare dataset (dummy example)
texts = ["I love programming", "I hate bugs", "Machine learning is fascinating"]
encodings = tokenizer(texts, truncation=True, padding=True, return_tensors='pt')
dataset = torch.utils.data.TensorDataset(encodings['input_ids'], encodings['attention_mask'])

# Data collator
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=True)

# Training arguments
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=4,
    warmup_steps=10,
    weight_decay=0.01,
    logging_dir='./logs',
)

# Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    data_collator=data_collator,
)

# Train the model
trainer.train()
