
# BERT Transformers

---

## ðŸ”· What is Transformer?

Transformer is a deep learning architecture introduced in 2017 in the paper:
"Attention Is All You Need".

### Key Features:
- Uses Self-Attention mechanism
- Processes entire sequence in parallel
- Handles long-range dependencies efficiently
- Replaced RNNs & LSTMs in NLP tasks

---

## ðŸ”· What is BERT?

BERT = Bidirectional Encoder Representations from Transformers

It is a pretrained Transformer Encoder model developed by Google.

### Key Characteristics:
- Bidirectional (reads text left & right simultaneously)
- Pretrained on massive text corpus
- Fine-tuned for downstream NLP tasks

---

## ðŸ”· Main Functions of BERT

1. Text Classification
2. Sentiment Analysis
3. Question Answering
4. Named Entity Recognition
5. Text Similarity
6. Language Understanding Tasks

---

### ðŸ”· Steps used in this Algorithm:-

1.  Import all the necessary libraries

2.  Load the imdb Dataset

3.  Load the BERT Tokenizer

4.  Perform the Tokenization

5.  Convert to TensorFlow Dataset

6.  Load the  Pretrained BERT Model

7.  Compile the BERT Model

8.  Train the BERT Model

9.  Evaluate the BERT Model

10. Plot Accuracy vs Loss

11. Perform Predictions on the sample text



### Step 1: Import all the necessary libraries

In [None]:
import tensorflow as tf
from transformers import BertTokenizer, TFBertForSequenceClassification
from datasets import load_dataset
import numpy as np
import matplotlib.pyplot as plt

### **OUTPUT:**

| Library                         | Purpose                     |
| ------------------------------- | --------------------------- |
| BertTokenizer                   | Converts text â†’ BERT format |
| TFBertForSequenceClassification | Pretrained BERT model       |
| load_dataset                    | Loads IMDB                  |
| tensorflow                      | Training                    |


### Step 2: Load the imdb Dataset

In [None]:
dataset = load_dataset("imdb")

train_texts = dataset['train']['text'][:2000]
train_labels = dataset['train']['label'][:2000]

test_texts = dataset['test']['text'][:500]
test_labels = dataset['test']['label'][:500]


### **Explanation:**

1.  We have an IMDB Dataset that has 50,000 movie reviews and is used for Binary classification.

### **OUTPUT:**

| Label | Meaning  |
| ----- | -------- |
| 0     | Negative |
| 1     | Positive |

2.  We take subset for faster training

### Step 3: Load the BERT Tokenizer

In [None]:
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

### **Explanation:**

1.  What happens internally?

It Loads:

(a.)   Vocabulary (30,522 tokens)

(b.)   WordPiece tokenizer

(c.)   Special tokens:

        (i.)         [CLS]

        (ii.)        [SEP]

        (iii.)       [PAD]

### Step 4: Perform the Tokenization

In [None]:
train_encodings = tokenizer(
    train_texts,
    truncation=True,
    padding=True,
    max_length=128
)

test_encodings = tokenizer(
    test_texts,
    truncation=True,
    padding=True,
    max_length=128
)


### **Explanation:**

1.     It   Cuts long reviews beyond 128 tokens.

2.    It performs the Padding in  shorter sentences to 128.

3.   All inputs become shape: (batch_size, 128)

4.   Every sentence becomes:  [CLS] I love this movie [SEP]

And converted into:

    (a.)   input_ids

    (b.)   attention_mask

    (c.)   token_type_ids

### Step 5: Convert to TensorFlow Dataset

In [None]:
train_dataset = tf.data.Dataset.from_tensor_slices((
    dict(train_encodings),
    train_labels
)).shuffle(1000).batch(16)

test_dataset = tf.data.Dataset.from_tensor_slices((
    dict(test_encodings),
    test_labels
)).batch(16)


### **Explanation:**

1.  It  Converts dictionary into tf format

        (a.)      shuffle() â†’ Avoids bias

        (b.)      batch(16) â†’ Memory efficient

### Step 6: Load the  Pretrained BERT Model

In [None]:
model = TFBertForSequenceClassification.from_pretrained(
    "bert-base-uncased",
    num_labels=2
)


### **Explanation:**


Text
 â†“
Embedding Layer
 â†“
12 Transformer Encoder Blocks
 â†“
[CLS] token representation
 â†“
Dropout
 â†“
Dense Layer (2 neurons)
 â†“
Logits


The model is already pretrained on:

    (a.)    Wikipedia
    
    (b.)    BooksCorpus

Now we are doing Fine-Tuning.


### Step 7: Compile the BERT Model

In [None]:
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5)

model.compile(
    optimizer=optimizer,
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=['accuracy']
)


### **Explanation:**


Important Interview Question: **bold text**

1.    Why learning rate = 3e-5 ?

       Because:

           (a.)        BERT already trained

           (b.)        Large LR â†’ Destroys learned weights

           (c.)        Fine-tuning needs small LR

### Step 8: Train the BERT Model

In [None]:
history=model.fit(
    train_dataset,
    validation_data=test_dataset,
    epochs=2
)


### **Explanation:**

1.   What Happens?

     (a.)         Only small weight adjustments

     (b.)         Model adapts to sentiment classification

### Step 9: Evaluate the BERT Model

In [None]:
loss, accuracy = model.evaluate(test_dataset)
print("Test Accuracy:", accuracy)


### **Explanation:**

1.    Expected Accuracy:

      (a.)     ~85â€“90% even with small dataset

### Step 10: Plot Accuracy vs Loss

In [None]:
plt.figure()
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(['Train', 'Validation'])
plt.show()

plt.figure()
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend(['Train', 'Validation'])
plt.show()


### **Explanation:**

1.   What This Shows:

     (a.)     If accuracy â†‘ and loss â†“ â†’ Model learning

     (b.)     If val loss â†‘ but train loss â†“ â†’ Overfitting

### Step 11: Perform Predictions on the sample text

In [None]:
sample_reviews = [
    "This movie was absolutely fantastic!",
    "Worst film I have ever seen."
]

sample_encodings = tokenizer(
    sample_reviews,
    truncation=True,
    padding=True,
    max_length=128,
    return_tensors="tf"
)

outputs = model(sample_encodings)
logits = outputs.logits

predictions = tf.argmax(logits, axis=1).numpy()


In [None]:
### Convert Prediction to Label

for review, pred in zip(sample_reviews, predictions):
    sentiment = "Positive" if pred == 1 else "Negative"
    print("Review:", review)
    print("Predicted Sentiment:", sentiment)
    print()


### **Output:**

**Review:**               This movie was absolutely fantastic!

**Predicted Sentiment:**   Positive

**Review:**                Worst film I have ever seen.

**Predicted Sentiment:**    Negative
