# BERT Embeddings and NLP Tasks

In this notebook we will learn:

- What BERT and BERT embeddings are  
- Why contextual embeddings are better than static embeddings  
- How to use BERT for:
  - Text classification
  - Named Entity Recognition (NER)
  - Sentiment analysis

We will use the Hugging Face `transformers` library and simple toy examples.

In [None]:
!pip install -q transformers torch datasets sentencepiece

import torch
from transformers import AutoTokenizer, AutoModel, AutoModelForSequenceClassification
from transformers import AutoModelForTokenClassification, pipeline

## 1. BERT and contextual embeddings

BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based language model that reads text in both directions to build contextual representations.

An embedding is a dense numeric vector representing text meaning.

BERT embeddings combine token, position, and segment embeddings and pass them through transformer layers.

In [None]:
from transformers import AutoTokenizer, AutoModel

model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
bert_model = AutoModel.from_pretrained(model_name)

sentence = "BERT embeddings capture contextual meaning."

inputs = tokenizer(sentence, return_tensors="pt")
with torch.no_grad():
    outputs = bert_model(**inputs)

last_hidden_state = outputs.last_hidden_state
cls_embedding = last_hidden_state[:, 0, :]

print("Shape of last_hidden_state:", last_hidden_state.shape)
print("Shape of [CLS] embedding:", cls_embedding.shape)

## Contextual vs Static Example

In [None]:
sent1 = "The bank approved my loan."
sent2 = "We sat by the bank of the river."

batch = tokenizer([sent1, sent2], padding=True, return_tensors="pt")
with torch.no_grad():
    out = bert_model(**batch)

for i, sent in enumerate([sent1, sent2]):
    tokens = tokenizer.convert_ids_to_tokens(batch["input_ids"][i])
    print(sent)
    print(tokens)

## Sentence Embeddings via Mean Pooling

In [None]:
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output.last_hidden_state
    mask = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return (token_embeddings * mask).sum(dim=1) / mask.sum(dim=1)

sentences = ["I love NLP.", "BERT embeddings are powerful."]
enc = tokenizer(sentences, padding=True, return_tensors="pt")

with torch.no_grad():
    out = bert_model(**enc)

embeddings = mean_pooling(out, enc["attention_mask"])
print(embeddings.shape)

## Text Classification with BERT

In [None]:
clf_model_name = "distilbert-base-uncased-finetuned-sst-2-english"
clf_tokenizer = AutoTokenizer.from_pretrained(clf_model_name)
clf_model = AutoModelForSequenceClassification.from_pretrained(clf_model_name)

text = "This course is very interesting."
inputs = clf_tokenizer(text, return_tensors="pt")
with torch.no_grad():
    logits = clf_model(**inputs).logits

probs = torch.softmax(logits, dim=-1)[0]
print(clf_model.config.id2label, probs.tolist())

## Named Entity Recognition

In [None]:
ner_pipe = pipeline("ner", model="dslim/bert-base-NER", aggregation_strategy="simple")
text = "Barack Obama was born in Hawaii."
print(ner_pipe(text))

## Sentiment Analysis

In [None]:
sentiment_pipe = pipeline("sentiment-analysis")
sentiment_pipe(["I love this!", "This is terrible."])

## Recap

You learned how to generate embeddings and apply BERT to classification, NER, and sentiment analysis.