
# Chapter 3: Language Models and LLMs

This notebook explores:
- The foundations of Language Models (LMs) and Large Language Models (LLMs)
- Differences in encoder-only, decoder-only, and encoder-decoder architectures
- Pretrained LLMs like BERT, GPT, T5
- Inference, fine-tuning, and API usage

## Learning Objectives

- Understand the purpose and design of language models
- Implement and use encoder-based models (BERT) for embeddings
- Use decoder-based models (GPT-2) for generation
- Apply encoder-decoder models (T5) for translation/summarization
- Compare popular open-source LLMs



## Encoder-Only Models: BERT

Used primarily for classification, sentence embeddings, question answering.


In [None]:

from transformers import AutoTokenizer, AutoModel
import torch

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")

sentence = "Language models are the backbone of modern NLP."
inputs = tokenizer(sentence, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

# Mean pooling
embedding = outputs.last_hidden_state.mean(dim=1)
print("Sentence Embedding Shape:", embedding.shape)



## Decoder-Only Models: GPT-2

Primarily used for autoregressive text generation.


In [None]:

from transformers import GPT2LMHeadModel, GPT2Tokenizer

model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

input_text = "Explain quantum computing in simple terms"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=60)

print("Generated Text:")
print(tokenizer.decode(outputs[0], skip_special_tokens=True))



## Encoder-Decoder Models: T5

T5 can be used for translation, summarization, and QA.


In [None]:

from transformers import T5ForConditionalGeneration, T5Tokenizer

model = T5ForConditionalGeneration.from_pretrained("t5-small")
tokenizer = T5Tokenizer.from_pretrained("t5-small")

text = "summarize: The quick brown fox jumps over the lazy dog. This sentence contains every letter of the alphabet and is often used to test fonts or keyboards."
inputs = tokenizer(text, return_tensors="pt", padding=True)

output = model.generate(**inputs, max_length=30)
print("Summary:", tokenizer.decode(output[0], skip_special_tokens=True))



## Popular Open-Source LLMs

| Model         | Type           | Best For                         |
|---------------|----------------|----------------------------------|
| BERT          | Encoder         | Embeddings, QA, classification  |
| GPT-2, GPT-3  | Decoder         | Text generation, summarization  |
| T5            | Encoder-Decoder | Translation, QA, summarization  |
| Mistral       | Decoder         | Instruction tuning              |
| Falcon, LLaMA | Decoder         | Open fine-tuning                |



## Real-World Use Cases

- GPT: Chatbots, creative writing, coding assistants
- BERT: Semantic search, intent classification
- T5: Automated email summarization, document understanding



## Exercises

1. Use BERT embeddings for semantic similarity search.
2. Try different prompts with GPT-2 and observe variations in outputs.
3. Use T5 for translation from English to French.
4. Compare latency and output quality between different LLMs.

## References

- Hugging Face: https://huggingface.co/models
- T5 Paper: https://arxiv.org/abs/1910.10683
- GPT-2 Blog: https://openai.com/research/gpt-2
