```{contents}
```
## Pretraining

**Pretraining** is the process of training a large neural network on massive amounts of general data so that it learns **universal representations** before being adapted to specific tasks.

It is the foundation of all modern **foundation models**.

---

### **Core Intuition**

Instead of learning every task from scratch, the model first learns the **structure of the world** from huge data.

> **Learn language, patterns, and knowledge once â€” then specialize later.**

This is similar to how humans learn general concepts before mastering specific skills.

---

### **What Happens During Pretraining**

The model is exposed to enormous datasets (text, images, code, audio, video) and learns by solving a **self-supervised objective**.

For language models:

$$
\text{Predict next token}
$$

This forces the model to learn:

* Grammar
* Facts
* Reasoning patterns
* World knowledge
* Representations of meaning

---

### **Training Objective Example (LLM)**

Given:

```
"The capital of France is ___"
```

The model learns to predict:

```
"Paris"
```

Over billions of examples, this builds deep understanding.

---

### **Why Pretraining Works**

| Benefit               | Explanation                |
| --------------------- | -------------------------- |
| Knowledge acquisition | Learns facts & concepts    |
| Generalization        | Works across tasks         |
| Transfer learning     | Reduces data for new tasks |
| Emergent abilities    | Reasoning & abstraction    |

---

### **Applications**

#### Large Language Models

GPT, Claude, LLaMA, Mistral

#### Vision Models

CLIP, DINO, ViT

#### Multimodal Models

GPT-4V, Gemini, Flamingo

#### Speech & Audio

Whisper, wav2vec

---

### **Pretraining vs Fine-Tuning**

| Stage       | Role                         |
| ----------- | ---------------------------- |
| Pretraining | Learn general knowledge      |
| Fine-tuning | Learn task-specific behavior |

---

### **Types of Pretraining**

* Self-supervised pretraining
* Contrastive pretraining
* Masked modeling
* Next-token prediction
* Multimodal pretraining

---

### **Intuition Summary**

Pretraining builds the **foundation brain** of the AI.

