# The Pooler (Summarizing the Sentence)

Having a powerful encoder that produces a sequence of context-aware vectors: `[v_1, v_2, ..., v_n]` is not enough for tasks like **Sentiment Analysis** (Is this positive or negative?), we can't feed a variable-length list of vectors into a classifier. We need **one single vector** that represents the entire sentence.

### [CLS] Token 
BERT uses a special token at the start of every sentence: `[CLS]`.
During the self-attention process (in the Encoder Stack), this token is allowed to "attend" to every other word. By the end of the last layer, the `[CLS]` vector theoretically contains the aggregate meaning of the whole sequence.

### Pooler Architecture
The Pooler is a simple layer that sits on top of the encoder:
1.  **Extract:** Take the vector at index 0 (the `[CLS]` token).
2.  **Process:** Pass it through a Linear layer + Tanh activation.

$$ Output = 	tanh(W \cdot x_{CLS} + b) $$

Let's start by adding src to the python system path in case your notebook is not being run with it already added.

In [1]:
import sys
from pathlib import Path
sys.path.append((Path('').resolve().parent / 'src').as_posix())

#### Inspecting the Layer
It's just a Linear layer and a Tanh. Nothing fancy.

In [2]:
from modules.pooler import Pooler
from settings import LayerCommonSettings

pooler = Pooler(LayerCommonSettings())
print(pooler)

Pooler(
  (dense): Linear(in_features=768, out_features=768, bias=True)
  (activation): Tanh()
)


#### Dimensionality Reduction
The most important job of the pooler is changing the shape.
* Input: `(Batch, Sequence, Hidden)`
* Output: `(Batch, Hidden)`

In [3]:
import torch 

input_tensor = torch.randn(2, 5, 768)
pooled_output = pooler(input_tensor)

print(f"Input Shape:  {input_tensor.shape}")
print(f"Output Shape: {pooled_output.shape}")

assert len(pooled_output.shape) == 2
print("✅ Sequence dimension collapsed.")

Input Shape:  torch.Size([2, 5, 768])
Output Shape: torch.Size([2, 768])
✅ Sequence dimension collapsed.
