```{contents}
```
## Context Compression

### 1. Definition

**Context Compression** is the process of **reducing the size of input context** while **preserving the information necessary** for a model to generate accurate, coherent, and task-relevant outputs.

It is essential because large language models (LLMs) have **finite context windows**, yet real-world applications often involve **long documents, conversations, and knowledge bases**.

> Goal:
> **Maximize useful information per token**

---

### 2. Why Context Compression Is Necessary

| Constraint     | Description                                       |
| -------------- | ------------------------------------------------- |
| Context Window | Models accept a limited number of tokens          |
| Latency        | Larger prompts increase inference time            |
| Cost           | More tokens = higher compute cost                 |
| Noise          | Irrelevant information degrades reasoning quality |
| Memory         | Long-term interactions exceed window capacity     |

---

### 3. Core Principles

Context compression aims to maintain:

* **Semantic fidelity** — same meaning
* **Task relevance** — preserve task-critical facts
* **Structural cues** — entities, relations, constraints
* **Reasoning support** — information required for inference

While minimizing:

* Redundancy
* Irrelevance
* Surface-level verbosity

---

### 4. Types of Context Compression

| Type             | Description                           | Example                    |
| ---------------- | ------------------------------------- | -------------------------- |
| **Extractive**   | Select most important segments        | Top-k passages             |
| **Abstractive**  | Generate condensed summary            | Executive summary          |
| **Symbolic**     | Convert text → structured form        | Triples, tables            |
| **Hierarchical** | Multi-level summarization             | Chapter → section → bullet |
| **Adaptive**     | Dynamic compression per task          | Query-aware summary        |
| **Learned**      | Neural models trained for compression | Transformer compressors    |

---

### 5. Compression Workflow

```
Raw Context
   ↓
Relevance Estimation
   ↓
Redundancy Removal
   ↓
Abstraction / Encoding
   ↓
Compressed Context
   ↓
LLM Inference
```

---

### 6. Mathematical View

Let original context be ( C ) and compressed context be ( \hat{C} ).

We want:

[
\arg\min_{\hat{C}} |\hat{C}| \quad \text{subject to} \quad
\text{Utility}(\hat{C}, T) \ge \text{Utility}(C, T)
]

Where:

* ( T ) = task
* Utility = model performance on task

---

### 7. Compression Techniques

#### 7.1 Heuristic Methods

* Keyword filtering
* Sentence ranking (TF-IDF, BM25)
* Sliding window + pruning

#### 7.2 Model-Based Compression

* Encoder–decoder summarizers
* Query-aware summarization
* Neural retrievers with learned scoring

#### 7.3 Symbolic Compression

Convert text to compact structures:

| Format            | Example                  |
| ----------------- | ------------------------ |
| Knowledge triples | (Company, founded, 1976) |
| Tables            | Entity → attributes      |
| JSON schemas      | Key-value constraints    |

---

### 8. Practical Demonstration

### Example: Query-Aware Context Compression

```python
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

def compress_context(context, query):
    prompt = f"Summarize the following for answering the query:\nQuery: {query}\n\n{context}"
    summary = summarizer(prompt, max_length=200, min_length=80, do_sample=False)
    return summary[0]['summary_text']
```

**Before (1,500 tokens)**
Long document describing climate policy, history, effects, statistics.

**After (180 tokens)**
Concise summary focusing only on emissions policies relevant to the query.

---

### 9. Context Compression vs Related Concepts

| Concept                    | Difference                                         |
| -------------------------- | -------------------------------------------------- |
| **Summarization**          | General form of compression                        |
| **Retrieval**              | Selects relevant context, does not compress        |
| **Knowledge Distillation** | Compresses model, not context                      |
| **Prompt Engineering**     | Structures context but does not reduce information |
| **Memory Management**      | Uses compression as a core tool                    |

---

### 10. Applications

| System          | Role of Context Compression      |
| --------------- | -------------------------------- |
| RAG systems     | Fit retrieved docs into window   |
| Chatbots        | Maintain long conversations      |
| Agents          | Preserve task memory             |
| Code assistants | Compress large codebases         |
| Search engines  | Generate compact knowledge views |

---

### 11. Advanced: Learned Context Compressors

Modern pipelines use **dedicated neural compressors**:

```
Document → Compressor Transformer → Latent Summary → LLM
```

Trained using:

* Task loss
* Reconstruction loss
* Information bottleneck objectives

---

### 12. Failure Modes

| Issue                   | Cause                       |
| ----------------------- | --------------------------- |
| Information loss        | Over-aggressive compression |
| Hallucination           | Missing key facts           |
| Bias amplification      | Distorted summaries         |
| Loss of reasoning steps | Over-abstraction            |

---

### 13. Design Guidelines

* Compress **after retrieval**
* Make compression **query-aware**
* Preserve **entities, numbers, constraints**
* Keep **intermediate reasoning**
* Continuously evaluate task performance

---

### 14. One-Sentence Summary

> **Context compression enables LLMs to operate over large information spaces by preserving only the information that matters for the current task, using principled information reduction strategies.**

---

If you want, next I can explain:
**Context Compression vs Prompt Compression**,
**Context Windows vs Long-Term Memory**,
or **How modern LLMs implement compression internally.**
