# Understanding ICL: Induction Heads

- 📺 **Video:** [https://youtu.be/mUthsZ_Aivo](https://youtu.be/mUthsZ_Aivo)

## Overview
- Analyze how attention heads implement induction: copying tokens that appear earlier in the context.
- Understand circuit-level explanations of in-context learning.

## Key ideas
- **Induction heads:** attend from token i to i-1 to copy continuation patterns.
- **Key-query alignment:** shifting keys/queries reveals positional bias.
- **Residual stream:** add copied token embeddings into the prediction.
- **Circuit analysis:** interpret weights to explain behavior.

## Demo
Construct a toy attention head that copies the previous token embedding and inspect its attention pattern, mirroring the lecture (https://youtu.be/CF3MHqYsgUo).

In [1]:
import numpy as np

sequence = np.array([
    [1.0, 0.0, 0.0],
    [0.0, 1.0, 0.0],
    [0.0, 0.0, 1.0],
    [0.5, 0.5, 0.0]
])

shift_matrix = np.array([
    [0, 0, 0, 0],
    [1, 0, 0, 0],
    [0, 1, 0, 0],
    [0, 0, 1, 0]
])

attention = shift_matrix.astype(float)
attention /= attention.sum(axis=1, keepdims=True) + 1e-8
context = attention @ sequence

print('Attention weights (induction head):')
print(attention)
print()
print('Copied representations:')
print(context)


Attention weights (induction head):
[[0.         0.         0.         0.        ]
 [0.99999999 0.         0.         0.        ]
 [0.         0.99999999 0.         0.        ]
 [0.         0.         0.99999999 0.        ]]

Copied representations:
[[0.         0.         0.        ]
 [0.99999999 0.         0.        ]
 [0.         0.99999999 0.        ]
 [0.         0.         0.99999999]]


## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [Language Models are Unsupervised Multitask Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
- [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
- [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)
- [Demystifying Prompts in Language Models via Perplexity Estimation](https://arxiv.org/abs/2212.04037)
- [Calibrate Before Use: Improving Few-Shot Performance of Language Models](https://arxiv.org/abs/2102.09690)
- [Holistic Evaluation of Language Models](https://arxiv.org/abs/2211.09110)
- [Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?](https://arxiv.org/abs/2202.12837)
- [In-context Learning and Induction Heads](https://arxiv.org/abs/2209.11895)
- [Multitask Prompted Training Enables Zero-Shot Task Generalization](https://arxiv.org/abs/2110.08207)
- [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
- [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
- [[Website] Stanford Alpaca: An Instruction-following LLaMA Model](https://crfm.stanford.edu/2023/03/13/alpaca.html)
- [Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation](https://arxiv.org/abs/2212.07981)
- [WiCE: Real-World Entailment for Claims in Wikipedia](https://arxiv.org/abs/2303.01432)
- [SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization](https://arxiv.org/abs/2111.09525)
- [FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation](https://arxiv.org/abs/2305.14251)
- [RARR: Researching and Revising What Language Models Say, Using Language Models](https://arxiv.org/abs/2210.08726)


*Links only; we do not redistribute slides or papers.*