# Few-shot Prompting

- 📺 **Video:** [https://youtu.be/JSBjj09xJeM](https://youtu.be/JSBjj09xJeM)

## Overview
- Provide a handful of demonstrations in the prompt to steer large language models toward desired outputs.
- Interpret few-shot prompting as in-context learning.

## Key ideas
- **Demonstrations:** include labeled examples illustrating input-output mapping.
- **Pattern matching:** models mimic formats, styles, and label choices.
- **Ordering:** example order and diversity influence generalization.
- **Evaluation:** craft held-out queries to measure prompt effectiveness.

## Demo
Build a tiny in-context classifier that infers label-word associations from few demonstrations, mirroring the lecture (https://youtu.be/-PMyW3F7S0M).

In [1]:
from collections import Counter

demos = [
    ('Review: I adored the clever twists.', 'positive'),
    ('Review: The pacing was dull and lifeless.', 'negative'),
    ('Review: Brilliant acting kept me engaged.', 'positive')
]

def few_shot(prompt, examples):
    word_counts = {'positive': Counter(), 'negative': Counter()}
    for text, label in examples:
        tokens = text.lower().split()
        word_counts[label].update(tokens)
    total = {label: sum(counts.values()) for label, counts in word_counts.items()}
    tokens = prompt.lower().split()
    scores = {}
    for label in word_counts:
        score = 0.0
        for token in tokens:
            score += (word_counts[label][token] + 1) / (total[label] + len(word_counts[label]))
        scores[label] = score
    return max(scores, key=scores.get)

queries = [
    'Review: Heartfelt storytelling with charming characters.',
    'Review: Weak script and awkward performances.'
]

for q in queries:
    label = few_shot(q, demos)
    print(f"Prompt --> {label}")


Prompt --> negative
Prompt --> negative


## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [Language Models are Unsupervised Multitask Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
- [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
- [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)
- [Demystifying Prompts in Language Models via Perplexity Estimation](https://arxiv.org/abs/2212.04037)
- [Calibrate Before Use: Improving Few-Shot Performance of Language Models](https://arxiv.org/abs/2102.09690)
- [Holistic Evaluation of Language Models](https://arxiv.org/abs/2211.09110)
- [Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?](https://arxiv.org/abs/2202.12837)
- [In-context Learning and Induction Heads](https://arxiv.org/abs/2209.11895)
- [Multitask Prompted Training Enables Zero-Shot Task Generalization](https://arxiv.org/abs/2110.08207)
- [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)
- [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
- [[Website] Stanford Alpaca: An Instruction-following LLaMA Model](https://crfm.stanford.edu/2023/03/13/alpaca.html)
- [Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation](https://arxiv.org/abs/2212.07981)
- [WiCE: Real-World Entailment for Claims in Wikipedia](https://arxiv.org/abs/2303.01432)
- [SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization](https://arxiv.org/abs/2111.09525)
- [FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation](https://arxiv.org/abs/2305.14251)
- [RARR: Researching and Revising What Language Models Say, Using Language Models](https://arxiv.org/abs/2210.08726)


*Links only; we do not redistribute slides or papers.*