# Easy → Medium → Hard curriculum walkthrough

This notebook demonstrates a complete curriculum that fades from easy to medium to hard data,
mirroring the minimal example shown in the project README but with richer inspection utilities.

## Setup
We make sure the local `curriculus` package is importable and prepare some toy datasets that
stand in for easy / medium / hard supervision.

In [None]:
import itertools
import random
from collections import Counter
from pathlib import Path
import sys

ROOT = Path.cwd()
if not (ROOT / "src").exists():
    ROOT = ROOT.parent
if not (ROOT / "src").exists():
    ROOT = ROOT.parent
sys.path.insert(0, str(ROOT / "src"))

from curriculus import Curriculus, CurriculusPlanner

In [7]:
def make_examples(label: str, count: int, seed: int = 0):
    rng = random.Random(seed)
    base_templates = [
        "{label} sample #{idx}: classify sentiment",
        "{label} sample #{idx}: extract entities",
        "{label} sample #{idx}: summarize paragraph",
    ]
    return [
        {"difficulty": label, "text": rng.choice(base_templates).format(label=label.title(), idx=i)}
        for i in range(count)
    ]

easy_samples = make_examples("easy", 40, seed=1)
medium_samples = make_examples("medium", 30, seed=2)
hard_samples = make_examples("hard", 20, seed=3)

datasets = [
    {"name": "easy", "dataset": easy_samples},
    {"name": "medium", "dataset": medium_samples},
    {"name": "hard", "dataset": hard_samples},
]

Counter([row["difficulty"] for row in itertools.chain(easy_samples, medium_samples, hard_samples)])


Counter({'easy': 40, 'medium': 30, 'hard': 20})

## Planning the schedule
We auto-generate a sequential curriculum and inspect the resulting sampling plan. This step is optional but recommended for debugging and validation.

In [8]:
planner = CurriculusPlanner(
    datasets=datasets,
    total_steps=90,
    oversampling=False,
    best_effort=True,
)
print(planner.get_plan_summary())

Total Steps: 90\nOversampling: False\nBest Effort: True\nDataset Budget:\n  easy: OK           (40 available)\n  medium: SCALED       (30 available, 45 needed (0.67x))\n  hard: SCALED       (20 available, 22 needed (0.89x))


## Sampling preview
Draw a handful of curriculum samples to see the fade between difficulty levels.

In [9]:
mixed_dataset = Curriculus(
    datasets=datasets,
    total_steps=30,
    oversampling=False,
    best_effort=True,
)

preview = []
for step, sample in zip(range(20), mixed_dataset['train']):
    preview.append({"step": step, "difficulty": sample["difficulty"], "text": sample["text"]})

preview


[{'step': 0,
  'difficulty': 'easy',
  'text': 'Easy sample #0: classify sentiment'},
 {'step': 1,
  'difficulty': 'easy',
  'text': 'Easy sample #1: summarize paragraph'},
 {'step': 2,
  'difficulty': 'easy',
  'text': 'Easy sample #2: classify sentiment'},
 {'step': 3, 'difficulty': 'easy', 'text': 'Easy sample #3: extract entities'},
 {'step': 4,
  'difficulty': 'easy',
  'text': 'Easy sample #4: classify sentiment'},
 {'step': 5,
  'difficulty': 'medium',
  'text': 'Medium sample #0: classify sentiment'},
 {'step': 6, 'difficulty': 'easy', 'text': 'Easy sample #5: extract entities'},
 {'step': 7, 'difficulty': 'easy', 'text': 'Easy sample #6: extract entities'},
 {'step': 8,
  'difficulty': 'medium',
  'text': 'Medium sample #1: classify sentiment'},
 {'step': 9, 'difficulty': 'easy', 'text': 'Easy sample #7: extract entities'},
 {'step': 10,
  'difficulty': 'medium',
  'text': 'Medium sample #2: classify sentiment'},
 {'step': 11,
  'difficulty': 'medium',
  'text': 'Medium sample

In [12]:
mixed_dataset.to_hf_dataset()

DatasetDict({
    train: Dataset({
        features: ['difficulty', 'text'],
        num_rows: 30
    })
})

## Next steps
- Swap the toy lists for real datasets.
- Tune `total_steps`, `oversampling`, and custom schedules as needed.
- Feed the `CurriculusIterableDataset` into your trainer / dataloader.