feat(personal-tutor): add desirable-difficulty hooks by minsoo-web · Pull Request #22 · hamsurang/kit

minsoo-web · 2026-05-01T14:19:39Z

Summary

Cross-session retrieval is now the only path to understood. Previously the personal-tutor skill let a learner pass a quiz with the answer still echoing in working memory ("fluency illusion") and mark a concept understood in the same session it was first taught — not really retention. This PR introduces Bjork's "desirable difficulty" hooks: same-session warm quizzes cap at partial, a Cold quiz pending: yes flag schedules retrieval verification for the next session, and a Phase 5 self-audit drift guard mechanically enforces the rule.

What changed

Mechanism	Effect
Phase 2.0 Cold Quiz Sweep	New session phase that fires when any node has `Cold quiz pending: yes`. Runs before any new teaching, with no re-explanation of the concept
Iron Rule #6	"Never advance to `understood` within the same session a node was first taught." Phase 5 self-audit reverts violations back to `partial`
R5 sole scheduler	Every new-concept warm pass (hint or no hint) sets `Cold quiz pending: yes`. Cold quiz is the only path to `understood` for first-taught nodes
Generation Effect	Phase 3 now starts with a Predict step — learner guesses before Explain, and Explain anchors on the guess
Deterministic format rotation	`feynman → apply → analyze → apply` (no return to feynman). `Last quiz format` unchanged on fail, so failure never escalates the learner into a higher Bloom level
3-strike downgrade	3 consecutive `failed` or `passed (hint used)` entries with no `passed (no hint)` between → `partial → gap`, breaks indefinite hint-pass loops
Path B escape valve	Review-slot warm pass without hints + prior cold attempt → `partial → understood`. Rescues nodes whose cold quiz was attempted-but-not-cleanly-passed

Schema additions: Cold quiz pending and Last quiz format lines per node. Read-time defaults + write-time touched-node-only migration — no batch rewrite of existing graphs.

Session flow

flowchart TB
    A[Session Start] --> B{Cold quiz<br/>pending?}
    B -->|yes| C[Phase 2.0<br/>Cold Quiz Sweep]
    B -->|no| D[Phase 2 Agenda]
    C --> D
    D --> E[Phase 3<br/>Predict → Explain → Q&A → Check]
    E --> F[Phase 4 Warm Quiz<br/>caps at gap→partial]
    F --> G[Phase 5 Archive<br/>+ self-audit]
    G --> H{Path A: cold<br/>no-hint pass?}
    G --> I{Path B: review-slot pass<br/>+ prior cold attempt?}
    H -->|yes| J[partial → understood]
    I -->|yes| J
    G --> K{3-strike or<br/>2-fail streak?}
    K -->|yes| L[partial → gap]

Validation

Post-implementation eval against the pre-improvement skill across 4 protocol-compliance scenarios (cold-pending routing, Iron Rule #6 enforcement, path B escape valve, 3-strike streak detection):

Metric	Pre-improvement	This PR	Delta
Pass rate	25.0% (7/30)	100.0% (30/30)	+75pp
Avg time / scenario	105.9s ± 18.7s	92.2s ± 14.6s	−13.7s
Avg tokens / scenario	34,966	38,119	+3,152

The new version is faster despite carrying more rules — deterministic format rotation and explicit cap rules eliminate the reasoning ambiguity the old skill burned cycles on.

The eval set (plugins/personal-tutor/evals/evals.json) lands in a follow-up commit so reviewers can verify methodology against future regressions; per-run outputs and the benchmark viewer are local-only.

Why these ship together

The three feature commits (brainstorm, plan, impl) describe one cognitive intervention. Separating impl from its requirements-and-plan would force a reviewer to reconstruct the desirable-difficulty rationale from code alone — which is exactly the failure mode the rule changes are trying to fix in the learner. Keeping the why and the how in one PR means future edits to the skill can re-read the reason a rule exists before changing it.

Captures the brainstorm output for adding desirable-difficulty hooks to the personal-tutor skill. Defines R1-R12 covering Generation-first teaching, warm/cold quiz split, hint follow-through, session capacity policy, and Iron Rule #6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Implementation plan for the desirable-difficulty hooks. 4 units (U1-U4): schema extension, Phase 3 Generation-first, Phase 4-5 warm semantics + Iron Rule, Phase 2.0 cold sweep + capacity policy. Plan was reviewed through ce-doc-review (4 personas, 19 actionable findings) and updated to reflect 15 Apply decisions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ration, Iron Rule #6) Restructures the learning protocol so that "fluency illusion" — passing quiz with the answer still in working memory — can no longer upgrade a node to understood. Implements three desirable-difficulty hooks across the session cycle: * Phase 3 Generation-first turn: predict before explain, with strategic hints (no answer leak) when the learner is stuck. * Phase 4 renamed to Warm Quiz, capped at gap→partial. Every new-concept warm pass schedules cold quiz for next session (R5 sole scheduler). * New Phase 2.0 Cold Quiz Sweep at session start (when pending exists), no re-teaching, deterministic format rotation Feynman→Apply→Analyze→ Apply, escalation prevented on cold-fail. partial→understood gates on cold no-hint pass (path A) or review-slot escape valve (path B). Plus: Iron Rule #6 (no same-session understood), Phase 5 self-audit hook, 3-strike downgrade for hint-pass loops, streak counter spanning warm/cold mix, capacity policy (cap=1 when cold-pending exists), 6-line knowledge-graph schema with read-time backward-compat and write-time gradual upgrade. README + Session Flow diagram + Applied Learning Science table synced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nd status docs - Phase 4 now runs warm quiz on review-slot nodes (path B was unreachable before — review flow ended at Socratic Q&A with no quiz step). - Review-slot warm passes do NOT trigger R5 cold-pending; only new-concept warm passes do. Review-slot uses Phase 2.0 rotation rule for format. - Phase 3 review branch now explicitly hands off to Phase 4. - Phase 2.0 cap-decision wording clarified: cap is fixed at session start, remaining pending count only feeds the agenda announcement. - README status table for `understood` now lists both path A (cold) and path B (review-slot escape valve). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

) The desirable-difficulty hooks shipped in #22 add new user-facing capability (cross-session retrieval verification, Generation Effect, mechanical Iron Rule #6 enforcement) while remaining backward-compatible with existing knowledge graphs (read-time defaults + touched-node-only migration). Minor bump per SemVer. Marketplace bumps to 1.10.0 to mirror the plugin's minor change. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

minsoo-web and others added 4 commits May 1, 2026 22:29

minsoo-web merged commit 374498d into main May 1, 2026
3 checks passed

minsoo-web deleted the feat/personal-tutor-desirable-difficulty branch May 1, 2026 14:28

minsoo-web mentioned this pull request May 1, 2026

chore(release): bump personal-tutor to 1.1.0 #23

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(personal-tutor): add desirable-difficulty hooks#22

feat(personal-tutor): add desirable-difficulty hooks#22
minsoo-web merged 4 commits intomainfrom
feat/personal-tutor-desirable-difficulty

minsoo-web commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

minsoo-web commented May 1, 2026

Summary

What changed

Session flow

Validation

Why these ship together

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant