Skip to content

feat(personal-tutor): add desirable-difficulty hooks#22

Merged
minsoo-web merged 4 commits intomainfrom
feat/personal-tutor-desirable-difficulty
May 1, 2026
Merged

feat(personal-tutor): add desirable-difficulty hooks#22
minsoo-web merged 4 commits intomainfrom
feat/personal-tutor-desirable-difficulty

Conversation

@minsoo-web
Copy link
Copy Markdown
Member

Summary

Cross-session retrieval is now the only path to understood. Previously the personal-tutor skill let a learner pass a quiz with the answer still echoing in working memory ("fluency illusion") and mark a concept understood in the same session it was first taught — not really retention. This PR introduces Bjork's "desirable difficulty" hooks: same-session warm quizzes cap at partial, a Cold quiz pending: yes flag schedules retrieval verification for the next session, and a Phase 5 self-audit drift guard mechanically enforces the rule.

What changed

Mechanism Effect
Phase 2.0 Cold Quiz Sweep New session phase that fires when any node has Cold quiz pending: yes. Runs before any new teaching, with no re-explanation of the concept
Iron Rule #6 "Never advance to understood within the same session a node was first taught." Phase 5 self-audit reverts violations back to partial
R5 sole scheduler Every new-concept warm pass (hint or no hint) sets Cold quiz pending: yes. Cold quiz is the only path to understood for first-taught nodes
Generation Effect Phase 3 now starts with a Predict step — learner guesses before Explain, and Explain anchors on the guess
Deterministic format rotation feynman → apply → analyze → apply (no return to feynman). Last quiz format unchanged on fail, so failure never escalates the learner into a higher Bloom level
3-strike downgrade 3 consecutive failed or passed (hint used) entries with no passed (no hint) between → partial → gap, breaks indefinite hint-pass loops
Path B escape valve Review-slot warm pass without hints + prior cold attempt → partial → understood. Rescues nodes whose cold quiz was attempted-but-not-cleanly-passed

Schema additions: Cold quiz pending and Last quiz format lines per node. Read-time defaults + write-time touched-node-only migration — no batch rewrite of existing graphs.

Session flow

flowchart TB
    A[Session Start] --> B{Cold quiz<br/>pending?}
    B -->|yes| C[Phase 2.0<br/>Cold Quiz Sweep]
    B -->|no| D[Phase 2 Agenda]
    C --> D
    D --> E[Phase 3<br/>Predict → Explain → Q&A → Check]
    E --> F[Phase 4 Warm Quiz<br/>caps at gap→partial]
    F --> G[Phase 5 Archive<br/>+ self-audit]
    G --> H{Path A: cold<br/>no-hint pass?}
    G --> I{Path B: review-slot pass<br/>+ prior cold attempt?}
    H -->|yes| J[partial → understood]
    I -->|yes| J
    G --> K{3-strike or<br/>2-fail streak?}
    K -->|yes| L[partial → gap]
Loading

Validation

Post-implementation eval against the pre-improvement skill across 4 protocol-compliance scenarios (cold-pending routing, Iron Rule #6 enforcement, path B escape valve, 3-strike streak detection):

Metric Pre-improvement This PR Delta
Pass rate 25.0% (7/30) 100.0% (30/30) +75pp
Avg time / scenario 105.9s ± 18.7s 92.2s ± 14.6s −13.7s
Avg tokens / scenario 34,966 38,119 +3,152

The new version is faster despite carrying more rules — deterministic format rotation and explicit cap rules eliminate the reasoning ambiguity the old skill burned cycles on.

The eval set (plugins/personal-tutor/evals/evals.json) lands in a follow-up commit so reviewers can verify methodology against future regressions; per-run outputs and the benchmark viewer are local-only.

Why these ship together

The three feature commits (brainstorm, plan, impl) describe one cognitive intervention. Separating impl from its requirements-and-plan would force a reviewer to reconstruct the desirable-difficulty rationale from code alone — which is exactly the failure mode the rule changes are trying to fix in the learner. Keeping the why and the how in one PR means future edits to the skill can re-read the reason a rule exists before changing it.


Compound Engineering
Claude Code

minsoo-web and others added 4 commits May 1, 2026 22:29
Captures the brainstorm output for adding desirable-difficulty hooks to
the personal-tutor skill. Defines R1-R12 covering Generation-first
teaching, warm/cold quiz split, hint follow-through, session capacity
policy, and Iron Rule #6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implementation plan for the desirable-difficulty hooks. 4 units (U1-U4):
schema extension, Phase 3 Generation-first, Phase 4-5 warm semantics +
Iron Rule, Phase 2.0 cold sweep + capacity policy. Plan was reviewed
through ce-doc-review (4 personas, 19 actionable findings) and updated
to reflect 15 Apply decisions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ration, Iron Rule #6)

Restructures the learning protocol so that "fluency illusion" — passing
quiz with the answer still in working memory — can no longer upgrade a
node to understood. Implements three desirable-difficulty hooks across
the session cycle:

* Phase 3 Generation-first turn: predict before explain, with strategic
  hints (no answer leak) when the learner is stuck.
* Phase 4 renamed to Warm Quiz, capped at gap→partial. Every new-concept
  warm pass schedules cold quiz for next session (R5 sole scheduler).
* New Phase 2.0 Cold Quiz Sweep at session start (when pending exists),
  no re-teaching, deterministic format rotation Feynman→Apply→Analyze→
  Apply, escalation prevented on cold-fail. partial→understood gates on
  cold no-hint pass (path A) or review-slot escape valve (path B).

Plus: Iron Rule #6 (no same-session understood), Phase 5 self-audit
hook, 3-strike downgrade for hint-pass loops, streak counter spanning
warm/cold mix, capacity policy (cap=1 when cold-pending exists),
6-line knowledge-graph schema with read-time backward-compat and
write-time gradual upgrade.

README + Session Flow diagram + Applied Learning Science table synced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nd status docs

- Phase 4 now runs warm quiz on review-slot nodes (path B was unreachable
  before — review flow ended at Socratic Q&A with no quiz step).
- Review-slot warm passes do NOT trigger R5 cold-pending; only new-concept
  warm passes do. Review-slot uses Phase 2.0 rotation rule for format.
- Phase 3 review branch now explicitly hands off to Phase 4.
- Phase 2.0 cap-decision wording clarified: cap is fixed at session start,
  remaining pending count only feeds the agenda announcement.
- README status table for `understood` now lists both path A (cold) and
  path B (review-slot escape valve).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@minsoo-web minsoo-web merged commit 374498d into main May 1, 2026
3 checks passed
@minsoo-web minsoo-web deleted the feat/personal-tutor-desirable-difficulty branch May 1, 2026 14:28
minsoo-web added a commit that referenced this pull request May 1, 2026
)

The desirable-difficulty hooks shipped in #22 add new user-facing
capability (cross-session retrieval verification, Generation Effect,
mechanical Iron Rule #6 enforcement) while remaining backward-compatible
with existing knowledge graphs (read-time defaults + touched-node-only
migration). Minor bump per SemVer.

Marketplace bumps to 1.10.0 to mirror the plugin's minor change.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant