Releases: product-on-purpose/thinking-framework-skills
v0.11.0 - contested lenses
v0.11.0
Contested lenses: we now run the famous-but-weak frameworks, and tell you the truth about them. The library is built on honest evidence grading, so for years it refused to ship famous methods the research does not support (SWOT, Five Whys, ACH, and friends) and instead published a why-not dossier. People kept asking for them by name. v0.11.0 adds a better answer than a flat "no": 56 evidence-graded core skills, plus 7 contested lenses we grade honestly and hand you caveat-first.
For everyone
- Seven famous frameworks, run honestly. SWOT, Five Whys, Eisenhower / MoSCoW / Pareto, a descriptively-named Cynefin sort, Reflective Equilibrium, Analysis of Competing Hypotheses, and QCA now run when you ask for them by name. Each one leads with its weak evidence, then either produces the artifact with the discipline it usually lacks (a SWOT that prunes, tags, and matches into options; a Five Whys that stops honestly when the problem is multi-cause) or, for the methods testing found actively harmful (ACH, QCA), warns you and routes you to a better-grounded move instead of reproducing the discredited artifact.
- They never get in the way. A contested lens is explicit-request-only: the Framework Advisor will never reach for one on a generic prompt. A trigger eval confirms it - 0 false-fires, every generic prompt routed to the stronger core skill.
- Honest framing, not catalog padding. The headline is still the 56 evidence-graded core skills. The 7 contested lenses are counted and reported as their own cohort, clearly marked, so "honest grading" stays honest.
For contributors
- A new conformance-gate layer (
check-contested.mjs, the 9th) makes "caveat-first" a checked contract, not an authoring style: the deficiency must lead the skill and every artifact, a branded lens must carry its trademark attribution, and tier X may now ship only as a contested lens. Cynefinships under a descriptive name (think-complexity-domain-sort) to keep the trademark out of the invocation; the old/library/cynefin/URL redirects.
v0.10.0 - learn-by-example depth + agent discovery
v0.10.0
Find it by example, and let agents find it at all. A worked example for every framework, a cross-library Showcase that hands off to pm-skills, the agent-discovery index switched on, and the behavioral-eval numbers refreshed across the full 56-skill catalog. No new frameworks; the catalog stays 56.
For everyone
- A quick worked example for every framework. The new Samples shelf gives each of the 56 skills one compact, end-to-end example - a real situation, the prompt, and the full artifact - so you can scan the whole library by example. Example coverage is now 56 of 56.
- A cross-library Showcase: tfs decides, pm-skills delivers. Three companies (Storevine, Brainshelf, Workbench) each take one feature from a raw decision to a launch call, then hand the reasoning artifact off to the matching pm-skills delivery artifact - so you can follow one company across both libraries. See the "Browse by company" section of the Showcase.
- The trust page is current. Both behavioral evals were re-run across all 56 skills (routing: 99% top-1, 0 false-fires across 673 cases; artifact quality: 99% of 389 checks, 53 of 56 skills perfect), and the Does this actually work? page now reflects the full catalog, not the earlier 47-skill run.
For builders
- The agent-discovery index is switched on. v0.9.0 published a machine-readable catalog; this release makes it discoverable -
llms.txtis now linked fromrobots.txtand every page, and a newllms-full.txtinlines the whole catalog (every component plus the 79 not-shipped methods) so an agent can ingest it in one fetch. red-team-lightre-graded P -> M (transferred). Its core move - construct the strongest contrary case - is the well-studied "consider the opposite" debiasing technique (Lord, Lepper & Preston 1984; Mussweiler et al. 2000; Hirt & Markman 1995), so its grade rises a notch, with the honest caveats kept: group-dissent research does not transfer, and the evidence is human-subject, not AI-validated.
v0.9.0 - Discoverable by agents
Discoverable by agents: the library now publishes a machine-readable index other AI agents can read to find and route to the right thinking skill.
This release makes the library legible to other software, not just people. An agent (or a crawler) can fetch a single index and learn every skill, what each produces, when to use it, and how to chain it - no scraping, no guessing. It also folds in the measurement loop and the example-coverage gate that landed after v0.8.0. No new frameworks; the catalog stays at 56.
For everyone
- An
llms.txtindex at the site root. Following the llmstxt.org convention, the site now serves a clean, linked index of every skill, tool, and recipe, grouped by cognitive job, plus the key getting-started pages. Point an AI assistant at it and it can discover and route to the library on its own. - A "Was this page helpful?" prompt on every docs page. A lightweight feedback widget (no tracking by default, no backend) that offers a one-click signal or a pre-filled GitHub issue, so the docs improve from real use.
For builders
- Two machine-readable catalogs.
catalog.jsonlists the 69 invokable components (56 skills + 4 tools + 9 recipes) with the fields an agent needs to route and chain - mechanism, when-to-use, the artifact each produces, evidence tier, recipe membership, likely companions, and a live URL.evaluated.jsonprojects all 135 graded methods, so the 79 the library evaluated and chose not to ship are available in context, each linking to its dossier. Both are generated from the existing sources of truth and validated against the live route set, so every link resolves. - Drift-gated like everything else. A new 8th conformance-gate layer regenerates the three artifacts and reds CI if the committed copies are stale, so the catalog can never silently fall behind the registry. The manifest diff for this release is version-only.
- Example-coverage ratchet. Every shipped skill must now have a worked example (a Showcase appearance or a sample) or be explicitly grandfathered; a new skill with no example reds the build.
Full technical changelog: CHANGELOG.md.
v0.8.0 - Learn by example
Learn by example: watch the frameworks work on real decisions, and see the numbers behind the claims.
This release adds the part a newcomer most wants - proof. You can now watch real decisions worked end to end, see real prompts in the styles people actually type, and read the measured evidence that the library routes and produces what it promises. No new frameworks; the catalog stays at 56. This is a documentation and trust release.
For everyone
- A Showcase of real decisions, prompt to finished artifact. Three people work hard problems start to finish: a founder deciding fast, an engineer making an architectural call, and a policy analyst deliberating on paper. Each page shows the exact prompt typed and the full artifact it produced - a ranked risk register, a weighted option matrix, an argument map, a stakeholder trade-off grid - so you can judge the quality before you run anything. Sixteen worked journeys, including full recipe chains and runs done entirely by hand.
- "Does this actually work?" - we measured it. A new page publishes the behavioral-eval results: the catalog routes the right framework for a situation 99% of the time with zero false-fires across 561 cases, and the artifacts meet their own quality bar on 99% of 315 checks. It also says plainly what the numbers do not prove.
- A prompt gallery, so your messy prompt is fine. Real prompts in three styles: a one-line casual ask, a structured block, or just describing the mess to the advisor. A sparse prompt produces the same complete artifact as a polished one, because the framework does the structuring.
- An operating guide. "Using the frameworks" takes you from running one framework to chaining several like a power user.
For builders
- The example surfaces are hand-authored pages on the existing Astro Starlight site; nothing about the install surface, the skills, or the manifests changed (the manifest diff is version-only).
- The behavioral-eval harness is reproducible and runs without an API key (
scripts/eval/); every number on the trust page traces to a committed JSON you can audit. - Catalog-count drift is now a hard CI failure: the count gate was extended to the repo-facing docs and the README prose counts.
Full notes: RELEASE-NOTES.md. Technical detail: CHANGELOG.md.
v0.7.1 - complete the Framework Library (45 to 75 dossiers)
The Framework Library is now complete: every method we evaluated and chose not to ship has an honest, browsable page.
- +30 documented "no"s (the library now holds 75 dossiers). The famous methods the library considered and did not ship as standalone skills - because they fold into something already shipped, carry a trademark or weak-evidence caveat, or do not survive on the merits - each now has a sourced page explaining the call. Among them: SWOT, Five Whys, Cynefin, Wardley Mapping, Jobs-to-be-Done, Porter's Five Forces, Blue Ocean, OODA, MECE, Multi-Criteria Decision Analysis, Key Assumptions Check, Double-Crux, Devil's Advocacy, How Might We, and more. "We considered it and said no, and here is exactly why" is now the rule, not the exception - documentation only, with no method's verdict changed.
v0.7.0 - behavioral evals + catalog 47 to 56 + ethics family
The library now measures its own behavior, and ships its largest catalog jump yet (now 56) with a new ethics family.
- +9 frameworks (now 56), including a new family: Ethics & Values Deliberation. Three new methods take a moral trade-off as the input and reason to a defensible position across everyone affected: Veil-of-Ignorance Reasoning (decide as if you had an equal chance of being any affected party), the Ethical Matrix (grid the stakeholders against wellbeing, autonomy, and fairness), and Speculative Harms & Anti-Goals (assume your success and name who it harms). The other six: Dialectical Bootstrapping (improve an estimate by disagreeing with yourself and averaging), Interval Calibration Check (test whether your confidence intervals are really as wide as your certainty), Consider the Unknowns (weigh what you cannot see before you commit), Process Tracing (test rival causes of a single case by the diagnostic weight of each clue), Argumentation Schemes (name the argument pattern, then ask its standard critical questions), and Interest-Based Negotiation (the library's first method for a decision with a counterparty).
- The library now measures its own behavior. Two evals run across every skill: a trigger eval (does the right skill fire for a situation - 561 cases, zero false-fires, 99% top-1) and an output eval (does a skill, once run, produce an artifact that meets its own bar - 99% of checks passed). They are model-executed and reproducible. The four skills the output eval flagged were tightened so the evidence caveat now ships with the artifact by construction, and re-scored 100%.
- Twenty new documented "no"s, including the famous personality tests. The candidates that did not clear the bar are each published in the Framework Library with their reasoning - among them honest, sourced "why we do not ship this" pages for MBTI, CliftonStrengths, DISC, the Enneagram, and learning styles, graded on what the psychometric evidence actually shows.
- Honesty held at scale. Of 30 candidates researched and adversarially re-checked, 9 became skills (a 70% fold/recipe/reject rate); independent research and verification agreed on all 30, with three grades knocked down to stop an adjacent claim's evidence from inflating the method's own. One new skill ships at an openly anecdotal grade, and says so on its face.
v0.6.0 - Catalog expansion (phase 2)
Catalog expansion, phase 2: seven new methods (now 47), and the rest of the candidate field honestly resolved.
- +7 frameworks (now 47). Four practitioner-grade: Role-Storming (generate ideas as someone else, to get past your own self-censorship), Morphological Analysis (lay a solution's choices out as a grid and recombine them), Pairwise Comparison (rank options head-to-head when you cannot score them on a scale), and Minimax Regret (choose under deep uncertainty by minimizing your worst-case regret). Three honest C-tier methods (conceptually strong, not yet study-backed): Three Horizons, Causal Layered Analysis, and Contradiction / Tension Mapping.
- Two recipes. Kepner-Tregoe and PDCA / A3 ship as workflow chains of existing skills rather than as new methods, because that is honestly what they are.
- Seventeen documented "no"s. The candidates that did not clear the bar - folded into a method that already covers them, or rejected on the merits - are each published in the Framework Library with their reasoning. The Library now holds 25 such dossiers, so "we considered it and said no" stays browsable.
- Honesty held at scale. Of 26 candidates researched and adversarially re-checked, only 7 became skills (a 73% fold/reject rate). Breadth never trumped the grade - the rejections are as much the product as the additions.
v0.5.0 - Catalog expansion
Catalog expansion: six new thinking methods, a new family, and four documented "no"s you can actually read.
- +6 frameworks (now 40). Three are problem-framing methods the research engine discovered, graded, and the library built end to end: Contradiction Resolution (dissolve a trade-off instead of splitting the difference), Boundary Critique (audit who a frame includes and excludes), and Frame Creation (reframe the problem by analogy). Three come from the latest shortlist: Theory of Constraints (find and exploit the single binding bottleneck), the Expected-Value Decision Tree (price the uncertainty and see what would flip the call), and Scenario Planning (stress-test a strategy against a set of divergent futures).
- A new family: Strategy & Opportunity. Scenario Planning opens the library's 11th cognitive-operation family.
- Four honest "no"s, each with a full dossier. Inversion and FMEA-lite fold into Premortem; a generic cognitive-bias checklist and the PR-FAQ decision memo are rejected on the merits. Every one is published in the Framework Library with its sources and the reasoning.
- Honest grading, visibly enforced. Two methods that arrived looking like "moderate evidence" were downgraded to "practitioner" once the research was read closely - the strong studies measured an adjacent claim, not the move itself.
Install / update: /plugin install thinking-framework-skills@product-on-purpose
Full changelog: https://github.com/product-on-purpose/thinking-framework-skills/blob/main/CHANGELOG.md
v0.4.0 - Framework Library platform
The Framework Library platform: a trustworthy catalog you can browse, plus tools that put it to work.
- A published Framework Library - every evaluated thinking method in one honest catalog, with per-method learning dossiers (what the evidence does and does not show, with graded sources), browsable by family.
- Tools, kept honestly separate from the methods - the Framework Advisor, Top-3, and Random-Frameworks now live in their own
/tools/section rather than masquerading as graded frameworks (no evidence badge a router has no business showing). - A research engine that grades honestly -
think-research-frameworkresearches a candidate method, grades it conservatively on the seven-tier model, checks overlap with what ships, and proposes a catalog entry for review. Never auto-adds. - A more trustworthy advisor - the insufficient-signal gate was rewritten and re-measured so it stops over-asking; routing sub-grade stays an honest C.
- Clearer evidence badges - a method's library badge shows its full compound grade (e.g.
S/M) to match its dossier. - Registry single source of truth + strong CI, and a registry-era documentation refresh (architecture, contributor guide, a repeatable release process).
- Catalog: Fishbone/Ishikawa re-vetted and folded into Issue Trees, with a published rejected-with-reasoning dossier.
Full technical detail: CHANGELOG.md. Install: /plugin install thinking-framework-skills@product-on-purpose.