Skip to content

mikeylong/judgmentkit

Repository files navigation

JudgmentKit

JudgmentKit is a fresh activity-first kernel for AI-generated interface work.

It is not a beautifier, design-system compliance layer, prompt library, schema browser, or MCP reference surface. Those may exist later as adapters. The core job is to help an agent generate or critique UI that is relevant, succinct, and appropriate to the activity it supports.

Product Thesis

AI-generated UI fails when the implementation model becomes the user experience. Tables become screens, schemas become forms, tool calls become buttons, and internal prompts become product vocabulary.

JudgmentKit gives the agent a better order of operations:

  1. Understand the activity.
  2. Translate the activity into interaction responsibilities.
  3. Decide what implementation detail should stay hidden, be translated, or appear only as diagnostics.
  4. Generate or critique the UI.
  5. Apply visual system choices only after the activity and interaction model are sound.

Aesthetics are adapter-layer work. They should refine a relevant UI, not rescue a broken one.

Kernel

  • ActivityModel: the activity system the UI enters.
  • InteractionContract: the specific user actions, decisions, state changes, and success criteria the UI must support.
  • DisclosurePolicy: the vocabulary and visibility rules that prevent implementation leakage.
  • JudgmentExample: a before/after case that calibrates what good and bad generated UI look like.

Architecture

JudgmentKit keeps the core deterministic and lets model assistance enter through explicit seams:

  1. Deterministic analyzer: extracts activity evidence, implementation terms, review questions, and disclosure risks from a brief.
  2. Deterministic review packet: turns that evidence into a reviewable activity model candidate with guardrails.
  3. Model-assisted candidate review seam: accepts a model-proposed candidate through dependency injection or MCP and runs the same guardrails.
  4. Provider-neutral proposer adapter: builds a serializable activity-model request for an injected model caller and returns the proposed candidate to the review seam.
  5. UI workflow candidate review seam: accepts a model- or agent-proposed workflow candidate and checks grounding, action support, handoff clarity, and disclosure containment before UI implementation.
  6. UI generation handoff gate: turns only ready workflow reviews into compact handoffs for the next UI generation pass.
  7. Optional provider adapters: provider configuration and network calls stay outside the kernel and feed proposed candidates back through the same review contract.

Structure

  • AGENTS.md: operating rules for agents working in this repository.
  • DESIGN.md: activity-first judgment contract.
  • specs/: product and interface specs for the kernel.
  • contracts/: machine-readable activity and disclosure contracts.
  • docs/: daily workflow guidance for agents and local usage.
  • examples/: copyable briefs and candidate fixtures for CLI and MCP checks.
  • tests/: checks that protect the kernel from drifting back to aesthetic-first or implementation-first work.

First Workflow

The first workflow is AI UI generation. It starts with one contract:

  • contracts/ai-ui-generation.activity-contract.json

The first validation command is:

npm test

For daily local use:

npm run mcp:smoke
judgmentkit review --input examples/refund-triage.brief.txt

For a hosted MCP install:

curl -fsSL https://judgmentkit.ai/install | bash
curl -fsSL https://judgmentkit.ai/install | bash -s -- --client claude
curl -fsSL https://judgmentkit.ai/install | bash -s -- --client cursor

From a checkout, the same installer can be dry-run locally:

npm run install:mcp -- --client codex --dry-run
npm run install:mcp -- --client claude --dry-run
npm run install:mcp -- --client cursor --dry-run

Optional OpenAI Responses smoke checks are opt-in:

JUDGMENTKIT_OPENAI_SMOKE=1 \
OPENAI_API_KEY=... \
JUDGMENTKIT_OPENAI_MODEL=... \
npm run smoke:openai-ui-workflow

For a deterministic one-shot before/after demo:

npm run demo:one-shot

That command also writes examples/demo/one-shot-demo.html for visual review.

For an early standalone comparison harness:

npm run demo:comparison

That command writes two independently runnable apps plus a manifest under examples/comparison/. Use it for qualitative paired comparisons of the raw brief baseline versus the JudgmentKit handoff path.

For a music-app standalone comparison:

npm run demo:comparison:music

That command writes a dinner-playlist brief, two independently runnable apps, a manifest, and a facilitator scorecard under examples/comparison/music/.

To score the committed comparison artifacts as a deterministic paired UI-generation eval:

npm run eval:ui

That command writes JSON and HTML reports under evals/reports/. It is qualitative paired-artifact evidence, not a statistically powered benchmark.

For the system-map model UI matrix:

npm run demo:model-ui

That command writes a static refund-triage matrix under examples/model-ui/refund-system-map/, including deterministic, Gemma 4 local LLM, and GPT-5.5 branches with and without the example-only design-system adapter. The website build copies those committed artifacts, records provenance in the manifest, and does not call live providers.

For the replacement website build:

npm run site:build

That command writes static routes for /, /docs/, /examples/, and /install under site/dist/. The public /mcp route is served by the hosted Streamable HTTP MCP function and returns metadata for browser GET requests.

For local site review with the same /mcp behavior:

npm run site:dev -- --host 127.0.0.1 --port 4173

That command rebuilds site/dist, serves static routes locally, and routes localhost /mcp through the same Streamable HTTP handler used in production.

About

Activity-first judgment layer for AI-generated product work

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors