Probabilistic-UI

The design vocabulary for AI surfaces that are honest about uncertainty.

This repository is the design-systems layer behind a connected trilogy of agentic-UX code prototypes — Sentinel, Recourse, and Helm — and an evolving set of notes on the patterns that should be re-used across any product surface where a probabilistic system meets a human decision.

It is not a component library you can install yet. It is the explicit vocabulary the three demos already share, written down so it can be re-used.

The premise

Probabilistic systems — LLMs, classifiers, search ranking, retrievers, agents — produce outputs that are correct with some probability and wrong with the rest. Most products today render that probability badly: a raw percentage, a "low confidence" highlight, or worse, nothing at all. The user is then asked to trust the output, override it, or escalate it without the information that decision actually requires.

Probabilistic-UI is the attempt to fix that at the primitive layer. The thesis is short:

AI claims become trustworthy only when their uncertainty is legible and their basis is checkable.

The same handful of primitives — applied with discipline — can carry that across radiology workstations, contract review, insurance appeals, agentic developer tools, and every other surface where a confident-looking output might be wrong.

The primitives, so far

These are the load-bearing pieces that already recur across all three trilogy demos. Naming and visual treatment are deliberately shared so a reviewer who has seen one project recognizes the same primitive in the next.

1. Calibrated confidence (language over numbers)

A small badge in three or four bands — High / Likely / Unsure / Low — with the exact percentage one hover away. The bands match how clinicians, lawyers, and analysts already speak; the percentage is for the moment they need it.

Why not raw percentages by default: numbers create false precision and reviewer fatigue. 73% and 74% look different and are not, behaviorally, different. Bands compress that without losing the signal.

Audience-shifted variant: for non-experts (Recourse), the bands flip to action verbs — Settled / You verify / Ask a lawyer. Same scale, different vocabulary. Confidence is for the reader, not the model.

2. The cross-hatch primitive

A diagonal cross-hatch background — visually distinct from any solid color in the system — used to mark a category of failure that demands a different response, not just a degree of worry.

The same primitive is used in three contexts across the trilogy:

Project	Cross-hatch means
Sentinel	This claim is ungrounded / hallucinated — not just "low confidence." Don't treat it like the others.
Recourse	This citation is fabricated or unverified. Don't put it in the appeal letter.
Helm	This tool call is irreversible — pushes, deletes, network. Can never be auto-allowed.

The point is the cross-hatch is learned once and recognized everywhere. A designer adding a new probabilistic surface to a product doesn't get to pick their own way of flagging the dangerous category — the cross-hatch already means "different kind, respond differently."

3. Evidence anchors

The antidote to hallucination is provenance. Every probabilistic claim should be paired with a checkable source the reader can verify in the same view — a clinical reference, a statute excerpt, a file path, a search result. If the surface can't produce an anchor, the claim should be flagged as ungrounded (see cross-hatch).

Anchors are bidirectional where possible: hovering the claim lights up the region it came from; clicking the region jumps to the claim.

4. Reversibility as the policy axis

For agent actions specifically, the right scale isn't "is this safe?" — it's what is the recovery cost?

Band	Definition	Treatment
`trivial`	read-only, no side effect	Auto-allow always
`reversible`	can be undone with another step	Auto-allow if policy permits
`danger`	affects shared state (push, network, send)	Never auto-allow — explicit human approval
`destructive`	cannot be undone (force-push, delete)	Explicit approval + extra confirm

This is the spine of Helm's ApprovalGate and translates one-for-one into any agentic surface.

5. The verdict shape

A single, reusable verdict object for each thing the human decides on:

verdict = Accept | Edit | Reject | Abort
+ rationale (free text, optional but encouraged for edits/rejects)
+ timestamp + reviewer
+ before / after (if edit)

This is the unit of the audit drawer in Sentinel, the decision record in Helm, and the mailing record in Recourse. Three different surfaces, same shape — and the same shape feeds into compliance, reviewer-performance metrics, and continuous-improvement loops for the underlying model.

6. Progressive disclosure

A probabilistic surface lies if it shows everything at once or shows nothing.

The pattern: the minimum that's load-bearing renders inline (confidence band, the claim, a single evidence chip). Hover reveals the next layer (exact percentage, top-k alternatives, statute excerpt). Click opens the full panel (model rationale, full provenance, edit history). The reader pulls more detail only when something asks for it.

How the three demos use the same vocabulary

Primitive	Sentinel	Recourse	Helm
Calibrated confidence	`Likely / Unsure / Low` (expert language)	`Settled / You verify / Ask a lawyer` (action verbs)	`ConfidenceTag` on agent intent
Cross-hatch	Hallucination chip	Fabricated-statute chip	Irreversibility chip on `danger`/`destructive`
Evidence anchors	`EvidenceLink` to clinical / legal source	Statute chip with operative excerpt + verified-on date	Diff or command surfaced inside the `ApprovalGate`
Verdict shape	Accept / Edit / Reject per claim	Verify / Send per claim	Allow / Edit / Reject / Abort per tool call
Audit object	`AuditEntry` per decision	`MailingRecord` + deadline ledger	`DecisionRecord` per step
Progressive disclosure	Inline → hover top-k → drawer rationale	Plain-language gloss → excerpt → full statute	Tool kind → diff → cost notice

A reviewer who learns the vocabulary on one project can read the others at a glance. That's the whole point.

What's not in this repo yet

This is the vocabulary, not a published component library. The shipped components live in their respective trilogy repos for now:

React primitives → sentinel-react (in Human-in-the-Loop)
ApprovalGate / ReversibilityChip → Helm
Statute / cadence primitives → Recourse

A future iteration of this repo will extract the shared bits into a single installable package with the canonical types and CSS variables. For now, the README is the contract.

Why it matters for product surfaces beyond the trilogy

The same vocabulary applies cleanly to:

Search & retrieval surfaces — "this result is plausible / this result is hallucinated" is the Sentinel pattern applied to RAG output.
Agentic workflows in any domain (code, ops, support, design tooling) — Helm's reversibility-as-axis carries over without modification.
Personalized / recommendation surfaces — calibrated bands beat raw scores for the same reasons.
Compliance & audit-heavy enterprise SaaS — the verdict shape is already the audit primitive.

If you're building any of those, the patterns here are designed to be picked up and re-used. Open an issue or message me — happy to talk specifics.

Related work

Sentinel (Human-in-the-Loop) · embeddable inline AI oversight (demo)
Recourse · consumer AI vs institutional loops (demo)
Helm · real-time oversight of LLM agents (demo)

— Ankur Sinha

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Probabilistic-UI

The premise

The primitives, so far

1. Calibrated confidence (language over numbers)

2. The cross-hatch primitive

3. Evidence anchors

4. Reversibility as the policy axis

5. The verdict shape

6. Progressive disclosure

How the three demos use the same vocabulary

What's not in this repo yet

Why it matters for product surfaces beyond the trilogy

Related work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Probabilistic-UI

The premise

The primitives, so far

1. Calibrated confidence (language over numbers)

2. The cross-hatch primitive

3. Evidence anchors

4. Reversibility as the policy axis

5. The verdict shape

6. Progressive disclosure

How the three demos use the same vocabulary

What's not in this repo yet

Why it matters for product surfaces beyond the trilogy

Related work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages