Skip to content

Provenance and Auditability

Marty McEnroe edited this page May 22, 2026 · 1 revision

Provenance and Auditability

Why does Clio extract from the DOM rather than from a provider's API?

The API path is unavailable, and that is actually load-bearing

Of the three supported LLM providers, none currently offers a public, stable, individually-authenticated conversation-export API for the consumer chat product. The provider's web UI is the canonical surface; the DOM is the canonical artifact.

If a public export API existed and Clio used it, that would still be a different artifact:

  • An API would return the provider's normalized view: server-side data structure, server-side sanitization
  • The DOM is the rendered view: exactly what the user saw, with all the contextual signals the UI shows (badges, thinking sections, artifact widgets, expansion states)

For an AI-governance use case — what did the user actually see when they made decisions based on the LLM's output — the rendered view is the right artifact. The API view would be the provider's interpretation of the conversation; the rendered view is the user's interpretation, which is what audit trails are meant to capture.

What "auditability" means here

If someone in the future needs to ask "what did the assistant tell you?" — a regulator, a colleague, the user's own future self — the local archive can answer authoritatively. Three properties matter:

  1. Completeness — Clio fails closed if no messages are found; partial captures don't silently produce misleading records (SECURITY.md documents this)
  2. Structure — the JSON output is stable enough to query (message index, role, content, thinking, attachments); not just a screenshot
  3. Locality — the archive lives on the user's disk, immune to provider revisions of the same conversation later

The third property matters for governance specifically: a conversation that was later moderated or removed from the provider's UI can still be reviewed if the user had archived it.

What this doesn't claim

Clio's archive is not a legal evidentiary record. It is not signed, not timestamped by a trusted third party, not hash-chained. A determined party could modify the JSON locally. For applications that need stronger evidentiary properties — sealed timestamps, write-once storage, regulatory-grade retention — Clio's output is a starting point for those workflows, not an endpoint.

Related

Clone this wiki locally