-
Notifications
You must be signed in to change notification settings - Fork 0
Provenance and Auditability
Why does Clio extract from the DOM rather than from a provider's API?
Of the three supported LLM providers, none currently offers a public, stable, individually-authenticated conversation-export API for the consumer chat product. The provider's web UI is the canonical surface; the DOM is the canonical artifact.
If a public export API existed and Clio used it, that would still be a different artifact:
- An API would return the provider's normalized view: server-side data structure, server-side sanitization
- The DOM is the rendered view: exactly what the user saw, with all the contextual signals the UI shows (badges, thinking sections, artifact widgets, expansion states)
For an AI-governance use case — what did the user actually see when they made decisions based on the LLM's output — the rendered view is the right artifact. The API view would be the provider's interpretation of the conversation; the rendered view is the user's interpretation, which is what audit trails are meant to capture.
If someone in the future needs to ask "what did the assistant tell you?" — a regulator, a colleague, the user's own future self — the local archive can answer authoritatively. Three properties matter:
-
Completeness — Clio fails closed if no messages are found; partial captures don't silently produce misleading records (
SECURITY.mddocuments this) - Structure — the JSON output is stable enough to query (message index, role, content, thinking, attachments); not just a screenshot
- Locality — the archive lives on the user's disk, immune to provider revisions of the same conversation later
The third property matters for governance specifically: a conversation that was later moderated or removed from the provider's UI can still be reviewed if the user had archived it.
Clio's archive is not a legal evidentiary record. It is not signed, not timestamped by a trusted third party, not hash-chained. A determined party could modify the JSON locally. For applications that need stronger evidentiary properties — sealed timestamps, write-once storage, regulatory-grade retention — Clio's output is a starting point for those workflows, not an endpoint.
- User Data Sovereignty — the framing that motivates this
- Known Limitations — what Clio does not claim
- PRIVACY.md — the user-facing commitment
Three Pillars (CIA)
- Privacy Architecture — Confidentiality
- Provenance and Auditability — Integrity
- Availability and Denial of Access — Availability
Topics
Source docs