Skip to content

metisos/doublecheck

Doublecheck

Real-time fact-checking for live podcasts and video streams.

Doublecheck listens to any podcast or video stream, detects factual claims as they're made, and verifies them against authoritative sources — in seconds, not hours. Drop in a URL and watch the verdicts unfold in a live sidebar.

License Node


How it works

  Browser tab audio  ──►  Sidecar WS  ──►  Deepgram (transcript)
                                              │
                                              ▼
                                       Haiku (claim detection)
                                              │
                                              ▼
                              Exa (search)  +  Haiku (verdict)
                                              │
                                              ▼
                              Sonnet (optional deep-insight pass)
                                              │
                                              ▼
                                  SSE  ──►  Dashboard sidebar

For YouTube: the browser captures the tab's audio and streams it to the server. For podcast / MP3 / RSS URLs: the server downloads and transcribes directly.


Quick start

Prerequisites

  • Node.js ≥ 18.18
  • ffmpeg on $PATH (required for podcast/MP3/RSS sources)
  • yt-dlp on $PATH (optional; only needed for non-YouTube downloads)
git clone https://github.com/metisos/doublecheck.git
cd doublecheck
npm install
cp .env.example .env.local
# fill in the keys (see below)
npm run dev

Open http://localhost:3000.


Required services

Service What it does Free tier Key
Deepgram Streaming transcription Yes — $200 credit DEEPGRAM_API_KEY
Exa Neural search for evidence sources Yes EXA_API_KEY
Anthropic Claude Haiku 4.5 (extraction + verdict) and Sonnet 4.6 (optional deep-insight pass) Pay-as-you-go ANTHROPIC_API_KEY

Optional enhancements

Service What it adds Key
Twelve Labs "VISUAL" cross-reference from an indexed video library TWELVELABS_API_KEY + TWELVELABS_INDEX_ID
Cloudflare Turnstile Invisible abuse-protection challenge on session creation TURNSTILE_SECRET_KEY + NEXT_PUBLIC_TURNSTILE_SITE_KEY

Leave the optional keys unset and the app still works perfectly — those features just aren't rendered.


Project layout

app/                   Next.js App Router routes
  api/                 Session, SSE events, internal IPC for the sidecar
  dashboard/           The live UI
components/            UI components — shared, landing/, dashboard/
lib/
  audio/               ffmpeg + yt-dlp audio extraction
  transcript/          Rolling-window transcript buffer
  claims/              Claim-detection prompt + Haiku client
  research/            Exa + Twelve Labs + query rewriter
  verdict/             Verdict synthesizer + deep-analysis pass
  memory/              Railroad per-source persistent memory
  pipeline/            Per-session orchestrator + shared types
  rate-limit.ts        Per-IP, global, and per-URL abuse protection
  spend.ts             Rolling 24h Anthropic spend tracker
scripts/
  audio-server.ts      WebSocket sidecar that bridges browser audio → Deepgram
public/
  audio-worklet.js     Float32 48kHz → Int16 16kHz mono PCM downsampler
examples/              Sample deployment configs (nginx, Turnstile setup)

Design rules

A few non-obvious choices that contributors should respect. PRs that violate these will be asked to change.

  • Speaker reputation is display-only, forever. Per-speaker accuracy stats may appear alongside verdicts, but they NEVER bias confidence, verdict, or the prompts that produce them. Confidence always means "how strongly does the evidence support this verdict." This is a hard contract, not a v1 simplification.
  • No mock data in production routes. Empty states until real data arrives.
  • All optional AI integrations must degrade gracefully when their keys are absent. Twelve Labs is the canonical example — without keys, the "VISUAL" badge simply doesn't render.

Deployment

The repo ships an example nginx vhost in examples/nginx.conf.example with the recommended rate-limit zones and SSE tuning. Cloudflare Turnstile setup is documented in examples/turnstile.md.

Operator-specific configs (real nginx vhosts, env files, deploy scripts) belong in a local-only deploy/ directory at the repo root — the layout is gitignored except for a brief README explaining the pattern.

Rate-limit defaults

All knobs are env-overridable. Defaults:

Limit Default Env var
Sessions per IP per hour 10 RATELIMIT_SESSIONS_PER_HOUR
Concurrent sessions per IP 2 RATELIMIT_CONCURRENT_SESSIONS_PER_KEY
Concurrent SSE streams per IP 4 RATELIMIT_CONCURRENT_SSE_PER_KEY
Global concurrent orchestrators 50 RATELIMIT_GLOBAL_CONCURRENT_SESSIONS
Rolling 24h Anthropic spend cap $50 RATELIMIT_DAILY_SPEND_CAP_CENTS
Distinct IPs before a URL is throttled 5 RATELIMIT_URL_DISTINCT_IPS

Loopback requests (127.0.0.1, ::1) bypass all limits — keeps local development unfettered.


Contributing

Issues and pull requests welcome. See CONTRIBUTING.md for the dev setup and PR process, CODE_OF_CONDUCT.md for community expectations, and SECURITY.md for vulnerability disclosure.


License

Apache License 2.0 — see LICENSE and NOTICE.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors