DocuVoice

Talk to your documents. Surface what matters.

AI-powered voice-first document analysis platform that transforms hours of manual review into a 5-minute conversation.

Quick Start • Architecture • Features • Tech Stack • Documentation

The Problem

Enterprise professionals — insurance adjusters, paralegals, financial analysts — spend 30+ hours per week manually cross-referencing documents, hunting for discrepancies, and writing findings reports. It's slow, error-prone, and soul-crushing.

The Solution

DocuVoice lets you upload documents and have a voice conversation with an AI agent that has already read, understood, and cross-referenced everything. It surfaces discrepancies, anomalies, and red flags in real-time — while you talk.

Upload a claim file, a policy, medical bills, and a police report. Ask: "Are there any inconsistencies between the claimant's statement and the police report?" — and get an instant, cited answer.

Features

Voice-First Interface — Natural conversation powered by Amazon Nova Sonic 2 (speech-to-speech, 1M token context)
Intelligent Document Processing — OCR, field extraction, and cross-document analysis via AWS Bedrock + LlamaParse
Real-Time Findings — Discrepancies, anomalies, red flags surfaced during conversation with severity ratings and confidence scores
Domain Plugins — Specialized analysis for insurance claims, legal contracts, financial due diligence, and HR compliance
Function Tools — Agent can search documents, compare fields, calculate exposure, and flag red flags mid-conversation
Live Transcript — Full session transcript with document references and tool call annotations
3-Panel Workspace — Voice orb + transcript | resizable document panel | extracted fields and findings

Architecture

┌─────────────────────────────────────────────────────────────┐
│                        Frontend                             │
│              Next.js 16 · React 19 · Tailwind v4            │
│         shadcn/ui · Zustand · LiveKit Client SDK            │
└──────────────────────┬──────────────────────────────────────┘
                       │ REST API + WebSocket
┌──────────────────────▼──────────────────────────────────────┐
│                        Backend                              │
│                  FastAPI · Pydantic v2                       │
│                                                             │
│  ┌─────────────┐  ┌──────────────┐  ┌───────────────────┐  │
│  │  Workspace   │  │   Document   │  │    LiveKit        │  │
│  │  Management  │  │  Processing  │  │    Token + Room   │  │
│  └──────┬──────┘  └──────┬───────┘  └───────┬───────────┘  │
│         │                │                   │              │
│  ┌──────▼────────────────▼───────────────────▼───────────┐  │
│  │              Repository Layer (DynamoDB / Memory)      │  │
│  └───────────────────────────────────────────────────────┘  │
└──────────────────────┬──────────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────────┐
│                      Voice Agent                            │
│           LiveKit Agents v1.4 · Nova Sonic 2                │
│                    Silero VAD                                │
│                                                             │
│  ┌────────────┐  ┌──────────────┐  ┌─────────────────────┐ │
│  │  Domain     │  │   Context    │  │   Function Tools    │ │
│  │  Plugins    │  │   Injector   │  │   (search, compare, │ │
│  │             │  │              │  │    flag, summarize)  │ │
│  └────────────┘  └──────────────┘  └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────────────┐
│                     AWS Services                            │
│  S3 (uploads) · DynamoDB (data) · Bedrock (AI) · Textract  │
└─────────────────────────────────────────────────────────────┘

Tech Stack

Layer	Technology
Frontend	Next.js 16, React 19, TypeScript, Tailwind CSS v4, shadcn/ui, Zustand, LiveKit Client
Backend	FastAPI, Python 3.13+, Pydantic v2, UV package manager
Voice Agent	LiveKit Agents v1.4, Amazon Nova Sonic 2, Silero VAD
AI/ML	AWS Bedrock (Nova Pro), Instructor, LlamaParse, PyMuPDF
Database	DynamoDB (single-table design)
Storage	S3 (presigned uploads)
Infra	Docker Compose, App Runner, Vercel, LiveKit Cloud

Project Structure

docuvoice/
├── frontend/          # Next.js 16 app (pages, components, stores, hooks)
├── backend/           # FastAPI server (API, services, processing, repositories)
├── agents/            # LiveKit voice agent (plugins, tools, context injection)
├── infra/             # AWS setup scripts (S3, DynamoDB, IAM)
├── files/             # Sample documents for demo (insurance, legal, HR)
├── docs/              # Project documentation
│   ├── architecture/  #   Tech stack, system design
│   ├── design/        #   UX design system, component mapping
│   └── planning/      #   Task plans, roadmap
└── docker-compose.yml # Full-stack orchestration

Quick Start

Prerequisites

Python 3.13+
Node.js 20+
UV (Python package manager)
pnpm (Node package manager)
AWS account with Bedrock access enabled

1. Clone and configure

git clone <repo-url> && cd docuvoice

Create environment files from the examples:

cp backend/.env.example backend/.env
cp agents/.env.example agents/.env
cp frontend/.env.example frontend/.env

Fill in your AWS credentials, LiveKit keys, and LlamaParse API key.

2. AWS infrastructure

cd infra && bash setup.sh

This creates the S3 bucket, DynamoDB table, and verifies Bedrock model access.

3. Start the backend

cd backend
uv sync
uv run python -m uvicorn app.main:app --reload --port 8000

4. Start the voice agent

cd agents
uv sync
uv run python -m agent.entrypoint dev

5. Start the frontend

cd frontend
pnpm install
pnpm dev

Open http://localhost:3000 — create a workspace, upload documents, and start talking.

Docker (all-in-one)

docker compose up --build

How It Works

Upload Documents ──► Processing Pipeline ──► Workspace Ready ──► Voice Session
                     │                                            │
                     ├─ Text extraction (PyMuPDF / LlamaParse)    ├─ Nova Sonic 2 (speech-to-speech)
                     ├─ Field extraction (Bedrock Nova Pro)       ├─ Context injection (1M tokens)
                     ├─ Cross-document analysis                   ├─ Function tool calls
                     └─ Findings generation                       └─ Real-time findings

Upload — Drop PDFs, images, or DOCX files into a workspace
Process — Backend extracts text, identifies fields, detects anomalies, and generates cross-document findings
Talk — Connect to a voice session; the agent has full context of all documents and findings
Discover — Ask questions, drill into discrepancies, get cited answers with confidence scores
Review — Findings, transcripts, and extracted fields are persisted for audit

Domain Plugins

Domain	Document Types	Key Analysis
Insurance Claims	FNOL, policies, medical bills, police reports	Timeline inconsistencies, coverage gaps, exposure calculation
Legal Contracts	MSAs, amendments, NDAs, playbooks	Clause deviations, missing protections, liability exposure
Financial DD	P&L, balance sheets, audit reports, cap tables	Revenue anomalies, EBITDA adjustments, working capital issues
HR Compliance	Complaints, witness statements, policies, employee files	Pattern detection, policy violations, credibility assessment

API Overview

The backend exposes a RESTful API at /api/v1/:

Endpoint	Description
`POST /workspaces`	Create a new workspace
`POST /workspaces/:id/documents/upload`	Upload a document
`POST /workspaces/:id/prepare`	Trigger document processing pipeline
`GET /workspaces/:id/preparation-status`	Poll processing progress
`GET /workspaces/:id/extracted-fields`	Get all extracted fields
`GET /workspaces/:id/findings`	Get cross-document findings
`POST /livekit/token`	Generate voice session token
`GET /sessions/:id/transcript`	Get session transcript
`GET /workspaces/:id/context`	Get pre-built agent context

Full API docs available at http://localhost:8000/docs when running.

Environment Variables

Backend (backend/.env)

APP_ENV=development
APP_PORT=8000
CORS_ORIGINS=["http://localhost:3000"]
STORAGE_BACKEND=memory          # or "dynamodb"

# AWS
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_DEFAULT_REGION=us-east-1
S3_BUCKET_NAME=
DYNAMODB_TABLE_NAME=docuvoice-main

# AI
BEDROCK_MODEL_ID=us.amazon.nova-pro-v1:0
LLAMA_CLOUD_API_KEY=

# LiveKit
LIVEKIT_URL=
LIVEKIT_API_KEY=
LIVEKIT_API_SECRET=

Agent (agents/.env)

LIVEKIT_URL=
LIVEKIT_API_KEY=
LIVEKIT_API_SECRET=
BACKEND_URL=http://localhost:8000

# AWS
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_DEFAULT_REGION=us-east-1
S3_BUCKET_NAME=
DYNAMODB_TABLE_NAME=docuvoice-main

Frontend (frontend/.env)

NEXT_PUBLIC_API_URL=http://localhost:8000
NEXT_PUBLIC_LIVEKIT_URL=

Documentation

Document	Description
Tech Stack	Full technical architecture, module dependency map, API spec
System Design	System design document (v1.0)
UX Design System	Color system, component mapping, animation tokens
Use Cases	Detailed personas, workflows, and plugin specifications
Frontend Tasks	Development roadmap and task breakdown

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.claude		.claude
agents		agents
backend		backend
deploy		deploy
docs		docs
files		files
frontend		frontend
infra		infra
.gitignore		.gitignore
README.md		README.md
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocuVoice

The Problem

The Solution

Features

Architecture

Tech Stack

Project Structure

Quick Start

Prerequisites

1. Clone and configure

2. AWS infrastructure

3. Start the backend

4. Start the voice agent

5. Start the frontend

Docker (all-in-one)

How It Works

Domain Plugins

API Overview

Environment Variables

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DocuVoice

The Problem

The Solution

Features

Architecture

Tech Stack

Project Structure

Quick Start

Prerequisites

1. Clone and configure

2. AWS infrastructure

3. Start the backend

4. Start the voice agent

5. Start the frontend

Docker (all-in-one)

How It Works

Domain Plugins

API Overview

Environment Variables

Documentation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages