DReader

Discord message archival system with thread reconstruction capabilities. Scrapes channels via browser automation and stores messages in SQLite for analysis.

QuickStart

Prerequisites: Bun 1.3+, Claude Code CLI with --chrome flag

# 1. Clone and install
git clone https://github.com/justSteve/DReader.git
cd DReader
bun install

# 2. Start Claude Code with Chrome browser control
claude --chrome
# Log into Discord in the browser window that opens

# 3. Start the API server
bun run dev

# 4. Start scraping a channel
curl -X POST http://localhost:3001/api/scrape/start \
  -H "Content-Type: application/json" \
  -d '{"channel_id": "YOUR_CHANNEL_ID", "scrape_type": "initialization"}'

Architecture

┌─────────────────────┐     ┌──────────────────┐     ┌─────────────┐
│ Claude Chrome       │────▶│ Scrape           │────▶│ SQLite DB   │
│ Extension           │     │ Orchestrator     │     │ (bun:sqlite)│
└─────────────────────┘     └──────────────────┘     └─────────────┘
         │                          │                       │
         │                          ▼                       ▼
    Browser DOM            ┌──────────────────┐     ┌─────────────┐
    Extraction             │ Hono REST API    │◀────│ Thread      │
                           │ :3001            │     │ Analyzer    │
                           └──────────────────┘     └─────────────┘

Components

Component	Purpose
ClaudeChromeController	Browser automation via Claude's Chrome extension
InitializationScrapeOrchestrator	Historical scraping (backward paging)
IncrementalScrapeOrchestrator	New message scraping (forward paging)
DatabaseService	SQLite storage with bun:sqlite
ThreadAnalyzer	Reconstruct conversation trees from reply chains

Scrape Modes

Initialization (Historical)

Scrapes backward in time to capture channel history.

# Full history
curl -X POST http://localhost:3001/api/scrape/start \
  -d '{"channel_id": "123", "scrape_type": "initialization"}'

# With date cutoff
curl -X POST http://localhost:3001/api/scrape/start \
  -d '{"channel_id": "123", "scrape_type": "initialization", "oldest_date_required": "2024-01-01"}'

Incremental (New Messages)

Scrapes forward from last known message to capture new activity.

curl -X POST http://localhost:3001/api/scrape/start \
  -d '{"channel_id": "123", "scrape_type": "incremental"}'

REST API

Base URL: http://localhost:3001/api

Scrape Operations

Method	Endpoint	Description
`POST`	`/scrape/start`	Start new scrape job
`POST`	`/scrape/resume/:id`	Resume interrupted job
`GET`	`/scrape/status/:id`	Get job status
`GET`	`/scrape/channel/:id/status`	Get channel scrape status

Data Access

Method	Endpoint	Description
`GET`	`/messages`	List messages (paginated)
`GET`	`/messages/:id`	Get single message
`GET`	`/servers`	List scraped servers
`GET`	`/channels`	List scraped channels

Thread Analysis (Deferred)

Method	Endpoint	Description
`GET`	`/threads/:id`	Get thread tree (returns 501)

Database Schema

SQLite database with these core tables:

Table	Purpose
`servers`	Discord servers
`channels`	Discord channels
`messages`	Scraped messages with metadata
`scrape_jobs`	Job tracking with progress state

Key Columns in `scrape_jobs`

oldest_scraped_id - Earliest message captured
newest_scraped_id - Latest message captured
scrape_scope - "initialization" or "incremental"
status - pending, running, completed, failed

Configuration

Environment Variables

Variable	Default	Description
`PORT`	3001	API server port
`DB_PATH`	`./data/dreader.db`	SQLite database path

config.json

{
  "scrape": {
    "scroll_delay_ms": 1000,
    "max_scrolls": 100,
    "messages_per_batch": 50
  }
}

Project Structure

DReader/
├── src/
│   ├── api/                    # Hono REST API
│   │   ├── index.ts            # Server entry point
│   │   └── routes/             # Route handlers
│   ├── domain/
│   │   ├── scrape-engine/      # Browser control & orchestration
│   │   │   ├── ClaudeChromeController.ts
│   │   │   ├── InitializationScrapeOrchestrator.ts
│   │   │   └── IncrementalScrapeOrchestrator.ts
│   │   ├── metadata-capture/   # DOM extraction
│   │   ├── thread-reconstruction/  # Thread tree building
│   │   └── models/             # TypeScript types
│   ├── services/               # Database layer
│   │   ├── DatabaseService.ts
│   │   └── schema.sql
│   └── cli/                    # CLI tools
│       └── auth-setup.ts
├── packages/                   # Shared packages
├── tests/                      # Test suites
├── bunfig.toml                 # Bun configuration
└── package.json

Development

# Install dependencies
bun install

# Run tests (50 passing, 11 skipped)
bun test

# Start dev server
bun run dev

# Type check
bun run typecheck

Authentication

DReader uses Claude Code's Chrome browser for Discord authentication:

Start Claude Code with Chrome: claude --chrome
Navigate to Discord in the browser
Log in with your Discord account
Keep Claude Code running during scraping

No Discord API access (no bot token, no OAuth, no REST API). All retrieval is computer-use: browser session automation or native app control.

Documentation

Code Review - Architecture assessment and recommendations
Implementation Plan - Refactoring status and roadmap

Limitations

Thread analysis is deferred (returns 501)
Requires Claude Code with --chrome flag
Windows-focused (PowerShell scripts)
No rate limiting on API endpoints yet

Tech Stack

Runtime: Bun 1.3+
API: Hono
Database: SQLite (bun:sqlite)
Browser: Claude Chrome Extension
Language: TypeScript (ESNext)

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
.beads		.beads
.claude		.claude
.runtime		.runtime
.vscode		.vscode
docs		docs
scripts		scripts
src		src
tests/retrieval		tests/retrieval
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
SSH_TEST.md		SSH_TEST.md
jest.config.js		jest.config.js
package.json		package.json
pyproject.toml		pyproject.toml
session-summary.md		session-summary.md
setup-windows.ps1		setup-windows.ps1
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DReader

QuickStart

Architecture

Components

Scrape Modes

Initialization (Historical)

Incremental (New Messages)

REST API

Scrape Operations

Data Access

Thread Analysis (Deferred)

Database Schema

Key Columns in `scrape_jobs`

Configuration

Environment Variables

config.json

Project Structure

Development

Authentication

Documentation

Limitations

Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DReader

QuickStart

Architecture

Components

Scrape Modes

Initialization (Historical)

Incremental (New Messages)

REST API

Scrape Operations

Data Access

Thread Analysis (Deferred)

Database Schema

Key Columns in scrape_jobs

Configuration

Environment Variables

config.json

Project Structure

Development

Authentication

Documentation

Limitations

Tech Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Key Columns in `scrape_jobs`

Packages