Discord in, structured retrieval out.
Backscroll is a self-hostable, Postgres-backed Discord ingestion and retrieval stack. It connects to Discord, backfills message history into a clean Postgres schema, and gives you fast full-text search, filtering, export, and a reliable foundation for downstream analytics or AI workflows.
It is infrastructure, not a product. No agents, no AI summaries, no decision-making -- just solid data ingestion and retrieval you can build on top of.
- Connects to Discord via a bot token (REST API, no Gateway required for backfill)
- Ingests full message history for any channel or guild you have access to
- Stores everything in Postgres with a clean, queryable schema
- Supports resumable sync -- safe to interrupt and restart
- Incrementally syncs new messages after initial backfill
- Full-text search across messages with channel, author, and time filters
- Exports messages as JSON or NDJSON for downstream pipelines
- Node.js 20+
- PostgreSQL 12+
- A Discord bot token (Discord Developer Portal)
git clone https://github.com/signalnodes/backscroll.git
cd backscroll
npm install
npm run build
npm link # makes `backscroll` available globallyOr run directly with npx tsx src/index.ts during development.
cp .env.example .envEdit .env:
DATABASE_URL=postgresql://user:password@localhost:5432/backscroll
DISCORD_BOT_TOKEN=your-bot-token-here
Create the database and run migrations:
createdb backscroll # or via psql
backscroll initbackscroll auth
# Authenticated as: YourBotInitialize the database and run migrations.
backscroll init
backscroll init --migrate-onlyValidate your Discord bot token.
backscroll authList guilds (servers) the bot has access to.
backscroll guilds
backscroll guilds --jsonList channels in a guild.
backscroll channels 1234567890123456789
backscroll channels 1234567890123456789 --type text
backscroll channels 1234567890123456789 --type all --jsonSync message history. On first run, performs a full backfill. On subsequent runs, picks up new messages incrementally.
# Sync all text channels in a guild (full backfill)
backscroll sync --guild 1234567890123456789
# Sync specific channels
backscroll sync 111111111111111111 222222222222222222
# Incremental sync of all previously synced channels
backscroll sync
# Force full re-backfill
backscroll sync --full --guild 1234567890123456789
# Control concurrency (default: 3)
backscroll sync --guild 1234567890123456789 --concurrency 5
# Dry run: see what would be synced
backscroll sync --guild 1234567890123456789 --dry-runSync is resumable -- if interrupted, the next run continues from where it left off.
Show sync progress for all tracked channels.
backscroll status
backscroll status --guild 1234567890123456789
backscroll status --jsonFull-text search across ingested messages.
backscroll search "deployment checklist"
backscroll search "postgres schema" --channel 111111111111111111
backscroll search "launch" --author 999999999999999999 --after 2024-01-01
backscroll search "bug" --context 3 # show 3 messages before/after each match
backscroll search "release" --limit 50 --jsonFlags:
--channel <id>-- filter by channel (repeatable)--author <id>-- filter by author--after <date>-- messages after date (YYYY-MM-DD or ISO8601)--before <date>-- messages before date--context <n>-- show N messages before/after each match--limit <n>-- max results (default: 25)--json-- JSON output
Stream messages to a file or stdout as JSON or NDJSON.
# Export a channel to NDJSON (default)
backscroll export --channel 111111111111111111 --output messages.ndjson
# Export to JSON
backscroll export --channel 111111111111111111 --format json --output messages.json
# Export to stdout and pipe
backscroll export --channel 111111111111111111 | jq '.content'
# Export with filters
backscroll export --after 2024-01-01 --before 2024-06-01 --output h1-2024.ndjsonNDJSON output is streamed in batches -- memory-efficient for large exports.
Basic message counts and analytics.
backscroll stats
backscroll stats --guild 1234567890123456789
backscroll stats --jsonBackscroll uses a Postgres schema designed for queryability and correctness.
| Table | Description |
|---|---|
guilds |
Discord server metadata |
channels |
Channel metadata, linked to guilds |
users |
Discord user/author data |
messages |
Core message table with full-text search vector |
attachments |
File attachments linked to messages |
message_mentions |
User mentions within messages |
sync_cursors |
Per-channel sync state and checkpoints |
The messages table uses a GENERATED ALWAYS AS stored tsvector column for zero-maintenance full-text search, with a GIN index. Postgres 12+ required.
- Go to discord.com/developers/applications
- Create a new application
- Go to Bot → enable the bot
- Under Privileged Gateway Intents, enable Message Content Intent
- Copy the bot token into your
.env - Go to OAuth2 → URL Generator:
- Scopes:
bot - Bot permissions:
Read Messages/View Channels,Read Message History
- Scopes:
- Use the generated URL to invite the bot to your server
Backscroll is designed to run anywhere Node.js and Postgres are available. There are no external service dependencies.
For a minimal production setup:
- Provision a Postgres database (local, Fly.io, Railway, Supabase, etc.)
- Set
DATABASE_URLandDISCORD_BOT_TOKENas environment variables - Run
backscroll initto apply the schema - Run
backscroll sync --guild <id>to start ingesting
For scheduled incremental sync, add a cron job or systemd timer that runs:
backscroll syncPhase 2
- HTTP/REST API for programmatic access
- pg_trgm fuzzy/partial matching
- Gateway connection for real-time message capture
- Edit/delete tracking
- Docker Compose for one-command self-hosting
Phase 3
- pgvector semantic search
- Multi-format export (CSV, Parquet)
- Thread ingestion
- Scheduled sync with built-in cron
MIT