Skip to content

signalnodes/backscroll

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Backscroll

Discord in, structured retrieval out.

Backscroll is a self-hostable, Postgres-backed Discord ingestion and retrieval stack. It connects to Discord, backfills message history into a clean Postgres schema, and gives you fast full-text search, filtering, export, and a reliable foundation for downstream analytics or AI workflows.

It is infrastructure, not a product. No agents, no AI summaries, no decision-making -- just solid data ingestion and retrieval you can build on top of.


What it does

  • Connects to Discord via a bot token (REST API, no Gateway required for backfill)
  • Ingests full message history for any channel or guild you have access to
  • Stores everything in Postgres with a clean, queryable schema
  • Supports resumable sync -- safe to interrupt and restart
  • Incrementally syncs new messages after initial backfill
  • Full-text search across messages with channel, author, and time filters
  • Exports messages as JSON or NDJSON for downstream pipelines

Quickstart

1. Prerequisites

2. Install

git clone https://github.com/signalnodes/backscroll.git
cd backscroll
npm install
npm run build
npm link   # makes `backscroll` available globally

Or run directly with npx tsx src/index.ts during development.

3. Configure

cp .env.example .env

Edit .env:

DATABASE_URL=postgresql://user:password@localhost:5432/backscroll
DISCORD_BOT_TOKEN=your-bot-token-here

4. Initialize

Create the database and run migrations:

createdb backscroll   # or via psql
backscroll init

5. Verify bot access

backscroll auth
# Authenticated as: YourBot

Commands

backscroll init

Initialize the database and run migrations.

backscroll init
backscroll init --migrate-only

backscroll auth

Validate your Discord bot token.

backscroll auth

backscroll guilds

List guilds (servers) the bot has access to.

backscroll guilds
backscroll guilds --json

backscroll channels <guild-id>

List channels in a guild.

backscroll channels 1234567890123456789
backscroll channels 1234567890123456789 --type text
backscroll channels 1234567890123456789 --type all --json

backscroll sync

Sync message history. On first run, performs a full backfill. On subsequent runs, picks up new messages incrementally.

# Sync all text channels in a guild (full backfill)
backscroll sync --guild 1234567890123456789

# Sync specific channels
backscroll sync 111111111111111111 222222222222222222

# Incremental sync of all previously synced channels
backscroll sync

# Force full re-backfill
backscroll sync --full --guild 1234567890123456789

# Control concurrency (default: 3)
backscroll sync --guild 1234567890123456789 --concurrency 5

# Dry run: see what would be synced
backscroll sync --guild 1234567890123456789 --dry-run

Sync is resumable -- if interrupted, the next run continues from where it left off.

backscroll status

Show sync progress for all tracked channels.

backscroll status
backscroll status --guild 1234567890123456789
backscroll status --json

backscroll search <query>

Full-text search across ingested messages.

backscroll search "deployment checklist"
backscroll search "postgres schema" --channel 111111111111111111
backscroll search "launch" --author 999999999999999999 --after 2024-01-01
backscroll search "bug" --context 3   # show 3 messages before/after each match
backscroll search "release" --limit 50 --json

Flags:

  • --channel <id> -- filter by channel (repeatable)
  • --author <id> -- filter by author
  • --after <date> -- messages after date (YYYY-MM-DD or ISO8601)
  • --before <date> -- messages before date
  • --context <n> -- show N messages before/after each match
  • --limit <n> -- max results (default: 25)
  • --json -- JSON output

backscroll export

Stream messages to a file or stdout as JSON or NDJSON.

# Export a channel to NDJSON (default)
backscroll export --channel 111111111111111111 --output messages.ndjson

# Export to JSON
backscroll export --channel 111111111111111111 --format json --output messages.json

# Export to stdout and pipe
backscroll export --channel 111111111111111111 | jq '.content'

# Export with filters
backscroll export --after 2024-01-01 --before 2024-06-01 --output h1-2024.ndjson

NDJSON output is streamed in batches -- memory-efficient for large exports.

backscroll stats

Basic message counts and analytics.

backscroll stats
backscroll stats --guild 1234567890123456789
backscroll stats --json

Schema

Backscroll uses a Postgres schema designed for queryability and correctness.

Table Description
guilds Discord server metadata
channels Channel metadata, linked to guilds
users Discord user/author data
messages Core message table with full-text search vector
attachments File attachments linked to messages
message_mentions User mentions within messages
sync_cursors Per-channel sync state and checkpoints

The messages table uses a GENERATED ALWAYS AS stored tsvector column for zero-maintenance full-text search, with a GIN index. Postgres 12+ required.


Discord bot setup

  1. Go to discord.com/developers/applications
  2. Create a new application
  3. Go to Bot → enable the bot
  4. Under Privileged Gateway Intents, enable Message Content Intent
  5. Copy the bot token into your .env
  6. Go to OAuth2 → URL Generator:
    • Scopes: bot
    • Bot permissions: Read Messages/View Channels, Read Message History
  7. Use the generated URL to invite the bot to your server

Self-hosting

Backscroll is designed to run anywhere Node.js and Postgres are available. There are no external service dependencies.

For a minimal production setup:

  1. Provision a Postgres database (local, Fly.io, Railway, Supabase, etc.)
  2. Set DATABASE_URL and DISCORD_BOT_TOKEN as environment variables
  3. Run backscroll init to apply the schema
  4. Run backscroll sync --guild <id> to start ingesting

For scheduled incremental sync, add a cron job or systemd timer that runs:

backscroll sync

Roadmap

Phase 2

  • HTTP/REST API for programmatic access
  • pg_trgm fuzzy/partial matching
  • Gateway connection for real-time message capture
  • Edit/delete tracking
  • Docker Compose for one-command self-hosting

Phase 3

  • pgvector semantic search
  • Multi-format export (CSV, Parquet)
  • Thread ingestion
  • Scheduled sync with built-in cron

License

MIT

About

Postgres-backed Discord ingestion and retrieval stack for search, analysis, and AI workflows.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors