Skip to content

feat(analytics): analytics subcommand group (digest shim, quiet, trends)#47

Open
mvanhorn wants to merge 4 commits intosteipete:mainfrom
mvanhorn:feat/analytics-subcommand-group
Open

feat(analytics): analytics subcommand group (digest shim, quiet, trends)#47
mvanhorn wants to merge 4 commits intosteipete:mainfrom
mvanhorn:feat/analytics-subcommand-group

Conversation

@mvanhorn
Copy link
Copy Markdown

Simulated Demo

Summary

Introduces discrawl analytics as a namespace for activity-style queries. Three subcommands ship:

  • analytics digest - delegates to the existing digest implementation, so discrawl digest is unchanged
  • analytics quiet - channels with no activity in the lookback window (archive candidates), default --since 30d
  • analytics trends - week-bucketed message counts per channel, zero-filled across the window, default --weeks 8

Ports vincentkoc/slacrawl#13 (merged 2026-04-23). Same SQL recipes adapted to discrawl's Discord schema.

Stacking

This PR is stacked on top of #46 (feat(cli): digest command). Until #46 merges, the diff here includes both sets of changes. Once #46 lands, this PR narrows to just the analytics work. Happy to wait for #46 review before this one moves; happy to combine into a single PR if that is preferred.

Why this matters

  • Issue Using discrawl as memory augmentation for AI agents #15 (@codexGW's writeup) describes building three Discord bots on top of discrawl with custom Python that asks two questions discrawl currently has no built-in answer for: which channels are silent, and how is volume changing week over week. quiet and trends answer them.
  • discrawl already has the schema, indexes (idx_messages_guild_created_id), and the internal/report/ package needed. The SQL is one query each; no schema changes.
  • Discord's own UI does not show silent-channel discovery or week-over-week volume per channel. The local archive is in a strictly better position to answer both.

Output

$ discrawl analytics quiet --since 30d

Quiet channels (no activity in the last 30d)

  bot-test       (text)        last: never       silent: -
  releases       (text)        last: 2026-03-10  silent: 46d
  off-topic      (text)        last: 2026-03-22  silent: 34d

Totals: 3 channels
$ discrawl analytics trends --weeks 4

Channel       Wk1   Wk2   Wk3   Wk4
general        12    18    24    31
incidents       3     1     -     7
releases        -     -     -     2
off-topic       8     5     2     -

Window: 2026-03-28 to 2026-04-25
$ discrawl analytics
Usage: discrawl analytics <subcommand> [flags]

Subcommands:
  digest  Per-channel activity summary for a window.
  quiet   Channels with no activity in the lookback window.
  trends  Week-over-week message counts per channel.

Flags

  • analytics quiet: --since (default 30d), --guild. Inherits --json and --plain from the root CLI.
  • analytics trends: --weeks (default 8), --guild, --channel. Same root-CLI inheritance.
  • analytics digest: same flags as digest (delegates to runDigest, identical behavior).

Scope

  • No schema changes. Uses existing messages / channels tables.
  • No new dependencies.
  • Both quiet and trends filter to message-bearing channel kinds (text, announcement, thread_public, thread_private, thread_announcement) so category and voice channels do not appear as never-active or all-zero rows. Mirrors the syncer's messageChannelKinds() predicate.
  • New: internal/report/quiet.go, internal/report/quiet_test.go, internal/report/trends.go, internal/report/trends_test.go, internal/cli/analytics.go, internal/cli/analytics_test.go.
  • Touched: internal/cli/cli.go (one switch case), internal/cli/output.go (printPlain + printHuman cases for Quiet and Trends, plus analytics in usage), README.md, SPEC.md.

Test plan

  • gofmt -l . clean
  • go vet ./... clean
  • go build ./cmd/discrawl
  • go test ./... passes (11 packages, all green)

Self-reviewed via codex review before pushing. The first review pass flagged that quiet and trends would surface category and voice channels as never-active, which would be misleading. Fixed in a follow-up commit on this branch by adding the kind filter described in Scope.

Open questions

  1. Namespace name: analytics vs insights vs stats. Slacrawl went with analytics; happy to rename.
  2. internal/report/ vs new internal/analytics/ package: kept everything in internal/report/ alongside report.go and digest.go to match the current organization. If you would prefer to carve out an internal/analytics/ package, easy follow-up PR.
  3. Default --since for quiet: went with 30d (slacrawl's default). For Discord's larger guilds with many low-activity channels, 60d or 90d may be more useful. Happy to change.
  4. quiet order: zero-activity (silent: -) first, then by name alphabetically. The slacrawl version sorts by last-message ascending which puts oldest first; this implementation puts never-active first since they are the strongest archive candidates. Happy to flip if you prefer slacrawl's ordering.

Subsequent phases (not in this PR)

Slacrawl's plan doc had health and response-times as phase 3, then threads-stale and activity as phase 4. Same idea here; happy to follow up if these land well.

This contribution was developed with AI assistance.

Adds discrawl digest, a per-channel activity summary over a time window.
discrawl already has report for repo-wide README dumps and messages /
search for retrieval; digest answers what happened in this guild over
the last N days, per channel.

Ports vincentkoc/slacrawl#9 (merged 2026-04-22). Same SQL recipe,
adapted to discrawl's Discord schema (guild_id, members, mention_events)
and the existing stdlib-flag CLI dispatch.

This contribution was developed with AI assistance.
RankedCount is reused by digest's top_posters/top_mentions slices.
Without JSON tags, those nested entries serialized as {Name, Count}
while the rest of the digest schema uses snake_case ({channel_id,
messages, ...}). Tag the fields so --json output is consistent.

Surfaced by codex review on the previous commit.
Introduces discrawl analytics as a namespace for activity-style queries.
Three subcommands ship:
- analytics digest: delegates to the existing digest implementation, so
  discrawl digest is unchanged
- analytics quiet: channels with no activity in the lookback window
  (archive candidates), default --since 30d
- analytics trends: week-bucketed message counts per channel,
  zero-filled across the window, default --weeks 8

Ports vincentkoc/slacrawl#13 (merged 2026-04-23). Same SQL recipes
adapted to discrawl's Discord schema. Stacked on top of the digest PR
so analytics digest can shim to runDigest and share the implementation.

This contribution was developed with AI assistance.
Both quiet and trends queries left-joined the full channels table,
which includes category and voice channels. Those rows can never
have synced messages, so quiet surfaced them as never-active archive
candidates and trends emitted all-zero rows for them. Filter to the
message-bearing kinds the syncer ingests:
- text
- announcement
- thread_public
- thread_private
- thread_announcement

Forum parents are excluded since the syncer's messageChannelKinds()
also excludes them. Forum threads (kind='thread_public') are still
included.

Surfaced by codex review on the previous commit.
@chatgpt-codex-connector
Copy link
Copy Markdown

To use Codex here, create a Codex account and connect to github.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant