Skip to content

feat(crossposting): source-quality ranker + diversity cap (Phase 2)#209

Merged
ohld merged 1 commit intoproductionfrom
chore/crossposting-source-quality-spec
Apr 28, 2026
Merged

feat(crossposting): source-quality ranker + diversity cap (Phase 2)#209
ohld merged 1 commit intoproductionfrom
chore/crossposting-source-quality-spec

Conversation

@ohld
Copy link
Copy Markdown
Member

@ohld ohld commented Apr 28, 2026

Summary

Phase 2 iteration on the channel crossposting ranker — published spec (specs/completed/crossposting-source-quality.md) plus full implementation, tests, and review.

Behavior changes:

  • Per-source quality multiplier (0.5×–2.0×) using AVG(forwards × √(views/100)) over mature 30-day crossposting history per meme_source_id
  • Self-calibrating median normalization (PERCENTILE_CONT(0.5)) — set-and-forget, no hardcoded thresholds
  • Diversity cap: ≤1 post per source per 24h on each channel, filtered to confirmed posts (telegram_message_id IS NOT NULL)
  • In-bot share boost: (1 + min(invited_count, 10) × 0.1) — caps at ×2.0 for memes with 10+ user shares
  • Image-only filter in source-quality CTE (excludes April video-era confound)
  • Bot-engagement quality floor (nlikes >= 5) preserved per channel premise: "best of bot, not newest of source"
  • score_version=2 tag on new posts → enables clean v1/v2 comparison

Production correctness fixes (from codex review iters):

  • src/flows/rewards/uploaded_memes.py: moved log_meme_sent to after bot.send_media_group so reward posts now record real telegram_message_id and properly participate in the diversity cap
  • Failed pre-send rows no longer poison the cap (telegram_message_id IS NOT NULL filter)
  • log_meme_sent keeps ON CONFLICT DO NOTHING semantics — preserves the original mature crossposting sample when reward flows repost an already-crossposted meme

Graceful degradation:

  • Ranker returns None when filters exhaust the pool → flow logs warning and skips slot (no Sentry alert)
  • Cold-start safety: if no source has 5+ mature image posts, source-quality multiplier falls through to neutral 1.0

Verification

  • ruff check src/ tests/ — clean
  • ruff format src/ tests/ — clean
  • pytest tests/test_crossposting_meme.py13/13 pass (4 new ranker tests + 9 existing)
  • All SQL CTEs verified on prod data:
    • RU: 7 qualifying sources, median signal 33.35, 5 sources blocked by cap, top-5 candidates valid
    • EN: 4 qualifying sources, median 1.67, 4 blocked, top-5 valid
  • Prod health pre-merge: 0 app errors / 0 Prefect failures / 0 Sentry crossposting issues

Success measurement

After this lands, baseline before next changes:

  • RU: 16.1 avg forwards/post (n=73 mature, last 14d)
  • EN: 2.7 avg forwards/post (n=61)

Re-read at +14d post-ship. Pass = v2 ≥ 1.15× v1 forwards on at least one channel + no >10% drop in views. Query in spec.

Review trail

  • /plan-ceo-review — selective expansion mode, scope/strategy
  • /plan-eng-review — architecture, tests, performance
  • Codex outside-voice (8 findings, 5 accepted, 1 rejected on user product principle)
  • ralphex — 4 implementation tasks + 9 codex iterations + 2 claude review rounds
  • Final codex iter-9: NO ISSUES FOUND

Test plan

  • All 13 unit + integration tests pass (host-mode pytest)
  • All ranker SQL CTEs return non-empty results on prod data
  • No regressions in pre-merge prod health
  • Post-merge: trigger Post to TG Channel RU flow once, verify a meme posts cleanly
  • Post-merge: trigger Post to TG Channel EN flow once, verify a meme posts cleanly
  • Post-merge: monitor Sentry + prod logs for 30min after first successful posts

Deferred (TODOS)

  • VK measurement layer
  • Channel-share deep-link signal (sc_%) materialized into meme_stats
  • Re-test videos as soft penalty post-baseline (after 2026-05-12)
  • Drop dead caption IS NULL multiplier

- specs/crossposting-source-quality.md: Phase 2 ranker iteration spec
  (CEO + Eng + Codex outside-voice reviewed). Adds per-source quality
  multiplier from crossposting forwards/views, diversity cap (≤1
  post/source/24h), invited_count signal boost, image-only filter.
  score_version=2. Spec is implementation input for ralphex.
- .gitignore: add .worktrees/ — per-developer git worktrees should
  never be committed (4 active worktrees were dirtying status).
@ohld
Copy link
Copy Markdown
Member Author

ohld commented Apr 28, 2026

STAFF ENGINEER REVIEW: APPROVED — pure docs + gitignore PR. Spec doc + .worktrees/ ignore. /review + /codex review both clean. /cso skipped (no sensitive surfaces touched).

One non-blocking note: the spec has two duplicated 'final SQL' blocks with different nlikes thresholds (>= 5 vs >= 2) — confusion risk for the implementation PR (ralphex), not a merge blocker.

CI: lint pass, test pending — --auto will gate the squash-merge until test passes.

@ohld ohld merged commit 533b06d into production Apr 28, 2026
3 checks passed
@ohld ohld changed the title chore(crossposting): add source-quality spec + gitignore .worktrees feat(crossposting): source-quality ranker + diversity cap (Phase 2) Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant