feat(start): document onboarding contract + ghost-user monitor (FFM-907) by ohld · Pull Request #224 · ffmemes/ff-backend

ohld · 2026-05-02T21:36:31Z

Summary

Pattern-level fix for FFM-907 — Sega's kitchen-branch leak (PR #222) was a symptom; this PR makes the same regression impossible to silently re-introduce and adds an alarm so we never depend on a friend complaining again.

Audit

handle_start does seven side effects every new user MUST get. After PR #222 they're all hoisted above the deep_link ladder:

#	Side effect	Universal?
1	`user_tg` upsert	✅ above ladder (`save_user_data`)
2	`user` upsert	✅ above ladder
3	`deep_link` saved on `user_tg`	✅ above ladder
4	`user_info` cache populated	✅ above ladder
5	`user_deep_link_log` row	✅ above ladder
6	Admin chat notification	✅ above ladder
7	`user_language` rows seeded	✅ above ladder (PR #222)

Per-branch behaviour (handle_language_settings, handle_invited_user, next_message, etc.) is intentional and stays inside the ladder.

What's in this PR

Contributor comment at the top of handle_start spelling out the rule: hoist universal side effects above the ladder; never bury them inside per-branch blocks. Cites the FFM-907 incident so future agents/humans understand the why.
src/flows/monitors/ghost_users.py — Prefect flow, runs every minute. Counts users created 1–5 min ago without a user_meme_reaction row (= WARN) and >5 min ago (= ERROR), posts a single summary to the admin chat with sample IDs. Filters out blocked-acquisition deep_links (silent drop is by design — they don't even create a user row, but the SQL is defensive).
tests/tgbot/test_start.py — regression coverage per deep_link variant: bare /start, kitchen, wrapped, giveaway_77, s_*_* share-link, likefollowbot blocked, plus existing-user path and idempotency of the lazy lang init. Each asserts the row-level post-conditions in user_tg + user + user_language + user_deep_link_log.

Why a Prefect flow vs in-process timer

Per-request scheduling would burn an event-loop slot per /start and lose state on bot restart. A 1-min Prefect cron is one query against user + user_meme_reaction (both already covered by indexes on created_at / user_id), reuses the existing failure-alert path, and is easy to pause/inspect.

Test plan

CI: tests/tgbot/test_start.py passes (8 cases)
After deploy: Prefect UI shows Monitor ghost users running every minute
Manually trigger by creating a user row with no reactions — confirm WARN appears in admin chat within 60s
No false positives on a normal day (most ghosts come from the language-picker drop-off; if rate is too noisy, raise the WARN threshold)
Confirm user.created_at is real wall-clock (not now() skew across replicas)

🤖 Generated with Claude Code

…s (FFM-907) PR #222 plugged the kitchen-branch leak (Sega's case). This audit ensures no future deep_link branch can re-introduce the same drift, and adds a runtime alarm so we don't have to wait for a friend to complain. - Contributor-facing comment in handle_start documenting the rule: every universal onboarding side effect lives ABOVE the deep_link ladder. New side effects must hoist, not bury inside per-branch blocks. - src/flows/monitors/ghost_users.py: Prefect flow runs every minute, counts new users (1-5min ago = WARN, >5min ago = ERROR) with no user_meme_reaction row, posts a single summary to admin chat. Filters out blocked-acquisition deep_links (intentional silent drop). - tests/tgbot/test_start.py: regression coverage for each known deep_link variant (none / kitchen / wrapped / giveaway_77 / s_*_* / blocked-acquisition / existing user). Asserts user_tg + user + user_language + user_deep_link_log rows after every created=True path, plus idempotency of the lazy lang init. Co-Authored-By: Paperclip <noreply@paperclip.ing>

ohld · 2026-05-05T09:42:09Z

STAFF ENGINEER REVIEW: APPROVED — Structural review clean. SQL safety verified: _build_exempt_clause interpolates Python constants only (BLOCKED_ACQUISITION_CHANNELS = frozenset({'likefollowbot'}), prefix tuple ('tapps',)), no user input. Tests cover every deep_link branch with row-level invariants. Documentation comment in handle_start correctly captures the FFM-907 contract.

Codex adversarial findings (non-blocking, suggested follow-ups)

Alert fatigue from language-picker drop-off — monitor will fire WARN/ERROR every minute even on healthy days because users who never finish onboarding count as ghosts. PR description already acknowledges this; raise the WARN threshold or add a per-WARN cooldown if signal-to-noise is bad post-deploy.
No Prefect concurrency guard — with * * * * * cron + 60s timeout, a hung run can overlap the next scheduled run. Worst case is duplicate admin alerts (read-only query, no locks). Consider prefect.deployments.concurrency_limit=1 on the deployment.
Monitor SQL untested — tests/tgbot/test_start.py regression-tests handle_start (correct scope for FFM-907) but the ghost-detection SQL itself has no integration test. Add a small tests/flows/test_ghost_users.py later: insert 4 fixture users (warn / error / has-reaction / blocked deep_link), assert counts.
Unbounded ARRAY_AGG — theoretical at current FF-memes scale; not worth fixing unless ghost rate spikes massively.

None of these are SQL injection, race conditions on writes, or secrets. Auto-merge queued.

ohld merged commit c03fa31 into production May 5, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(start): document onboarding contract + ghost-user monitor (FFM-907)#224

feat(start): document onboarding contract + ghost-user monitor (FFM-907)#224
ohld merged 1 commit into
productionfrom
fix/ffm-907-start-onboarding-audit

ohld commented May 2, 2026

Uh oh!

ohld commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ohld commented May 2, 2026

Summary

Audit

What's in this PR

Why a Prefect flow vs in-process timer

Test plan

Uh oh!

ohld commented May 5, 2026

Codex adversarial findings (non-blocking, suggested follow-ups)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant