Skip to content

fix(start): lazy-init languages on /start to unblock cold-start orphans#222

Merged
ohld merged 1 commit into
productionfrom
fix/start-lazy-init-languages
May 2, 2026
Merged

fix(start): lazy-init languages on /start to unblock cold-start orphans#222
ohld merged 1 commit into
productionfrom
fix/start-lazy-init-languages

Conversation

@ohld
Copy link
Copy Markdown
Member

@ohld ohld commented May 2, 2026

Summary

User #370728472 (Sega, RU) ran /start, instantly got "мемы кончились". DB showed user_language empty for his row. Recommendations filter on user_language → no candidates → cold-start "empty feed" message.

He landed there because he registered on 2026-04-01 via ?start=kitchen. The kitchen branch in handle_start returns before any call to init_user_languages_from_tg_user:

if deep_link == "kitchen":
    return await handle_show_kitchen(update, context)

wrapped and the main if created: branches both init; kitchen silently skipped it.

Blast radius

Already-active orphans backfilled manually on prod:

  • 6 users created in last 30 days (4 via kitchen deep_link, 2 via None)
  • Sega (#370728472) included
  • Set user_language from user_tg.language_code (5 → ru, 1 → en)

Long tail (not in scope): 893 total historical orphans, but only 3 active in last 7d — they self-heal on their next /start once this PR ships (the new lazy-init covers them).

Critically: 38/38 new users in the last 7 days have language rows — the main onboarding funnel was not blocked. This was a long-tail correctness bug on share-link / deep-link branches that returned early.

What I changed

src/tgbot/handlers/start.py: one idempotent check hoisted above the deep_link branching:

if not await get_user_languages(user_id):
    await init_user_languages_from_tg_user(update.effective_user)

Removed the duplicate per-branch calls in wrapped and if created: (the new check supersedes them). add_user_languages uses ON CONFLICT DO NOTHING so this is safe to re-run.

Test plan

  • Merge → Coolify auto-deploys
  • /start from a fresh account → confirm user_language row created
  • /start ?start=kitchen from a fresh account → confirm user_language row created (was the bug)
  • One of the orphans hits /start again → confirm no double-insert (idempotent path)

Sega (#370728472) hit /start, immediately got "memes ended". Root cause:
no rows in user_language for him → recommendations query filters every
candidate out → cold start surfaces the empty-feed message.

How he ended up there: he registered on 2026-04-01 with deep_link
'kitchen'. The kitchen branch in handle_start returns BEFORE any
init_user_languages_from_tg_user call:

  if deep_link == "kitchen":
      return await handle_show_kitchen(update, context)

The wrapped + main "if created" branches both call init; kitchen
silently skipped it. Six new users in the last 30 days (4 via 'kitchen',
2 via deep_link=None) ended up the same way — all backfilled by hand
just now.

Fix: hoist a single idempotent check above the deep_link branching:

    if not await get_user_languages(user_id):
        await init_user_languages_from_tg_user(update.effective_user)

This covers every branch (current + future deep_links) and self-heals
any historical orphan the next time they /start. Removed the per-branch
init calls in wrapped + new-user paths since the new check supersedes
them. Idempotent guarantee: add_user_languages uses ON CONFLICT DO NOTHING.

Note on extent: 893 historical orphans exist (mostly pre-init code),
but only 6 were active in the last 30d / 3 in the last 7d — the bug
was not blocking the main onboarding funnel (38/38 new users in last
7d had languages). It was a long-tail correctness issue that bit any
share-link / deep-link path that returned early.
@ohld
Copy link
Copy Markdown
Member Author

ohld commented May 2, 2026

STAFF ENGINEER REVIEW: APPROVED — Hoist + idempotent guard above all deep_link branches is correct. add_user_languages ON CONFLICT DO NOTHING covers TOCTOU. No regressions: save_user_data runs first so user row exists before get_user_languages check. Blocked-channel early return is unaffected (line 88-91 returns BEFORE the new check). kitchen/wrapped/if created branches all benefit from the hoisted call. Codex review: GATE PASS, no actionable bugs.

@ohld ohld merged commit ffc0309 into production May 2, 2026
3 checks passed
ohld added a commit that referenced this pull request May 2, 2026
Targets users registered in the last 12 months who never had a single
meme delivered (no row in user_meme_reaction). They were silently
locked out by onboarding bugs — most via the kitchen deep_link path
that returned before init_user_languages_from_tg_user (fixed forward
in PR #222), some via other early-return drift.

Sega (#370728472) was case zero — registered 2026-04-01 via
?start=kitchen, no language rows, "мемы кончились" on every /start.
After manual backfill + apology DM he immediately produced a healthy
session (4 likes / 1 dislike / 7 sent in 22 min, 80% positive).

Scope:
  - 66 candidates total (40 RU, 26 EN) at the time of writing
  - Filtered: blocked_bot_at IS NULL, type NOT IN blocked/banned/waitlist
  - Dedup: reuses send_broadcast Redis-set marker per broadcast_id
  - Default 0.5s delay (~2/s) — very conservative for a small list

Run:
  PYTHONPATH=/src python scripts/broadcast_ghost_recovery.py \
      ghost-recovery-2026-05 --dry-run
  PYTHONPATH=/src python scripts/broadcast_ghost_recovery.py \
      ghost-recovery-2026-05

Same shape as scripts/broadcast_wrapped.py — no new infra.
ohld added a commit that referenced this pull request May 2, 2026
UserType has no 'banned' value (src/tgbot/constants.py) — the filter is
a no-op. Drop it so the WHERE clause reflects actual reachable types.

Branch was also rebased onto production to pull in PR #222's lazy
language-init in handle_start. The recommended SE fix B (repair-on-start)
is now active for both new and existing users (start.py:124-125), so the
broadcast's "/start" CTA will trigger language backfill and unblock the
recommendation queue for ghost recipients.

Addresses Staff Engineer review on #223.
ohld added a commit that referenced this pull request May 2, 2026
* ops(broadcast): one-shot ghost-user recovery script

Targets users registered in the last 12 months who never had a single
meme delivered (no row in user_meme_reaction). They were silently
locked out by onboarding bugs — most via the kitchen deep_link path
that returned before init_user_languages_from_tg_user (fixed forward
in PR #222), some via other early-return drift.

Sega (#370728472) was case zero — registered 2026-04-01 via
?start=kitchen, no language rows, "мемы кончились" on every /start.
After manual backfill + apology DM he immediately produced a healthy
session (4 likes / 1 dislike / 7 sent in 22 min, 80% positive).

Scope:
  - 66 candidates total (40 RU, 26 EN) at the time of writing
  - Filtered: blocked_bot_at IS NULL, type NOT IN blocked/banned/waitlist
  - Dedup: reuses send_broadcast Redis-set marker per broadcast_id
  - Default 0.5s delay (~2/s) — very conservative for a small list

Run:
  PYTHONPATH=/src python scripts/broadcast_ghost_recovery.py \
      ghost-recovery-2026-05 --dry-run
  PYTHONPATH=/src python scripts/broadcast_ghost_recovery.py \
      ghost-recovery-2026-05

Same shape as scripts/broadcast_wrapped.py — no new infra.

* ops(broadcast): drop dead 'banned' filter from ghost recovery query

UserType has no 'banned' value (src/tgbot/constants.py) — the filter is
a no-op. Drop it so the WHERE clause reflects actual reachable types.

Branch was also rebased onto production to pull in PR #222's lazy
language-init in handle_start. The recommended SE fix B (repair-on-start)
is now active for both new and existing users (start.py:124-125), so the
broadcast's "/start" CTA will trigger language backfill and unblock the
recommendation queue for ghost recipients.

Addresses Staff Engineer review on #223.
ohld added a commit that referenced this pull request May 5, 2026
…s (FFM-907) (#224)

PR #222 plugged the kitchen-branch leak (Sega's case). This audit ensures
no future deep_link branch can re-introduce the same drift, and adds a
runtime alarm so we don't have to wait for a friend to complain.

- Contributor-facing comment in handle_start documenting the rule:
  every universal onboarding side effect lives ABOVE the deep_link
  ladder. New side effects must hoist, not bury inside per-branch blocks.
- src/flows/monitors/ghost_users.py: Prefect flow runs every minute,
  counts new users (1-5min ago = WARN, >5min ago = ERROR) with no
  user_meme_reaction row, posts a single summary to admin chat.
  Filters out blocked-acquisition deep_links (intentional silent drop).
- tests/tgbot/test_start.py: regression coverage for each known
  deep_link variant (none / kitchen / wrapped / giveaway_77 / s_*_* /
  blocked-acquisition / existing user). Asserts user_tg + user +
  user_language + user_deep_link_log rows after every created=True path,
  plus idempotency of the lazy lang init.

Co-authored-by: Paperclip <noreply@paperclip.ing>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant