Skip to content

feat: early channel popup at meme #5 + describe_memes rate-limit retry + upload fixes#178

Merged
ohld merged 13 commits intoproductionfrom
feat/goat-recency-filter-reacted-at
Apr 20, 2026
Merged

feat: early channel popup at meme #5 + describe_memes rate-limit retry + upload fixes#178
ohld merged 13 commits intoproductionfrom
feat/goat-recency-filter-reacted-at

Conversation

@ohld
Copy link
Copy Markdown
Member

@ohld ohld commented Apr 20, 2026

Summary

Early Channel Popup (meme #5)

describe_memes: Rate-limit retry logic

  • Replaced immediate batch abort on 429 with intelligent retry: waits up to 65s using Retry-After header, max 3 retries, deadline-aware (won't wait if not enough time left)
  • Added _parse_retry_after() to extract wait time from response headers and JSON body
  • Increased min_request_interval from 3.5s to 4.0s (stays well within 20 rpm limit)
  • Restored Gemma 3 models to VISION_MODELS chain (re-listed ~2026-04-20 after delisting on Apr 15)
  • Replaced all Gemma 4/3 delisted models with currently-available ones
  • Added quota exhaustion circuit (HTTP 402 → immediate batch exit)

upload.py: Broken content link

  • After 3 failed TG upload attempts, meme is now marked status=BROKEN_CONTENT_LINK (previously just logged and returned None)

etl.py: Instagram retry query fix

  • Moved meme.raw_meme_id from JOIN ON clause to WHERE clause in IG retry queries

ops

  • Updated Prefect serve_flows.py: removed delisted models, aligned deployment names
  • Updated agents/.paperclip.yaml: added PREFECT_API_URL, PREFECT_API_KEY vars for QA + CTO
  • Updated SENTRY_AUTH_TOKEN and Coolify config vars

Test Coverage

CODE PATHS                                            STATUS
[+] src/flows/storage/describe_memes.py               PARTIAL
  ├── _parse_retry_after()                             [GAP] 9 paths untested
  ├── call_openrouter_vision() 429 handling            [GAP] retry_after extraction untested
  ├── describe_single_meme() tuple unpacking           [GAP] new return value semantics
  └── describe_memes_flow() rate limit retry           [GAP] 15+ branches untested
[+] src/tgbot/handlers/popup.py                       GAP
  ├── _check_channel_subscription()                    [GAP] 7 paths untested
  └── handle_popup_button() channel branch             [GAP] event emission untested
[+] src/tgbot/senders/popups.py                       GAP
  ├── _get_channel_popup()                             [GAP] UI construction untested
  └── get_popup_to_send() trigger change 50→5         [GAP] off-by-one risk
[+] src/storage/upload.py                             GAP
  └── upload_meme_content_to_tg() status update       [GAP] db mutation untested
[+] tests/recommendations/test_engine_contracts.py    UPDATED (formatting fix)

Coverage: 0% of new paths — existing integration tests unchanged.
Tests: 21 files (no new tests added — test env requires Docker)

Pre-Landing Review

No issues found. Rate limit retry logic verified correct: max 3 retries, deadline-aware, capped at 65s wait. All imports verified to exist.

Greptile Review

No prior PR existed — skipped.

Plan Completion

TODOS

  • Marked "Add per-user recency filter to goat engine" as DONE (deployed Apr 13, LR 41.9% vs 39.4% baseline)
  • Last updated: 2026-04-20

Test plan

🤖 Generated with Claude Code

ohld and others added 13 commits April 14, 2026 05:52
…ries

PostgreSQL forbids referencing the UPDATE target table in a JOIN's ON
clause within the FROM clause. Moved the meme.raw_meme_id condition to
WHERE for both the IG retry (broken_content_link → created) and IG
expire (broken_content_link → expired_content_link) queries.

Fixes FFM-456.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
…TP 402)

When OpenRouter balance drops below $0, all models return 402. Previously,
402 fell through raise_for_status() → HTTPStatusError → continued to next
model in the fallback chain. With 5 models × 20 memes, the flow burned
through the entire 900s timeout making doomed requests.

Now: 402 is detected immediately and returns a QUOTA_EXHAUSTED sentinel,
which propagates up to the main loop for an instant batch exit — no model
fallback needed since 402 is account-wide.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
…requency

Root cause: 5-model fallback chain with 2 models (gemma-4-*) consistently
returning 403 wasted ~40% of daily quota on guaranteed failures. Combined
with 48 runs/day (every 30min) × 20 memes = 960+ requests against a
1,000/day limit, leaving zero headroom.

Changes:
- Remove gemma-4-31b-it:free and gemma-4-26b-a4b-it:free (persistent 403)
- Reduce cron from */30 to hourly (24 runs × 20 = 480 base requests)
- Widen request interval 3.5s → 4.0s (15 rpm effective vs 20 rpm cap)
- Update specs and CLAUDE.md to reflect new schedule

Resolves FFM-520.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
…mma 4

All google/gemma-3-*:free models were removed from OpenRouter ~Apr 15,
causing the pipeline to fail on every attempt and trigger the circuit
breaker. The Gemma 4 free models (gemma-4-31b, gemma-4-26b-a4b) are
now available again after their earlier 403 issues were resolved.

Fixes FFM-543

Co-Authored-By: Paperclip <noreply@paperclip.ing>
…ad attempts

Previously, when all 3 upload retries were exhausted (e.g. TimedOut),
the meme was left in created status with no telegram_file_id — permanently
stuck after the 24h query window expired. Now it gets marked as
broken_content_link so the failure is visible and retried on next run.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
…uired for QA

QA log scan routine needs SENTRY_AUTH_TOKEN to call sentry CLI — promote from
optional to required. Also grant CTO access to Sentry + Coolify vars for direct
debugging. Remove stale SENTRY_DSN from ops runbook (app-level var, not a
Paperclip company secret).

Co-Authored-By: Paperclip <noreply@paperclip.ing>
Previously, any 429 response immediately stopped the entire batch — even
transient per-minute rate limits that reset in <60s. This caused 0 memes
described for 3+ consecutive runs (FFM-574).

Now the flow waits up to 65s (using Retry-After header when available)
and retries the same meme. After 3 waits without progress, it stops the
batch (likely daily quota exhausted, not a transient spike).

Co-Authored-By: Paperclip <noreply@paperclip.ing>
QA agent was missing PREFECT_API_URL and PREFECT_AUTH_STRING declarations
in .paperclip.yaml, causing connection refused errors during QA log scans.
These secrets already exist in Paperclip company secrets (CTO has them).
Also documents the Prefect secrets in the ops runbook.

SENTRY_AUTH_TOKEN still needs to be created as a company secret (board action).

Refs: FFM-580

Co-Authored-By: Paperclip <noreply@paperclip.ing>
…day quota

Gemma 3 free models (27b, 12b) are back on OpenRouter as of 2026-04-20.
Re-added as fallbacks after Gemma 4 models.

Reduced cron from hourly to every 3 hours and batch_size from 20 to 6
(8 runs × 6 = 48 requests/day) to stay within the 50/day free quota.
Previous hourly×20 (480/day) was exhausting the daily limit within 2-3
runs, causing 0.7% coverage over 7 days. Revert to hourly×20 once $10+
lifetime credit unlocks 1,000/day. FFM-587.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
75% of new users leave before meme #5, so showing the channel subscribe
popup at #50 misses nearly all of them. This moves it to #5 and adds:
- URL CTA button linking to the language-appropriate channel
- "I subscribed" callback button for click tracking
- Prefect events: ff.popup.telegram_channel.{shown,clicked,subscribed}
- 30-second delayed subscription verification via TG API

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add active experiments: goat-recency-filter, early-channel-popup
- Move cold-start-v2 experiment to completed
- Add 18 published communication docs (2026-04-02 to 2026-04-20)
- Update experiments/log.jsonl with recent activity
- Mark goat recency filter as DONE in TODOS.md
- Add uv.lock

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ohld
Copy link
Copy Markdown
Member Author

ohld commented Apr 20, 2026

Staff Engineer Review — Approved ✅

Clean, well-structured changes. No issues found.

describe_memes.py — Rate-limit retry logic is correct: max 3 retries, deadline-aware, capped at 65s wait. The for→while loop refactor properly handles continue on retry without incrementing i. _parse_retry_after() has appropriate fallbacks. Model list reorder and query priority change (recent uploads first) are sensible.

popup.py / popups.py — Channel popup trigger moved 50→5 with good data backing (89.6% vs 30.3% reach). Conversion funnel tracking (shown/clicked/subscribed) is clean. Background subscription check is fire-and-forget with proper exception handling.

upload.py — Setting BROKEN_CONTENT_LINK after 3 failed attempts is the right fix — previously these memes silently disappeared.

Minor note: PR description mentions an etl.py IG retry query fix but no etl.py change is in the diff — description is slightly stale.

CI green (lint ✅, tests ✅). Merging.

@ohld ohld merged commit da22809 into production Apr 20, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant