v0.22.0 — freshness + ToS compliance (Phase 6 final)
Phase 6 closes with v0.22.0. Eliminates zombie vacancies and re-frames the product against hh.ru ToS §3.11.
F1 — on-read freshness check
Every instant search response runs an asyncio.gather over top-N (default 20) matches against https://api.hh.ru/vacancies/{id} via httpx.AsyncClient. An archived: true flag or a 404 triggers a soft-delete (status='archived', archived_at=now()) and removes the row from the response in-flight. Wrapped in try/except: if hh.ru blips, the response still ships.
F2 — nightly sweep
vacancy_warmup_worker._run_freshness_sweep_if_due re-checks up to 500 oldest-checked rows per 24h. Order: last_freshness_check ASC NULLS FIRST, shown_count DESC — newest unverified rows first, then most-shown. 0.5s polite delay between HH calls; ~4 minutes per sweep cycle.
F3 — framing as a search layer
- Every
VacancyCardnow renders a visible source-host button (hh.ru ↗) with bordered + accent-filled styling, replacing the previous subtle "Источник →" link. README.md,README.ru.md,PRIVACY.mdrewritten as "AI-assisted search layer over hh.ru" instead of "vacancy database".- New
PRIVACY.mdsection explicitly states: we do not republish vacancy descriptions, we link back to canonical postings, we honour archive status.
Migration
0038_vacancy_freshness adds:
vacancies.last_freshness_check TIMESTAMPTZ NULLvacancies.archived_at TIMESTAMPTZ NULLvacancies.shown_count INTEGER NOT NULL DEFAULT 0
Acceptance
- Archived <2% in served results.
- Every match card has a visible hh.ru link.
- Zero hh.ru complaints over 30 days (observational).
Tests
20 new tests in tests/test_vacancy_freshness.py:
_extract_hh_idhappy + invalid pathscheck_vacancy_alive: 200 + alive / 200 + archived / 404 / 5xx / network errorsweep_stale_vacanciesordering (NULL first, shown_count tiebreak)- Instant endpoint excludes archived from response
shown_countbumps after instant- Sweep-due window (
_run_freshness_sweep_if_duerespects 24h gate)
Full backend: 666 pass, 1 pre-existing unrelated failure.
Phase 6 closing summary
| Release | What |
|---|---|
| v0.19.0 | Persist instant matches as completed recommendation_jobs row |
| v0.20.0 | Strip Stage 2 deep-scan from the search button |
| v0.21.0 | Lazy segment populate on cold pool |
| v0.22.0 | Freshness check + nightly sweep + ToS framing |
p95 search latency dropped from ~60s to <1s. OpenAI cost dropped ~95%. The local pool is now self-cleaning. Decision-gate for Phase 7 (Telegram bot) is now open.