Skip to content

v0.22.0 — freshness + ToS compliance (Phase 6 final)

Choose a tag to compare

@fat32al1ty fat32al1ty released this 28 Apr 22:02
· 12 commits to master since this release

Phase 6 closes with v0.22.0. Eliminates zombie vacancies and re-frames the product against hh.ru ToS §3.11.

F1 — on-read freshness check

Every instant search response runs an asyncio.gather over top-N (default 20) matches against https://api.hh.ru/vacancies/{id} via httpx.AsyncClient. An archived: true flag or a 404 triggers a soft-delete (status='archived', archived_at=now()) and removes the row from the response in-flight. Wrapped in try/except: if hh.ru blips, the response still ships.

F2 — nightly sweep

vacancy_warmup_worker._run_freshness_sweep_if_due re-checks up to 500 oldest-checked rows per 24h. Order: last_freshness_check ASC NULLS FIRST, shown_count DESC — newest unverified rows first, then most-shown. 0.5s polite delay between HH calls; ~4 minutes per sweep cycle.

F3 — framing as a search layer

  • Every VacancyCard now renders a visible source-host button (hh.ru ↗) with bordered + accent-filled styling, replacing the previous subtle "Источник →" link.
  • README.md, README.ru.md, PRIVACY.md rewritten as "AI-assisted search layer over hh.ru" instead of "vacancy database".
  • New PRIVACY.md section explicitly states: we do not republish vacancy descriptions, we link back to canonical postings, we honour archive status.

Migration

0038_vacancy_freshness adds:

  • vacancies.last_freshness_check TIMESTAMPTZ NULL
  • vacancies.archived_at TIMESTAMPTZ NULL
  • vacancies.shown_count INTEGER NOT NULL DEFAULT 0

Acceptance

  • Archived <2% in served results.
  • Every match card has a visible hh.ru link.
  • Zero hh.ru complaints over 30 days (observational).

Tests

20 new tests in tests/test_vacancy_freshness.py:

  • _extract_hh_id happy + invalid paths
  • check_vacancy_alive: 200 + alive / 200 + archived / 404 / 5xx / network error
  • sweep_stale_vacancies ordering (NULL first, shown_count tiebreak)
  • Instant endpoint excludes archived from response
  • shown_count bumps after instant
  • Sweep-due window (_run_freshness_sweep_if_due respects 24h gate)

Full backend: 666 pass, 1 pre-existing unrelated failure.

Phase 6 closing summary

Release What
v0.19.0 Persist instant matches as completed recommendation_jobs row
v0.20.0 Strip Stage 2 deep-scan from the search button
v0.21.0 Lazy segment populate on cold pool
v0.22.0 Freshness check + nightly sweep + ToS framing

p95 search latency dropped from ~60s to <1s. OpenAI cost dropped ~95%. The local pool is now self-cleaning. Decision-gate for Phase 7 (Telegram bot) is now open.