Skip to content

New VK parser#7

Merged
ohld merged 2 commits intomainfrom
perfect-vk-parser
Jan 2, 2024
Merged

New VK parser#7
ohld merged 2 commits intomainfrom
perfect-vk-parser

Conversation

@aleksspevak
Copy link
Copy Markdown
Contributor

No description provided.

@aleksspevak aleksspevak requested a review from ohld December 30, 2023 22:53
@aleksspevak
Copy link
Copy Markdown
Contributor Author

@ohld Подскажи как ты тестил тг парсер и закидывал в клауд? Хочу в своем протестить.
Если запускать python flow_deployments/parsers.py, то не работает, так переменные у нас через FastAPI инициализируются.
В самом приложение не нашел функций для запуска деплоя.

Copy link
Copy Markdown
Member

@VeryBigSad VeryBigSad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Все отлично, главное протестить, что это работает)
Хотел бы переименовать .env значение + date в более явный posted_at

Comment thread src/config.py
TELEGRAM_BOT_WEBHOOK_SECRET: str | None = None
MEME_STORAGE_TELEGRAM_CHAT_ID: str | None = None

VK_TOKEN: str | None = None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bot или user токен?

Comment thread src/database.py

Column("url", String, nullable=False),
Column("content", String),
Column("date", DateTime, nullable=False),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

переименовать бы в posted_at

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

у нас уже в meme_raw_tg уже date, потому что это поле, которое выдает парсер тг. Не знаю, как лучше

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

тогда норм
в целом таблица отражает данные, которые мы получаем с вк, так что окей

Comment thread src/storage/parsers/vk.py
async def get_items(self, num_of_posts: Optional[int] = None):
logger.info(f"Going to parse VK: {self.source_link}")
vk_source = _extract_username_from_url(self.source_link)
self.vk_source_link = "https://vk.com/%s" % vk_source
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%s выглядит грустно
понимаю, что это скопированный кусок, но в будущем не хотелось бы такого страшного кода, как эта функция, в проекте

@ohld
Copy link
Copy Markdown
Member

ohld commented Dec 31, 2023

@aleksspevak чекни ридми. Ты можешь запустить весь бекенд через docker compose up, а потом через docker compose exec python (см ридми) запустить питон в окружении бекенда, импортировать твою функцию и проверить ее

@ohld ohld self-requested a review January 2, 2024 13:33
@ohld ohld merged commit f95c691 into main Jan 2, 2024
@ohld ohld deleted the perfect-vk-parser branch April 13, 2026 11:04
ohld added a commit that referenced this pull request Apr 27, 2026
…firefighting (#205)

* feat(agents/se): Self-Check Gate + Anti-Patterns Log to close silent firefighting

Adds a mandatory verification step before staff-engineer marks an execution
issue done. Each PR-review outcome (merged / queued / blocked-CI / changes-
requested / external / already-resolved) maps to a path with explicit checks
that must pass; failures route to status=blocked with a reason instead of a
silent close.

Path A3 probes the next-link (Coolify deploy) via last_online_at on
/api/v1/applications/<uuid>: if the container is still healthy on a pre-merge
timestamp 5 min after merge, files [chain-broken:coolify-not-triggered]
HIGH for CTO. This catches the GH→Coolify webhook drop case ohld reported.

ANTI-PATTERNS.md is the case-log feeding the gate. Each row maps to a check;
six seed rows from real PRs (#177 6-day silent trigger drop, #199 17h
zero-artifact merge, #201 auto-merge race during changes-requested, #200
bare-merge CI race, the user-reported chain-break, and the
SC=$(gh pr view --json comments) JSON corruption found while testing).

Tested locally against PRs #199 (correctly identified as silent exit, 0
review-signal artifacts) and #201 (signal found, A3 timestamp probe passes).

* fix(agents/se): address codex review of self-check gate

Two real bugs codex caught before push:

[P1] A2/B2/C2 grep was too narrow — only matched the comment-fallback form
(STAFF ENGINEER REVIEW: APPROVED) used when GitHub self-review-blocks ohld.
For non-ohld internal and external authors, step 7 posts a real `gh pr review
--approve -b "Review summary"` whose body lacks the prefix. Those valid
approvals would have failed the gate. Fix: accept EITHER a .reviews[] entry
with state="APPROVED" OR the comment-prefix. Same dual-form for D1
CHANGES_REQUESTED.

[P2] A3 Coolify probe fired immediately after merge, when last_online_at is
still pre-merge from the previous deploy. Coolify needs ~3-5 min for the
deploy + healthcheck cycle. Without a grace window, every healthy PR would
file [chain-broken:coolify-not-triggered] and drown CTO in false alarms.
Fix: probe only fires when now - mergedAt >= 300s; otherwise deferred to
QA's hourly Process Health Check.

Both fixes logged as #7 and #8 in ANTI-PATTERNS.md so the same blind spots
don't reappear in future gate revisions.

* fix(agents/se): address SE agent CHANGES REQUESTED on PR #205

Two P1s the SE agent caught reviewing this PR:

[P1.1] Path A3 used BSD `date -u -j -f` which doesn't exist on the Linux
agent runtime. Probe failed at the first line, MERGED_EPOCH was empty, the
chain-broken issue never fired. Fix: GNU `date -u -d` auto-parses both
mergedAt (ISO 8601) and last_online_at (YYYY-MM-DD HH:MM:SS).

[P1.2] A2/B2/C2/D1/E1 jq commands grepped ALL artifacts on the PR. Spec
said "for THIS run" but had no time filter, so a stale APPROVED comment
from a prior wake would let a current silent-exit wake pass A2 — exactly
the silent-close mode the gate was meant to fix. Fix: capture
WAKE_START_ISO at the top of every wake, filter via --arg t in A2 + D1
jq calls.

Skipped per minimal-code preference: ANTI-PATTERNS rows for these (caught
pre-merge, not a production failure), Coolify UUID drift note, D3 MCP-tool
prose tweak. All non-blocking from the SE review.

Verified locally against PR #205 own data — the new wake-scope filter
correctly finds the SE review when WAKE_START < submittedAt and rejects
it otherwise.

* fix(agents/se): route all SE wake exits through Self-Check Gate

Codex adversarial review of PR #205 (5 findings, 1 P1 + 4 P2):

[P1] Steps 0/7/8 bypassed the gate entirely — `paperclipUpdateIssue
done|blocked` was called directly inside each step, so the verification
block in step 9 was dead code as wired (cases #1/#3/#4/#5 from the
ANTI-PATTERNS log all closed via those direct calls). Restructured so
each terminal branch sets `OUTCOME_PATH=A|B|C|D|E|F` and jumps to step
9; step 9 is now the single `paperclipUpdateIssue` call site for the
wake. Added an explicit terminal-status mapping table.

[P2] `/tmp/sc.json` and `/tmp/app.json` are cross-run race traps —
two parallel SE wakes overwrite each other's snapshots. PR-scoped to
`/tmp/sc-${PR_NUMBER}.json` and `/tmp/app-${PR_NUMBER}.json`.

[P2] Path A3 Coolify probe didn't validate the curl response. Empty
body on 401/404/500/network failure → `jq -r .last_online_at` empty →
`date -u -d ""` undefined → silent no-op. Now checks curl exit, HTTP
200, and non-empty `last_online_at`; on failure files
`[chain-broken:coolify-probe-unhealthy]` for CTO.

[P2] Path E was missing the `>= $WAKE_START_ISO` freshness filter that
A2/B2/C2/D1 already have. A stale APPROVED review from a prior wake
could have let a current silent-exit external-PR wake pass the gate.
E1/E2 now filter by wake-start.

[P2] A3 chain-broken contradicted the general "any failed check →
blocked" rule. Called out A3 as the explicit non-blocking exception:
SE delivered review + merge regardless; the broken handoff is a
separate `[chain-broken:*]` ticket. Updated "When a check fails" and
the terminal-status table to make this explicit.

Co-Authored-By: Paperclip <noreply@paperclip.ing>

* fix(agents/se): round-2 review fixes — bash semantics, precheck order

Round-2 SE review of commit 72ea6ed found 2 P1 regressions and 1 P2
that the structural pass missed:

[P1] CI-red branch fall-through. Old code had `exit 0`; my refactor
replaced it with `# ... goto step 9`, but bash doesn't honor prose
comments — execution fell through to `gh pr merge --squash --auto`
and queued the merge for a PR that should stay blocked. Fix: introduce
a `SKIP_MERGE` flag set by either precheck failure (CI red OR
auto-merge disabled), and gate the `gh pr merge ...` block on
`[ -z "$SKIP_MERGE" ]`. The single exit point is still step 9.

[P1] A3 chain-broken issue never filed. Both A3 failure branches used
`: "file [chain-broken:*] PR #<n> ..."`, but `:` is the bash null
command — the string is just an evaluated argument, no issue is ever
created. The wake closed `done` silently exactly as ANTI-PATTERNS #5
warned about. Fix: bash block now computes `A3_RESULT` and `A3_DETAIL`
only; an explicit prose step below the bash block tells the agent to
invoke the `paperclipCreateIssue` MCP tool when A3_RESULT is
probe-unhealthy or not-triggered (filing an issue is a tool call, not
a shell command, so it shouldn't have been in the bash block).

[P2] `allow_auto_merge` precheck was documented AFTER the
`gh pr merge --squash --auto` call that depends on it. If the setting
drifts back to false, the merge call errors first and the diagnostic
recovery never runs. Moved the precheck above the merge command (now
under the same `c.` heading as the CI-red precheck), gated by the
SKIP_MERGE flag described above.

Co-Authored-By: Paperclip <noreply@paperclip.ing>

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants