Improve social review: critique loop, writing guide, and workflow fixes#18498
Improve social review: critique loop, writing guide, and workflow fixes#18498adamgordonbell merged 10 commits intomasterfrom
Conversation
… re-trigger - Add soft fail heuristics for article summaries and LLM-speak patterns - On FAIL or missing platforms, reviewer now drafts suggested replacement copy validated via critique sub-agent loop - Add Agent tool to social review workflow for sub-agent validation - Use PULUMI_BOT_TOKEN in claude.yml so pushes trigger downstream workflows Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Docs ReviewPR: Improve social review: rubric additions, suggested copy, and re-trigger Overall this is a well-structured change that closes a clear loop (review → suggest → apply → re-validate). A few items to address or verify: Issues1. [Medium] Unbounded critique loop could run away in CI
There's no iteration cap. If the LLM writes copy that consistently triggers its own soft-fail heuristics (easy to do with the new LLM-pattern detector), this loop runs until the CI job times out, burning tokens the whole way. Consider adding a maximum (e.g., "up to 3 attempts; if still failing, post the best attempt with a note that it didn't fully pass self-review"). 2. [Medium] Missing 3. [Low] ESC secrets fetched before access check
This is low-priority since no secrets leak to untrusted code, but it avoids unnecessary ESC calls. 4. [Low] "Keep guidance short" constraint may confuse the reviewer Observations (non-blocking)
To request further reviews or fixes, mention @claude in a comment. |
|
Your site preview for commit 06ee672 is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-18498-06ee6720.s3-website.us-west-2.amazonaws.com |
- Add workflow_dispatch to social review workflow with PR number input - Refactor workflow to resolve PR info from either event type - Make social-media-review skill user-invocable for local testing - Add local context: reads blog post directly, outputs to terminal Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move CTA to CI context only, clean up drafting section to be context-neutral, make local flow self-contained. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add platform character limits as hard fail rule (X: 255, LinkedIn: 2950, Bluesky: 300), accounting for URL overhead per platform - Require python3 character count verification in drafting process - Add Bash(python3:*) to CI workflow allowed tools for counting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d examples Restructure the social-media-review skill from a single file into a directory-based skill with separated concerns: - SKILL.md: thin orchestrator with goal, principles, and flow - references/critique-rubric.md: standalone rubric for the critique sub-agent with borderline judgment examples, soft-fail threshold pairs, and gap-only failure cases - references/writing-guide.md: craft guide with setup/gap/pointer structure, cross-platform adaptation example, and diverse bad-copy examples Key changes to the rubric: - Add Bluesky structure rule - Require programmatic character count verification - Add curiosity gap section with self-contained test - Diverse examples across post types (migration, product, tutorial, intern story) — no examples from the same blog post Key changes to the workflow: - Update SKILL.md path reference - Critique loop capped at 3 critiques / 2 revisions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ESC action uses OIDC auth which may require the environment claim in the token. The social review workflow already sets this; claude.yml was missing it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critique rubric: - Inline bad/good example pairs for every soft-fail heuristic - Add voice section (I vs we vs naming external companies) - Fix: character limits clarified as body limits with URL overhead - Fix: Bluesky structure "should" → "must" under hard fail - Fix: weak-opening example isolated (no longer multi-trigger) - Fix: LLM patterns use consistent FAIL/PASS labels Writing guide: - Inline bad/good pairs for every craft topic (setup, gap, pointer, voice, specificity, announcement tone, only-Pulumi, LLM-speak) - Fix: specificity good example no longer closes the gap - Fix: LinkedIn section explicitly warns against using extra space for article information rather than story - Examples differentiated from rubric (different wording/scenarios) SKILL.md: - Output format: "keep it short, scannable, 30 seconds" - Flow step 3: check paragraph structure before moving on - Flow step 4: verify ALL drafts, don't skip, wasting rounds noted - Remove examples.md references CI workflow: - Reinforce output format in the prompt itself Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rubric: add complete bad/good example pairs for every soft-fail heuristic,
voice variety (named author, stat-first, product-as-subject, third-party),
missing-pointer and curiosity-gap-closed sections with threshold calibration.
Writing guide: add setup/gap/pointer structure, pointer style variety,
named-author voice, cross-platform adaptation examples, factual accuracy
rule ("stick to what the post says"), and LLM-speak avoidance patterns.
- Remove backtick markdown from example social copy (hard-fail rule) - Add backtick to rubric's banned markdown list - Replace duplicated Fraser/policy-analyze example with Boris Schlosser/CI secrets - Replace duplicated SST third-party example with Simen Olsen/Bjerk - Replace duplicated Neo announcement example with Visual Import - Swap fabricated CloudFormation migration example for real CDKTF deprecation post - Swap fabricated HCL benchmark example for real token-efficiency post - Add "no gap" Good example (Bun runtime) to gap section - Add "stick to what the post says" factual accuracy section - Soften gap language: not every post needs one - Change LLM-speak pointer from "setup takes 15 minutes" to vary pointers - Use Neo Plan Mode for "we" voice example (team effort)
|
@claude review |
|
Claude finished @adamgordonbell's task in 2m 15s —— View job Code Review
OverallThis is a high-quality restructure. The directory-based layout cleanly separates orchestration from rubric from craft guide. The examples throughout are diverse and concrete, and the previous reviewer's main concerns are addressed: the critique loop now has an explicit cap ( Issues[Low] The summary bullet at line 40 lists this as one of the nine soft-fail heuristics — but it's the only one without a dedicated The LLM-patterns section partially covers this (staccato fragments produce valid structure that's hard to read), but that's already its own heuristic. If this was intentional — meant to serve as a catch-all covered by the other sections — it might be cleaner to remove it from the summary rather than leave it dangling. If it points at something distinct (e.g., long undivided blocks, cramped sentence packing), it needs an example. Fix this → [Low] ESC step ordering in The [Very Low] Triple-single-quote edge case in character count command Both If copy ever contains Observations (non-blocking)Writing guide: cross-platform adaptation example is the standout piece. The X/LinkedIn/Bluesky trio for the same 150-Terraform-module story correctly demonstrates that more room means more story depth, not more article facts. This is the right mental model and worth having. "Stick to what the post says" section. Prevents the obvious failure mode of the LLM embellishing with invented numbers. The "Common mistakes" list is concrete and actionable. Pointer variation guidance. "Vary the style across platforms. If X uses 'Here's how,' LinkedIn and Bluesky should use a different form." This is easy to overlook and good to make explicit. "Curiosity gap closed" vs "Missing pointer" as separate heuristics. These are distinct and easy to conflate. Having them as separate items with separate examples prevents the critique sub-agent from treating them as the same failure.
Allowed tools in CI prompt. SummaryOne real gap (structure heuristic has no example) and two low-priority workflow issues. The skill content itself is solid — the rubric is thorough, the examples are well-chosen, and the loop behavior is now correctly bounded. |
Docs ReviewSummaryThis PR restructures the social media review skill from a single file into a directory-based skill with separated concerns (orchestrator, critique rubric, writing guide), adds a draft + critique loop for FAIL cases, and updates two GitHub Actions workflows. The skill content is well-written with strong examples. The workflow changes follow established repo patterns but warrant human review due to expanded permissions. Skill files (SKILL.md, critique-rubric.md, writing-guide.md)These are agent instruction files and exempt from STYLE-GUIDE.md formatting rules per AGENTS.md. The content is high quality — the examples are concrete, the rubric is well-calibrated with borderline cases, and the writing guide teaches by contrast (bad vs. good). No issues found. One behavior change to note: the skill changed from Workflow changes — flag for human review
No issues requiring changesThe skill content is clean, the workflow changes follow established patterns, and the examples are well-crafted. The main thing for a human reviewer to confirm is the intent behind swapping to Mention @claude if you'd like additional reviews or fixes. |
Summary
Overhauls the social media review skill and supporting CI workflows based on issues observed in #18493, where social copy took 3 round trips to get right.
Skill restructure
The single social-media-review.md file is now a directory-based skill with separated concerns:
New behavior: draft + critique loop on FAIL
When posts fail or platforms are missing, the reviewer drafts suggested replacement copy, verifies character counts programmatically, validates via a critique sub-agent with a clean context, and iterates up to 2 revisions. Posts suggested copy with a CTA for the author to apply.
Workflow changes
Rubric additions
Test plan