Skip to content

Improve social review: critique loop, writing guide, and workflow fixes#18498

Merged
adamgordonbell merged 10 commits intomasterfrom
improve/social-review-workflow
Apr 14, 2026
Merged

Improve social review: critique loop, writing guide, and workflow fixes#18498
adamgordonbell merged 10 commits intomasterfrom
improve/social-review-workflow

Conversation

@adamgordonbell
Copy link
Copy Markdown
Contributor

@adamgordonbell adamgordonbell commented Apr 13, 2026

Summary

Overhauls the social media review skill and supporting CI workflows based on issues observed in #18493, where social copy took 3 round trips to get right.

Skill restructure

The single social-media-review.md file is now a directory-based skill with separated concerns:

  • SKILL.md — thin orchestrator with explicit goal, principles, and a 6-step flow
  • references/critique-rubric.md — standalone rubric for the critique sub-agent with borderline judgment examples, soft-fail threshold pairs, and gap-only failure cases
  • references/writing-guide.md — craft guide with setup/gap/pointer structure, cross-platform adaptation example, and diverse bad-copy examples

New behavior: draft + critique loop on FAIL

When posts fail or platforms are missing, the reviewer drafts suggested replacement copy, verifies character counts programmatically, validates via a critique sub-agent with a clean context, and iterates up to 2 revisions. Posts suggested copy with a CTA for the author to apply.

Workflow changes

  • claude-social-review.yml: Added workflow_dispatch for manual triggering. Added Agent and Bash(python3:*) to allowed tools
  • claude.yml: Swapped GITHUB_TOKEN for PULUMI_BOT_TOKEN (via ESC) so Claude pushes trigger downstream workflows

Rubric additions

  • No-summaries and LLM-speak soft-fail heuristics
  • Character limits as hard-fail with URL overhead (X: 255, LinkedIn: 2950, Bluesky: 300)
  • Bluesky structure rule
  • Programmatic character count verification

Test plan

  • Comparison post (Add blog post: AI coding agent orchestration frameworks compared #18493): critique loop caught LinkedIn over-revealing, revised successfully
  • Product feature post (Neo Plan Mode): rubric caught press-release originals, writer produced problem-first copy
  • Verify claude.yml ESC secrets resolve (PULUMI_BOT_TOKEN)
  • Test manual workflow_dispatch trigger
  • Test that Claude pushes trigger social review workflow

… re-trigger

- Add soft fail heuristics for article summaries and LLM-speak patterns
- On FAIL or missing platforms, reviewer now drafts suggested replacement
  copy validated via critique sub-agent loop
- Add Agent tool to social review workflow for sub-agent validation
- Use PULUMI_BOT_TOKEN in claude.yml so pushes trigger downstream workflows

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 13, 2026

Docs Review

PR: Improve social review: rubric additions, suggested copy, and re-trigger
Files reviewed: .claude/commands/social-media-review.md, .github/workflows/claude-social-review.yml, .github/workflows/claude.yml

Overall this is a well-structured change that closes a clear loop (review → suggest → apply → re-validate). A few items to address or verify:


Issues

1. [Medium] Unbounded critique loop could run away in CI
social-media-review.md, "Drafting suggested copy" step 4 (line 108):

"If the sub-agent returns FAIL, revise and re-validate. Repeat until all platforms pass."

There's no iteration cap. If the LLM writes copy that consistently triggers its own soft-fail heuristics (easy to do with the new LLM-pattern detector), this loop runs until the CI job times out, burning tokens the whole way. Consider adding a maximum (e.g., "up to 3 attempts; if still failing, post the best attempt with a note that it didn't fully pass self-review").

2. [Medium] Missing environment key on claude.yml job — verify OIDC compatibility
claude.yml uses pulumi/esc-action@v1 with OIDC auth (lines 33-35, 94-99) but the job has no environment: key. The social review workflow (claude-social-review.yml line 18) sets environment: production for the same ESC configuration. GitHub includes the environment claim in the OIDC token only when environment is set — if the ESC OIDC trust policy checks this claim, the secret fetch will fail silently in claude.yml but work in the social review workflow. Please verify the ESC trust policy accepts tokens without an environment claim, or add environment: production (or the appropriate value) to the claude job.

3. [Low] ESC secrets fetched before access check
claude.yml lines 33-37: The "Fetch secrets from ESC" step runs before the "Check repository write access" step. While the Claude action is gated on the access check (line 74), ESC credentials are fetched on every invocation — including unauthorized ones. Consider either:

  • Moving the ESC step after the access check, or
  • Adding if: steps.check-access.outputs.has_write_access == 'true' to the ESC step (which would require reordering steps)

This is low-priority since no secrets leak to untrusted code, but it avoids unnecessary ESC calls.

4. [Low] "Keep guidance short" constraint may confuse the reviewer
social-media-review.md line 129: The constraints still say "Keep guidance short and general (no line edits)." The new drafting process produces full replacement copy, which isn't "guidance" per se, but the instruction is ambiguous. Consider clarifying, e.g., "Keep guidance notes short and general — suggested copy is provided separately under its own heading."


Observations (non-blocking)

  • Soft-fail heuristic length (lines 57-58): The LLM-characteristic writing patterns bullet is quite dense. Consider splitting into sub-bullets for readability (e.g., separate "em dash chains," "staccato fragments," and "constructed parallelism" into their own items).

  • PASS/FAIL reorder: Moving PASS before FAIL in the output format section is a nice touch — simpler case first.

  • BUILD-AND-DEPLOY.md: Checked — it references claude.yml and claude-social-review.yml at a high level. The current descriptions remain accurate; no update needed for these operational changes.

  • Agent in allowed tools (claude-social-review.yml line 94): Correctly added to support the new sub-agent critique loop.


To request further reviews or fixes, mention @claude in a comment.

@pulumi-bot
Copy link
Copy Markdown
Collaborator

pulumi-bot commented Apr 13, 2026

- Add workflow_dispatch to social review workflow with PR number input
- Refactor workflow to resolve PR info from either event type
- Make social-media-review skill user-invocable for local testing
- Add local context: reads blog post directly, outputs to terminal

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move CTA to CI context only, clean up drafting section to be
context-neutral, make local flow self-contained.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add platform character limits as hard fail rule (X: 255, LinkedIn: 2950,
  Bluesky: 300), accounting for URL overhead per platform
- Require python3 character count verification in drafting process
- Add Bash(python3:*) to CI workflow allowed tools for counting

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d examples

Restructure the social-media-review skill from a single file into a
directory-based skill with separated concerns:

- SKILL.md: thin orchestrator with goal, principles, and flow
- references/critique-rubric.md: standalone rubric for the critique
  sub-agent with borderline judgment examples, soft-fail threshold
  pairs, and gap-only failure cases
- references/writing-guide.md: craft guide with setup/gap/pointer
  structure, cross-platform adaptation example, and diverse bad-copy
  examples

Key changes to the rubric:
- Add Bluesky structure rule
- Require programmatic character count verification
- Add curiosity gap section with self-contained test
- Diverse examples across post types (migration, product, tutorial,
  intern story) — no examples from the same blog post

Key changes to the workflow:
- Update SKILL.md path reference
- Critique loop capped at 3 critiques / 2 revisions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@adamgordonbell adamgordonbell changed the title Improve social review: rubric additions, suggested copy, and re-trigger Improve social review: critique loop, writing guide, and workflow fixes Apr 13, 2026
The ESC action uses OIDC auth which may require the environment claim
in the token. The social review workflow already sets this; claude.yml
was missing it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critique rubric:
- Inline bad/good example pairs for every soft-fail heuristic
- Add voice section (I vs we vs naming external companies)
- Fix: character limits clarified as body limits with URL overhead
- Fix: Bluesky structure "should" → "must" under hard fail
- Fix: weak-opening example isolated (no longer multi-trigger)
- Fix: LLM patterns use consistent FAIL/PASS labels

Writing guide:
- Inline bad/good pairs for every craft topic (setup, gap, pointer,
  voice, specificity, announcement tone, only-Pulumi, LLM-speak)
- Fix: specificity good example no longer closes the gap
- Fix: LinkedIn section explicitly warns against using extra space
  for article information rather than story
- Examples differentiated from rubric (different wording/scenarios)

SKILL.md:
- Output format: "keep it short, scannable, 30 seconds"
- Flow step 3: check paragraph structure before moving on
- Flow step 4: verify ALL drafts, don't skip, wasting rounds noted
- Remove examples.md references

CI workflow:
- Reinforce output format in the prompt itself

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rubric: add complete bad/good example pairs for every soft-fail heuristic,
voice variety (named author, stat-first, product-as-subject, third-party),
missing-pointer and curiosity-gap-closed sections with threshold calibration.

Writing guide: add setup/gap/pointer structure, pointer style variety,
named-author voice, cross-platform adaptation examples, factual accuracy
rule ("stick to what the post says"), and LLM-speak avoidance patterns.
- Remove backtick markdown from example social copy (hard-fail rule)
- Add backtick to rubric's banned markdown list
- Replace duplicated Fraser/policy-analyze example with Boris Schlosser/CI secrets
- Replace duplicated SST third-party example with Simen Olsen/Bjerk
- Replace duplicated Neo announcement example with Visual Import
- Swap fabricated CloudFormation migration example for real CDKTF deprecation post
- Swap fabricated HCL benchmark example for real token-efficiency post
- Add "no gap" Good example (Bun runtime) to gap section
- Add "stick to what the post says" factual accuracy section
- Soften gap language: not every post needs one
- Change LLM-speak pointer from "setup takes 15 minutes" to vary pointers
- Use Neo Plan Mode for "we" voice example (team effort)
@adamgordonbell
Copy link
Copy Markdown
Contributor Author

@claude review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 14, 2026

Claude finished @adamgordonbell's task in 2m 15s —— View job


Code Review

  • Read changed files
  • Review SKILL.md
  • Review critique-rubric.md
  • Review writing-guide.md
  • Review workflow changes
  • Post final review

Overall

This is a high-quality restructure. The directory-based layout cleanly separates orchestration from rubric from craft guide. The examples throughout are diverse and concrete, and the previous reviewer's main concerns are addressed: the critique loop now has an explicit cap (up to 2 revisions, 3 total critiques) and environment: production is set in claude.yml.


Issues

[Low] critique-rubric.md: "Structure technically valid but still hard to read" has no example section

The summary bullet at line 40 lists this as one of the nine soft-fail heuristics — but it's the only one without a dedicated ### section and FAIL/PASS pair. Every other item in the list has detailed examples to guide the sub-agent. Without examples, a critique sub-agent has no ground truth for what "structure technically valid but hard to read" actually means.

The LLM-patterns section partially covers this (staccato fragments produce valid structure that's hard to read), but that's already its own heuristic. If this was intentional — meant to serve as a catch-all covered by the other sections — it might be cleaner to remove it from the summary rather than leave it dangling. If it points at something distinct (e.g., long undivided blocks, cramped sentence packing), it needs an example. Fix this →

[Low] ESC step ordering in claude.yml (noted in previous review, not yet addressed)

The Fetch secrets from ESC step runs unconditionally before Check repository write access. ESC credentials are consumed on every trigger, including unauthorized ones. No secrets leak to untrusted code, but it's an unnecessary ESC call. Adding if: steps.check-access.outputs.has_write_access == 'true' to the ESC step would require reordering, but would be cleaner. Low priority — not blocking.

[Very Low] Triple-single-quote edge case in character count command

Both SKILL.md (step 4) and critique-rubric.md use:

python3 -c "print(len('''<copy>'''))"

If copy ever contains ''', the shell command breaks. Using """ instead of ''' would be more robust. Extremely unlikely to matter in practice.


Observations (non-blocking)

Writing guide: cross-platform adaptation example is the standout piece. The X/LinkedIn/Bluesky trio for the same 150-Terraform-module story correctly demonstrates that more room means more story depth, not more article facts. This is the right mental model and worth having.

"Stick to what the post says" section. Prevents the obvious failure mode of the LLM embellishing with invented numbers. The "Common mistakes" list is concrete and actionable.

Pointer variation guidance. "Vary the style across platforms. If X uses 'Here's how,' LinkedIn and Bluesky should use a different form." This is easy to overlook and good to make explicit.

"Curiosity gap closed" vs "Missing pointer" as separate heuristics. These are distinct and easy to conflate. Having them as separate items with separate examples prevents the critique sub-agent from treating them as the same failure.

workflow_dispatch inputs. The Resolve PR info step correctly falls back to github.event.inputs.pr_number for manual runs, and the concurrency group uses || github.event.inputs.pr_number too. Clean.

Allowed tools in CI prompt. Agent and Bash(python3:*) are now explicitly allowed — required for the critique loop to function.


Summary

One real gap (structure heuristic has no example) and two low-priority workflow issues. The skill content itself is solid — the rubric is thorough, the examples are well-chosen, and the loop behavior is now correctly bounded.

@adamgordonbell adamgordonbell marked this pull request as ready for review April 14, 2026 12:25
@adamgordonbell adamgordonbell merged commit 99f1c24 into master Apr 14, 2026
8 checks passed
@adamgordonbell adamgordonbell deleted the improve/social-review-workflow branch April 14, 2026 12:26
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 14, 2026

Docs Review

Summary

This PR restructures the social media review skill from a single file into a directory-based skill with separated concerns (orchestrator, critique rubric, writing guide), adds a draft + critique loop for FAIL cases, and updates two GitHub Actions workflows. The skill content is well-written with strong examples. The workflow changes follow established repo patterns but warrant human review due to expanded permissions.

Skill files (SKILL.md, critique-rubric.md, writing-guide.md)

These are agent instruction files and exempt from STYLE-GUIDE.md formatting rules per AGENTS.md. The content is high quality — the examples are concrete, the rubric is well-calibrated with borderline cases, and the writing guide teaches by contrast (bad vs. good). No issues found.

One behavior change to note: the skill changed from user-invocable: false to user-invocable: true (SKILL.md:2), meaning users can now invoke /social-media-review directly. This seems intentional given the added argument: field but isn't called out in the PR description.

Workflow changes — flag for human review

.github/workflows/claude.yml — PULUMI_BOT_TOKEN swap

  • Lines 34–37: Adds ESC secret fetching step.
  • Line 81: Replaces the default GITHUB_TOKEN with PULUMI_BOT_TOKEN for the Claude Code action. The comment explains the intent (pushes trigger downstream workflows like social review), and PULUMI_BOT_TOKEN is already used this way across many other workflows in the repo (customer-managed-workflow-agent-cli.yml, pulumi-cli.yml, etc.). The pattern is established, but this does expand what actions Claude's pushes can trigger — worth confirming this is the desired scope.
  • Line 21: Adds environment: production (required for ESC OIDC). Consistent with claude-social-review.yml and other workflows.
  • Lines 96–99: Workflow-level env: block for ESC config. Currently only one job exists, but if jobs are added later they'd inherit these ESC settings. This matches the pattern in claude-social-review.yml and other workflows.

.github/workflows/claude-social-review.yml — workflow_dispatch + expanded tools

  • Lines 8–12: Adds workflow_dispatch input for manual triggering. The pr_number input is type: string, which accepts non-numeric values, but gh pr view would simply fail on invalid input so this is low risk.
  • Lines 32–48: New Resolve PR info step that handles both event types. Uses GH_TOKEN and --repo flag so it works before checkout. Logic is clean.
  • Line 131: claude_args now includes Agent and Bash(python3:*). These are needed for the critique sub-agent loop (step 5 in SKILL.md) and programmatic character count verification (step 4). The Bash(python3:*) permission is scoped to python3 commands only, which limits the surface area.
  • Line 124: Updated skill path reference from .claude/commands/social-media-review.md to .claude/commands/social-media-review/SKILL.md. Correct.

No issues requiring changes

The skill content is clean, the workflow changes follow established patterns, and the examples are well-crafted. The main thing for a human reviewer to confirm is the intent behind swapping to PULUMI_BOT_TOKEN in claude.yml and the expanded tool permissions in claude-social-review.yml.


Mention @claude if you'd like additional reviews or fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants