Add blog post: dark factory pattern for Pulumi infrastructure#18771
Conversation
Maps the dark factory pattern (planner/generator/validator/orchestrator with holdout scenarios) onto Pulumi primitives (Automation API, ESC, CrossGuard, Deployments, Neo). Includes the four-layer architecture, a minimal Python orchestrator using LocalWorkspace, and a four-phase rollout for IaC.
Social Media Reviewcontent/blog/dark-factory-pattern-pulumi-autonomous-iac/index.mdX — PASSLinkedIn — PASSBluesky — PASSSuggestions (advisory)These are stylistic notes — they don't block the post. X
Bluesky
To apply these suggestions, comment: To re-run the social media review after updates, comment: |
|
Your site preview for commit 1d4baf7 is ready! 🎉 http://www-testing-pulumi-docs-origin-pr-18771-1d4baf7a.s3-website.us-west-2.amazonaws.com |
- Replace Tyagi/HackerNoon ladder with Dan Shapiro's original 0-5 framing (the canonical source the YouTube transcript credits). - Attribute holdout scenarios to StrongDM directly and lead the isolation rationale with sycophancy, which is the stronger framing. - Drop the Python orchestrator code block; describe the orchestrator shape in prose so the section keeps its narrative pace. - Loosen the forced 'every Pulumi primitive maps to a layer' framing. The post now treats Pulumi as the worked example without claiming every dark-factory part has a perfect Pulumi equivalent.
|
🟰 Skipping social media review — 1 blog file(s) changed, but social block unchanged. The verdict from the prior run still applies. Comment |
Four targeted edits, no structural changes: - Promote the StrongDM blog post to the primary link in the opener; Simon Willison's coverage was secondary anyway. - Replace the rollback-vs-leak framing with the IAM-policy-postmortem pair, which is sharper and more vivid. - Add the pulumi preview = holdout insight at the end of the infrastructure-is-harder section. preview produces a deterministic diff that CrossGuard evaluates without seeing the conversation that produced the program. For infrastructure, half the wall ships in the box. - Replace the abstract destroy-permission risk paragraph with concrete examples (database drops, hosted zone deletes, root key rotations) and the earned-autonomy gradient (tag changes today, release-branch cuts next quarter, destructive class over months).
|
🟰 Skipping social media review — 1 blog file(s) changed, but social block unchanged. The verdict from the prior run still applies. Comment |
- Lead the opener with the StrongDM specifics (32k lines of production code, no human author, no human reviewer) instead of the generic three-engineer line. Number is verifiable from Simon Willison's October 2025 site visit; dropped the loose seven-month timeframe. - Add a one-line clarification to the sycophancy paragraph: "It isn't malice; it's the agent doing exactly what it was asked, badly." - Add a three-layer rationale to the destroy-permission risk paragraph spelling out the failure mode of each layer in isolation: approvals without policy, policy without approvals, both without a kill switch.
|
🟰 Skipping social media review — 1 blog file(s) changed, but social block unchanged. The verdict from the prior run still applies. Comment |
After verifying every claim against primary sources, fix six over-attributions and link choices: - Stripe 1,300 PRs link now points to Stripe's own dev blog (the primary source) instead of MindStudio's secondary coverage. - StrongDM 32k lines link now points to Simon Willison's site visit, which is the primary source for the line breakdown (16k Rust + 9.5k Go + 6.7k TypeScript). The StrongDM blog post itself does not state that number. - Drop "As StrongDM put it" framing on the lazy-rewrite line; that language is a paraphrase from a third party, not a direct StrongDM quote. Reframe as "The practical concern is direct". - The four-phase rollout is Tyagi's framing, not StrongDM's. Change "the same path StrongDM walked" to "the same path the application-code factories walked". - The 90/5/10 measurable gates are also Tyagi's. Drop the StrongDM attribution. - The triple-run/2-of-3/audit-first-fifty mitigations are best-practice, not specifically what StrongDM published. Reframe as "The standard mitigation is layered". - Drop "the next phase of enterprise software delivery" gloss on BCG; BCG only used the term, the gloss was mine. Also update LinkedIn social copy: replace the HackerNoon reference with the Shapiro/StrongDM/Stripe framing the body now uses, so social and body agree. Verified: Fanuc claims accurate (Mount Fuji, robots build robots, 30-day unattended runs); Shapiro/Glowforge primary post intact; $1,000/day per engineer-equivalent intact (Willison). All eight internal links and four external links return 200.
The opener references Fanuc as the canonical lights-out factory but many readers will not know it. The IMechE piece names the Oshino site specifically and confirms robots build robots and run unattended, which is the framing the post uses.
|
Claude finished @dirien's task in 40s —— View job PR ReviewConfidence: MEDIUM · 🤖 AI-suspect (prose patterns + slash-command workflow) · scrutiny: heightened · 5 Pulumi claims checked · 1 contradicted · external citations unverifiable (WebFetch blocked) · contributor: @dirien (external) · risk: minor The Dark Factory Pattern for Infrastructure: Running Pulumi Lights-OutCheck: heading structure, Mermaid diagram renders, all 6 internal links navigate correctly. Issues introduced by this PR🚨 Needs your eyes — Neo "three modes" claim
The current
The description says the post "Cross-links Engin's prior posts on Agent Sprawl, Grounded AI" and the test plan lists The description also mentions "a minimal Python orchestrator using the Automation API" (code block). That was also removed during rewrites; the final post has no code examples. 🔬 Fact-check triage🚨 Needs your eyes
✅ Verified
Code verificationNo fenced code blocks remain in the final version (original Python example was removed during rewrites). Nothing to test. Trivial fix candidates
Overall assessmentIssues found — one high-confidence contradiction (Neo modes) blocks clean approval. The rest (long paragraph, stale description) are minor. The post itself is well-written and structurally solid; fixing the Neo claim is the only substantive blocker. Recommendations
· Branch |
Social Media Reviewcontent/blog/dark-factory-pattern-pulumi-autonomous-iac/index.mdX — PASSLinkedIn — PASSBluesky — PASSSuggestions (advisory)These are stylistic notes — they don't block the post. X
Bluesky
To apply these suggestions, comment: To re-run the social media review after updates, comment: |
|
🟰 Skipping social media review — 1 blog file(s) changed, but social block unchanged. The verdict from the prior run still applies. Comment |
|
@claude is there a preview link that is up to date? |
|
Claude finished @adamgordonbell's task in 24s —— View job Yes — the latest preview was generated for commit That matches the most recent commit on this branch ("Trim filler in the pulumi preview paragraph opener"). --- · Branch |
|
I like the concept of dark factory! There is part of me that wants more caution in the article. I assume very few orgs have the maturity to build this out right now, so I kind of want the Pulumi version of "this is not medical advice. please consult your doctor" :) The "what goes wrong" presents problems, but then has 1 sentence solutions for each. If my article, I would say here is a path towards this, each step gets harder and you may never safely attain 'dark factory' and doing so is high risk, but each step towards it can force you to become more efficient and consider all your human bottle necks. Also, the linked in post I like. The twitter / blue sky I think should be more like: "Stripe ships 1,300 AI-authored PRs a week. The pattern behind it has a name: the dark factory. The infrastructure factory is different. Here’s what happens when the factory floor is your Pulumi state graph." So changing "What it looks" to "Here's what happens" sounds better to me. |
adamgordonbell
left a comment
There was a problem hiding this comment.
I think the tweet text change and some caveat to approach with caution would be good. But approving and leaving to your judgement.
Three pieces of feedback applied: - Sharpen the Level 5 framing in 'What a dark factory actually is'. Add 'Most teams won't get there safely, and probably shouldn't try to' and reframe the design question as 'safe at all'. Names what the post had been implying. - Add a framing paragraph to 'What could go wrong' acknowledging these are unsolved problems with partial mitigations, not the easy fixes the prior rhythm implied. 'Any team running level 5 should expect to eat one or two of these in the first year.' - Add a journey paragraph to 'Where to start' acknowledging that Phase 1 alone is the win for most teams and full Level 5 may stay out of reach indefinitely. The path itself forces useful work (clearer specs, named bottlenecks, the deterministic gates humans had been running in their heads). Social copy: - X and Bluesky now use Adam's suggested 'Here's what happens' framing, with 'over a thousand' instead of '1,300' to match the body's primary-source phrasing. - LinkedIn '1,300' replaced with 'merge over a thousand' for the same body consistency. Each addition is concrete and substantive (no manufactured triads, no AI-pattern fluff). Humanizer-scanned before commit.
Social Media Reviewcontent/blog/dark-factory-pattern-pulumi-autonomous-iac/index.mdX — PASSLinkedIn — PASSBluesky — PASSSuggestions (advisory)These are stylistic notes — they don't block the post.
To apply these suggestions, comment: To re-run the social media review after updates, comment: |
|
📆 Social media posts scheduled:
|
…step S41's fresh-fixture battery showed blog/claims-heavy PR reviews aren't single-run-reproducible at the 🚨 tier — claim *discovery* is model-generated and varies run to run, so one run catches a real blocking finding the next misses (pulumi#18771 StrongDM misattribution, pulumi#18743 p5.48xlarge price vs Llama-3.3 nonexistence). Discovery is the weak link; verification is fine. This lifts claim extraction out of the variable Opus review into a pre-step: - extract-claims.py — Layer A: deterministic regex floor (numbers, version pins, temporal words, source attributions, URLs, named-entity/spec claims, positioning/comparison triggers) over the whole diff. Guarantees the concrete claims can never be silently dropped. safe_main(). - extract-claims-llm.py — Layer B: two redundant, differently-framed Sonnet passes (atomic/per-sentence and holistic/paragraph), direct /v1/messages call with temperature 0 + forced extract_claims tool schema, one call per changed content/**/*.md file, prompt-cached system prompt. Prompted with the new references/claim-extraction.md (taxonomy + the "what is NOT a claim" list incl. the third-party-attribution flip + framing rule + ≥10 worked examples, the S41 misses among them). safe_main(); degrades gracefully. - merge-claims.py — unions the three layers into .candidate-claims.json: dedup by overlapping line range + token overlap, anchor LLM line ranges to file content, found_by provenance, pass-count → confidence. - claude-code-review.yml — wires the four pre-steps; timeout-minutes: 25 on the claude-review job (S41 saw a review hang ~18 min). - fact-check.md — .candidate-claims.json is the claim *floor* the review MUST verify (MAY add more); the in-review 4-way claim-finder dispatch retires on the normal path (the pre-step subsumes it), kept as a degraded-pre-step fallback; frontmatter-sweep scope pinned to frontmatter-validate.py's new per-file frontmatter_keys (fixes the pulumi#18745-r2 social.* omission). - validate-pinned.py (schema v6→v7) — candidate-claims-coverage rule fails the review (soft-flooring loudly) if a candidate claim has no overlapping trail record; trail-bucket-consistency relaxed for pure-layout/0-claim PRs (pulumi#18857-r1 over-trigger). - test_extract_claims.py + testdata/ — synthetic per-category tests + the 3 real S41-fixture diffs (assert the dropped claims surface) + merge-claims dedup/anchor/provenance tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pulumi#18771)" This reverts commit e1eed1f.
pulumi#18771)" This reverts commit f0c9bef.
…pulumi#18771)" This reverts commit e1eed1f.
pulumi#18771)" This reverts commit f0c9bef.
…pulumi#18771)" This reverts commit e1eed1f.
pulumi#18771)" This reverts commit f0c9bef.
Summary
AGENTS.md+ CrossGuard error-as-instruction → spec-driven with holdouts → selective auto-apply on measurable gates → lights-out), and includes the insight thatpulumi previewis structurally a holdout artifact.Test plan
make lintpasses (markdown lint + prettier)make formatcleanmake serverenders the post (HTTP 200), title correct, mermaid diagram renders, ladder table renders with all 6 rows/automation/,/docs/esc/,/docs/iac/using-pulumi/crossguard/,/docs/pulumi-cloud/deployments/,/product/neo/,/blog/neo-levels-up/,/blog/pulumi-neo-now-supports-agentsmd/title,allow_long_title,date,meta_desc(≤160),meta_image,feature_image,authors,tags, three-channelsocialblock (X 235/255, LinkedIn 894/2950, Bluesky 255/300)meta.pngandfeature.pngrendered via/blog-meta-image(platform template)<!--more-->excerpt break in place/humanizer,/docs-review,/social-media-review