Investigate-bot fix gate, preview self-consistency, and bot identity fixes#1259
Conversation
Derive the pkg.pr.new publish set from globs (./packages/*, ./packages/plugins/*) instead of a hardcoded list. pkg.pr.new skips private packages, so the test fixtures are excluded automatically while every public package, including the previously missing registry-client, plugin-types and auth-atproto, now gets a preview. This keeps preview installs self-consistent: emdash's preview references sibling packages via workspace:*, and pkg.pr.new can only rewrite those to matching preview URLs when the siblings are published in the same run. Omitting one made the dep fall back to npm's released version, which broke when source had drifted (e.g. registry-client's ./env export added without a release). Also pin pkg-pr-new as a root devDependency and run it via pnpm exec rather than pnpm dlx, per the tool's CI guidance.
Two orchestrator-workflow fixes: - Commit identity was emdash-bot[bot] / bot@emdashcms.com, but the GitHub App slug is emdashbot. Use emdashbot[bot] and the users.noreply.github.com attribution email, matching every other workflow. This also makes the github.actor guards in auto-format and auto-extract match the bot's own commits. - intended-behavior outcomes now get a new triage/by-design label instead of sharing triage/reproduced with confirmed bugs. The two need opposite follow-up (likely close vs. needs a fix), so they should not share a label.
The fix stage only ran at verify=bug AND diagnose.confidence=high, where high meant 'mechanical, one-line, no ambiguity'. That conflated two independent questions: is the root cause certain, and is the fix obvious? Real, fixable bugs (e.g. #1178, #1199) were parked at triage/reproduced because one clearly-correct fix existed among several shapes, which forced a medium rating. Decouple the axes: - confidence now rates root-cause certainty only. - a new fixApproach (mechanical | clear-best-option | needs-design-decision) rates fix clarity. - the gate becomes verdict=bug AND confidence!=low AND fixApproach!=needs-design-decision. Diagnose also now emits a concrete proposedFix (always), which feeds the fix stage as its spec and doubles as the maintainer's starting point when the fix is deferred. The fix stage runs on a separate, cheaper agent (kimi-k2.6) in its own session: the reasoning is already done, so it is guided implementation. It shares the on-disk checkout, so staged edits still reach the orchestrator. Configurable via FLUE_FIX_MODEL. Reframe the skill cost model: the output is a reporter-verified candidate branch a maintainer reviews, not a merge, so a clear, test-backed fix is worth attempting even when it is more than a one-liner. Update diagnose/verify/fix skills, _INVESTIGATE.md and README; delete the stale PLAN.md.
|
PR template validation failedPlease fix the following issues by editing your PR description:
See CONTRIBUTING.md for the full contribution policy. |
Scope checkThis PR changes 526 lines across 11 files. Large PRs are harder to review and more likely to be closed without review. If this scope is intentional, no action needed. A maintainer will review it. If not, please consider splitting this into smaller PRs. See CONTRIBUTING.md for contribution guidelines. |
| # source has drifted (e.g. a new exports subpath added without a release). | ||
| # pkg.pr.new skips `private: true` packages automatically, so the test | ||
| # fixtures under packages/plugins/* are excluded without enumerating them. | ||
| - run: pnpm exec pkg-pr-new publish --pnpm './packages/*' './packages/plugins/*' |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
docs | e502307 | Jun 01 2026, 10:53 AM |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
emdash-playground | e502307 | Jun 01 2026, 10:54 AM |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
emdash-demo-cache | e502307 | Jun 01 2026, 10:54 AM |
@emdash-cms/admin
@emdash-cms/auth
@emdash-cms/auth-atproto
@emdash-cms/blocks
@emdash-cms/cloudflare
@emdash-cms/contentful-to-portable-text
emdash
create-emdash
@emdash-cms/gutenberg-to-portable-text
@emdash-cms/plugin-cli
@emdash-cms/plugin-types
@emdash-cms/registry-client
@emdash-cms/registry-lexicons
@emdash-cms/sandbox-workerd
@emdash-cms/x402
@emdash-cms/plugin-ai-moderation
@emdash-cms/plugin-atproto
@emdash-cms/plugin-audit-log
@emdash-cms/plugin-color
@emdash-cms/plugin-embeds
@emdash-cms/plugin-field-kit
@emdash-cms/plugin-forms
@emdash-cms/plugin-webhook-notifier
commit: |
There was a problem hiding this comment.
Pull request overview
This PR updates the repo’s bot/CI tooling to (1) publish self-consistent preview releases across all public workspace packages, and (2) improve the investigate-bot pipeline by correcting bot identity, splitting “by design” triage outcomes from confirmed bugs, and broadening/clarifying the automated-fix gate with a cheaper fix-stage model.
Changes:
- Publish preview releases using package globs (and run
pkg-pr-newvia a pinned devDependency +pnpm exec) so all public workspace packages land in the same preview run. - Fix investigate workflow git identity to match the GitHub App slug, and introduce a dedicated
triage/by-designlabel for intended behavior outcomes. - Decouple “root-cause confidence” from “fix clarity” in
.flue, addfixApproach+ always-presentproposedFix, and run the fix stage in a separate cheaper model session.
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
pnpm-lock.yaml |
Adds pkg-pr-new lock entries to support running via pnpm exec. |
package.json |
Pins pkg-pr-new as a root devDependency for CI usage. |
.github/workflows/preview-releases.yml |
Switches to glob-based publishing and pnpm exec pkg-pr-new. |
.github/workflows/investigate.yml |
Updates intended-behavior labeling + bot git identity. |
.flue/workflows/investigate.ts |
Implements new fix gate (confidence + fixApproach), adds proposedFix, and runs fix on a separate agent/session. |
.flue/skills/verify/SKILL.md |
Updates gate description to match the new fix criteria. |
.flue/skills/fix/SKILL.md |
Reframes fix stage as guided implementation based on diagnose’s proposedFix. |
.flue/skills/diagnose/SKILL.md |
Defines the new two-axis output (confidence vs fixApproach) and requires proposedFix. |
.flue/skills/_INVESTIGATE.md |
Updates investigate reference doc to the new gating logic and outputs. |
.flue/README.md |
Documents the updated pipeline and label meanings, including triage/by-design. |
.flue/PLAN.md |
Removes obsolete design plan doc. |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # `triage/by-design` (not `triage/reproduced`): the bot | ||
| # reproduced the described behavior but believes it is | ||
| # intentional. This is a "likely close / convert to discussion" | ||
| # signal, the opposite follow-up from a confirmed bug, so it | ||
| # gets its own label rather than sharing triage/reproduced. | ||
| gh issue edit "$ISSUE_NUMBER" --repo emdash-cms/emdash --remove-label "triage/reproducing" --add-label "triage/by-design" |
There was a problem hiding this comment.
Good catch on both. Fixed in e502307:
- Added
triage/by-designto the reproducing-transition cleanup loop ininvestigate.yml, so a re-triggered by-design issue no longer ends up with stackedtriage/*labels. - Added
triage/by-design->By designtoSTATE_BY_LABELandPRECEDENCEintriage-project-sync.yml(ranked just belowtriage/reproduced).
I also created the By design single-select option on Project #3's Triage State field, so by-design issues now sync to a real board state rather than being skipped.
There was a problem hiding this comment.
This is a sensible, well-scoped PR that fixes three real problems: preview releases were missing packages because of a hardcoded list; the bot committed under the wrong GitHub App identity; and the fix gate conflated "certainty of cause" with "clarity of fix", which starved the fix stage of valid bugs. The decoupling into confidence + fixApproach + proposedFix is the right conceptual split, and running the implementation stage on a cheaper model with a fresh session is a clean way to save cost without losing correctness.
I checked the following:
- Workflow YAMLs for token hygiene, label transitions, git identity, and branch push logic.
.flue/workflows/investigate.tsfor schema changes, session wiring, and fix-gate logic.- Skill markdown files for consistency with the new gate conditions.
package.jsonandpnpm-lock.yamlfor dependency hygiene.
Headline conclusion: One real bug in label management; everything else is clean.
Still needs fixing
- Missing
triage/by-designlabel cleanup ininvestigate.yml. The transition step that strips old labels before addingtriage/reproducingdoes not include the newtriage/by-designlabel. If an issue was previously triaged asby-designand a maintainer re-triggers it, the stale label will remain alongside the new terminal label.
Clean / no action needed
- The preview globs (
./packages/*,./packages/plugins/*) correctly include all public packages and rely onpkg-pr-newskipping private ones. - The
pkg-pr-newdependency is correctly added as a rootdevDependency; usingpnpm execinstead ofpnpm dlxis appropriate. - Bot identity (
emdashbot[bot]/emdashbot[bot]@users.noreply.github.com) now matches the App slug used in other workflows. - The valibot schema additions (
fixApproach,proposedFix) are wired through correctly to both the fix session args and the fallback comment notes. - The
notAttemptedReasonternary ininvestigate.tsis correct:intended-behaviorreturns earlier, so the"bug"branch at that point can only meanunclear. - The
fixAgentgets its owninit/sessionand shares the same sandbox cwd, so staged edits reach the orchestrator as intended.
Unverified but noted
pnpm-lock.yamlcontains a small amount of unrelated drift (workerdsnapshot downgrade,vitepatch bump in@vitest/uisnapshot). This is harmless lockfile churn from adding the new dependency, but it adds noise to the diff. It would be cleaner if the lockfile only contained thepkg-pr-newaddition and its transitive deps.
Addresses review on #1259. The new triage/by-design label was missing from two places: - investigate.yml's reproducing-transition cleanup loop, so a re-triggered by-design issue could carry triage/by-design alongside triage/reproducing. - triage-project-sync.yml's STATE_BY_LABEL/PRECEDENCE, so by-design issues resolved to no state and stopped syncing to the board. Maps triage/by-design -> 'By design' board option (terminal verdict, ranked just below reproduced in precedence). The 'By design' single- select option must be added to Project #3's 'Triage State' field; until then the sync warns-and-skips rather than failing.
What does this PR do?
Three related improvements to the bot/CI tooling. Each is a separate commit.
1. Preview releases publish all public packages (
ci:)The
pkg-pr-new publishcall used a hardcoded list of 16 paths that omittedregistry-client,plugin-types, andauth-atproto. Because pkg.pr.new only rewritesworkspace:*deps to preview URLs for packages it publishes in the same run,emdash's preview pinned@emdash-cms/registry-clientto its plain version and resolved it from npm instead. When source had drifted from the last npm release (registry-client's./envexport was added in #1238 without a version bump), preview installs broke withPackage subpath './env' is not defined.Now derived from globs (
./packages/*,./packages/plugins/*); pkg.pr.new skips private packages, so test fixtures are excluded automatically and all 23 public packages get a self-consistent preview. Also pinspkg-pr-newas a root devDependency and usespnpm execinstead ofpnpm dlx, per the tool's CI guidance.2. Bot identity + triage label split (
fix(ci):)emdash-bot[bot]/bot@emdashcms.com, but the GitHub App slug isemdashbot. Corrected toemdashbot[bot]/emdashbot[bot]@users.noreply.github.com, matching every other workflow (and making thegithub.actorguards inauto-format/auto-extractactually match the bot's commits).intended-behavioroutcomes sharedtriage/reproducedwith confirmed bugs, despite needing opposite follow-up. They now get a dedicatedtriage/by-designlabel (created in the repo).3. Broaden the investigate-bot fix gate; run fix on a cheaper model (
feat(flue):)The fix stage ran only at
verify=bug AND diagnose.confidence=high, wherehighwas defined as "mechanical, one line, no ambiguity." That conflated two independent questions — is the cause certain? and is the fix obvious? — so confidently-diagnosed bugs with one clearly-correct fix among several shapes (e.g. #1178, #1199) were forced tomediumand parked.confidencerates root-cause certainty only; a newfixApproach(mechanical|clear-best-option|needs-design-decision) rates fix clarity. Gate becomesverdict=bug AND confidence!=low AND fixApproach!=needs-design-decision.proposedFix: diagnose now always emits a concrete plan, fed to the fix stage as its spec and surfaced to the maintainer when the fix is deferred.kimi-k2.6agent in its own session (reasoning is already done; it's guided implementation), sharing the on-disk checkout so staged edits still reach the orchestrator. Overridable viaFLUE_FIX_MODEL.Type of change
Checklist
pnpm typecheckpasses (.flueworkspace;flue buildalso succeeds)pnpm lintpasses (no diagnostics in changed files)pnpm testpasses (or targeted tests for my change) — n/a; CI-workflow and agent-prompt changes, no unit-testable surfacepnpm formathas been run.fluechangedAI-generated code disclosure
Try this PR
Open a fresh playground →
A full working EmDash site, deployed from this branch. Each visit gets its own session-scoped sandbox: no login needed and no shared state. Try the admin, edit content, hit the public site.
Tracks
feat/investigate-bot-improvements. Updated automatically when the playground redeploys.