Skip to content

Investigate-bot fix gate, preview self-consistency, and bot identity fixes#1259

Merged
ascorbic merged 4 commits into
mainfrom
feat/investigate-bot-improvements
Jun 1, 2026
Merged

Investigate-bot fix gate, preview self-consistency, and bot identity fixes#1259
ascorbic merged 4 commits into
mainfrom
feat/investigate-bot-improvements

Conversation

@ascorbic
Copy link
Copy Markdown
Collaborator

@ascorbic ascorbic commented Jun 1, 2026

What does this PR do?

Three related improvements to the bot/CI tooling. Each is a separate commit.

1. Preview releases publish all public packages (ci:)

The pkg-pr-new publish call used a hardcoded list of 16 paths that omitted registry-client, plugin-types, and auth-atproto. Because pkg.pr.new only rewrites workspace:* deps to preview URLs for packages it publishes in the same run, emdash's preview pinned @emdash-cms/registry-client to its plain version and resolved it from npm instead. When source had drifted from the last npm release (registry-client's ./env export was added in #1238 without a version bump), preview installs broke with Package subpath './env' is not defined.

Now derived from globs (./packages/*, ./packages/plugins/*); pkg.pr.new skips private packages, so test fixtures are excluded automatically and all 23 public packages get a self-consistent preview. Also pins pkg-pr-new as a root devDependency and uses pnpm exec instead of pnpm dlx, per the tool's CI guidance.

2. Bot identity + triage label split (fix(ci):)

  • The investigate workflow committed as emdash-bot[bot] / bot@emdashcms.com, but the GitHub App slug is emdashbot. Corrected to emdashbot[bot] / emdashbot[bot]@users.noreply.github.com, matching every other workflow (and making the github.actor guards in auto-format/auto-extract actually match the bot's commits).
  • intended-behavior outcomes shared triage/reproduced with confirmed bugs, despite needing opposite follow-up. They now get a dedicated triage/by-design label (created in the repo).

3. Broaden the investigate-bot fix gate; run fix on a cheaper model (feat(flue):)

The fix stage ran only at verify=bug AND diagnose.confidence=high, where high was defined as "mechanical, one line, no ambiguity." That conflated two independent questions — is the cause certain? and is the fix obvious? — so confidently-diagnosed bugs with one clearly-correct fix among several shapes (e.g. #1178, #1199) were forced to medium and parked.

  • Decouple the axes: confidence rates root-cause certainty only; a new fixApproach (mechanical | clear-best-option | needs-design-decision) rates fix clarity. Gate becomes verdict=bug AND confidence!=low AND fixApproach!=needs-design-decision.
  • proposedFix: diagnose now always emits a concrete plan, fed to the fix stage as its spec and surfaced to the maintainer when the fix is deferred.
  • Cheaper fix model: the fix stage runs on a separate kimi-k2.6 agent in its own session (reasoning is already done; it's guided implementation), sharing the on-disk checkout so staged edits still reach the orchestrator. Overridable via FLUE_FIX_MODEL.
  • Reframed stakes: skills now treat the output as a reporter-verified candidate branch a maintainer reviews, not a merge.

Type of change

  • Chore (dependencies, CI, tooling)
  • Bug fix

Checklist

  • I have read CONTRIBUTING.md
  • pnpm typecheck passes (.flue workspace; flue build also succeeds)
  • pnpm lint passes (no diagnostics in changed files)
  • pnpm test passes (or targeted tests for my change) — n/a; CI-workflow and agent-prompt changes, no unit-testable surface
  • pnpm format has been run
  • I have added/updated tests for my changes (if applicable) — n/a
  • User-visible strings: n/a (no admin UI changes)
  • I have added a changeset (if this PR changes a published package) — n/a; only the root (private) package.json, workflows, and .flue changed

AI-generated code disclosure

  • This PR includes AI-generated code — model/tool: Claude Opus 4.8 (opencode)

Try this PR

Open a fresh playground →

A full working EmDash site, deployed from this branch. Each visit gets its own session-scoped sandbox: no login needed and no shared state. Try the admin, edit content, hit the public site.

Tracks feat/investigate-bot-improvements. Updated automatically when the playground redeploys.

ascorbic added 3 commits June 1, 2026 11:37
Derive the pkg.pr.new publish set from globs (./packages/*,
./packages/plugins/*) instead of a hardcoded list. pkg.pr.new skips
private packages, so the test fixtures are excluded automatically while
every public package, including the previously missing registry-client,
plugin-types and auth-atproto, now gets a preview.

This keeps preview installs self-consistent: emdash's preview references
sibling packages via workspace:*, and pkg.pr.new can only rewrite those
to matching preview URLs when the siblings are published in the same
run. Omitting one made the dep fall back to npm's released version,
which broke when source had drifted (e.g. registry-client's ./env
export added without a release).

Also pin pkg-pr-new as a root devDependency and run it via pnpm exec
rather than pnpm dlx, per the tool's CI guidance.
Two orchestrator-workflow fixes:

- Commit identity was emdash-bot[bot] / bot@emdashcms.com, but the
  GitHub App slug is emdashbot. Use emdashbot[bot] and the
  users.noreply.github.com attribution email, matching every other
  workflow. This also makes the github.actor guards in auto-format and
  auto-extract match the bot's own commits.
- intended-behavior outcomes now get a new triage/by-design label
  instead of sharing triage/reproduced with confirmed bugs. The two
  need opposite follow-up (likely close vs. needs a fix), so they
  should not share a label.
The fix stage only ran at verify=bug AND diagnose.confidence=high, where
high meant 'mechanical, one-line, no ambiguity'. That conflated two
independent questions: is the root cause certain, and is the fix
obvious? Real, fixable bugs (e.g. #1178, #1199) were parked at
triage/reproduced because one clearly-correct fix existed among several
shapes, which forced a medium rating.

Decouple the axes:
- confidence now rates root-cause certainty only.
- a new fixApproach (mechanical | clear-best-option | needs-design-decision)
  rates fix clarity.
- the gate becomes verdict=bug AND confidence!=low AND
  fixApproach!=needs-design-decision.

Diagnose also now emits a concrete proposedFix (always), which feeds the
fix stage as its spec and doubles as the maintainer's starting point
when the fix is deferred.

The fix stage runs on a separate, cheaper agent (kimi-k2.6) in its own
session: the reasoning is already done, so it is guided implementation.
It shares the on-disk checkout, so staged edits still reach the
orchestrator. Configurable via FLUE_FIX_MODEL.

Reframe the skill cost model: the output is a reporter-verified
candidate branch a maintainer reviews, not a merge, so a clear,
test-backed fix is worth attempting even when it is more than a
one-liner. Update diagnose/verify/fix skills, _INVESTIGATE.md and
README; delete the stale PLAN.md.
Copilot AI review requested due to automatic review settings June 1, 2026 10:39
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jun 1, 2026

⚠️ No Changeset found

Latest commit: e502307

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

PR template validation failed

Please fix the following issues by editing your PR description:

  • Check the "I have read CONTRIBUTING.md" checkbox.

See CONTRIBUTING.md for the full contribution policy.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Scope check

This PR changes 526 lines across 11 files. Large PRs are harder to review and more likely to be closed without review.

If this scope is intentional, no action needed. A maintainer will review it. If not, please consider splitting this into smaller PRs.

See CONTRIBUTING.md for contribution guidelines.

@github-actions github-actions Bot added the review/needs-review No maintainer or bot review yet label Jun 1, 2026
# source has drifted (e.g. a new exports subpath added without a release).
# pkg.pr.new skips `private: true` packages automatically, so the test
# fixtures under packages/plugins/* are excluded without enumerating them.
- run: pnpm exec pkg-pr-new publish --pnpm './packages/*' './packages/plugins/*'
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Jun 1, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
docs e502307 Jun 01 2026, 10:53 AM

@github-actions github-actions Bot added review/awaiting-author Reviewed; waiting on the author to respond and removed review/needs-review No maintainer or bot review yet labels Jun 1, 2026
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Jun 1, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
emdash-playground e502307 Jun 01 2026, 10:54 AM

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Jun 1, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
emdash-demo-cache e502307 Jun 01 2026, 10:54 AM

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Jun 1, 2026

Open in StackBlitz

@emdash-cms/admin

npm i https://pkg.pr.new/@emdash-cms/admin@1259

@emdash-cms/auth

npm i https://pkg.pr.new/@emdash-cms/auth@1259

@emdash-cms/auth-atproto

npm i https://pkg.pr.new/@emdash-cms/auth-atproto@1259

@emdash-cms/blocks

npm i https://pkg.pr.new/@emdash-cms/blocks@1259

@emdash-cms/cloudflare

npm i https://pkg.pr.new/@emdash-cms/cloudflare@1259

@emdash-cms/contentful-to-portable-text

npm i https://pkg.pr.new/@emdash-cms/contentful-to-portable-text@1259

emdash

npm i https://pkg.pr.new/emdash@1259

create-emdash

npm i https://pkg.pr.new/create-emdash@1259

@emdash-cms/gutenberg-to-portable-text

npm i https://pkg.pr.new/@emdash-cms/gutenberg-to-portable-text@1259

@emdash-cms/plugin-cli

npm i https://pkg.pr.new/@emdash-cms/plugin-cli@1259

@emdash-cms/plugin-types

npm i https://pkg.pr.new/@emdash-cms/plugin-types@1259

@emdash-cms/registry-client

npm i https://pkg.pr.new/@emdash-cms/registry-client@1259

@emdash-cms/registry-lexicons

npm i https://pkg.pr.new/@emdash-cms/registry-lexicons@1259

@emdash-cms/sandbox-workerd

npm i https://pkg.pr.new/@emdash-cms/sandbox-workerd@1259

@emdash-cms/x402

npm i https://pkg.pr.new/@emdash-cms/x402@1259

@emdash-cms/plugin-ai-moderation

npm i https://pkg.pr.new/@emdash-cms/plugin-ai-moderation@1259

@emdash-cms/plugin-atproto

npm i https://pkg.pr.new/@emdash-cms/plugin-atproto@1259

@emdash-cms/plugin-audit-log

npm i https://pkg.pr.new/@emdash-cms/plugin-audit-log@1259

@emdash-cms/plugin-color

npm i https://pkg.pr.new/@emdash-cms/plugin-color@1259

@emdash-cms/plugin-embeds

npm i https://pkg.pr.new/@emdash-cms/plugin-embeds@1259

@emdash-cms/plugin-field-kit

npm i https://pkg.pr.new/@emdash-cms/plugin-field-kit@1259

@emdash-cms/plugin-forms

npm i https://pkg.pr.new/@emdash-cms/plugin-forms@1259

@emdash-cms/plugin-webhook-notifier

npm i https://pkg.pr.new/@emdash-cms/plugin-webhook-notifier@1259

commit: e502307

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the repo’s bot/CI tooling to (1) publish self-consistent preview releases across all public workspace packages, and (2) improve the investigate-bot pipeline by correcting bot identity, splitting “by design” triage outcomes from confirmed bugs, and broadening/clarifying the automated-fix gate with a cheaper fix-stage model.

Changes:

  • Publish preview releases using package globs (and run pkg-pr-new via a pinned devDependency + pnpm exec) so all public workspace packages land in the same preview run.
  • Fix investigate workflow git identity to match the GitHub App slug, and introduce a dedicated triage/by-design label for intended behavior outcomes.
  • Decouple “root-cause confidence” from “fix clarity” in .flue, add fixApproach + always-present proposedFix, and run the fix stage in a separate cheaper model session.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pnpm-lock.yaml Adds pkg-pr-new lock entries to support running via pnpm exec.
package.json Pins pkg-pr-new as a root devDependency for CI usage.
.github/workflows/preview-releases.yml Switches to glob-based publishing and pnpm exec pkg-pr-new.
.github/workflows/investigate.yml Updates intended-behavior labeling + bot git identity.
.flue/workflows/investigate.ts Implements new fix gate (confidence + fixApproach), adds proposedFix, and runs fix on a separate agent/session.
.flue/skills/verify/SKILL.md Updates gate description to match the new fix criteria.
.flue/skills/fix/SKILL.md Reframes fix stage as guided implementation based on diagnose’s proposedFix.
.flue/skills/diagnose/SKILL.md Defines the new two-axis output (confidence vs fixApproach) and requires proposedFix.
.flue/skills/_INVESTIGATE.md Updates investigate reference doc to the new gating logic and outputs.
.flue/README.md Documents the updated pipeline and label meanings, including triage/by-design.
.flue/PLAN.md Removes obsolete design plan doc.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +357 to +362
# `triage/by-design` (not `triage/reproduced`): the bot
# reproduced the described behavior but believes it is
# intentional. This is a "likely close / convert to discussion"
# signal, the opposite follow-up from a confirmed bug, so it
# gets its own label rather than sharing triage/reproduced.
gh issue edit "$ISSUE_NUMBER" --repo emdash-cms/emdash --remove-label "triage/reproducing" --add-label "triage/by-design"
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on both. Fixed in e502307:

  • Added triage/by-design to the reproducing-transition cleanup loop in investigate.yml, so a re-triggered by-design issue no longer ends up with stacked triage/* labels.
  • Added triage/by-design -> By design to STATE_BY_LABEL and PRECEDENCE in triage-project-sync.yml (ranked just below triage/reproduced).

I also created the By design single-select option on Project #3's Triage State field, so by-design issues now sync to a real board state rather than being skipped.

Copy link
Copy Markdown
Contributor

@emdashbot emdashbot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a sensible, well-scoped PR that fixes three real problems: preview releases were missing packages because of a hardcoded list; the bot committed under the wrong GitHub App identity; and the fix gate conflated "certainty of cause" with "clarity of fix", which starved the fix stage of valid bugs. The decoupling into confidence + fixApproach + proposedFix is the right conceptual split, and running the implementation stage on a cheaper model with a fresh session is a clean way to save cost without losing correctness.

I checked the following:

  • Workflow YAMLs for token hygiene, label transitions, git identity, and branch push logic.
  • .flue/workflows/investigate.ts for schema changes, session wiring, and fix-gate logic.
  • Skill markdown files for consistency with the new gate conditions.
  • package.json and pnpm-lock.yaml for dependency hygiene.

Headline conclusion: One real bug in label management; everything else is clean.

Still needs fixing

  1. Missing triage/by-design label cleanup in investigate.yml. The transition step that strips old labels before adding triage/reproducing does not include the new triage/by-design label. If an issue was previously triaged as by-design and a maintainer re-triggers it, the stale label will remain alongside the new terminal label.

Clean / no action needed

  • The preview globs (./packages/*, ./packages/plugins/*) correctly include all public packages and rely on pkg-pr-new skipping private ones.
  • The pkg-pr-new dependency is correctly added as a root devDependency; using pnpm exec instead of pnpm dlx is appropriate.
  • Bot identity (emdashbot[bot] / emdashbot[bot]@users.noreply.github.com) now matches the App slug used in other workflows.
  • The valibot schema additions (fixApproach, proposedFix) are wired through correctly to both the fix session args and the fallback comment notes.
  • The notAttemptedReason ternary in investigate.ts is correct: intended-behavior returns earlier, so the "bug" branch at that point can only mean unclear.
  • The fixAgent gets its own init/session and shares the same sandbox cwd, so staged edits reach the orchestrator as intended.

Unverified but noted

  • pnpm-lock.yaml contains a small amount of unrelated drift (workerd snapshot downgrade, vite patch bump in @vitest/ui snapshot). This is harmless lockfile churn from adding the new dependency, but it adds noise to the diff. It would be cleaner if the lockfile only contained the pkg-pr-new addition and its transitive deps.

Addresses review on #1259. The new triage/by-design label was missing
from two places:

- investigate.yml's reproducing-transition cleanup loop, so a
  re-triggered by-design issue could carry triage/by-design alongside
  triage/reproducing.
- triage-project-sync.yml's STATE_BY_LABEL/PRECEDENCE, so by-design
  issues resolved to no state and stopped syncing to the board.

Maps triage/by-design -> 'By design' board option (terminal verdict,
ranked just below reproduced in precedence). The 'By design' single-
select option must be added to Project #3's 'Triage State' field; until
then the sync warns-and-skips rather than failing.
@github-actions github-actions Bot added review/needs-rereview Author pushed changes since the last review and removed review/awaiting-author Reviewed; waiting on the author to respond labels Jun 1, 2026
@ascorbic ascorbic merged commit d4e306c into main Jun 1, 2026
39 of 40 checks passed
@ascorbic ascorbic deleted the feat/investigate-bot-improvements branch June 1, 2026 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ci review/needs-rereview Author pushed changes since the last review size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants