Skip to content

feat(chat): reorder harness markers and split compaction buckets#222

Merged
dcramer merged 7 commits intomainfrom
devin/1776625437-harness-markers-instruction-precedence
Apr 19, 2026
Merged

feat(chat): reorder harness markers and split compaction buckets#222
dcramer merged 7 commits intomainfrom
devin/1776625437-harness-markers-instruction-precedence

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented Apr 19, 2026

Summary

Reshapes the user-turn prompt wrapper and thread-background rendering so Claude Sonnet and GPT-5 treat the current instruction as authoritative and prior thread context as read-only reference material. Addresses the failure mode tracked in #221 and getsentry/junior-prod#35, where Junior drifts onto a narrowed-but-superseded ask from earlier in a thread.

Changes in packages/junior/src/chat/respond-helpers.ts (buildUserTurnText):

  • Order (top → bottom): <thread-background>, <session-context>, <turn-context>, <current-instruction priority="highest"><current-instruction> is always the final block, matching Anthropic's long-context guidance to place the active query last.
  • Drops legacy <current-message> / <thread-conversation-context> wrappers.
  • No explanatory prose inside markers — tag names carry the signal.

Changes in packages/junior/src/chat/services/conversation-memory.ts:

  • buildConversationContext wraps each compaction in <compaction index=… covered_messages=… created_at=…> and each transcript entry in <message index=… ts=… role=… author=… slack_ts=…>, so each prior item is an individually addressable reference instead of a flat blob.
  • summarizeConversationChunk prompt now produces three fixed sections — <active-asks>, <superseded-or-completed-asks>, <facts> — so stale or already-acted-on asks stop reading as live constraints after compaction.

Rationale and authoritative prior art (Anthropic long-context guide, OpenAI GPT-5 prompting guide, OpenAI Model Spec chain-of-command) are cited in #221.

Review & Testing Checklist for Human

  • Sanity-check the new buildUserTurnText output shape against a real thread turn (e.g. local dev or an eval snapshot) and confirm the final tag emitted is </current-instruction> and <thread-background> precedes it.
  • Spot-check one compacted conversation in a real thread to confirm the summarizer is producing the three-bucket XML (active / superseded / facts) rather than a free-form paragraph. Because the summarizer is model-generated, the prompt change only shapes output — run against the production fast model to verify it complies.
  • Decide whether this should be gated behind an eval sweep on both Sonnet and GPT-5 gateway models before relying on the new marker shape for production traffic. This PR does not add such an eval.

Notes

  • Intentionally preserved the <thread-transcript> / <thread-compactions> marker names; routing fixtures in tests/unit/routing/subscribed-decision.test.ts still reference them.
  • No runtime behavior change beyond the emitted prompt text; no new dependencies, no schema changes. Compaction storage format (summary: string) is unchanged — only the prompt that generates it is updated.
  • Pre-existing unit-test failure tests/unit/services/turn-checkpoint.test.ts > reuses the latest stored transcript… reproduces on main (requires REDIS_URL) and is unrelated to this PR.
  • Follow-up candidates (not in this PR): add an eval that exercises narrow-then-broaden instruction drift across a compacted thread; consider also marking the assistant's own prior tool calls with an executed flag in <message> wrappers.

Link to Devin session: https://app.devin.ai/sessions/f46faf27a4354f7dab95abd8dfc50211
Requested by: @dcramer

…ecedence

Put thread background first, latest user instruction last, and add an
explicit instruction-precedence block. Wrap per-compaction and per-message
items with metadata attributes so they read as individual references
instead of one flat blob. Split compaction summaries into active-asks /
superseded-or-completed-asks / facts buckets so stale or completed asks
stop reading as currently active.

Rationale and citations in #221.

Co-Authored-By: Devin <devin@cognition.ai>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR that start with 'DevinAI' or '@devin'.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
junior-docs Ready Ready Preview, Comment Apr 19, 2026 9:01pm

Request Review

Shrink JSDocs on buildUserTurnText and buildConversationContext to the
intent only, drop the redundant <thread-background> preamble (the
precedence block already covers it), fold the single-use attr helpers
into buildConversationContext, and collapse single-line section pushes.
No behavior change.

Co-Authored-By: Devin <devin@cognition.ai>
…er block

Keep <latest-user-instruction> as the final block of the user turn so the
model sees the active ask last, and move <instruction-precedence> to the
top of the wrapper so the reconciliation rules frame the context that
follows. Each marker (<thread-background>, <session-context>, <turn-context>,
<latest-user-instruction>, and the <thread-compactions>/<thread-transcript>
blocks inside background) now opens with a one-line purpose statement so
the role of every block is self-describing.

Co-Authored-By: Claude sonnet-4.5 <devin-ai-integration[bot]@users.noreply.github.com>
Tag names are the system markers; they do not need an explanatory
sentence inside each block. Remove the <instruction-precedence> wrapper
and the descriptor lines from <thread-background>, <session-context>,
<turn-context>, <latest-user-instruction>, <thread-compactions>, and
<thread-transcript>. Behavior-relevant structure (ordering,
per-compaction/per-message metadata, priority="highest" on the latest
instruction) is preserved.

Co-Authored-By: Claude sonnet-4.5 <devin-ai-integration[bot]@users.noreply.github.com>
@devin-ai-integration devin-ai-integration Bot changed the title feat(chat): reorder harness markers for Sonnet + GPT-5 instruction precedence feat(chat): reorder harness markers and split compaction buckets Apr 19, 2026
…ion>

The 'user' qualifier is implicit in the turn context and 'current' is
more direct than 'latest'. Attribute (priority="highest") and placement
(final block of the wrapper) are unchanged.

Co-Authored-By: Claude sonnet-4.5 <devin-ai-integration[bot]@users.noreply.github.com>
cursor[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

specs/testing/unit-spec.md:47 bans unit tests that assert exact or
substring prompt prose on prompt builders. Keep only the pure-logic
branch cases (raw pass-through, empty-conversation undefined) and
defer structural XML validation to integration or eval coverage.

Co-Authored-By: Claude sonnet-4.5 <devin-ai-integration[bot]@users.noreply.github.com>
The local escapeAttr only handled double quotes, so author names and
slack_ts values containing &, <, or > would produce malformed XML
attributes. Swap to the shared escapeXml utility from @/chat/xml,
which covers all five XML special characters.

Co-Authored-By: Claude sonnet-4.5 <devin-ai-integration[bot]@users.noreply.github.com>
@dcramer dcramer merged commit 7f8f845 into main Apr 19, 2026
15 checks passed
@dcramer dcramer deleted the devin/1776625437-harness-markers-instruction-precedence branch April 19, 2026 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant