Skip to content

copilot-driver: restart fresh on null-type tool_call 400; permanently disable --continue after any fallback#29329

Merged
pelikhan merged 4 commits intomainfrom
copilot/fix-copilot-driver-restart-strategy
Apr 30, 2026
Merged

copilot-driver: restart fresh on null-type tool_call 400; permanently disable --continue after any fallback#29329
pelikhan merged 4 commits intomainfrom
copilot/fix-copilot-driver-restart-strategy

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 30, 2026

When the model emits a tool call with type: null, the API returns a 400 that poisons conversation history. Retrying with --continue re-injects the same broken state — all 4 attempts fail identically. A fresh restart discards the corrupt history and gives the run a real chance to succeed.

Changes

  • New error detectionNULL_TYPE_TOOL_CALL_PATTERN matches the specific API 400: "Invalid type for '...tool_calls[N].type': ... but got null instead."
  • Fresh restart on null-type — when detected, clears --continue and continues the retry loop as a fresh run (works whether the failing attempt used --continue or not)
  • continueDisabledPermanently guard — once a fallback to fresh run is triggered (null-type or auth error), --continue is never re-enabled for subsequent attempts; previously a fresh-run success with output would re-arm --continue
  • Auth fallback hardened — existing auth-error fallback now also sets continueDisabledPermanently, closing the same re-arm bug
  • Diagnostic log — per-attempt failure line gains isNullTypeToolCallError= field

Retry flow example

attempt 1 (fresh):      partial output, normal error  → retry with --continue
attempt 2 (--continue): tool_calls[0].type: null 400  → restart fresh, --continue permanently disabled
attempt 3 (fresh):      partial output, other error   → retry fresh  (NOT --continue)
attempt 4 (fresh):      ...

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • invalid.example.invalid
    • Triggering command: /usr/lib/git-core/git-remote-https /usr/lib/git-core/git-remote-https origin https://invalid.example.invalid/nonexistent-repo.git e/git init�� ndor/bin/git git ode_modules/.bin/git =receive test@example.com--git-dir=/tmp/bare-incremental-UQfpU1 /git (dns block)
    • Triggering command: /usr/lib/git-core/git-remote-https /usr/lib/git-core/git-remote-https origin https://invalid.example.invalid/nonexistent-repo.git e/git init�� ndor/bin/git git ode_modules/.bin/git =receive test@example.com--git-dir=/tmp/bare-incremental-QCGtrx /git (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI and others added 2 commits April 30, 2026 14:41
Copilot AI changed the title [WIP] Fix copilot-driver to restart fresh on null-type tool_call error copilot-driver: restart fresh on null-type tool_call 400; permanently disable --continue after any fallback Apr 30, 2026
Copilot AI requested a review from pelikhan April 30, 2026 14:46
@pelikhan pelikhan marked this pull request as ready for review April 30, 2026 14:50
Copilot AI review requested due to automatic review settings April 30, 2026 14:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Copilot CLI harness retry policy to detect a specific “null-type tool call” API 400 that corrupts conversation history, and to recover by restarting fresh while preventing --continue from being reused after any such fallback.

Changes:

  • Add detection for null-type tool_calls[N].type 400 and force a fresh restart on retry.
  • Introduce a continueDisabledPermanently guard so --continue can’t be re-enabled after a fallback (null-type or auth).
  • Add unit tests covering the new pattern and retry-mode transitions; add a patch changeset entry.
Show a summary per file
File Description
actions/setup/js/copilot_harness.cjs Implements null-type tool-call detection, fresh-restart fallback, and permanent --continue disable guard; expands per-attempt diagnostics.
actions/setup/js/copilot_harness.test.cjs Adds tests for the null-type error regex and for retry policy behavior around fresh restarts and permanent --continue disabling.
.changeset/patch-copilot-driver-null-type-tool-call-fresh-restart.md Documents the behavior change as a patch release note.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 3/3 changed files
  • Comments generated: 2

Comment on lines +484 to +488
// Null-type tool_call error: the model emitted a malformed tool call that poisons the
// conversation history. Retrying with --continue re-injects the same broken history and
// produces the same 400 on every subsequent attempt. Restart fresh to discard the poisoned
// history, and permanently disable --continue so the corrupt state is never re-loaded.
if (isNullTypeToolCall) {
describe("null-type tool_call detection pattern", () => {
const NULL_TYPE_TOOL_CALL_PATTERN = /tool_calls\[.*?\]\.type.*null/;

it("matches the exact error from the failed workflow run", () => {
@github-actions github-actions Bot mentioned this pull request Apr 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 80/100

Excellent

Metric Value
New/modified tests analyzed 10
✅ Design tests (behavioral contracts) 10 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 10 (100%)
Duplicate test clusters 0
Test inflation detected ⚠️ Yes (200 test lines / 43 production lines ≈ 4.6:1)
🚨 Coding-guideline violations None

Test Classification Details

View all 10 tests
Test File Classification Issues Detected
matches the exact error from the failed workflow run copilot_harness.test.cjs ✅ Design None — validates real-world error string
matches with different array indices copilot_harness.test.cjs ✅ Design None — validates index-variant matching
does not match unrelated tool_calls errors copilot_harness.test.cjs ✅ Design Negative/edge cases covered
does not match unrelated null errors copilot_harness.test.cjs ✅ Design Negative + empty-string edge case
restarts fresh when null-type error occurs on a --continue attempt copilot_harness.test.cjs ✅ Design Behavioral contract: retry=true, continue=false, disabled=true
restarts fresh when null-type error occurs on a fresh run copilot_harness.test.cjs ✅ Design Behavioral contract: fresh-start path
does not retry when budget is exhausted copilot_harness.test.cjs ✅ Design Edge case: MAX_RETRIES boundary
does not retry when no output was produced copilot_harness.test.cjs ✅ Design Edge case: hasOutput=false guard
does not re-enable --continue after a null-type fresh restart copilot_harness.test.cjs ✅ Design Multi-step invariant: permanent disable guard
does not re-enable --continue after an auth-error fresh restart copilot_harness.test.cjs ✅ Design Multi-step invariant: auth-error permanent disable

Observations (Not Blocking)

⚠️ Inlined retry logic in test file

The describe blocks null-type tool_call restarts fresh instead of --continue and permanent --continue disable guard each define a local applyRetryPolicy function that mirrors the production logic rather than importing it. This means:

  • The tests verify the policy's behavioral contract correctly — the invariants are real and well-chosen.
  • However, if the production applyRetryPolicy is refactored (e.g., variable names or branching order change), these tests would still pass while the production code silently regresses.

Suggested improvement: If the production retry policy is extractable as a pure function, export it and import it directly in tests. If not (e.g., it's deeply embedded in a closure), the current inline approach is acceptable — just document why.

i️ Test inflation (4.6:1 ratio)

200 test lines were added against 43 production lines. This triggered the inflation penalty (−10 pts), but the tests are genuinely thorough — the ratio is explained by multi-step scenario tests that require test-local helper functions and detailed assertion chains. No artificial inflation detected.


Language Support

Tests analyzed:

  • 🟨 JavaScript (*.test.cjs): 10 tests (vitest) — actions/setup/js/copilot_harness.test.cjs
  • 🐹 Go (*_test.go): 0 tests changed

Score Breakdown

Component Weight Score
Behavioral Coverage (10/10 design tests) 40 pts 40
Error/Edge Case Coverage (10/10 tests) 30 pts 30
Low Duplication (0 clusters) 20 pts 20
Proportional Growth (4.6:1 ratio > 2:1) 10 pts 0 ⚠️
Total 100 pts 80

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). All 10 tests enforce genuine behavioral contracts: the regex detection patterns are validated against real-world error strings (including the exact error from the failed workflow run), and the retry policy invariants are tested end-to-end including budget exhaustion, no-output guards, and multi-step permanent-disable scenarios.


📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References:

🧪 Test quality analysis by Test Quality Sentinel · ● 399.8K ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 80/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 10 new tests enforce genuine behavioral contracts covering the null-type tool_call detection regex, retry policy decisions, budget exhaustion, no-output guards, and multi-step permanent --continue disable invariants.

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot review all comments

…st for accuracy

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/1eb81da3-5f18-4bab-803c-d9ba0be9fca8

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 30, 2026

@copilot review all comments

Done in 2b52491:

  • Header comment (copilot_harness.cjs top-level retry-policy docs): added a bullet documenting that null-type tool_calls[N].type 400 errors poison conversation history, trigger a fresh restart, and permanently disable --continue for the rest of the run.
  • In-loop comment (before the retry-decision block): rewrote the "Exceptions" line to enumerate MCP policy, model-not-supported, auth, and null-type tool_call cases explicitly.
  • Test name: renamed "matches the exact error from the failed workflow run""matches the error format observed in failed workflow runs" to avoid implying the fixture string is a verbatim copy of the API message.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • invalid.example.invalid
    • Triggering command: /usr/lib/git-core/git-remote-https /usr/lib/git-core/git-remote-https origin https://invalid.example.invalid/nonexistent-repo.git e/git init�� ndor/bin/git git ode_modules/.bin/git =receive test@example.com--git-dir=/tmp/bare-incremental-048Db6 /git (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@pelikhan pelikhan merged commit 7ce8d1c into main Apr 30, 2026
@pelikhan pelikhan deleted the copilot/fix-copilot-driver-restart-strategy branch April 30, 2026 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

copilot-driver: restart fresh instead of --continue when 400 null-type tool_call poisons history

3 participants