fix(file_write): non-greedy tag, last-block fence, schema content fallback, debug placeholder by voidborne-d · Pull Request #246 · lsdefine/GenericAgent

voidborne-d · 2026-05-01T02:51:06Z

Closes #241.

Summary

Issue #241 reports 3 extraction bugs + 1 design gap in file_write. Reproduced and fixed all four with a 12-line, 4-file change. Net code size change: 0 (LoC down by 1 in ga.py, +1 in agent_loop.py, +1 each in the two schema files).

Changes

Bug 1 — greedy `(.*)` swallows multi-tag bodies (`ga.py`)

<file_content[^>]*>(.*)</file_content> matched to the last </file_content> in the reply, so two adjacent <file_content> blocks (or one whose body contains the literal closing string) produced corrupt content. Switched to re.findall(... .*? ...) and take the last match — non-greedy + last-wins matches the LLM's most-recent intent.

Bug 2 — first-fence-to-last-fence span (`ga.py`)

The fallback text.find('\``'), text.rfind('```')returned the entire span between the first and last triple backtick — when the reply contained a prose code snippet *before* the file-content fence, the prose plus its closing fence got concatenated into the file. Replaced withre.findall(r"```[^\n\`]\n([\s\S]?)```", ...)` and take the last fence body. Single-fence behavior unchanged.

Bug 3 — `<file_content>` silently stripped from logs (`agent_loop.py`)

_clean_content removed <file_content>...</file_content> entirely, so when a write turned out wrong there was no way to confirm what the model actually emitted. Now substitutes a <file_content: N chars> placeholder — visible in turn logs, zero token overhead, debugging restored.

Bug 4 — schema lacks a `content` parameter (`tools_schema*.json`)

file_write's parameter set was just path + mode; if reply-body parsing missed the content (Bugs 1/2 territory or bad LLM formatting) there was no fallback. Added an optional content string. do_file_write now does args.get(\"content\") or extract_robust_content(response.content) — body-extraction stays the canonical path, schema arg is a tail-end safety net.

Verification (no test infra in repo, so behavior verified by direct run)

✓ multi tag: returns the LAST <file_content> body
✓ article + file-content block: returns last fence body, prose ignored
✓ single fence: unchanged
✓ <file_content lang=\"py\"> attribute: matched
✓ no content: returns None
✓ <file_content> shown inside a fence + real one outside: real one wins
✓ _clean_content emits <file_content: N chars> placeholder
✓ schema includes content param (both en + cn)
✓ args.content fallback wired into ga.py

python -m py_compile ga.py agent_loop.py clean. Both schemas parse as valid JSON.

Notes

All four fixes are surgical and isolated to the boundary of the bug. No collateral refactor.
Schema description for the new content field calls it [Optional fallback] so the LLM keeps preferring <file_content> (preserves token-efficient streaming).
Net delta is neutral (3 added / 4 removed in ga.py, +1 elsewhere). Aligns with the project's "ideally negative or zero" line-count guidance.

…lback, debug placeholder Closes lsdefine#241. - ga.py: <file_content> tag match becomes non-greedy + uses last tag/fence to avoid swallowing prose between unrelated triple backticks. - ga.py: file_write accepts optional args.content as a direct fallback when reply body has no <file_content> or trailing code block. - agent_loop.py: _clean_content replaces <file_content> with a length placeholder instead of stripping silently, so writes are visible in logs. - tools_schema(_cn).json: declare optional content parameter on file_write.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(file_write): non-greedy tag, last-block fence, schema content fallback, debug placeholder#246

fix(file_write): non-greedy tag, last-block fence, schema content fallback, debug placeholder#246
voidborne-d wants to merge 1 commit intolsdefine:mainfrom
voidborne-d:fix/file-write-extraction

voidborne-d commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

voidborne-d commented May 1, 2026

Summary

Changes

Bug 1 — greedy (.*) swallows multi-tag bodies (ga.py)

Bug 2 — first-fence-to-last-fence span (ga.py)

Bug 3 — <file_content> silently stripped from logs (agent_loop.py)

Bug 4 — schema lacks a content parameter (tools_schema*.json)

Verification (no test infra in repo, so behavior verified by direct run)

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bug 1 — greedy `(.*)` swallows multi-tag bodies (`ga.py`)

Bug 2 — first-fence-to-last-fence span (`ga.py`)

Bug 3 — `<file_content>` silently stripped from logs (`agent_loop.py`)

Bug 4 — schema lacks a `content` parameter (`tools_schema*.json`)