Skip to content

fix(import): truncate at UTF-8 char boundary to avoid panic on multibyte chars#201

Merged
pszymkowiak merged 2 commits into
developfrom
fix/import-utf8-boundary
May 10, 2026
Merged

fix(import): truncate at UTF-8 char boundary to avoid panic on multibyte chars#201
pszymkowiak merged 2 commits into
developfrom
fix/import-utf8-boundary

Conversation

@pszymkowiak
Copy link
Copy Markdown
Contributor

Summary

  • Closes Bug: icm import panics when a session file contains a multibyte UTF-8 character (). #199
  • icm import panicked when a session contained a multibyte UTF-8 character (e.g. ) straddling byte offset 500 (raw_excerpt cap) or 4096 (format-peek cap)
  • Both call sites used a raw byte slice &s[..N], which panics if N falls inside a char
  • Replaced both with the existing truncate_at_char_boundary helper (crates/icm-cli/src/main.rs:2541) — same helper already used for transcript truncation in the hook path
  • Added two regression tests building strings where (3 bytes) straddles the cap exactly

Reproducer (from #199)

thread 'main' panicked at crates/icm-cli/src/import.rs:524:26:
end byte index 500 is not a char boundary; it is inside '→' (bytes 498..501)

Test plan

  • cargo test -p icm-cli --bin icm import:: — 13 passed
  • cargo test --workspace — 400 passed, 4 ignored
  • cargo fmt --check clean
  • cargo clippy --workspace --all-targets -- -D warnings clean

🤖 Generated with Claude Code

pszymkowiak and others added 2 commits May 10, 2026 09:07
…yte chars

Closes #199.

`icm import` panicked when a session contained a multibyte UTF-8 character
(e.g. `→`) straddling byte offset 500 (raw_excerpt cap) or 4096 (format-peek
cap). Both call sites used a raw byte slice `&s[..N]` which panics if `N`
falls inside a char.

Reuse the existing `truncate_at_char_boundary` helper at both sites, and add
two regression tests that build strings where the multibyte char straddles
the cap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The two existing tests cover `truncate_at_char_boundary` directly. This
adds an end-to-end test that runs the actual `cmd_import` flow on a JSONL
session whose joined-text byte 500 falls inside `→`, confirming the bug
path (raw_excerpt slice) is exercised. Verified against pre-fix code:
panics with the exact error message reported in #199.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pszymkowiak pszymkowiak merged commit 12b3606 into develop May 10, 2026
6 checks passed
@pszymkowiak pszymkowiak deleted the fix/import-utf8-boundary branch May 10, 2026 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant