fix(import): truncate at UTF-8 char boundary to avoid panic on multibyte chars#201
Merged
Conversation
…yte chars Closes #199. `icm import` panicked when a session contained a multibyte UTF-8 character (e.g. `→`) straddling byte offset 500 (raw_excerpt cap) or 4096 (format-peek cap). Both call sites used a raw byte slice `&s[..N]` which panics if `N` falls inside a char. Reuse the existing `truncate_at_char_boundary` helper at both sites, and add two regression tests that build strings where the multibyte char straddles the cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The two existing tests cover `truncate_at_char_boundary` directly. This adds an end-to-end test that runs the actual `cmd_import` flow on a JSONL session whose joined-text byte 500 falls inside `→`, confirming the bug path (raw_excerpt slice) is exercised. Verified against pre-fix code: panics with the exact error message reported in #199. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
icm importpanics when a session file contains a multibyte UTF-8 character (→). #199icm importpanicked when a session contained a multibyte UTF-8 character (e.g.→) straddling byte offset 500 (raw_excerptcap) or 4096 (format-peek cap)&s[..N], which panics ifNfalls inside a chartruncate_at_char_boundaryhelper (crates/icm-cli/src/main.rs:2541) — same helper already used for transcript truncation in the hook path→(3 bytes) straddles the cap exactlyReproducer (from #199)
Test plan
cargo test -p icm-cli --bin icm import::— 13 passedcargo test --workspace— 400 passed, 4 ignoredcargo fmt --checkcleancargo clippy --workspace --all-targets -- -D warningsclean🤖 Generated with Claude Code