Skip to content

Conversation

@mznet
Copy link
Contributor

@mznet mznet commented Jan 17, 2026

Calling strip_identifier with identifiers that contain cjk characters causes a Rust panic as shown below

assert!(is_valid_javascript_identifier("한글"));
thread 'js_identifiers::tests::test_is_valid_javascript_identifier' (1076350) panicked at src/js_identifiers.rs:49:12:
byte index 4 is not a char boundary; it is inside '글' (bytes 3..6) of `한글`
stack backtrace:
   0: __rustc::rust_begin_unwind
             at /rustc/f8297e351a40c1439a467bbbb6879088047f50b3/library/std/src/panicking.rs:698:5
   1: core::panicking::panic_fmt
             at /rustc/f8297e351a40c1439a467bbbb6879088047f50b3/library/core/src/panicking.rs:75:14
   2: core::str::slice_error_fail_rt
   3: core::str::slice_error_fail
             at /rustc/f8297e351a40c1439a467bbbb6879088047f50b3/library/core/src/str/mod.rs:69:5
   4: core::str::traits::<impl core::slice::index::SliceIndex<str> for core::ops::range::Range<usize>>::index
             at /Users/mjet.plane/.rustup/toolchains/1.91.0-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/str/traits.rs:248:21
   5: core::str::traits::<impl core::slice::index::SliceIndex<str> for core::ops::range::RangeInclusive<usize>>::index
             at /Users/mjet.plane/.rustup/toolchains/1.91.0-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/str/traits.rs:664:33
   6: core::str::traits::<impl core::slice::index::SliceIndex<str> for core::ops::range::RangeToInclusive<usize>>::index
             at /Users/mjet.plane/.rustup/toolchains/1.91.0-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/str/traits.rs:751:24
   7: core::str::traits::<impl core::ops::index::Index<I> for str>::index
             at /Users/mjet.plane/.rustup/toolchains/1.91.0-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/str/traits.rs:63:15
   8: sourcemap::js_identifiers::strip_identifier
             at ./src/js_identifiers.rs:49:12
   9: sourcemap::js_identifiers::is_valid_javascript_identifier
             at ./src/js_identifiers.rs:54:5

Using cjk characters in identifiers is not common, but I found examples where cjk characters are used as identifiers and the panic happened.
Javascript identifiers are not limited to ascii characters and can include unicode characters.

The current implementation stored only the start byte index of each character in end_idx while iterating, and then sliced the string using an inclusive range &s[..=end_idx].

For example, the string "한글", '한' occupies bytes 0–2 and '글' occupies bytes 3–5, but when processing '글', i = 3 is stored as end_idx and slicing with &s[..=3] breaks the UTF-8 character boundary.
This results in a byte index is not a char boundary panic.

To fix, the code now tracks the end position of each character instead of the start position by calculating end_idx = i + c.len_utf8(), and uses an exclusive range &s[..end_idx] when slicing.

This change covers not only CJK characters but also other non-ASCII Unicode identifiers.

mznet added 2 commits January 17, 2026 15:40
Change "変数名" (Japanese) to "变量名" (Chinese) for better CJK coverage.
Copy link
Contributor

@loewenheim loewenheim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thank you very much for the report and fix. Nice catch!

ETA: Can you please run cargo fmt and update the PR?

@loewenheim loewenheim enabled auto-merge (squash) January 19, 2026 14:35
auto-merge was automatically disabled January 20, 2026 00:15

Head branch was pushed to by a user without write access

@mznet
Copy link
Contributor Author

mznet commented Jan 20, 2026

@loewenheim I applied cargo fmt to the code to improve readability.

@loewenheim loewenheim merged commit c3c213d into getsentry:master Jan 20, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants