fix PANIC Sorry, Pyrefly crashed, this is always a bug in Pyrefly itself #2722#2755
fix PANIC Sorry, Pyrefly crashed, this is always a bug in Pyrefly itself #2722#2755asukaminato0721 wants to merge 1 commit intofacebook:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a Pyrefly crash in signature-diff rendering by ensuring computed diff spans never split multi-byte UTF-8 characters, addressing issue #2722.
Changes:
- Reworked
diff_rangesto compute longest common prefix/suffix using character boundaries and convert back to byte offsets safely. - Updated the
diff_rangesdoc comment to describe UTF-8-safe behavior. - Added a regression test covering Unicode content in a
Literal[...]return type.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| /// Find the UTF-8-safe byte ranges where two strings differ, using longest | ||
| /// common prefix and suffix. Returns `None` if the strings are equal. | ||
| /// | ||
| /// The returned ranges highlight the "differing middle" of each string. | ||
| /// When one string is a strict prefix/suffix of the other, a minimal | ||
| /// single-byte range is returned to ensure there's always something to annotate. | ||
| /// | ||
| /// Note: operates on raw bytes, which is correct for ASCII type names but | ||
| /// could produce ranges that split multi-byte UTF-8 characters for non-ASCII | ||
| /// identifiers. | ||
| /// single-character range is returned to ensure there's always something to | ||
| /// annotate. | ||
| fn diff_ranges(expected: &str, found: &str) -> Option<(Range<usize>, Range<usize>)> { | ||
| if expected == found { | ||
| return None; | ||
| } | ||
| let expected_bytes = expected.as_bytes(); | ||
| let found_bytes = found.as_bytes(); | ||
| let expected_chars = expected.chars().collect::<Vec<_>>(); | ||
| let found_chars = found.chars().collect::<Vec<_>>(); | ||
| let expected_boundaries = expected | ||
| .char_indices() | ||
| .map(|(idx, _)| idx) | ||
| .chain(std::iter::once(expected.len())) | ||
| .collect::<Vec<_>>(); | ||
| let found_boundaries = found | ||
| .char_indices() | ||
| .map(|(idx, _)| idx) | ||
| .chain(std::iter::once(found.len())) | ||
| .collect::<Vec<_>>(); | ||
|
|
||
| let mut lcp = 0; | ||
| while lcp < expected_bytes.len() | ||
| && lcp < found_bytes.len() | ||
| && expected_bytes[lcp] == found_bytes[lcp] | ||
| while lcp < expected_chars.len() | ||
| && lcp < found_chars.len() | ||
| && expected_chars[lcp] == found_chars[lcp] | ||
| { | ||
| lcp += 1; | ||
| } | ||
| let mut lcs = 0; | ||
| while expected_bytes.len() > lcp + lcs | ||
| && found_bytes.len() > lcp + lcs | ||
| && expected_bytes[expected_bytes.len() - 1 - lcs] | ||
| == found_bytes[found_bytes.len() - 1 - lcs] | ||
| while expected_chars.len() > lcp + lcs | ||
| && found_chars.len() > lcp + lcs | ||
| && expected_chars[expected_chars.len() - 1 - lcs] | ||
| == found_chars[found_chars.len() - 1 - lcs] | ||
| { | ||
| lcs += 1; | ||
| } | ||
| let expected_end = expected_bytes.len().saturating_sub(lcs); | ||
| let found_end = found_bytes.len().saturating_sub(lcs); | ||
| let expected_end = expected_chars.len().saturating_sub(lcs); | ||
| let found_end = found_chars.len().saturating_sub(lcs); | ||
| let expected_span = if expected_end > lcp { | ||
| lcp..expected_end | ||
| expected_boundaries[lcp]..expected_boundaries[expected_end] | ||
| } else { | ||
| // The expected params are a prefix of the found params (or vice versa). | ||
| // Point at the first character after the shared prefix, which in the | ||
| // full source corresponds to the closing `)` or `,` — indicating where | ||
| // parameters are missing or extra. Clamp to the string length to avoid | ||
| // producing an out-of-bounds range when the entire string is a prefix | ||
| // (e.g., for Callable types whose return type ends at the string boundary). | ||
| lcp..(lcp + 1).min(expected_bytes.len()) | ||
| let next = (lcp + 1).min(expected_chars.len()); | ||
| expected_boundaries[lcp]..expected_boundaries[next] |
There was a problem hiding this comment.
The doc comment says a strict prefix/suffix case returns a “minimal single-character range”, but the current implementation can return a zero-length (point) span when the common prefix reaches the end of the shorter string (e.g. lcp == expected_chars.len() => expected_boundaries[lcp]..expected_boundaries[lcp]). Please update the comment to reflect that a point span may be returned, or adjust the logic to always produce a 1-character span when possible (e.g. fall back to highlighting the last character when the diff is at end-of-string).
|
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
|
@yangdanny97 has imported this pull request. If you are a Meta employee, you can view this in D95951583. |
grievejia
left a comment
There was a problem hiding this comment.
Review automatically exported from Phabricator review in Meta.
|
@yangdanny97 merged this pull request in ff72f08. |
…elf facebook#2722 (facebook#2755) Summary: Fixes facebook#2722 Fixed the crash by making override signature diff spans UTF-8 safe. It now computes prefix/suffix matches by character and converts back to byte offsets only at valid char boundaries. Pull Request resolved: facebook#2755 Test Plan: added regression test mypy primer does not panic Reviewed By: grievejia Differential Revision: D95951583 Pulled By: yangdanny97 fbshipit-source-id: b147fc2756fc132ea73dc03efce52bb99d1c9fb7
Summary
Fixes #2722
Fixed the crash by making override signature diff spans UTF-8 safe.
It now computes prefix/suffix matches by character and converts back to byte offsets only at valid char boundaries.
Test Plan
Added a regression test from the GitHub comment