fix(at-command): prevent stack overflow from regex backtracking on large inputs#27580
fix(at-command): prevent stack overflow from regex backtracking on large inputs#27580Sauravdas007 wants to merge 2 commits into
Conversation
adding state-machine version of parseAllAtCommands() to resolve // regex error
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical performance issue where complex regular expressions caused stack overflow errors during the processing of large or malformed inputs. By transitioning to an iterative, manual scanning approach, the system now handles large pastes with linear-time complexity, ensuring stability and reliability for the '@' command parsing logic. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request replaces the regex-based parsing of @ commands in atCommandProcessor.ts with a manual iterative character scanner to better handle quoted paths and escaped characters. However, the new scanner introduces two correctness regressions: it incorrectly includes trailing periods in file paths and swallows the rest of the line when encountering unclosed quotes. A code suggestion has been provided to address these issues by checking for closing quotes and explicitly breaking on trailing periods.
| let inQuotes = false; | ||
|
|
||
| while (i < query.length) { | ||
| const ch = query[i]; | ||
|
|
||
| // Handle quoted paths | ||
| if (ch === '"') { | ||
| inQuotes = !inQuotes; | ||
| i++; | ||
| continue; | ||
| } | ||
|
|
||
| // Handle escaped characters | ||
| if (ch === '\\' && i + 1 < query.length) { | ||
| i += 2; | ||
| continue; | ||
| } | ||
|
|
||
| // We strip the @ before unescaping so that unescapePath can handle quoted paths correctly on Windows. | ||
| const atPath = '@' + unescapePath(fullMatch.substring(1)); | ||
| parts.push({ type: 'atPath', content: atPath }); | ||
| // Stop at delimiters when not inside quotes | ||
| if (!inQuotes && /[ \t\n\r,;!?()[\]{}]/.test(ch)) { | ||
| break; | ||
| } | ||
|
|
||
| i++; | ||
| } |
There was a problem hiding this comment.
The new iterative scanner introduces two correctness regressions compared to the original regex parser:
- Trailing Periods: In the original regex, a period
.was only matched if it was not followed by the end of the string or whitespace (\.(?!$|[ \t\n\r])). This prevented trailing periods at the end of sentences (e.g.,Please check @foo.txt.) from being incorrectly included in the file path. The new scanner incorrectly includes trailing periods, which breaks path resolution. - Unclosed Quotes: If a quote is unclosed (e.g.,
@foo"bar baz), the original regex would treat the quote as a regular character and stop at the space delimiter. The new scanner setsinQuotes = trueand swallows the rest of the line, including delimiters.
We can resolve both issues elegantly by checking for a closing quote using indexOf (to avoid swallowing delimiters on unclosed quotes) and explicitly breaking on trailing periods.
while (i < query.length) {
const ch = query[i];
// Handle quoted paths
if (ch === '"') {
const closingQuoteIndex = query.indexOf('"', i + 1);
if (closingQuoteIndex !== -1) {
i = closingQuoteIndex + 1;
continue;
} else {
i++;
continue;
}
}
// Handle escaped characters
if (ch === '\\' && i + 1 < query.length) {
i += 2;
continue;
}
// Stop at delimiters
if (/[ \t\n\r,;!?()[\\]{}]/.test(ch)) {
break;
}
// Stop at trailing periods
if (ch === '.' && (i + 1 === query.length || /[ \t\n\r]/.test(query[i + 1]))) {
break;
}
i++;
}
Summary
Fixes #27539
This PR replaces the regex-based
@command parser with an iterative scanner to prevent catastrophic backtracking when processing large pasted inputs.Fixes #27539.
Root Cause
The previous implementation relied on a complex regular expression:
Under large malformed inputs (logs, JSON dumps, session histories, etc.) containing
@-prefixed content, the overlapping alternations inside a greedy repetition could trigger excessive backtracking, eventually causing:Changes
Replaced regex-based parsing with a deterministic character-by-character scanner.
Added explicit handling for:
@Preserved existing output structure and command categorization behavior.
Benefits
Testing