Conversation
🛑 Action Required: Evaluation ApprovalSteering changes have been detected in this PR. To prevent regressions, a maintainer must approve the evaluation run before this PR can be merged. Maintainers:
Once approved, the evaluation results will be posted here automatically. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue on Windows where the ripgrep binary could fail due to architecture mismatches, leading to EFTYPE errors. It introduces a validation step to verify the binary before use and provides a robust fallback to the system's ripgrep. Additionally, it improves the SDK's observability by implementing a configurable logging interface. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a validation check for the ripgrep binary and a fallback to the system "rg" command, alongside a new AgentLogger interface in the SDK for improved logging flexibility. The review feedback highlights a security concern regarding the unverified fallback to system binaries, performance issues stemming from the use of synchronous process spawning, and documentation that contradicts the new implementation.
|
|
||
| debugLogger.warn( | ||
| '[grep_search] Bundled ripgrep unavailable or invalid. Falling back to system "rg" command.', | ||
| ); |
There was a problem hiding this comment.
The code falls back to using the system's rg command if the bundled ripgrep binary is unavailable or invalid. However, the comments in the same file (lines 91-95) explicitly state that using a system-installed ripgrep is "intentionally not used at this time" and is "blocked on checksum verification of external binaries." By adding this fallback without implementing the necessary verification, the PR introduces a security regression that could allow the execution of an untrusted or malicious binary from the system's PATH, potentially leading to Remote Code Execution (RCE). Remove the fallback to the system rg command until proper checksum verification or other security measures are implemented to ensure the integrity of the binary being executed.
| */ | ||
| function isValidRipgrepBinary(binaryPath: string): boolean { | ||
| try { | ||
| const testRun = spawnSync(binaryPath, ['--version'], { | ||
| encoding: 'utf-8', | ||
| windowsHide: true, | ||
| }); | ||
|
|
||
| if (testRun.error) { | ||
| debugLogger.warn( | ||
| `[grep_search] Bundled binary validation failed: ${testRun.error.message}`, | ||
| ); | ||
| return false; | ||
| } | ||
|
|
||
| return testRun.status === 0; | ||
| } catch (err) { | ||
| debugLogger.warn( | ||
| `[grep_search] Exception during binary validation: ${getErrorMessage(err)}`, | ||
| ); | ||
| return false; | ||
| } |
There was a problem hiding this comment.
The isValidRipgrepBinary function uses spawnSync, which blocks the Node.js event loop. This function is called during every grep_search tool invocation. To align with repository guidelines, we should avoid synchronous operations that block the event loop. Furthermore, instead of performing an explicit pre-check for the command's existence or validity, we should rely on the built-in error handling of spawn when the command is actually executed. If a validation check is necessary, it should be asynchronous and its result should be cached to avoid redundant process spawns.
References
- Rely on the built-in error handling of spawn and spawnSync for 'command not found' errors instead of adding an explicit commandExists pre-check.
- Use asynchronous file system operations (e.g., fs.promises.readFile) instead of synchronous ones (e.g., fs.readFileSync) to avoid blocking the event loop.
| * into its global bin directory. | ||
| * - Even if ripgrep is available on the system PATH, it is intentionally |
There was a problem hiding this comment.
The comment stating that ripgrep on the system PATH is "intentionally not used at this time" is now contradictory to the changes in ensureRgPath (line 149), which introduces a fallback to the system rg command. This fallback also bypasses the security concern mentioned in line 95 regarding checksum verification of external binaries. Please update the documentation to reflect the new fallback behavior and ensure it aligns with the project's security policies.
| * into its global bin directory. | |
| * - Even if ripgrep is available on the system PATH, it is intentionally | |
| * - Even if ripgrep is available on the system PATH, it is primarily | |
| * not used, falling back to it only if the managed binary is invalid. |
e682d9d to
e9f3016
Compare
e9f3016 to
47b89b0
Compare
|
@gemini-code-assist /gemini review @cocosheng-g I have pushed a new commit (47b89b0) to address the automated review feedback: Security/Documentation: I completely removed the system rg fallback to eliminate the RCE risk and maintain the project's strict binary checksum policy. If the bundled binary is invalid, it now throws a clear, safe error. Performance: I refactored isValidRipgrepBinary to use the asynchronous spawn method instead of spawnSync, preventing the Node.js event loop from blocking. Please re-review the current state of the PR. |
There was a problem hiding this comment.
Code Review
This pull request adds a validation step for the ripgrep binary to ensure compatibility and prevent execution errors on Windows. The isValidRipgrepBinary utility is introduced and integrated into the acquisition flow. Reviewers identified several improvement opportunities: implementing a timeout for the spawned process to avoid potential hangs, caching validation results, and using resolveToRealPath for path security. Furthermore, using an explicit state enum for the acquisition process is recommended to prevent redundant downloads and improve state management reliability.
| async function isValidRipgrepBinary(binaryPath: string): Promise<boolean> { | ||
| return new Promise((resolve) => { | ||
| try { | ||
| const testRun = spawn(binaryPath, ['--version'], { | ||
| windowsHide: true, | ||
| }); | ||
|
|
||
| testRun.on('error', (err) => { | ||
| debugLogger.warn( | ||
| `[grep_search] Bundled binary validation failed: ${getErrorMessage(err)}`, | ||
| ); | ||
| resolve(false); | ||
| }); | ||
|
|
||
| testRun.on('close', (code) => { | ||
| resolve(code === 0); | ||
| }); | ||
| } catch (err) { | ||
| debugLogger.warn( | ||
| `[grep_search] Exception during binary validation: ${getErrorMessage(err)}`, | ||
| ); | ||
| resolve(false); | ||
| } | ||
| }); | ||
| } |
There was a problem hiding this comment.
The isValidRipgrepBinary function executes a binary using spawn without a timeout, which could lead to a Denial of Service if the binary hangs. For performance, the validation result should be cached. Additionally, this utility performs file system operations and should validate its path input internally using a robust function like resolveToRealPath to prevent path traversal vulnerabilities, as per repository rules.
async function isValidRipgrepBinary(binaryPath: string, signal?: AbortSignal): Promise<boolean> {
const resolvedPath = resolveToRealPath(binaryPath);
if (!resolvedPath) return false;
return new Promise((resolve) => {
const timeout = setTimeout(() => {
testRun.kill();
resolve(false);
}, 10000);
const testRun = spawn(resolvedPath, ['--version'], {
windowsHide: true,
signal,
});
testRun.on('error', (err) => {
clearTimeout(timeout);
resolve(false);
});
testRun.on('close', (code) => {
clearTimeout(timeout);
resolve(code === 0);
});
});
}References
- Ensure consistent path resolution by using a single, robust function (e.g., resolveToRealPath) for all related path validations, including internal validations in components like WorkspaceContext.
- Utility functions that perform file system operations should validate their path inputs internally to prevent path traversal vulnerabilities, rather than relying solely on callers to perform validation.
| if (!ripgrepAcquisitionPromise) { | ||
| ripgrepAcquisitionPromise = (async () => { | ||
| try { | ||
| await downloadRipGrep(Storage.getGlobalBinDir()); | ||
| return await resolveExistingRgPath(); | ||
| const newPath = await resolveExistingRgPath(); | ||
| if (newPath && (await isValidRipgrepBinary(newPath))) { | ||
| return newPath; | ||
| } | ||
| return null; | ||
| } catch (e) { | ||
| debugLogger.warn( | ||
| `[grep_search] Failed to acquire valid ripgrep: ${getErrorMessage(e)}`, | ||
| ); | ||
| return null; | ||
| } finally { | ||
| ripgrepAcquisitionPromise = null; | ||
| } |
There was a problem hiding this comment.
The current implementation of ensureRipgrepAvailable can lead to an infinite loop of redundant downloads when an incompatible binary is present on disk. Additionally, it manages state by checking for the existence of ripgrepAcquisitionPromise and resetting it in a finally block. Following repository guidelines, you should use an explicit state variable (e.g., an enum with values like IDLE, ACQUIRING, READY, FAILED) to manage the lifecycle of the acquisition. This is more reliable than checking for the existence of a promise object, which may be cleared in finally blocks.
References
- When managing the state of asynchronous operations, rely on an explicit state variable (e.g., a state enum) rather than checking for the existence of a promise object. Promise objects may be cleared in finally blocks upon completion, making them unreliable for state checks after the operation has finished.
47b89b0 to
b2d7f2d
Compare
|
@gemini-code-assist /gemini review I have updated the PR to strictly adhere to the repository guidelines mentioned in the review: Validation Security & Performance: isValidRipgrepBinary now uses resolveToRealPath to prevent path traversal, implements a 10-second timeout to prevent DoS via hanging processes, and caches the validation result to avoid redundant spawns. State Management: Replaced the fragile Promise-existence check with an explicit RipgrepState enum (IDLE, ACQUIRING, READY, FAILED). This ensures we don't fall into an infinite download loop if the binary is continuously flagged as incompatible on Windows. Please review! |
There was a problem hiding this comment.
Code Review
This pull request enhances the ripgrep tool by implementing a binary validation check and a state machine to manage the acquisition process. It introduces the isValidRipgrepBinary function to verify executability and uses a new RipgrepState enum to prevent redundant downloads. Feedback indicates a Temporal Dead Zone risk in the validation logic, potential race conditions due to concurrent process spawning, and a violation of architectural standards regarding the use of module-level global variables for state management.
| // Global cache for binary validation | ||
| let cachedValidationResult: boolean | null = null; | ||
|
|
||
| /** | ||
| * Asynchronously validates if the ripgrep binary is executable on the current OS/architecture. | ||
| * Uses a timeout and robust path resolution to adhere to security and performance guidelines. | ||
| */ | ||
| async function isValidRipgrepBinary( | ||
| binaryPath: string, | ||
| signal?: AbortSignal, | ||
| ): Promise<boolean> { | ||
| if (cachedValidationResult !== null) { | ||
| return cachedValidationResult; | ||
| } | ||
|
|
||
| const resolvedPath = resolveToRealPath(binaryPath); | ||
| if (!resolvedPath) { | ||
| debugLogger.warn( | ||
| `[grep_search] Validation failed: Could not resolve path for ${binaryPath}`, | ||
| ); | ||
| return false; | ||
| } | ||
|
|
||
| return new Promise((resolve) => { | ||
| const timeout = setTimeout(() => { | ||
| if (testRun && !testRun.killed) { | ||
| testRun.kill(); | ||
| } | ||
| debugLogger.warn( | ||
| `[grep_search] Binary validation timed out for ${resolvedPath}`, | ||
| ); | ||
| cachedValidationResult = false; | ||
| resolve(false); | ||
| }, 10000); | ||
|
|
||
| const testRun = spawn(resolvedPath, ['--version'], { | ||
| windowsHide: true, | ||
| signal, | ||
| }); | ||
|
|
||
| testRun.on('error', (err) => { | ||
| clearTimeout(timeout); | ||
| debugLogger.warn( | ||
| `[grep_search] Bundled binary validation error: ${getErrorMessage(err)}`, | ||
| ); | ||
| cachedValidationResult = false; | ||
| resolve(false); | ||
| }); | ||
|
|
||
| testRun.on('close', (code) => { | ||
| clearTimeout(timeout); | ||
| cachedValidationResult = code === 0; | ||
| resolve(cachedValidationResult); | ||
| }); | ||
| }); | ||
| } |
There was a problem hiding this comment.
The isValidRipgrepBinary implementation contains several critical issues:
- Temporal Dead Zone (TDZ) Bug:
testRunis accessed in thesetTimeoutcallback before its declaration. Ifspawnthrows synchronously, the timeout will trigger and cause aReferenceError. - Concurrency: Multiple concurrent calls will trigger multiple
spawnprocesses. Caching the promise itself resolves this. - Path Resolution: Use
resolveToRealPathfor consistent path validation as per repository standards. - Error Handling: Rely on the built-in error handling of
spawnfor 'command not found' errors instead of manual checks, and ensureresolveToRealPatherrors are caught.
let validationPromise: Promise<boolean> | null = null;
async function isValidRipgrepBinary(
binaryPath: string,
signal?: AbortSignal,
): Promise<boolean> {
if (validationPromise) {
return validationPromise;
}
validationPromise = (async () => {
try {
const resolvedPath = resolveToRealPath(binaryPath);
return await new Promise<boolean>((resolve) => {
const testRun = spawn(resolvedPath, ['--version'], {
windowsHide: true,
signal,
});
const timeout = setTimeout(() => {
if (!testRun.killed) {
testRun.kill();
}
debugLogger.warn(
'[grep_search] Binary validation timed out for ' + resolvedPath,
);
resolve(false);
}, 2000);
testRun.on('error', (err) => {
clearTimeout(timeout);
debugLogger.warn(
'[grep_search] Bundled binary validation error: ' + getErrorMessage(err),
);
resolve(false);
});
testRun.on('close', (code) => {
clearTimeout(timeout);
resolve(code === 0);
});
});
} catch (err) {
debugLogger.warn(
'[grep_search] Validation failed: ' + getErrorMessage(err),
);
return false;
}
})();
return validationPromise;
}References
- Ensure consistent path resolution by using a single, robust function (e.g., resolveToRealPath) for all related path validations, including internal validations in components like WorkspaceContext.
- Rely on the built-in error handling of spawn and spawnSync for 'command not found' errors instead of adding an explicit commandExists pre-check.
| FAILED, | ||
| } | ||
|
|
||
| let currentAcquisitionState: RipgrepState = RipgrepState.IDLE; |
There was a problem hiding this comment.
The introduction of currentAcquisitionState as a module-level global variable violates the repository's general rules. State like this should be instance-scoped or session-scoped to prevent race conditions and memory issues in concurrent environments. Consider moving this state into the RipGrepTool class or a dedicated service.
References
- Avoid module-level global variables for state like caches to prevent race conditions and memory issues in concurrent environments. Instead, use session-scoped or instance-scoped state and leverage standard cache implementations like LRUCache.
b2d7f2d to
9eac580
Compare
|
@gemini-code-assist /gemini review The bot provided excellent feedback on concurrency and state management. I have updated the code: TDZ & Concurrency: Implemented the exact isValidRipgrepBinary structure suggested by the bot. It now caches the Promise itself to handle concurrent requests and resolves the Temporal Dead Zone issue with the timeout variable. Global State: Completely removed the RipgrepState enum and the global currentAcquisitionState variable to respect the repository's rule against module-level state. Ready for re-review! |
There was a problem hiding this comment.
Code Review
This pull request introduces a validation step for the ripgrep binary to ensure it is executable on the current system, specifically addressing EFTYPE errors on Windows. It adds a new isValidRipgrepBinary function that performs a test execution with a timeout and caches the result. Review feedback correctly identifies a logic error in the caching mechanism: using a single module-level global promise fails to account for different binary paths and could lead to race conditions. It is recommended to use a keyed cache and move the state to an instance-scoped level.
| if (validationPromise) { | ||
| return validationPromise; | ||
| } |
There was a problem hiding this comment.
The validationPromise cache in isValidRipgrepBinary is flawed and violates repository rules. First, it uses a module-level global variable, which can lead to race conditions and memory issues; state should be instance-scoped or session-scoped, preferably using an LRUCache. Second, checking for the existence of a promise object to determine operation state is unreliable as promises may be cleared in finally blocks; use an explicit state enum instead. Finally, the cache must be keyed by binaryPath to prevent stale results when different binaries are checked.
| if (validationPromise) { | |
| return validationPromise; | |
| } | |
| if (this.validationCache.has(binaryPath)) { | |
| return this.validationCache.get(binaryPath)!; | |
| } |
References
- Avoid module-level global variables for state like caches to prevent race conditions and memory issues in concurrent environments. Instead, use session-scoped or instance-scoped state and leverage standard cache implementations like LRUCache.
- When managing the state of asynchronous operations, rely on an explicit state variable (e.g., a state enum) rather than checking for the existence of a promise object. Promise objects may be cleared in finally blocks upon completion, making them unreliable for state checks after the operation has finished.
|
@cocosheng-g The AI reviewer seems to be stuck in an architectural loop here. It requested that I add caching to isValidRipgrepBinary for performance, but is now rejecting the cache because it's a module-level variable (requesting an instance-scoped LRUCache instead). Because canUseRipgrep and ensureRgPath are exported standalone utility functions, moving this validation state into a class instance would require refactoring the broader architecture of this file and its external consumers. That feels wildly out of scope for a simple EFTYPE fallback fix. The current commit (9eac580) safely resolves the core Windows architecture mismatch, utilizes the asynchronous spawn to protect the event loop, and uses robust path resolution. Could a maintainer please take a look and manually override the bot's block if this implementation is acceptable? |
Description:
Fixes #22784
What happened?
On Windows (
win32), thegrep_searchtool was failing with aspawn EFTYPEerror. This occurs whenchild_process.spawnattempts to execute a downloaded binary that doesn't match the host architecture (like ARM on x64) or is corrupted.What I changed:
Implemented the solution discussed in #22786. Before
grep_searchtrusts the downloadedripgrepbinary, it now runs a quick validation check (--versionviaspawnSync). If Node.js catches anEFTYPE(or any other execution error), it gracefully falls back to usingrgfrom the systemPATH.Testing:
grep_searchcontinues to function properly without throwing the architecture mismatch error.