Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
Summary of ChangesHello @nv-alicheng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a fundamental shift in how LiveCodeBench evaluations are performed by encapsulating the entire process within a dedicated, security-hardened Docker container. This architectural change aims to mitigate security risks associated with executing untrusted code and to simplify dependency management. It also refines the project's overall dependency structure and provides a flexible mechanism for interacting with the new containerized evaluation service. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request significantly improves the LiveCodeBench (LCB) evaluation workflow by introducing a containerized FastAPI WebSocket service. This change effectively addresses critical security concerns associated with executing untrusted LLM-generated code by isolating the evaluation environment. The accompanying documentation is comprehensive, detailing security hardening best practices, container setup, and usage. Dependency management has also been modernized by transitioning to "pyproject.toml" and leveraging optional dependencies. The "LiveCodeBenchScorer" now intelligently attempts WebSocket communication first, with a secure, opt-in fallback to local subprocess execution. The "Extractor" classes have been made more robust with the addition of a "default" parameter, enhancing downstream processing reliability. Overall, these changes represent a well-executed architectural improvement that enhances both the security and maintainability of the LCB integration.
src/inference_endpoint/evaluation/livecodebench/run_lcb_tests.py
Dismissed
Show dismissed
Hide dismissed
src/inference_endpoint/evaluation/livecodebench/run_lcb_tests.py
Dismissed
Show dismissed
Hide dismissed
src/inference_endpoint/evaluation/livecodebench/run_lcb_tests.py
Dismissed
Show dismissed
Hide dismissed
src/inference_endpoint/evaluation/livecodebench/run_lcb_tests.py
Dismissed
Show dismissed
Hide dismissed
src/inference_endpoint/evaluation/livecodebench/run_lcb_tests.py
Dismissed
Show dismissed
Hide dismissed
src/inference_endpoint/evaluation/livecodebench/run_lcb_tests.py
Dismissed
Show dismissed
Hide dismissed
src/inference_endpoint/evaluation/livecodebench/run_lcb_tests.py
Dismissed
Show dismissed
Hide dismissed
…f the library, fix type annotations
src/inference_endpoint/dataset_manager/predefined/livecodebench/__init__.py
Outdated
Show resolved
Hide resolved
src/inference_endpoint/evaluation/livecodebench/run_lcb_tests.py
Dismissed
Show dismissed
Hide dismissed
…nerate as subprocess instead of handling with lcb-service
…st suite JSON files
Code reviewFound 3 issues:
endpoints/src/inference_endpoint/evaluation/livecodebench/lcb_serve.py Lines 467 to 477 in 318e948
endpoints/src/inference_endpoint/evaluation/scoring.py Lines 510 to 514 in 318e948
endpoints/src/inference_endpoint/evaluation/extractor.py Lines 1 to 40 in 318e948 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
What does this PR do?
Moves LiveCodeBench eval to a security-enhanced docker container as a web service.
Decouples LCB eval from the official lcb_runner repo to reduce the number of dependencies.
Type of change
Related issues
Testing
Checklist