-
Notifications
You must be signed in to change notification settings - Fork 54
feat: Add secure Python code execution with llm-sandbox support #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add secure Python code execution with llm-sandbox support #217
Conversation
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
jakelorocco
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I left a few minor comments but the core of it looks nice. Please feel free to push back on anything you disagree with / let me know if you already talked to Nathan about any choices I've commented on.
Also, can you please rebase/merge main and resolve the uv.lock conflict as well?
- Add PythonExecutesWithoutError requirement with three execution backends: - SafeBackend: Validates syntax and imports without execution (default) - UnsafeBackend: Direct subprocess execution with warnings - LLMSandboxBackend: Docker-based execution using llm-sandbox - Implement allow_unsafe_execution flag with explicit opt-in and warnings - Add import restriction support for defense-in-depth security - Support use_sandbox flag for secure Docker-based execution - Include comprehensive test suite with 21 test cases - Maintain backward compatibility while defaulting to safe mode - Add llm-sandbox[docker] dependency for optional sandbox functionality
Improves code formatting and readability in python.py by splitting long lines, adding whitespace, and updating argument formatting. Also updates test import order in test_reqlib_python.py for consistency.
c38e456 to
5aad5cb
Compare
As suggested in PR review, the class name now includes 'Req' to better align with naming conventions for Requirement classes.
All subclasses had identical __init__ methods, so this reduces code duplication by implementing it once in the base class.
- Renamed ExecutionBackend to ExecutionEnvironment and all subclasses - Updated documentation to clarify that allowed_imports=None means any import is allowed - This avoids confusion with the main Backend classes used for LLM generation
Clarified why blocks with less than 2 non-trivial lines are penalized and added TODO for future improvements using comment-to-code ratio
- Enhanced docstring to explain code extraction and 'best block' selection - Preserve underlying extraction failure reasons for better debugging - Error messages now include specific details about why extraction failed
Removed the unused context_text parameter from _score_code_block function and updated all call sites to match the simplified signature.
- Added _get_unauthorized_imports function to return specific unauthorized import names - Enhanced all import restriction error messages to show which imports are unauthorized - Improved debugging experience by providing actionable error details - Maintained backward compatibility with existing tests
Added comment explaining that sys.executable uses the same Python interpreter and environment as the current process, ensuring access to all installed packages and dependencies.
- Added stdout output to success messages for both subprocess and sandbox execution - Provides valuable debugging information and execution feedback - Only includes output when present, keeps messages clean when no output - Helps users understand what their code actually did during execution
- Added detailed logging for unknown sandbox errors - Include exit code and available attributes for better debugging - Provide more informative error messages when stderr is not available - Helps diagnose sandbox execution issues more effectively
Documented all scoring criteria including length bonus, function/class detection, control flow analysis, and non-trivial content filtering to help developers understand code block prioritization logic.
- Replaced hard-coded skip with runtime Docker availability detection - Added llm_sandbox import check and Docker connectivity test - Sandbox tests now run when Docker is available, skip gracefully when not - All 21 tests now pass when Docker is running
- Fixed ruff formatting across all files - Added explicit type annotation for unauthorized list in _get_unauthorized_imports - Resolved MyPy type annotation error in python.py
|
I pushed fixes for pre-commit errors relating to my code. There are still more errors but they seem to be unrelated to my changes. I would be happy to fix those too, but everything on my end should be good. |
@0xCUB3, yeah it looks like we likely haven't touched the folder your editing since we apparently forced those mypy changes in. Could you please just add |
I added the ignores. The checks should hopefully pass now. |
|
Sorry about that... ran the linter again to fix the one remaining error. |
|
Looks like a test not related to your stuff failed. Let me take a look to make confirm, but we should be good to merge your PR. |
|
Okay, confirmed that it's a separate issue with an update to the litellm package we use. I will approve and merge this. |
Flexible Python code execution validation system with three execution modes:
Elements
API
Dependencies
llm-sandbox[docker]Testing
No breaking changes. Existing code using
PythonExecutesWithoutError()continues to work with safe validation mode.TODO: Documentation, perhaps more rigorous testing on larger code chunks