Skip to content

Conversation

@0xCUB3
Copy link
Contributor

@0xCUB3 0xCUB3 commented Oct 25, 2025

Flexible Python code execution validation system with three execution modes:

  • Safe mode (default): Syntax validation only
  • Unsafe execution: Direct subprocess execution with warnings
  • Sandbox execution: Secure Docker-based execution via llm-sandbox

Elements

  • Abstract backend architecture with pluggable execution strategies
  • Import restrictions as an add'l security layer (AST-based analysis)
  • Configurable timeouts for both unsafe and sandbox execution
  • Safety warnings when using unsafe execution mode
  • Fallbacks when llm-sandbox is not available

API

# Safe mode (default) - validation only
req = PythonExecutesWithoutError()

# Unsafe execution with warning
req = PythonExecutesWithoutError(allow_unsafe_execution=True, timeout=10)

# Secure sandbox execution
req = PythonExecutesWithoutError(use_sandbox=True, timeout=10)

# With import restrictions
req = PythonExecutesWithoutError(
    use_sandbox=True,
    allowed_imports=["os", "sys", "json"],
    timeout=10
)

Dependencies

  • Adds llm-sandbox[docker]
  • Requires Docker for sandbox functionality

Testing

# Run core tests (no Docker required)
python -m pytest test/stdlib_basics/test_reqlib_python.py -k "not sandbox"

# Run all tests including sandbox (requires Docker)
python -m pytest test/stdlib_basics/test_reqlib_python.py

No breaking changes. Existing code using PythonExecutesWithoutError() continues to work with safe validation mode.

TODO: Documentation, perhaps more rigorous testing on larger code chunks

@mergify
Copy link

mergify bot commented Oct 25, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

Copy link
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I left a few minor comments but the core of it looks nice. Please feel free to push back on anything you disagree with / let me know if you already talked to Nathan about any choices I've commented on.

Also, can you please rebase/merge main and resolve the uv.lock conflict as well?

- Add PythonExecutesWithoutError requirement with three execution backends:
  - SafeBackend: Validates syntax and imports without execution (default)
  - UnsafeBackend: Direct subprocess execution with warnings
  - LLMSandboxBackend: Docker-based execution using llm-sandbox
- Implement allow_unsafe_execution flag with explicit opt-in and warnings
- Add import restriction support for defense-in-depth security
- Support use_sandbox flag for secure Docker-based execution
- Include comprehensive test suite with 21 test cases
- Maintain backward compatibility while defaulting to safe mode
- Add llm-sandbox[docker] dependency for optional sandbox functionality
Improves code formatting and readability in python.py by splitting long lines, adding whitespace, and updating argument formatting. Also updates test import order in test_reqlib_python.py for consistency.
@0xCUB3 0xCUB3 force-pushed the feat/llm-sandbox-execution branch from c38e456 to 5aad5cb Compare November 11, 2025 15:19
0xCUB3 and others added 13 commits November 11, 2025 10:20
As suggested in PR review, the class name now includes 'Req' to better align with naming conventions for Requirement classes.
All subclasses had identical __init__ methods, so this reduces code duplication by implementing it once in the base class.
- Renamed ExecutionBackend to ExecutionEnvironment and all subclasses
- Updated documentation to clarify that allowed_imports=None means any import is allowed
- This avoids confusion with the main Backend classes used for LLM generation
Clarified why blocks with less than 2 non-trivial lines are penalized and added TODO for future improvements using comment-to-code ratio
- Enhanced docstring to explain code extraction and 'best block' selection
- Preserve underlying extraction failure reasons for better debugging
- Error messages now include specific details about why extraction failed
Removed the unused context_text parameter from _score_code_block function and updated all call sites to match the simplified signature.
- Added _get_unauthorized_imports function to return specific unauthorized import names
- Enhanced all import restriction error messages to show which imports are unauthorized
- Improved debugging experience by providing actionable error details
- Maintained backward compatibility with existing tests
Added comment explaining that sys.executable uses the same Python interpreter and environment as the current process, ensuring access to all installed packages and dependencies.
- Added stdout output to success messages for both subprocess and sandbox execution
- Provides valuable debugging information and execution feedback
- Only includes output when present, keeps messages clean when no output
- Helps users understand what their code actually did during execution
- Added detailed logging for unknown sandbox errors
- Include exit code and available attributes for better debugging
- Provide more informative error messages when stderr is not available
- Helps diagnose sandbox execution issues more effectively
Documented all scoring criteria including length bonus, function/class detection, control flow analysis, and non-trivial content filtering to help developers understand code block prioritization logic.
- Replaced hard-coded skip with runtime Docker availability detection
- Added llm_sandbox import check and Docker connectivity test
- Sandbox tests now run when Docker is available, skip gracefully when not
- All 21 tests now pass when Docker is running
@0xCUB3 0xCUB3 requested a review from jakelorocco November 11, 2025 16:12
- Fixed ruff formatting across all files
- Added explicit type annotation for unauthorized list in _get_unauthorized_imports
- Resolved MyPy type annotation error in python.py
@0xCUB3
Copy link
Contributor Author

0xCUB3 commented Nov 11, 2025

I pushed fixes for pre-commit errors relating to my code. There are still more errors but they seem to be unrelated to my changes. I would be happy to fix those too, but everything on my end should be good.

@jakelorocco
Copy link
Contributor

I pushed fixes for pre-commit errors relating to my code. There are still more errors but they seem to be unrelated to my changes. I would be happy to fix those too, but everything on my end should be good.

@0xCUB3, yeah it looks like we likely haven't touched the folder your editing since we apparently forced those mypy changes in. Could you please just add # type: ignore to those lines? That way we can make sure your tests run on the github runner and merge your changes. Everything looks good to me!

@0xCUB3
Copy link
Contributor Author

0xCUB3 commented Nov 11, 2025

I pushed fixes for pre-commit errors relating to my code. There are still more errors but they seem to be unrelated to my changes. I would be happy to fix those too, but everything on my end should be good.

@0xCUB3, yeah it looks like we likely haven't touched the folder your editing since we apparently forced those mypy changes in. Could you please just add # type: ignore to those lines? That way we can make sure your tests run on the github runner and merge your changes. Everything looks good to me!

I added the ignores. The checks should hopefully pass now.

@0xCUB3
Copy link
Contributor Author

0xCUB3 commented Nov 11, 2025

Sorry about that... ran the linter again to fix the one remaining error.

@jakelorocco
Copy link
Contributor

Looks like a test not related to your stuff failed. Let me take a look to make confirm, but we should be good to merge your PR.

@jakelorocco
Copy link
Contributor

Okay, confirmed that it's a separate issue with an update to the litellm package we use. I will approve and merge this.

@jakelorocco jakelorocco merged commit 9d12458 into generative-computing:main Nov 11, 2025
1 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants