feat: Workspace environment persisted in the network volume #10
Merged
Conversation
Coverage fails when at below 50%. We can increase this later.
Establish testing infrastructure and protocol validation tests. Create shared fixtures and validates the FunctionRequest/FunctionResponse data models that will be extended for volume workspace functionality.
Tests volume detection, virtual environment creation, file-based locking for concurrency, and timeout handling mechanisms.
Validates that functions execute in volume workspace, can access persistent packages, and fallback gracefully when volume is unavailable.
Tests complete request workflows, concurrent access safety, mixed execution scenarios, and realistic error handling patterns.
Adds volume detection logic, workspace initialization with file-based locking, virtual environment creation, and timeout handling to make some tests pass.
Implement smart dependency installation that only installs missing packages. Optimizes performance by leveraging persistent volume storage and avoiding redundant package installations.
…uration Enable functions to execute in volume workspace with access to persistent packages. Configures Python path, environment variables, and UV cache to utilize volume storage effectively.
Update existing tests to work with new volume workspace functionality. Ensures backward compatibility and validates that all existing functionality continues to work with the new volume-aware implementation.
- Move all Python modules to src/ for better organization - Update Docker files to copy from src/ directory - Update pyproject.toml with src/ in pythonpath - Update Makefile to copy remote_execution.py to src/ - All tests pass with new structure
- Add _validate_virtual_environment() method to WorkspaceManager with symlink chain resolution using os.path.realpath() - Add _remove_broken_virtual_environment() cleanup method - Enhance initialize_workspace() with validation checks and automatic repair - Add validation calls in setup_python_path() and dependency installer - Update Docker files to work with src/ directory structure - Update tests to mock new validation methods - Fix pyproject.toml pythonpath configuration for tests This resolves broken virtual environment symlinks when different endpoints create venvs with different Python interpreter paths on shared volumes.
- Add RUNPOD_ENDPOINT_ID environment variable support for endpoint isolation
- Workspace paths now: /runpod-volume/runtimes/{endpoint_id}
- Shared UV cache at volume root: /runpod-volume/.uv-cache
- Add RUNTIMES_DIR_NAME constant for endpoint workspace organization
- Update WorkspaceManager to create endpoint-specific workspace paths
- Add comprehensive endpoint isolation tests
- Update integration tests for new workspace structure
- Resolve merge conflicts from virtual environment validation features
- Add HF_CACHE_DIR_NAME constant for .hf-cache directory - Implement _configure_huggingface_cache() method in WorkspaceManager - Set HF environment variables (HF_HOME, TRANSFORMERS_CACHE, etc.) to use volume paths - Update unit and integration tests to mock os.makedirs calls - Fix "No space left on device" errors when downloading HF models
- Add make test-handler command that tests all test_*.json files locally - Update CI to use make test-handler for consistency between local and CI testing - Ensure local development environment matches CI testing exactly - Remove code duplication between Makefile and CI configuration - Support cross-platform testing (handles timeout command availability) - Update CLAUDE.md documentation with new testing commands
…ization - Add configurable timeout constants (WORKSPACE_INIT_TIMEOUT, WORKSPACE_LOCK_POLL_INTERVAL) - Implement atomic lock file operations with proper file descriptor management - Enhance lock file cleanup with comprehensive error handling - Add workspace directory validation before lock acquisition - Fix race condition in workspace functionality checks by making them atomic - Add comprehensive timeout and edge case tests for concurrency scenarios - Improve error messages and fallback behavior for various failure modes - Maintain backward compatibility while significantly improving reliability
… initialization" This reverts commit a411bfe.
- Add BaseExecutor abstract base class with common functionality - Update FunctionExecutor and ClassExecutor to inherit from BaseExecutor - Standardize execution environment setup via _setup_execution_environment - Update ClassExecutor constructor to accept workspace_manager parameter - Fix ClassExecutor tests to mock workspace_manager dependency
- Add logging support to DependencyInstaller and WorkspaceManager - Replace print calls with appropriate log levels (info, warning, error) - Improve debugging and monitoring capabilities
The fix will resolve the vLLM subprocess errors encountered while maintaining full compatibility with existing functionality. When deployed to RunPod with volumes, libraries like vLLM that hardcode /app/.venv paths will seamlessly use the volume's virtual environment.
- Add symlink from /app/.venv to volume venv to handle hardcoded paths - Configure PYTHONPATH environment variable for subprocess compatibility - Ensure libraries like vLLM can spawn subprocesses that find installed packages - Add comprehensive test coverage for symlink functionality - Maintain backward compatibility when no volume is present
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements persistent workspace management with RunPod network volumes and introduces a complete architectural refactor. The changes include endpoint-specific workspaces, virtual environments, shared package caching, concurrency safety, and structured logging throughout the codebase.
Key Changes Summary