-
Notifications
You must be signed in to change notification settings - Fork 6
feat(cli): Add flash undeploy command for endpoint management
#121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
flash undeploy command for endpoint management
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a comprehensive flash undeploy command to manage and delete RunPod serverless endpoints tracked by the Flash CLI. The command provides multiple interaction modes (list, by name, --all, --interactive, --cleanup-stale) with safety features including confirmation prompts and detailed error reporting.
Key Changes:
- New undeploy command with multiple interaction modes and safety confirmations
- Extended ResourceManager with
list_all_resources()andfind_resources_by_name()methods - Comprehensive test coverage for all command modes and helper functions
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/tetra_rp/cli/commands/undeploy.py | Implements the undeploy command with list, interactive, and deletion logic |
| src/tetra_rp/core/resources/resource_manager.py | Adds methods to list and find resources by name |
| src/tetra_rp/core/api/runpod.py | Updates delete_endpoint success detection logic |
| src/tetra_rp/cli/main.py | Registers the undeploy command |
| tests/unit/resources/test_resource_manager.py | Tests for new ResourceManager methods |
| tests/unit/cli/test_undeploy.py | Comprehensive tests for undeploy command |
| src/tetra_rp/cli/docs/flash-undeploy.md | Complete documentation for the undeploy command |
| src/tetra_rp/cli/docs/README.md | Updates main CLI docs with undeploy command |
| tests/unit/cli/init.py | Initializes CLI test module |
| Makefile | Removes .pkl file deletion from clean target |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Implements comprehensive error handling for missing API keys with actionable guidance for users. When RUNPOD_API_KEY is missing, users now receive helpful error messages that include: - Documentation URL for obtaining API keys - Three setup methods (env var, .env file, shell profile) - Context about which operation requires the key - Troubleshooting guidance for .env file location Implementation: - Created RunpodAPIKeyError with helpful default message - Added validate_api_key() helper functions - Updated API clients to use custom exception - Added resource deployment error context - Enhanced flash init with API key documentation link - Added 15 comprehensive tests (all passing) Users previously saw generic "Runpod API key is required" errors. Now they get clear, actionable steps to resolve the issue.
Simplify validation logic and extract duplicated error handling: - Simplify API key validation condition in validation.py Changed `api_key.strip() == ""` to `not api_key.strip()` for clarity - Extract duplicated error handling in resource_manager.py Created `_deploy_with_error_context` helper method to handle RunpodAPIKeyError with resource context, eliminating code duplication All tests pass (231 unit tests).
Implements comprehensive undeploy command with multiple interaction modes: - List all tracked endpoints with status indicators - Delete specific endpoint by name - Delete all endpoints with double confirmation - Interactive checkbox selection for batch deletion Changes: - Add undeploy.py command with Rich/questionary UI - Extend ResourceManager with list_all_resources() and find_resources_by_name() - Integrate undeploy command into CLI - Add comprehensive unit tests with mocking - Fix duplicate import in resource_manager.py Safety features include confirmation prompts, keyboard interrupt handling, and detailed error reporting per endpoint.
Fixes two critical bugs preventing flash undeploy from working correctly:
1. GraphQL deleteEndpoint success detection
- RunPod API returns {"deleteEndpoint": null} on success
- Changed from checking null value to checking key presence
- Added detailed comments explaining GraphQL response pattern
2. Unclosed aiohttp session
- Wrapped RunpodGraphQLClient in async context manager
- Ensures proper session cleanup after deletion
- Eliminates "Unclosed client session" warnings
3. Updated test mocks to support async context manager usage
Tested with real endpoint deletion - successfully removes endpoint
from both RunPod and local .tetra_resources.pkl tracking file.
Removes automatic deletion of .pkl files from 'make clean' target to prevent accidental loss of endpoint tracking state. Rationale: - .pkl files like .tetra_resources.pkl contain critical state for tracking deployed RunPod endpoints - Deleting this file orphans deployed endpoints (still running, still costing money) without ability to manage them via CLI - Users should explicitly manage endpoints via 'flash undeploy' command - .pkl files are state/cache files, not build artifacts Impact: - make clean still removes build artifacts (dist, build, egg-info, pycache) - .tetra_resources.pkl persists across clean operations - Users must use 'flash undeploy' for proper endpoint cleanup - Prevents accidental resource leaks and unexpected cloud costs
Adds ability to clean up stale endpoint entries from .tetra_resources.pkl when endpoints have been deleted externally (via RunPod UI/API). Changes: - Added --cleanup-stale flag to flash undeploy command - New _cleanup_stale_endpoints() function that: - Checks all tracked endpoints for inactive status - Lists inactive endpoints for user review - Prompts for confirmation before removal - Removes only from tracking (endpoints already deleted remotely) - Added imports: Dict, DeployableResource, Confirm Use case: When users delete endpoints via RunPod UI/API instead of flash CLI, the tracking file (.tetra_resources.pkl) becomes stale. This flag identifies and removes those orphaned tracking entries. Usage: flash undeploy --cleanup-stale Note: The "Status" column in `flash undeploy list` makes a health check API call for each endpoint to determine Active/Inactive state. While this adds latency (6 endpoints = 6 API calls), it's valuable for identifying stale entries that need cleanup.
Added documentation for the new flash undeploy command covering all usage modes and features. Changes: - Updated src/tetra_rp/cli/docs/README.md with flash undeploy section - Command synopsis and options - Usage examples for all modes - Status indicator explanations - Created src/tetra_rp/cli/docs/flash-undeploy.md - Detailed documentation (240+ lines) - All usage modes: list, by name, --all, --interactive, --cleanup-stale - Status check explanation and value proposition - Safety features and confirmations - Integration with @Remote decorator - Troubleshooting guide - Examples and workflows Documentation covers: - List endpoints with health status - Undeploy by name with confirmation - Undeploy all with double confirmation - Interactive checkbox selection - Cleanup stale tracking (--cleanup-stale flag) - Status indicators (Active/Inactive/Unknown) - Tracking file management - Error handling and troubleshooting
25265cc to
00ad37d
Compare
- Remove debugging print statements from test_undeploy.py - Use Tuple from typing module instead of lowercase tuple for Python 3.9 consistency - Update type hints in undeploy.py and resource_manager.py - Add Tuple to imports in both files All 289 tests pass, quality checks pass.
| Dict with success status and message | ||
| """ | ||
| try: | ||
| async with RunpodGraphQLClient() as client: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, but would it be cleaner to have delete endpoint logic self container somewhere, either by the resource manager or by another layer? it feels a little weird to have this live inside of the cli, but it's pretty lightweight and that could well be over-abstracting before it's needed. ie could _delete_endpoint be a generally useful util for flash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. I'll do a follow-up PR for this refactor.
Summary
Implements a comprehensive
flash undeploycommand to manage and delete RunPod serverless endpoints tracked by the Flash CLI.Changes
Core Implementation
src/tetra_rp/cli/commands/undeploy.py(~572 lines)delete_endpointsuccess detection inrunpod.py{"deleteEndpoint": null}success responseRunpodGraphQLClientlist_all_resources()andfind_resources_by_name()Build System
.tetra_resources.pklcleanup frommake cleanflash undeploycommands to manage endpointsDocumentation
Usage Examples
List all endpoints with status
Shows table with Name, Endpoint ID, Status (🟢 Active/🔴 Inactive/❓ Unknown), Type, Resource ID
Undeploy specific endpoint
Undeploy all endpoints (double confirmation)
Interactive selection
Clean up stale tracking
Removes tracking for endpoints deleted externally (via RunPod UI/API)
Features
Status Checking
Safety Features
--all(yes/no + type "DELETE ALL")Cleanup Stale Tracking
.tetra_resources.pklfileTechnical Details
GraphQL deleteEndpoint Behavior
{"deleteEndpoint": null}on success (HTTP 200)_execute_graphqlon failureAsync Session Management
RunpodGraphQLClientimplements__aenter__/__aexit__Tracking File Protection
.tetra_resources.pklexcluded frommake clean.gitignoreTest Coverage
Testing Checklist
Related Issues