feat(cli): Add `flash undeploy` command for endpoint management #121

deanq · 2025-11-19T18:50:47Z

Summary

Implements a comprehensive flash undeploy command to manage and delete RunPod serverless endpoints tracked by the Flash CLI.

Changes

Core Implementation

New Command: src/tetra_rp/cli/commands/undeploy.py (~572 lines)
- Multiple interaction modes: list, by name, --all, --interactive, --cleanup-stale
- Rich formatted tables and panels for UI
- Questionary for interactive checkbox selection
- Health status checking for all endpoints
GraphQL Fix: Fixed delete_endpoint success detection in runpod.py
- Changed from checking null value to checking key presence
- Handles GraphQL's {"deleteEndpoint": null} success response
Session Management: Implemented async context manager for RunpodGraphQLClient
ResourceManager Extensions: Added list_all_resources() and find_resources_by_name()
CLI Integration: Registered undeploy command in main.py
Tests: Comprehensive unit tests with async mocking (~355 lines)

Build System

Makefile: Removed .tetra_resources.pkl cleanup from make clean
- Tracking file now persists across builds
- Use flash undeploy commands to manage endpoints

Documentation

README.md: Added flash undeploy section with examples
flash-undeploy.md: Comprehensive 280+ line documentation
- All usage modes with examples
- Status indicator explanations
- Troubleshooting guide
- Integration with @Remote decorator

Usage Examples

List all endpoints with status

flash undeploy list

Shows table with Name, Endpoint ID, Status (🟢 Active/🔴 Inactive/❓ Unknown), Type, Resource ID

Undeploy specific endpoint

flash undeploy my-api

Undeploy all endpoints (double confirmation)

flash undeploy --all

Interactive selection

flash undeploy --interactive

Clean up stale tracking

flash undeploy --cleanup-stale

Removes tracking for endpoints deleted externally (via RunPod UI/API)

Features

Status Checking

🟢 Active: Endpoint exists and health check succeeds
🔴 Inactive: Tracking exists but endpoint deleted externally
❓ Unknown: Exception during health check
Performed via health check API calls (1 per endpoint)

Safety Features

Confirmation prompts before all deletions
Double confirmation for --all (yes/no + type "DELETE ALL")
Keyboard interrupt handling (Ctrl+C to cancel)
"Cannot be undone" warnings
Detailed error reporting per endpoint
Success/failure counts in summary
Continues processing remaining endpoints on failure

Cleanup Stale Tracking

Identifies endpoints deleted via RunPod UI/API
Removes orphaned tracking entries
No API deletion (endpoints already gone)
Prevents stale .tetra_resources.pkl file

Technical Details

GraphQL deleteEndpoint Behavior

Returns {"deleteEndpoint": null} on success (HTTP 200)
Success determined by key presence, not value
Exceptions thrown by _execute_graphql on failure

Async Session Management

RunpodGraphQLClient implements __aenter__/__aexit__
Properly closes aiohttp sessions
Prevents "Unclosed client session" warnings

Tracking File Protection

.tetra_resources.pkl excluded from make clean
Contains deployment state for @Remote decorator
Already in .gitignore

Test Coverage

List command (no endpoints, with endpoints)
Undeploy by name (cancelled, success, nonexistent)
Undeploy --all (wrong confirmation, success)
Delete endpoint function (success, API failure, exception)
Helper functions (status checking, resource type formatting)
Async context manager mocking

Testing Checklist

Unit tests pass (289 tests)
Quality checks pass (format, lint, coverage 37.37%)
GraphQL deletion fix verified
Async session cleanup verified
All interaction modes tested
Error handling works as expected
Confirmation flows prevent accidental deletions
Documentation accuracy verified

Related Issues

Fixes endpoint deletion bug (GraphQL null handling)
Fixes unclosed aiohttp session warnings
Protects tracking file from accidental deletion
Enables cleanup of externally deleted endpoints

Copilot

Pull request overview

This PR implements a comprehensive flash undeploy command to manage and delete RunPod serverless endpoints tracked by the Flash CLI. The command provides multiple interaction modes (list, by name, --all, --interactive, --cleanup-stale) with safety features including confirmation prompts and detailed error reporting.

Key Changes:

New undeploy command with multiple interaction modes and safety confirmations
Extended ResourceManager with list_all_resources() and find_resources_by_name() methods
Comprehensive test coverage for all command modes and helper functions

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/tetra_rp/cli/commands/undeploy.py	Implements the undeploy command with list, interactive, and deletion logic
src/tetra_rp/core/resources/resource_manager.py	Adds methods to list and find resources by name
src/tetra_rp/core/api/runpod.py	Updates delete_endpoint success detection logic
src/tetra_rp/cli/main.py	Registers the undeploy command
tests/unit/resources/test_resource_manager.py	Tests for new ResourceManager methods
tests/unit/cli/test_undeploy.py	Comprehensive tests for undeploy command
src/tetra_rp/cli/docs/flash-undeploy.md	Complete documentation for the undeploy command
src/tetra_rp/cli/docs/README.md	Updates main CLI docs with undeploy command
tests/unit/cli/init.py	Initializes CLI test module
Makefile	Removes .pkl file deletion from clean target

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/tetra_rp/core/resources/resource_manager.py

src/tetra_rp/cli/commands/undeploy.py

tests/unit/cli/test_undeploy.py

Implements comprehensive error handling for missing API keys with actionable guidance for users. When RUNPOD_API_KEY is missing, users now receive helpful error messages that include: - Documentation URL for obtaining API keys - Three setup methods (env var, .env file, shell profile) - Context about which operation requires the key - Troubleshooting guidance for .env file location Implementation: - Created RunpodAPIKeyError with helpful default message - Added validate_api_key() helper functions - Updated API clients to use custom exception - Added resource deployment error context - Enhanced flash init with API key documentation link - Added 15 comprehensive tests (all passing) Users previously saw generic "Runpod API key is required" errors. Now they get clear, actionable steps to resolve the issue.

Simplify validation logic and extract duplicated error handling: - Simplify API key validation condition in validation.py Changed `api_key.strip() == ""` to `not api_key.strip()` for clarity - Extract duplicated error handling in resource_manager.py Created `_deploy_with_error_context` helper method to handle RunpodAPIKeyError with resource context, eliminating code duplication All tests pass (231 unit tests).

Implements comprehensive undeploy command with multiple interaction modes: - List all tracked endpoints with status indicators - Delete specific endpoint by name - Delete all endpoints with double confirmation - Interactive checkbox selection for batch deletion Changes: - Add undeploy.py command with Rich/questionary UI - Extend ResourceManager with list_all_resources() and find_resources_by_name() - Integrate undeploy command into CLI - Add comprehensive unit tests with mocking - Fix duplicate import in resource_manager.py Safety features include confirmation prompts, keyboard interrupt handling, and detailed error reporting per endpoint.

Fixes two critical bugs preventing flash undeploy from working correctly: 1. GraphQL deleteEndpoint success detection - RunPod API returns {"deleteEndpoint": null} on success - Changed from checking null value to checking key presence - Added detailed comments explaining GraphQL response pattern 2. Unclosed aiohttp session - Wrapped RunpodGraphQLClient in async context manager - Ensures proper session cleanup after deletion - Eliminates "Unclosed client session" warnings 3. Updated test mocks to support async context manager usage Tested with real endpoint deletion - successfully removes endpoint from both RunPod and local .tetra_resources.pkl tracking file.

Removes automatic deletion of .pkl files from 'make clean' target to prevent accidental loss of endpoint tracking state. Rationale: - .pkl files like .tetra_resources.pkl contain critical state for tracking deployed RunPod endpoints - Deleting this file orphans deployed endpoints (still running, still costing money) without ability to manage them via CLI - Users should explicitly manage endpoints via 'flash undeploy' command - .pkl files are state/cache files, not build artifacts Impact: - make clean still removes build artifacts (dist, build, egg-info, pycache) - .tetra_resources.pkl persists across clean operations - Users must use 'flash undeploy' for proper endpoint cleanup - Prevents accidental resource leaks and unexpected cloud costs

Adds ability to clean up stale endpoint entries from .tetra_resources.pkl when endpoints have been deleted externally (via RunPod UI/API). Changes: - Added --cleanup-stale flag to flash undeploy command - New _cleanup_stale_endpoints() function that: - Checks all tracked endpoints for inactive status - Lists inactive endpoints for user review - Prompts for confirmation before removal - Removes only from tracking (endpoints already deleted remotely) - Added imports: Dict, DeployableResource, Confirm Use case: When users delete endpoints via RunPod UI/API instead of flash CLI, the tracking file (.tetra_resources.pkl) becomes stale. This flag identifies and removes those orphaned tracking entries. Usage: flash undeploy --cleanup-stale Note: The "Status" column in `flash undeploy list` makes a health check API call for each endpoint to determine Active/Inactive state. While this adds latency (6 endpoints = 6 API calls), it's valuable for identifying stale entries that need cleanup.

Added documentation for the new flash undeploy command covering all usage modes and features. Changes: - Updated src/tetra_rp/cli/docs/README.md with flash undeploy section - Command synopsis and options - Usage examples for all modes - Status indicator explanations - Created src/tetra_rp/cli/docs/flash-undeploy.md - Detailed documentation (240+ lines) - All usage modes: list, by name, --all, --interactive, --cleanup-stale - Status check explanation and value proposition - Safety features and confirmations - Integration with @Remote decorator - Troubleshooting guide - Examples and workflows Documentation covers: - List endpoints with health status - Undeploy by name with confirmation - Undeploy all with double confirmation - Interactive checkbox selection - Cleanup stale tracking (--cleanup-stale flag) - Status indicators (Active/Inactive/Unknown) - Tracking file management - Error handling and troubleshooting

- Remove debugging print statements from test_undeploy.py - Use Tuple from typing module instead of lowercase tuple for Python 3.9 consistency - Update type hints in undeploy.py and resource_manager.py - Add Tuple to imports in both files All 289 tests pass, quality checks pass.

jhcipar · 2025-11-25T01:07:04Z

src/tetra_rp/cli/commands/undeploy.py

+        Dict with success status and message
+    """
+    try:
+        async with RunpodGraphQLClient() as client:


nit, but would it be cleaner to have delete endpoint logic self container somewhere, either by the resource manager or by another layer? it feels a little weird to have this live inside of the cli, but it's pretty lightweight and that could well be over-abstracting before it's needed. ie could _delete_endpoint be a generally useful util for flash

Good call. I'll do a follow-up PR for this refactor.

deanq changed the title ~~feat(cli): Add flash undeploy command for endpoint management~~ feat(cli): Add flash undeploy command for endpoint management Nov 19, 2025

deanq requested a review from Copilot November 22, 2025 10:36

Copilot AI reviewed Nov 22, 2025

View reviewed changes

src/tetra_rp/core/resources/resource_manager.py Outdated Show resolved Hide resolved

src/tetra_rp/cli/commands/undeploy.py Outdated Show resolved Hide resolved

tests/unit/cli/test_undeploy.py Outdated Show resolved Hide resolved

deanq added 8 commits November 22, 2025 02:38

chore: quality check

feac7a8

deanq force-pushed the deanq/ae-1482-flash-cli-undeploy branch from 25265cc to 00ad37d Compare November 22, 2025 10:38

deanq requested a review from jhcipar November 22, 2025 11:14

deanq marked this pull request as ready for review November 22, 2025 11:14

deanq requested a review from justinwlin November 22, 2025 11:15

jhcipar approved these changes Nov 25, 2025

View reviewed changes

deanq merged commit cd32ffc into main Nov 25, 2025
7 checks passed

deanq deleted the deanq/ae-1482-flash-cli-undeploy branch November 25, 2025 08:02

runpod-release-please-bot bot mentioned this pull request Nov 25, 2025

chore: release 0.18.0 #123

Merged

deanq mentioned this pull request Nov 25, 2025

refactor: move endpoint deletion logic to proper abstraction layers #124

Merged

This was referenced Feb 6, 2026

chore: release 2.0.0 #184

Closed

chore: release 2.0.0 #186

Closed

chore: release 1.1.0 #188

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli): Add `flash undeploy` command for endpoint management #121

feat(cli): Add `flash undeploy` command for endpoint management #121

Uh oh!

deanq commented Nov 19, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jhcipar Nov 25, 2025

Uh oh!

deanq Nov 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(cli): Add flash undeploy command for endpoint management #121

feat(cli): Add flash undeploy command for endpoint management #121

Uh oh!

Conversation

deanq commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Core Implementation

Build System

Documentation

Usage Examples

List all endpoints with status

Undeploy specific endpoint

Undeploy all endpoints (double confirmation)

Interactive selection

Clean up stale tracking

Features

Status Checking

Safety Features

Cleanup Stale Tracking

Technical Details

GraphQL deleteEndpoint Behavior

Async Session Management

Tracking File Protection

Test Coverage

Testing Checklist

Related Issues

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jhcipar Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

deanq Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(cli): Add `flash undeploy` command for endpoint management #121

feat(cli): Add `flash undeploy` command for endpoint management #121

deanq commented Nov 19, 2025 •

edited

Loading