Skip to content

feat: Add database restart command #421

@joshrotenberg

Description

@joshrotenberg

Overview

Add a command to restart Redis processes for a database, similar to rladmin's restart db <id> command.

Use Case

Database restarts are needed for:

  • Applying certain configuration changes
  • Recovering from hung processes
  • Clearing memory after heavy operations
  • Testing failover behavior
  • Troubleshooting connectivity issues

Current Workaround

No direct way via redisctl - must use rladmin or manual failover:

# Option 1: SSH and use rladmin
ssh admin@cluster-node
rladmin restart db db:1

# Option 2: Trigger failover (not a true restart)
redisctl enterprise database failover <id>

Desired Behavior

# Restart database (with automatic role preservation)
redisctl enterprise database restart 1

# Output
Restarting database 'prod-db' (db:1)...
  ✓ Restarting slave shards (redis:2, redis:4, redis:6)
  ✓ Restarting master shards (redis:1, redis:3, redis:5)
  ✓ Verifying connectivity
Database restart complete (45 seconds)

# Restart without preserving roles (faster, but roles may swap)
redisctl enterprise database restart 1 --no-preserve-roles

# Force restart even with warnings
redisctl enterprise database restart 1 --force

# Discard data (for non-persisted databases)
redisctl enterprise database restart 1 --discard-data

# Wait for restart to complete
redisctl enterprise database restart 1 --wait

# JSON output
redisctl enterprise database restart 1 -o json
{
  "database_id": 1,
  "status": "success",
  "shards_restarted": 6,
  "duration_seconds": 45,
  "roles_preserved": true
}

Implementation Requirements

1. API Verification

First, verify if the REST API exposes a restart endpoint:

# Test against local Docker cluster
curl -k -u "admin@redis.local:Redis123!" \
  -X POST \
  https://localhost:9443/v1/bdbs/1/restart

If endpoint exists: Straightforward implementation
If endpoint doesn't exist: Need to investigate alternatives (failover-based approach?)

2. Command Structure

#[derive(Debug, Clone, Parser)]
pub struct RestartArgs {
    /// Database ID
    pub database_id: i32,
    
    /// Don't preserve master/slave roles (faster)
    #[arg(long)]
    pub no_preserve_roles: bool,
    
    /// Discard data (for non-persisted DBs)
    #[arg(long)]
    pub discard_data: bool,
    
    /// Force restart even with warnings
    #[arg(long)]
    pub force: bool,
    
    /// Wait for restart to complete
    #[arg(long)]
    pub wait: bool,
    
    /// Timeout for wait (seconds)
    #[arg(long, default_value = "300")]
    pub timeout: u64,
    
    /// Output format
    #[arg(short, long, value_enum, default_value = "table")]
    pub output_format: OutputFormat,
}

3. Safety Checks

Before restarting, validate:

// Pre-restart validation
async fn validate_restart(client: &Client, db_id: i32, force: bool) -> CliResult<()> {
    let db = client.databases().get(db_id).await?;
    
    // Check if database is active
    if db.status != "active" && !force {
        bail!("Database is not active (status: {}). Use --force to restart anyway.", db.status);
    }
    
    // Warn about data loss if not persisted
    if !db.persistence_enabled && !force {
        eprintln!("⚠️  Warning: Database has no persistence. Data may be lost.");
        eprintln!("Use --discard-data to confirm, or --force to proceed anyway.");
        bail!("Restart cancelled for safety");
    }
    
    // Check replication
    if db.replication_enabled {
        eprintln!("✓ Replication enabled - safe restart");
    } else if !force {
        eprintln!("⚠️  Warning: Database has no replication. Restart will cause downtime.");
        eprintln!("Use --force to proceed.");
        bail!("Restart cancelled for safety");
    }
    
    Ok(())
}

4. Restart Workflow

pub async fn restart_database(
    client: &Client,
    args: RestartArgs,
) -> CliResult<()> {
    // 1. Validate
    validate_restart(client, args.database_id, args.force).await?;
    
    // 2. Get database info
    let db = client.databases().get(args.database_id).await?;
    let shards = client.shards().list_for_database(args.database_id).await?;
    
    println!("Restarting database '{}' (db:{})...", db.name, args.database_id);
    
    // 3. Call restart API
    let response = client
        .databases()
        .restart(args.database_id, RestartRequest {
            preserve_roles: !args.no_preserve_roles,
            discard_data: args.discard_data,
        })
        .await?;
    
    // 4. Wait if requested
    if args.wait {
        wait_for_database_active(client, args.database_id, args.timeout).await?;
    }
    
    println!("✓ Database restart complete");
    Ok(())
}

rladmin Equivalent

# Basic restart
rladmin restart db db:1

# Restart with role preservation
rladmin restart db db:1 preserve_roles

# Restart and discard data
rladmin restart db db:1 discard_data

# Force discard even with persistence
rladmin restart db db:1 discard_data force_discard

Benefits

  1. Remote restart - No SSH required (vs rladmin)
  2. Safety checks - Validates before restarting
  3. Wait mode - Automatic polling until restart completes
  4. Structured output - JSON for automation
  5. Clear feedback - Progress indicators and status

Risks & Considerations

  1. Downtime - Restarting causes temporary unavailability
  2. Data loss - If not persisted/replicated
  3. Connection disruption - All clients will be disconnected
  4. Timing - Consider maintenance windows

Testing Plan

  1. Test against Docker cluster first
  2. Verify API endpoint exists
  3. Test with persisted vs non-persisted databases
  4. Test with replicated vs non-replicated databases
  5. Test role preservation
  6. Test wait mode and timeout

Related

Priority

Medium - Useful operational command, but requires API verification first.

Next Steps

  1. ✅ Create this issue to track the feature
  2. ⏳ Verify if /v1/bdbs/<id>/restart endpoint exists in REST API
  3. ⏳ If exists, implement command
  4. ⏳ If not, evaluate alternatives (failover-based approach)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions