Skip to content

Backend.AI environment auto-detection not working when executing commands #66

@inureyes

Description

@inureyes

Problem Description

When Backend.AI environment variables (BACKENDAI_CLUSTER_HOSTS, BACKENDAI_CLUSTER_HOST, BACKENDAI_CLUSTER_ROLE) are set, bssh should automatically detect the cluster and execute commands on all nodes. However, the current implementation incorrectly interprets the command as a hostname and tries to connect in SSH single-host mode.

Root Cause

The issue is in the CLI mode detection logic (src/cli.rs:407-411):

pub fn is_ssh_mode(&self) -> bool {
    // Only SSH mode if destination is provided and no cluster/hosts
    self.destination.is_some() && self.cluster.is_none() && self.hosts.is_none()
}

This logic only checks for explicit -C (cluster) or -H (hosts) options, and doesn't consider auto-detected Backend.AI environments.

Execution flow:

  1. User runs: bssh "whoami"
  2. CLI parsing: destination="whoami", cluster=None, hosts=None
  3. is_ssh_mode() returns true (destination exists, no cluster/hosts)
  4. ❌ Tries to SSH connect to host "whoami" instead of executing command on cluster

Expected Behavior

export BACKENDAI_CLUSTER_HOSTS="node1,node2,node3"
export BACKENDAI_CLUSTER_HOST="node1"
export BACKENDAI_CLUSTER_ROLE="main"

bssh "whoami"
# Should execute "whoami" on node1, node2, node3

Actual Behavior

bssh "whoami"
# Output: Failed to connect to inureyes@whoami:22
# Tries to connect to "whoami" as a hostname

Solution

Implement early Backend.AI environment detection in src/app/initialization.rs:

pub async fn initialize_app(cli: &mut Cli, args: &[String]) -> Result<AppContext> {
    // Check for Backend.AI environment early and auto-set cluster
    if Config::from_backendai_env().is_some() 
        && cli.cluster.is_none() 
        && cli.hosts.is_none() 
        && !cli.is_ssh_mode() {
        cli.cluster = Some("bai_auto".to_string());
    }
    
    // ... rest of initialization
}

This ensures Backend.AI environments are treated as multi-server mode before the mode decision logic runs.

Test Cases

Working cases:

  • bssh list - Shows bai_auto cluster correctly
  • bssh ping - Detects all 3 nodes correctly (subcommand path)
  • bssh -C bai_auto "whoami" - Works with explicit cluster

Failing case:

  • bssh "whoami" - Treats "whoami" as hostname instead of command

Additional Context

  • Backend.AI detection code exists in src/config/loader.rs:51-93
  • The auto-generated cluster name is bai_auto
  • Test script available: test_backendai_env.sh

Priority

High - This breaks a core feature documented in CLAUDE.md and ARCHITECTURE.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority:highHigh priority issuetype:bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions