refactor: Implement EnvGuard RAII wrapper and #[serial] for safe environment variable handling in Rust 2024 edition

## Overview

Rust 2024 edition marks `std::env::set_var` and `std::env::remove_var` as `unsafe` because they can cause undefined behavior in multi-threaded environments. During the 2024 edition migration (commit `73b78571`), these calls were left in an inconsistent state:

- **177 `env::set_var`/`env::remove_var` calls** across **17 test-bearing files**
- **The test build currently fails with 119 `E0133` errors** on `cargo check --lib --tests`
- 58 calls already wrapped in ad-hoc `unsafe {}` blocks (partial manual fix)
- 7 files already have `#[serial]` applied to some (but not all) of their env-touching tests
- `serial_test = "3.4"` already exists in `[dev-dependencies]`

This issue captures the plan to finish the migration via a safe `EnvGuard` RAII wrapper plus `#[serial]`, removing all `unsafe {}` blocks from test code and retiring the custom `once_cell::Mutex` pattern in `tests/backendai_env_test.rs`.

## Current State (verified 2026-04-14)

### Build Status

```
$ cargo check --lib --tests
error[E0133]: call to unsafe function `set_var` is unsafe and requires unsafe block
...
error: could not compile `bssh` (lib test) due to 123 previous errors
```

119 of those 123 errors are `E0133` against `std::env::set_var` / `remove_var`; the remainder are cascading type errors in the same modules.

### File-by-file Inventory

| File | env calls | `unsafe {}` blocks | `#[serial]` already? | `E0133` errors |
|---|---:|---:|:---:|---:|
| `src/jump/chain/auth.rs` *(test module)* | 20 | 0 | no | 20 |
| `src/executor/rank_detector.rs` *(test module)* | 17 | 0 | partial | 17 |
| `src/config/tests.rs` | 17 | 6 | no | 3 |
| `src/server/config/loader.rs` *(test module)* | 15 | 0 | no | 15 |
| `src/ssh/client/connection.rs` *(test module)* | 15 | 0 | partial | 10 |
| `src/cli/mode_detection_tests.rs` | 13 | 0 | partial | 13 |
| `src/jump/parser/tests.rs` | 13 | 5 | no | 8 |
| `tests/backendai_env_test.rs` | 23 | 6 | no *(uses `ENV_MUTEX`)* | 0 |
| `tests/jump_host_config_test.rs` | 8 | 2 | no | 4 |
| `tests/exit_code_integration_test.rs` | 8 | 0 | partial | 8 |
| `src/ssh/auth.rs` *(test module)* | 7 | 0 | no | 7 |
| `src/security/sudo.rs` *(test module)* | 7 | 0 | partial | 7 |
| `tests/pdsh_compat_test.rs` | 5 | 0 | partial | 5 |
| `tests/interactive_test.rs` | 3 | 2 | no | 0 |
| `src/commands/interactive_unit_test.rs` | 3 | 0 | no | 3 |
| `src/ssh/ssh_config/integration_tests/env_cache_integration_test.rs` | 2 | 0 | no | 2 |
| `src/cli/pdsh.rs` *(test module)* | 1 | 0 | no | 1 |
| **Total** | **177** | **21** | **7 files partial** | **119** |

Notes:

- *"partial"* means the file imports `serial_test::serial` and applies `#[serial]` to some — but not all — env-touching test functions.
- `tests/fail_fast_test.rs` already imports `serial_test` and uses `#[serial]`, but contains **zero** `env::set_var`/`remove_var` calls and is not in scope for this issue.
- Counts come from `grep -rn 'env::set_var\|env::remove_var'` and the error locations in `cargo check --lib --tests` output.

## Solution: `EnvGuard` RAII + `#[serial]`

Encapsulate every `unsafe` environment variable mutation inside a single RAII type that restores the original value on drop, and serialize every test that touches environment variables with `#[serial]` from the `serial_test` crate.

**Benefits**

1. **Zero `unsafe` in test bodies** — all unsafety lives in one audited module with a documented `SAFETY:` comment.
2. **Guaranteed cleanup** — `Drop` restores state even on panic or early `return`.
3. **Thread-safety via `#[serial]`** — eliminates cross-test interference without hand-rolled mutexes.
4. **Removes `ENV_MUTEX`** in `tests/backendai_env_test.rs` (obsolete once `#[serial]` is applied everywhere).
5. **Minimal, measurable perf cost** — expected +10–15% test duration; no impact on release build or runtime.

## Design

### `EnvGuard` (`src/test_helpers/env_guard.rs`)

```rust
// Copyright 2025 Lablup Inc. and Jeongkyu Shin
// SPDX-License-Identifier: Apache-2.0

//! RAII wrapper for safe environment variable mutation in tests.
//!
//! All `unsafe` calls to `std::env::set_var` / `std::env::remove_var` are
//! encapsulated here. Combined with `#[serial_test::serial]`, this gives tests
//! a safe, cleanup-guaranteed way to manipulate process-wide environment
//! variables without scattering `unsafe {}` blocks across the codebase.

#![cfg(test)]

use std::ffi::{OsStr, OsString};

/// RAII guard that sets or removes an environment variable on construction
/// and restores the previous value (or unset state) on drop.
///
/// Tests that use `EnvGuard` **must** also be annotated with
/// `#[serial_test::serial]` to prevent races with other tests that read or
/// mutate the same variable.
#[must_use = "EnvGuard must be bound to a local; dropping it immediately restores the variable"]
pub struct EnvGuard {
    key: OsString,
    original: Option<OsString>,
}

impl EnvGuard {
    /// Set an environment variable, saving its prior value for restoration.
    pub fn set(key: impl Into<OsString>, value: impl AsRef<OsStr>) -> Self {
        let key = key.into();
        let original = std::env::var_os(&key);
        // SAFETY: tests that construct `EnvGuard` are `#[serial]`, so no
        // other thread reads or writes the environment concurrently.
        unsafe { std::env::set_var(&key, value); }
        Self { key, original }
    }

    /// Remove an environment variable, saving its prior value for restoration.
    pub fn remove(key: impl Into<OsString>) -> Self {
        let key = key.into();
        let original = std::env::var_os(&key);
        // SAFETY: same rationale as `set`.
        unsafe { std::env::remove_var(&key); }
        Self { key, original }
    }
}

impl Drop for EnvGuard {
    fn drop(&mut self) {
        // SAFETY: same rationale as `set` / `remove`.
        unsafe {
            match self.original.take() {
                Some(v) => std::env::set_var(&self.key, v),
                None => std::env::remove_var(&self.key),
            }
        }
    }
}
```

**Design choices**

- **`OsString` / `OsStr`** instead of `String` preserves non-UTF-8 values (e.g. `PATH` on Windows) correctly. Tests using `HOME` pass `&str` transparently via `AsRef<OsStr>`.
- **`#![cfg(test)]`** at the module level guarantees the code never ships in a release binary.
- **`#[must_use]`** flags the common mistake of `let _ = EnvGuard::set(...)`, which would drop the guard immediately and restore the variable before the test body ran.
- **LIFO drop order** when a test holds multiple guards matches the user's mental model:
  ```rust
  let _home = EnvGuard::set("HOME", "/tmp/test");
  let _user = EnvGuard::set("USER", "testuser");
  // Drops in reverse: USER first, then HOME.
  ```
- v1 deliberately has no `set_many` / `remove_many` helpers — multiple guards compose cleanly.

### Module wiring

`src/test_helpers/mod.rs`:

```rust
#![cfg(test)]

mod env_guard;
pub use env_guard::EnvGuard;
```

`src/lib.rs` (test-only):

```rust
#[cfg(test)]
pub(crate) mod test_helpers;
```

For integration tests under `tests/` (which can't see `pub(crate)` items), use a thin shared module:

`tests/common/mod.rs`:

```rust
// Re-export via a `#[path]` include of the library's `env_guard.rs` so the
// integration tests use the *same* implementation as unit tests.
#[path = "../../src/test_helpers/env_guard.rs"]
mod env_guard_impl;
pub use env_guard_impl::EnvGuard;
```

Each integration test file then adds `mod common;` and `use common::EnvGuard;`. The `#[path]` approach avoids publishing `EnvGuard` as a public API of the `bssh` crate. (Alternative: mark the module `pub` and re-export; we'll pick one during Phase 1 depending on which compiles more cleanly.)

## Implementation Phases

### Phase 1 — Foundation *(blocks everything else)*

1. Create `src/test_helpers/env_guard.rs` with `EnvGuard` + `Drop`.
2. Create `src/test_helpers/mod.rs`.
3. Add `#[cfg(test)] pub(crate) mod test_helpers;` to `src/lib.rs`.
4. Create `tests/common/mod.rs` exposing `EnvGuard` to integration tests (or finalize an alternative visibility path).
5. Verify `serial_test = "3.4"` is in `[dev-dependencies]` (already present ✅).

**Exit criteria:** `cargo check --lib --tests 2>&1 | grep "test_helpers"` reports no errors referencing the new module.

### Phase 2 — Convert unit tests in `src/` *(12 files, 119 − 30 = 89 compile errors cleared)*

Fix `src/` test modules in ascending order of complexity to catch pattern issues early:

1. `src/cli/pdsh.rs` (1 call)
2. `src/commands/interactive_unit_test.rs` (3)
3. `src/ssh/ssh_config/integration_tests/env_cache_integration_test.rs` (2)
4. `src/security/sudo.rs` (7)
5. `src/ssh/auth.rs` (7)
6. `src/cli/mode_detection_tests.rs` (13)
7. `src/jump/parser/tests.rs` (13)
8. `src/config/tests.rs` (17)
9. `src/server/config/loader.rs` (15)
10. `src/ssh/client/connection.rs` (15)
11. `src/executor/rank_detector.rs` (17)
12. `src/jump/chain/auth.rs` (20)

After each file: `cargo check --lib --tests` should have one fewer batch of errors. Commit per file (or per logical pair) for reviewability.

**Exit criteria:** `cargo check --lib --tests` reports 0 errors from `src/`.

### Phase 3 — Convert integration tests in `tests/` *(5 files, ~30 remaining errors cleared)*

1. `tests/interactive_test.rs` (3) — smallest, validates the `tests/common/` wiring
2. `tests/pdsh_compat_test.rs` (5)
3. `tests/jump_host_config_test.rs` (8)
4. `tests/exit_code_integration_test.rs` (8)
5. `tests/backendai_env_test.rs` (23) — **also delete the `ENV_MUTEX` / `once_cell::Lazy<Mutex<()>>` boilerplate at the top of the file and every `let _guard = ENV_MUTEX.lock().await;` call site**

**Exit criteria:** `cargo test --lib --bins --tests --no-run` compiles; no `unsafe { env::set_var }` / `unsafe { env::remove_var }` remain outside `src/test_helpers/env_guard.rs`.

### Phase 4 — Validation

1. `cargo build --all-targets` — clean.
2. `cargo clippy --all-targets --all-features -- -D warnings` — clean.
3. `cargo test --lib --bins --tests` — all green.
4. `grep -rn "unsafe.*env::set_var\|unsafe.*env::remove_var" src/ tests/` → 0 matches outside `src/test_helpers/env_guard.rs`.
5. `grep -rn "ENV_MUTEX\|once_cell::sync::Lazy" tests/` → 0 matches in converted files.
6. Record before/after test duration and include the delta in the PR description.

## Conversion Pattern Reference

### Before (current broken state — `src/config/tests.rs`)

```rust
#[test]
fn test_expand_tilde() {
    let original_home = std::env::var("HOME").ok();
    std::env::set_var("HOME", "/home/user");        // E0133

    let expanded = expand_tilde(Path::new("~/.ssh/config"));

    if let Some(home) = original_home {
        std::env::set_var("HOME", home);            // E0133
    } else {
        std::env::remove_var("HOME");                // E0133
    }

    assert_eq!(expanded, PathBuf::from("/home/user/.ssh/config"));
}
```

### After

```rust
use serial_test::serial;
use crate::test_helpers::EnvGuard;

#[test]
#[serial]
fn test_expand_tilde() {
    let _home = EnvGuard::set("HOME", "/home/user");

    let expanded = expand_tilde(Path::new("~/.ssh/config"));

    assert_eq!(expanded, PathBuf::from("/home/user/.ssh/config"));
    // `_home` dropped here → HOME automatically restored.
}
```

### Multi-variable test (replaces the `ENV_MUTEX` pattern)

```rust
// tests/backendai_env_test.rs — after conversion
use serial_test::serial;
mod common;
use common::EnvGuard;

#[tokio::test]
#[serial]
async fn test_backendai_env_auto_detection() {
    let _hosts = EnvGuard::set("BACKENDAI_CLUSTER_HOSTS", "node1.ai,node2.ai,node3.ai");
    let _host  = EnvGuard::set("BACKENDAI_CLUSTER_HOST", "node1.ai");
    let _role  = EnvGuard::set("BACKENDAI_CLUSTER_ROLE", "main");

    let temp_dir = tempfile::tempdir().unwrap();
    let nonexistent_path = temp_dir.path().join("nonexistent.yaml");

    let config = Config::load_with_priority(&nonexistent_path)
        .await
        .expect("Config should load with Backend.AI env");

    assert!(config.clusters.contains_key("bai_auto"));
    // ...remaining assertions unchanged...
}
```

No more `ENV_MUTEX.lock()`, no more manual `orig_hosts` save/restore, no more `unsafe {}`.

## Completion Checklist

**Phase 1 — Foundation**

- [ ] `src/test_helpers/env_guard.rs` created with `EnvGuard` + `Drop`
- [ ] `src/test_helpers/mod.rs` created, exports `EnvGuard`
- [ ] `src/lib.rs` gains `#[cfg(test)] pub(crate) mod test_helpers;`
- [ ] `tests/common/mod.rs` created (or alternative visibility path chosen)

**Phase 2 — `src/` test modules** *(119 of 119 E0133 errors must be resolved across both phases)*

- [ ] `src/cli/pdsh.rs` — converted, `#[serial]` applied
- [ ] `src/commands/interactive_unit_test.rs` — converted, `#[serial]` applied
- [ ] `src/ssh/ssh_config/integration_tests/env_cache_integration_test.rs` — converted, `#[serial]` applied
- [ ] `src/security/sudo.rs` — converted, existing `#[serial]` preserved
- [ ] `src/ssh/auth.rs` — converted, `#[serial]` applied
- [ ] `src/cli/mode_detection_tests.rs` — converted, existing `#[serial]` preserved + extended
- [ ] `src/jump/parser/tests.rs` — converted, `#[serial]` applied
- [ ] `src/config/tests.rs` — converted, `#[serial]` applied
- [ ] `src/server/config/loader.rs` — converted, `#[serial]` applied
- [ ] `src/ssh/client/connection.rs` — converted, existing `#[serial]` preserved + extended
- [ ] `src/executor/rank_detector.rs` — converted, existing `#[serial]` preserved + extended
- [ ] `src/jump/chain/auth.rs` — converted, `#[serial]` applied

**Phase 3 — `tests/` integration tests**

- [ ] `tests/interactive_test.rs` — converted, `#[serial]` applied
- [ ] `tests/pdsh_compat_test.rs` — converted, existing `#[serial]` preserved
- [ ] `tests/jump_host_config_test.rs` — converted, `#[serial]` applied
- [ ] `tests/exit_code_integration_test.rs` — converted, existing `#[serial]` preserved
- [ ] `tests/backendai_env_test.rs` — converted; `ENV_MUTEX` + `once_cell::sync::Lazy` deleted

**Phase 4 — Validation**

- [ ] `cargo build --all-targets` — clean
- [ ] `cargo clippy --all-targets --all-features -- -D warnings` — clean
- [ ] `cargo test --lib --bins --tests` — all green
- [ ] `grep -rn "unsafe.*env::set_var\|unsafe.*env::remove_var" src/ tests/` = 0 matches outside `env_guard.rs`
- [ ] `grep -rn "ENV_MUTEX" tests/` = 0 matches
- [ ] Test duration delta documented in the PR (expected +10–15%)

## Risks & Mitigations

| Risk | Mitigation |
|---|---|
| `#[serial]` serializes all env-touching tests → CI slowdown | Accept initially. If slowdown > 20%, group with `#[serial(key)]` by variable family. Measure in PR. |
| A test uses `EnvGuard` but forgets `#[serial]` → race with other tests | Add a grep assertion in CI: any test file importing `EnvGuard` must also import `serial_test::serial`. |
| Drop order surprises with multiple guards in one test | Document LIFO order in the module doc; prefer one guard per variable in the pattern examples. |
| Integration tests can't access crate-private `test_helpers` | Use `#[path]` include in `tests/common/mod.rs`, so unit and integration tests share one implementation file without making `EnvGuard` public API. |
| `var_os` + `set_var` cycle loses OS-level encoding fidelity | Not a concern for our test values (all ASCII); documented in the module header for future readers. |
| Existing `ENV_MUTEX.lock().await` in `backendai_env_test.rs` is async, `#[serial]` is sync | `serial_test` provides `#[serial]` that works with `#[tokio::test]` — no change needed. Verified on crate docs. |

## Performance Impact

- **Test execution time:** +10–15% expected (driven by `#[serial]` enforcement, not by `EnvGuard` itself). Actual delta measured in the PR.
- **Release build:** zero impact (`#![cfg(test)]` on the module).
- **Runtime performance:** zero impact (never linked into the production binary).

## Future Work

Once this lands, a follow-up issue should:

1. **Eliminate env var dependence from unit tests** by injecting configuration through function parameters or a `Config::builder()` pattern, so env-touching is reserved for tests that validate env var parsing itself.
2. **Audit `src/ssh/client/connection.rs`** — can the SSH agent detection tests inject a socket path parameter instead of mutating `SSH_AUTH_SOCK`?
3. **Audit `src/executor/rank_detector.rs`** — 17 env var calls suggest the rank detector could accept an injected env-like map, removing the need for process-global mutation in tests.

Each of these would reduce the `#[serial]` surface and let more tests run in parallel.

## References

- Rust 2024 edition changes: <https://doc.rust-lang.org/edition-guide/rust-2024/>
- `std::env::set_var` unsafe soundness issue: <https://github.com/rust-lang/rust/issues/90308>
- `serial_test` crate: <https://crates.io/crates/serial_test>
- RAII pattern: <https://doc.rust-lang.org/rust-by-example/scope/raii.html>
- Related commits: `73b78571` (2024 edition migration), `65fca4d4` (clippy guard-clause cleanup)


Risk	Mitigation
`#[serial]` serializes all env-touching tests → CI slowdown	Accept initially. If slowdown > 20%, group with `#[serial(key)]` by variable family. Measure in PR.
A test uses `EnvGuard` but forgets `#[serial]` → race with other tests	Add a grep assertion in CI: any test file importing `EnvGuard` must also import `serial_test::serial`.
Drop order surprises with multiple guards in one test	Document LIFO order in the module doc; prefer one guard per variable in the pattern examples.
Integration tests can't access crate-private `test_helpers`	Use `#[path]` include in `tests/common/mod.rs`, so unit and integration tests share one implementation file without making `EnvGuard` public API.
`var_os` + `set_var` cycle loses OS-level encoding fidelity	Not a concern for our test values (all ASCII); documented in the module header for future readers.
Existing `ENV_MUTEX.lock().await` in `backendai_env_test.rs` is async, `#[serial]` is sync	`serial_test` provides `#[serial]` that works with `#[tokio::test]` — no change needed. Verified on crate docs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Implement EnvGuard RAII wrapper and #[serial] for safe environment variable handling in Rust 2024 edition #179

Overview

Current State (verified 2026-04-14)

Build Status

File-by-file Inventory

Solution: `EnvGuard` RAII + `#[serial]`

Design

`EnvGuard` (`src/test_helpers/env_guard.rs`)

Module wiring

Implementation Phases

Phase 1 — Foundation (blocks everything else)

Phase 2 — Convert unit tests in `src/` (12 files, 119 − 30 = 89 compile errors cleared)

Phase 3 — Convert integration tests in `tests/` (5 files, ~30 remaining errors cleared)

Phase 4 — Validation

Conversion Pattern Reference

Before (current broken state — `src/config/tests.rs`)

After

Multi-variable test (replaces the `ENV_MUTEX` pattern)

Completion Checklist

Risks & Mitigations

Performance Impact

Future Work

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

File	env calls	`unsafe {}` blocks	`#[serial]` already?	`E0133` errors
`src/jump/chain/auth.rs` (test module)	20	0	no	20
`src/executor/rank_detector.rs` (test module)	17	0	partial	17
`src/config/tests.rs`	17	6	no	3
`src/server/config/loader.rs` (test module)	15	0	no	15
`src/ssh/client/connection.rs` (test module)	15	0	partial	10
`src/cli/mode_detection_tests.rs`	13	0	partial	13
`src/jump/parser/tests.rs`	13	5	no	8
`tests/backendai_env_test.rs`	23	6	no (uses `ENV_MUTEX`)	0
`tests/jump_host_config_test.rs`	8	2	no	4
`tests/exit_code_integration_test.rs`	8	0	partial	8
`src/ssh/auth.rs` (test module)	7	0	no	7
`src/security/sudo.rs` (test module)	7	0	partial	7
`tests/pdsh_compat_test.rs`	5	0	partial	5
`tests/interactive_test.rs`	3	2	no	0
`src/commands/interactive_unit_test.rs`	3	0	no	3
`src/ssh/ssh_config/integration_tests/env_cache_integration_test.rs`	2	0	no	2
`src/cli/pdsh.rs` (test module)	1	0	no	1
Total	177	21	7 files partial	119

refactor: Implement EnvGuard RAII wrapper and #[serial] for safe environment variable handling in Rust 2024 edition #179

Description

Overview

Current State (verified 2026-04-14)

Build Status

File-by-file Inventory

Solution: EnvGuard RAII + #[serial]

Design

EnvGuard (src/test_helpers/env_guard.rs)

Module wiring

Implementation Phases

Phase 1 — Foundation (blocks everything else)

Phase 2 — Convert unit tests in src/ (12 files, 119 − 30 = 89 compile errors cleared)

Phase 3 — Convert integration tests in tests/ (5 files, ~30 remaining errors cleared)

Phase 4 — Validation

Conversion Pattern Reference

Before (current broken state — src/config/tests.rs)

After

Multi-variable test (replaces the ENV_MUTEX pattern)

Completion Checklist

Risks & Mitigations

Performance Impact

Future Work

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Solution: `EnvGuard` RAII + `#[serial]`

`EnvGuard` (`src/test_helpers/env_guard.rs`)

Phase 2 — Convert unit tests in `src/` (12 files, 119 − 30 = 89 compile errors cleared)

Phase 3 — Convert integration tests in `tests/` (5 files, ~30 remaining errors cleared)

Before (current broken state — `src/config/tests.rs`)

Multi-variable test (replaces the `ENV_MUTEX` pattern)