Skip to content

refactor: Implement EnvGuard RAII wrapper and #[serial] for safe environment variable handling in Rust 2024 edition #179

@inureyes

Description

@inureyes

Overview

Rust 2024 edition marks std::env::set_var and std::env::remove_var as unsafe because they can cause undefined behavior in multi-threaded environments. During the 2024 edition migration (commit 73b78571), these calls were left in an inconsistent state:

  • 177 env::set_var/env::remove_var calls across 17 test-bearing files
  • The test build currently fails with 119 E0133 errors on cargo check --lib --tests
  • 58 calls already wrapped in ad-hoc unsafe {} blocks (partial manual fix)
  • 7 files already have #[serial] applied to some (but not all) of their env-touching tests
  • serial_test = "3.4" already exists in [dev-dependencies]

This issue captures the plan to finish the migration via a safe EnvGuard RAII wrapper plus #[serial], removing all unsafe {} blocks from test code and retiring the custom once_cell::Mutex pattern in tests/backendai_env_test.rs.

Current State (verified 2026-04-14)

Build Status

$ cargo check --lib --tests
error[E0133]: call to unsafe function `set_var` is unsafe and requires unsafe block
...
error: could not compile `bssh` (lib test) due to 123 previous errors

119 of those 123 errors are E0133 against std::env::set_var / remove_var; the remainder are cascading type errors in the same modules.

File-by-file Inventory

File env calls unsafe {} blocks #[serial] already? E0133 errors
src/jump/chain/auth.rs (test module) 20 0 no 20
src/executor/rank_detector.rs (test module) 17 0 partial 17
src/config/tests.rs 17 6 no 3
src/server/config/loader.rs (test module) 15 0 no 15
src/ssh/client/connection.rs (test module) 15 0 partial 10
src/cli/mode_detection_tests.rs 13 0 partial 13
src/jump/parser/tests.rs 13 5 no 8
tests/backendai_env_test.rs 23 6 no (uses ENV_MUTEX) 0
tests/jump_host_config_test.rs 8 2 no 4
tests/exit_code_integration_test.rs 8 0 partial 8
src/ssh/auth.rs (test module) 7 0 no 7
src/security/sudo.rs (test module) 7 0 partial 7
tests/pdsh_compat_test.rs 5 0 partial 5
tests/interactive_test.rs 3 2 no 0
src/commands/interactive_unit_test.rs 3 0 no 3
src/ssh/ssh_config/integration_tests/env_cache_integration_test.rs 2 0 no 2
src/cli/pdsh.rs (test module) 1 0 no 1
Total 177 21 7 files partial 119

Notes:

  • "partial" means the file imports serial_test::serial and applies #[serial] to some — but not all — env-touching test functions.
  • tests/fail_fast_test.rs already imports serial_test and uses #[serial], but contains zero env::set_var/remove_var calls and is not in scope for this issue.
  • Counts come from grep -rn 'env::set_var\|env::remove_var' and the error locations in cargo check --lib --tests output.

Solution: EnvGuard RAII + #[serial]

Encapsulate every unsafe environment variable mutation inside a single RAII type that restores the original value on drop, and serialize every test that touches environment variables with #[serial] from the serial_test crate.

Benefits

  1. Zero unsafe in test bodies — all unsafety lives in one audited module with a documented SAFETY: comment.
  2. Guaranteed cleanupDrop restores state even on panic or early return.
  3. Thread-safety via #[serial] — eliminates cross-test interference without hand-rolled mutexes.
  4. Removes ENV_MUTEX in tests/backendai_env_test.rs (obsolete once #[serial] is applied everywhere).
  5. Minimal, measurable perf cost — expected +10–15% test duration; no impact on release build or runtime.

Design

EnvGuard (src/test_helpers/env_guard.rs)

// Copyright 2025 Lablup Inc. and Jeongkyu Shin
// SPDX-License-Identifier: Apache-2.0

//! RAII wrapper for safe environment variable mutation in tests.
//!
//! All `unsafe` calls to `std::env::set_var` / `std::env::remove_var` are
//! encapsulated here. Combined with `#[serial_test::serial]`, this gives tests
//! a safe, cleanup-guaranteed way to manipulate process-wide environment
//! variables without scattering `unsafe {}` blocks across the codebase.

#![cfg(test)]

use std::ffi::{OsStr, OsString};

/// RAII guard that sets or removes an environment variable on construction
/// and restores the previous value (or unset state) on drop.
///
/// Tests that use `EnvGuard` **must** also be annotated with
/// `#[serial_test::serial]` to prevent races with other tests that read or
/// mutate the same variable.
#[must_use = "EnvGuard must be bound to a local; dropping it immediately restores the variable"]
pub struct EnvGuard {
    key: OsString,
    original: Option<OsString>,
}

impl EnvGuard {
    /// Set an environment variable, saving its prior value for restoration.
    pub fn set(key: impl Into<OsString>, value: impl AsRef<OsStr>) -> Self {
        let key = key.into();
        let original = std::env::var_os(&key);
        // SAFETY: tests that construct `EnvGuard` are `#[serial]`, so no
        // other thread reads or writes the environment concurrently.
        unsafe { std::env::set_var(&key, value); }
        Self { key, original }
    }

    /// Remove an environment variable, saving its prior value for restoration.
    pub fn remove(key: impl Into<OsString>) -> Self {
        let key = key.into();
        let original = std::env::var_os(&key);
        // SAFETY: same rationale as `set`.
        unsafe { std::env::remove_var(&key); }
        Self { key, original }
    }
}

impl Drop for EnvGuard {
    fn drop(&mut self) {
        // SAFETY: same rationale as `set` / `remove`.
        unsafe {
            match self.original.take() {
                Some(v) => std::env::set_var(&self.key, v),
                None => std::env::remove_var(&self.key),
            }
        }
    }
}

Design choices

  • OsString / OsStr instead of String preserves non-UTF-8 values (e.g. PATH on Windows) correctly. Tests using HOME pass &str transparently via AsRef<OsStr>.
  • #![cfg(test)] at the module level guarantees the code never ships in a release binary.
  • #[must_use] flags the common mistake of let _ = EnvGuard::set(...), which would drop the guard immediately and restore the variable before the test body ran.
  • LIFO drop order when a test holds multiple guards matches the user's mental model:
    let _home = EnvGuard::set("HOME", "/tmp/test");
    let _user = EnvGuard::set("USER", "testuser");
    // Drops in reverse: USER first, then HOME.
  • v1 deliberately has no set_many / remove_many helpers — multiple guards compose cleanly.

Module wiring

src/test_helpers/mod.rs:

#![cfg(test)]

mod env_guard;
pub use env_guard::EnvGuard;

src/lib.rs (test-only):

#[cfg(test)]
pub(crate) mod test_helpers;

For integration tests under tests/ (which can't see pub(crate) items), use a thin shared module:

tests/common/mod.rs:

// Re-export via a `#[path]` include of the library's `env_guard.rs` so the
// integration tests use the *same* implementation as unit tests.
#[path = "../../src/test_helpers/env_guard.rs"]
mod env_guard_impl;
pub use env_guard_impl::EnvGuard;

Each integration test file then adds mod common; and use common::EnvGuard;. The #[path] approach avoids publishing EnvGuard as a public API of the bssh crate. (Alternative: mark the module pub and re-export; we'll pick one during Phase 1 depending on which compiles more cleanly.)

Implementation Phases

Phase 1 — Foundation (blocks everything else)

  1. Create src/test_helpers/env_guard.rs with EnvGuard + Drop.
  2. Create src/test_helpers/mod.rs.
  3. Add #[cfg(test)] pub(crate) mod test_helpers; to src/lib.rs.
  4. Create tests/common/mod.rs exposing EnvGuard to integration tests (or finalize an alternative visibility path).
  5. Verify serial_test = "3.4" is in [dev-dependencies] (already present ✅).

Exit criteria: cargo check --lib --tests 2>&1 | grep "test_helpers" reports no errors referencing the new module.

Phase 2 — Convert unit tests in src/ (12 files, 119 − 30 = 89 compile errors cleared)

Fix src/ test modules in ascending order of complexity to catch pattern issues early:

  1. src/cli/pdsh.rs (1 call)
  2. src/commands/interactive_unit_test.rs (3)
  3. src/ssh/ssh_config/integration_tests/env_cache_integration_test.rs (2)
  4. src/security/sudo.rs (7)
  5. src/ssh/auth.rs (7)
  6. src/cli/mode_detection_tests.rs (13)
  7. src/jump/parser/tests.rs (13)
  8. src/config/tests.rs (17)
  9. src/server/config/loader.rs (15)
  10. src/ssh/client/connection.rs (15)
  11. src/executor/rank_detector.rs (17)
  12. src/jump/chain/auth.rs (20)

After each file: cargo check --lib --tests should have one fewer batch of errors. Commit per file (or per logical pair) for reviewability.

Exit criteria: cargo check --lib --tests reports 0 errors from src/.

Phase 3 — Convert integration tests in tests/ (5 files, ~30 remaining errors cleared)

  1. tests/interactive_test.rs (3) — smallest, validates the tests/common/ wiring
  2. tests/pdsh_compat_test.rs (5)
  3. tests/jump_host_config_test.rs (8)
  4. tests/exit_code_integration_test.rs (8)
  5. tests/backendai_env_test.rs (23) — also delete the ENV_MUTEX / once_cell::Lazy<Mutex<()>> boilerplate at the top of the file and every let _guard = ENV_MUTEX.lock().await; call site

Exit criteria: cargo test --lib --bins --tests --no-run compiles; no unsafe { env::set_var } / unsafe { env::remove_var } remain outside src/test_helpers/env_guard.rs.

Phase 4 — Validation

  1. cargo build --all-targets — clean.
  2. cargo clippy --all-targets --all-features -- -D warnings — clean.
  3. cargo test --lib --bins --tests — all green.
  4. grep -rn "unsafe.*env::set_var\|unsafe.*env::remove_var" src/ tests/ → 0 matches outside src/test_helpers/env_guard.rs.
  5. grep -rn "ENV_MUTEX\|once_cell::sync::Lazy" tests/ → 0 matches in converted files.
  6. Record before/after test duration and include the delta in the PR description.

Conversion Pattern Reference

Before (current broken state — src/config/tests.rs)

#[test]
fn test_expand_tilde() {
    let original_home = std::env::var("HOME").ok();
    std::env::set_var("HOME", "/home/user");        // E0133

    let expanded = expand_tilde(Path::new("~/.ssh/config"));

    if let Some(home) = original_home {
        std::env::set_var("HOME", home);            // E0133
    } else {
        std::env::remove_var("HOME");                // E0133
    }

    assert_eq!(expanded, PathBuf::from("/home/user/.ssh/config"));
}

After

use serial_test::serial;
use crate::test_helpers::EnvGuard;

#[test]
#[serial]
fn test_expand_tilde() {
    let _home = EnvGuard::set("HOME", "/home/user");

    let expanded = expand_tilde(Path::new("~/.ssh/config"));

    assert_eq!(expanded, PathBuf::from("/home/user/.ssh/config"));
    // `_home` dropped here → HOME automatically restored.
}

Multi-variable test (replaces the ENV_MUTEX pattern)

// tests/backendai_env_test.rs — after conversion
use serial_test::serial;
mod common;
use common::EnvGuard;

#[tokio::test]
#[serial]
async fn test_backendai_env_auto_detection() {
    let _hosts = EnvGuard::set("BACKENDAI_CLUSTER_HOSTS", "node1.ai,node2.ai,node3.ai");
    let _host  = EnvGuard::set("BACKENDAI_CLUSTER_HOST", "node1.ai");
    let _role  = EnvGuard::set("BACKENDAI_CLUSTER_ROLE", "main");

    let temp_dir = tempfile::tempdir().unwrap();
    let nonexistent_path = temp_dir.path().join("nonexistent.yaml");

    let config = Config::load_with_priority(&nonexistent_path)
        .await
        .expect("Config should load with Backend.AI env");

    assert!(config.clusters.contains_key("bai_auto"));
    // ...remaining assertions unchanged...
}

No more ENV_MUTEX.lock(), no more manual orig_hosts save/restore, no more unsafe {}.

Completion Checklist

Phase 1 — Foundation

  • src/test_helpers/env_guard.rs created with EnvGuard + Drop
  • src/test_helpers/mod.rs created, exports EnvGuard
  • src/lib.rs gains #[cfg(test)] pub(crate) mod test_helpers;
  • tests/common/mod.rs created (or alternative visibility path chosen)

Phase 2 — src/ test modules (119 of 119 E0133 errors must be resolved across both phases)

  • src/cli/pdsh.rs — converted, #[serial] applied
  • src/commands/interactive_unit_test.rs — converted, #[serial] applied
  • src/ssh/ssh_config/integration_tests/env_cache_integration_test.rs — converted, #[serial] applied
  • src/security/sudo.rs — converted, existing #[serial] preserved
  • src/ssh/auth.rs — converted, #[serial] applied
  • src/cli/mode_detection_tests.rs — converted, existing #[serial] preserved + extended
  • src/jump/parser/tests.rs — converted, #[serial] applied
  • src/config/tests.rs — converted, #[serial] applied
  • src/server/config/loader.rs — converted, #[serial] applied
  • src/ssh/client/connection.rs — converted, existing #[serial] preserved + extended
  • src/executor/rank_detector.rs — converted, existing #[serial] preserved + extended
  • src/jump/chain/auth.rs — converted, #[serial] applied

Phase 3 — tests/ integration tests

  • tests/interactive_test.rs — converted, #[serial] applied
  • tests/pdsh_compat_test.rs — converted, existing #[serial] preserved
  • tests/jump_host_config_test.rs — converted, #[serial] applied
  • tests/exit_code_integration_test.rs — converted, existing #[serial] preserved
  • tests/backendai_env_test.rs — converted; ENV_MUTEX + once_cell::sync::Lazy deleted

Phase 4 — Validation

  • cargo build --all-targets — clean
  • cargo clippy --all-targets --all-features -- -D warnings — clean
  • cargo test --lib --bins --tests — all green
  • grep -rn "unsafe.*env::set_var\|unsafe.*env::remove_var" src/ tests/ = 0 matches outside env_guard.rs
  • grep -rn "ENV_MUTEX" tests/ = 0 matches
  • Test duration delta documented in the PR (expected +10–15%)

Risks & Mitigations

Risk Mitigation
#[serial] serializes all env-touching tests → CI slowdown Accept initially. If slowdown > 20%, group with #[serial(key)] by variable family. Measure in PR.
A test uses EnvGuard but forgets #[serial] → race with other tests Add a grep assertion in CI: any test file importing EnvGuard must also import serial_test::serial.
Drop order surprises with multiple guards in one test Document LIFO order in the module doc; prefer one guard per variable in the pattern examples.
Integration tests can't access crate-private test_helpers Use #[path] include in tests/common/mod.rs, so unit and integration tests share one implementation file without making EnvGuard public API.
var_os + set_var cycle loses OS-level encoding fidelity Not a concern for our test values (all ASCII); documented in the module header for future readers.
Existing ENV_MUTEX.lock().await in backendai_env_test.rs is async, #[serial] is sync serial_test provides #[serial] that works with #[tokio::test] — no change needed. Verified on crate docs.

Performance Impact

  • Test execution time: +10–15% expected (driven by #[serial] enforcement, not by EnvGuard itself). Actual delta measured in the PR.
  • Release build: zero impact (#![cfg(test)] on the module).
  • Runtime performance: zero impact (never linked into the production binary).

Future Work

Once this lands, a follow-up issue should:

  1. Eliminate env var dependence from unit tests by injecting configuration through function parameters or a Config::builder() pattern, so env-touching is reserved for tests that validate env var parsing itself.
  2. Audit src/ssh/client/connection.rs — can the SSH agent detection tests inject a socket path parameter instead of mutating SSH_AUTH_SOCK?
  3. Audit src/executor/rank_detector.rs — 17 env var calls suggest the rank detector could accept an injected env-like map, removing the need for process-global mutation in tests.

Each of these would reduce the #[serial] surface and let more tests run in parallel.

References

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions