Skip to content

Always-on LD_LIBRARY_PATH stripping causes 11x performance regression for CUDA/MKL users #8945

@johnzfitch

Description

@johnzfitch

What version of Codex is running?

0.00

What subscription do you have?

Pro

Which model were you using?

gpt-5.2

What platform is your computer?

Linux 6.17.9-arch1-1 x86_64

What issue are you seeing?

Pre-main hardening in release builds strips LD_LIBRARY_PATH unconditionally, causing severe performance regressions and broken functionality for:

  • CUDA workloads: 11-300x slower (GPU libraries not found, fallback to CPU)
  • Intel MKL workloads: 11x slower (optimized BLAS not found, slow fallback)
  • Conda environments: Legacy installations relying on LD_LIBRARY_PATH fail
  • Enterprise deployments: Custom DB drivers, Oracle clients fail to load
  • HPC clusters: Module systems and custom toolchains broken

This regression was introduced in PR #4521 (commit b8e1fe6) which made codex_process_hardening::pre_main_hardening() always-on in release builds.

The hardening strips all LD_* environment variables before main(), and because Codex spawns child processes with the parent environment, every child inherits the stripped values.

What steps can reproduce the bug?

  1. Install Codex release build (non-debug)
  2. Set LD_LIBRARY_PATH to custom CUDA installation:
    export LD_LIBRARY_PATH=/opt/cuda/lib64:$LD_LIBRARY_PATH
  3. Run a CUDA workload:
    codex exec "python3 -c 'import torch; print(torch.cuda.is_available())'"
  4. Observe: Returns False (CUDA libraries not found)
  5. Verify LD_LIBRARY_PATH is stripped:
    codex exec "echo $LD_LIBRARY_PATH"
  6. Output: (empty or different from parent)

Performance regression can be measured with:
time codex exec "python3 -c 'import numpy as np; np.dot(np.random.rand(1000,1000), np.random.rand(1000,1000))'"

What is the expected behavior?

LD_LIBRARY_PATH should be preserved by default to support legitimate use cases (CUDA, MKL, custom libraries).

Users who need maximum security can opt-in via environment variable:
CODEX_SECURE_MODE=1 codex exec "..."

This approach:

  • Restores performance and functionality for CUDA/MKL users
  • Maintains strong security through existing Landlock/Seatbelt boundaries
  • Allows opt-in maximum hardening when truly needed
  • Has zero breaking changes for existing users

Additional information

I have a tested fix ready for PR submission:

  • Makes pre_main_hardening() opt-in via CODEX_SECURE_MODE=1 environment variable
  • Default mode: Preserves LD_LIBRARY_PATH (fast, compatible)
  • Secure mode: Strips LD_* variables (maximum hardening)
  • Only 2 files modified: cli/src/main.rs and responses-api-proxy/src/main.rs
  • Security maintained through Landlock/Seatbelt (primary boundaries)

This affects users in ML/AI research, scientific computing, HPC, and enterprise environments for the most part.


TL;DR: The pre-main hardening regression is literally a "ghost" - it executes before main(), strips the environment, and then disappears, leaving no trace except confused users reporting slowness.

Reference documentation:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingsandboxIssues related to permissions or sandboxing

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions