Skip to content

fix(serve): respect user-set CUDA_VISIBLE_DEVICES in gpu_env#751

Open
paxiaatucsdedu wants to merge 1 commit intolightseekorg:mainfrom
paxiaatucsdedu:pan/gpu-assignment-control
Open

fix(serve): respect user-set CUDA_VISIBLE_DEVICES in gpu_env#751
paxiaatucsdedu wants to merge 1 commit intolightseekorg:mainfrom
paxiaatucsdedu:pan/gpu-assignment-control

Conversation

@paxiaatucsdedu
Copy link

@paxiaatucsdedu paxiaatucsdedu commented Mar 13, 2026

Description

Problem

When running smg serve inside a Docker container with CUDA_VISIBLE_DEVICES set
(e.g. docker run -e CUDA_VISIBLE_DEVICES=4 ...), the gpu_env method in
WorkerLauncher unconditionally overrides the variable with sequential IDs starting
from 0. This causes workers to land on the wrong GPU, leading to OOM errors when
another process is already using GPU 0.

Solution

Modify gpu_env to check whether CUDA_VISIBLE_DEVICES is already set in the
environment. If it is, treat it as the available GPU pool and slice into it by
dp_rank and tp_size. If it is not set (or empty), fall back to the original
sequential assignment (0,1,...). A bounds check raises a clear ValueError when
the pool has fewer GPUs than the requested dp_rank * tp_size range.

Changes

  • bindings/python/src/smg/serve.py: Rewrite WorkerLauncher.gpu_env to index
    into an existing CUDA_VISIBLE_DEVICES pool instead of overriding it. Add bounds
    check with descriptive error message.
  • bindings/python/tests/test_serve.py: Add unit tests for pool indexing
    (parametrized across multiple GPU layouts), cross-backend verification (vllm),
    bounds check (ValueError), and empty-string fallback.

Test Plan

Manual verification (Docker):

# Before fix: worker ignores CUDA_VISIBLE_DEVICES=4 and launches on GPU 0
docker run -itd --gpus all -e CUDA_VISIBLE_DEVICES=4 \
  <smg-image> smg serve --backend sglang --model-path /models/test
# Logs show: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 150.00 MiB. GPU 0 has a total capacity of 79.44 GiB of which 19.00 MiB is free.
 
# After fix: worker correctly uses GPU 4
docker run -itd --gpus all -e CUDA_VISIBLE_DEVICES=4 \
  -v ./serve.py:/usr/local/lib/python3.12/dist-packages/smg/serve.py:ro \
  <smg-image> smg serve --backend sglang --model-path /models/test
# Logs show: CUDA_VISIBLE_DEVICES=4

Added unit test at bindings/python/tests/test_serve.py

Checklist
  • cargo +nightly fmt passes
  • cargo clippy --all-targets --all-features -- -D warnings passes
  • (Optional) Documentation updated
  • (Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

  • Improvements

    • Improved GPU assignment for multi-worker setups: respects externally provided GPU visibility, partitions available GPUs per worker, and validates tensor-parallel sizing to prevent misconfiguration.
    • Ensures unbuffered Python output for worker processes.
  • Tests

    • Added unit tests covering GPU selection, partitioning, fallback behavior, and invalid tensor-parallel configurations.

@github-actions github-actions bot added python-bindings Python bindings changes tests Test changes labels Mar 13, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where the smg serve command would ignore user-defined CUDA_VISIBLE_DEVICES settings, causing GPU allocation problems, especially in containerized environments. The changes ensure that the system now correctly interprets and utilizes the CUDA_VISIBLE_DEVICES environment variable as a pool for GPU assignment, improving resource management and preventing out-of-memory errors. This enhancement provides greater flexibility and control for users deploying smg serve in diverse GPU setups.

Highlights

  • GPU Assignment Logic: The gpu_env method in WorkerLauncher has been updated to respect an existing CUDA_VISIBLE_DEVICES environment variable. Previously, it would unconditionally override this variable, leading to incorrect GPU assignments when running smg serve in environments where CUDA_VISIBLE_DEVICES was already set.
  • Dynamic GPU Pool Indexing: The new logic treats the user-defined CUDA_VISIBLE_DEVICES as an available GPU pool and indexes into it based on dp_rank and tp_size. This allows for external control over GPU visibility while still supporting multi-worker partitioning.
  • Robustness and Error Handling: A bounds check has been added to prevent out-of-range GPU assignments, raising a ValueError with a descriptive message if the requested GPU range exceeds the available GPUs in CUDA_VISIBLE_DEVICES. If CUDA_VISIBLE_DEVICES is not set or is empty, the system gracefully falls back to sequential GPU assignment.
  • Comprehensive Unit Tests: New unit tests have been introduced to cover various scenarios, including indexing into different GPU layouts, cross-backend verification (vllm), bounds checking for insufficient GPUs, and fallback behavior for empty CUDA_VISIBLE_DEVICES.
Changelog
  • bindings/python/src/smg/serve.py
    • Modified the gpu_env method to check for and utilize an existing CUDA_VISIBLE_DEVICES environment variable as the source for available GPUs.
    • Implemented logic to slice the CUDA_VISIBLE_DEVICES pool based on dp_rank and tp_size for multi-worker configurations.
    • Added a ValueError check to ensure that the requested GPU range does not exceed the number of available GPUs specified in CUDA_VISIBLE_DEVICES.
    • Ensured fallback to sequential GPU assignment if CUDA_VISIBLE_DEVICES is not set or is empty.
  • bindings/python/tests/test_serve.py
    • Added test_gpu_env_indexes_into_visible_pool to verify correct indexing into the CUDA_VISIBLE_DEVICES pool across various configurations.
    • Included test_gpu_env_visible_pool_vllm for cross-backend validation of the new GPU assignment logic.
    • Introduced test_gpu_env_raises_on_insufficient_visible_gpus to confirm proper error handling when insufficient GPUs are available.
    • Added test_gpu_env_empty_visible_falls_back to test the default sequential assignment behavior when CUDA_VISIBLE_DEVICES is not specified.
Activity
  • The author has completed the cargo +nightly fmt and cargo clippy checks.
  • The author has provided manual verification steps using Docker to demonstrate the fix.
  • New unit tests have been added to bindings/python/tests/test_serve.py to validate the changes.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Mar 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 740537b3-bad3-4775-bebc-42f7a8cc8d94

📥 Commits

Reviewing files that changed from the base of the PR and between 8dd3105 and ced6ba0.

📒 Files selected for processing (2)
  • bindings/python/src/smg/serve.py
  • bindings/python/tests/test_serve.py

📝 Walkthrough

Walkthrough

gpu_env in bindings/python/src/smg/serve.py now validates tensor-parallel size and partitions GPUs for multi-worker setups by slicing CUDA_VISIBLE_DEVICES (if set) or generating a sequential range. PYTHONUNBUFFERED is set to "1". New unit tests exercise normal and error cases.

Changes

Cohort / File(s) Summary
GPU assignment logic enhancement
bindings/python/src/smg/serve.py
Added tp_size validation, partitioning of CUDA_VISIBLE_DEVICES by dp_rank and tp_size (with bounds checks), fallback to sequential GPU indices when absent, always set PYTHONUNBUFFERED="1", and expanded docstring.
GPU environment test coverage
bindings/python/tests/test_serve.py
Added unit tests covering slicing into visible GPU pool, VLLM launcher behavior, insufficient-visible-GPUs error, invalid tp_size error, and empty-visible fallback.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

run-ci

Suggested reviewers

  • key4ng
  • CatherineSue
  • slin1237

Poem

🐰 I hop through pools of GPUs bright,
I slice each slice just out of sight,
If numbers fail, I softly cry,
But fall back quick — no need to sigh,
Hooray, workers start their flight! 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately summarizes the main change: respecting user-set CUDA_VISIBLE_DEVICES in the gpu_env function, which is the core fix addressing the GPU assignment problem.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly modifies the gpu_env method to respect an existing CUDA_VISIBLE_DEVICES environment variable. The implementation is robust, handling various formats of the variable and including a clear bounds check to prevent errors when not enough GPUs are available. The fallback to the original sequential assignment when the variable is not set is preserved. The changes are accompanied by a comprehensive set of unit tests that cover the new logic, including edge cases and different backend launchers. The code quality is high, and the changes effectively address the described problem.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bindings/python/tests/test_serve.py`:
- Around line 463-505: Add a unit test that asserts the tp_size <= 0 guard in
gpu_env raises: create a pytest that constructs the appropriate launcher (e.g.,
SglangWorkerLauncher), sets args = argparse.Namespace(tensor_parallel_size=0)
(and/or a negative value), and calls launcher.gpu_env(args, dp_rank=0,
env={"CUDA_VISIBLE_DEVICES": "0,1"}) inside pytest.raises(ValueError,
match="tp_size|tensor_parallel_size|<= 0") to ensure the check at line 68 in
bindings/python/src/smg/serve.py is exercised; name the test something like
test_gpu_env_raises_on_non_positive_tp_size and add it alongside the existing
gpu_env tests.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a593117e-1bd3-4baf-85db-73334ed4b5a6

📥 Commits

Reviewing files that changed from the base of the PR and between 50aa6dc and 8dd3105.

📒 Files selected for processing (2)
  • bindings/python/src/smg/serve.py
  • bindings/python/tests/test_serve.py

When CUDA_VISIBLE_DEVICES is already set (e.g. via Docker -e), treat it
as the available GPU pool instead of overriding it with sequential IDs.
Adds bounds check, tp_size validation, and unit tests.

Signed-off-by: paxiaatucsdedu <paxia@ucsd.edu>
@paxiaatucsdedu paxiaatucsdedu force-pushed the pan/gpu-assignment-control branch from 8dd3105 to ced6ba0 Compare March 13, 2026 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python-bindings Python bindings changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant