fix(serve): respect user-set CUDA_VISIBLE_DEVICES in gpu_env by paxiaatucsdedu · Pull Request #751 · lightseekorg/smg

paxiaatucsdedu · 2026-03-13T04:01:47Z

Description

Problem

When running smg serve inside a Docker container with CUDA_VISIBLE_DEVICES set
(e.g. docker run -e CUDA_VISIBLE_DEVICES=4 ...), the gpu_env method in
WorkerLauncher unconditionally overrides the variable with sequential IDs starting
from 0. This causes workers to land on the wrong GPU, leading to OOM errors when
another process is already using GPU 0.

Solution

Modify gpu_env to check whether CUDA_VISIBLE_DEVICES is already set in the
environment. If it is, treat it as the available GPU pool and slice into it by
dp_rank and tp_size. If it is not set (or empty), fall back to the original
sequential assignment (0,1,...). A bounds check raises a clear ValueError when
the pool has fewer GPUs than the requested dp_rank * tp_size range.

Changes

bindings/python/src/smg/serve.py: Rewrite WorkerLauncher.gpu_env to index
into an existing CUDA_VISIBLE_DEVICES pool instead of overriding it. Add bounds
check with descriptive error message.
bindings/python/tests/test_serve.py: Add unit tests for pool indexing
(parametrized across multiple GPU layouts), cross-backend verification (vllm),
bounds check (ValueError), and empty-string fallback.

Test Plan

Manual verification (Docker):

# Before fix: worker ignores CUDA_VISIBLE_DEVICES=4 and launches on GPU 0
docker run -itd --gpus all -e CUDA_VISIBLE_DEVICES=4 \
  <smg-image> smg serve --backend sglang --model-path /models/test
# Logs show: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 150.00 MiB. GPU 0 has a total capacity of 79.44 GiB of which 19.00 MiB is free.
 
# After fix: worker correctly uses GPU 4
docker run -itd --gpus all -e CUDA_VISIBLE_DEVICES=4 \
  -v ./serve.py:/usr/local/lib/python3.12/dist-packages/smg/serve.py:ro \
  <smg-image> smg serve --backend sglang --model-path /models/test
# Logs show: CUDA_VISIBLE_DEVICES=4

Added unit test at bindings/python/tests/test_serve.py

Checklist

cargo +nightly fmt passes
cargo clippy --all-targets --all-features -- -D warnings passes
(Optional) Documentation updated
(Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

Improvements
- Improved GPU assignment for multi-worker setups: respects externally provided GPU visibility, partitions available GPUs per worker, and validates tensor-parallel sizing to prevent misconfiguration.
- Ensures unbuffered Python output for worker processes.
Tests
- Added unit tests covering GPU selection, partitioning, fallback behavior, and invalid tensor-parallel configurations.

gemini-code-assist · 2026-03-13T04:02:02Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where the smg serve command would ignore user-defined CUDA_VISIBLE_DEVICES settings, causing GPU allocation problems, especially in containerized environments. The changes ensure that the system now correctly interprets and utilizes the CUDA_VISIBLE_DEVICES environment variable as a pool for GPU assignment, improving resource management and preventing out-of-memory errors. This enhancement provides greater flexibility and control for users deploying smg serve in diverse GPU setups.

Highlights

GPU Assignment Logic: The gpu_env method in WorkerLauncher has been updated to respect an existing CUDA_VISIBLE_DEVICES environment variable. Previously, it would unconditionally override this variable, leading to incorrect GPU assignments when running smg serve in environments where CUDA_VISIBLE_DEVICES was already set.
Dynamic GPU Pool Indexing: The new logic treats the user-defined CUDA_VISIBLE_DEVICES as an available GPU pool and indexes into it based on dp_rank and tp_size. This allows for external control over GPU visibility while still supporting multi-worker partitioning.
Robustness and Error Handling: A bounds check has been added to prevent out-of-range GPU assignments, raising a ValueError with a descriptive message if the requested GPU range exceeds the available GPUs in CUDA_VISIBLE_DEVICES. If CUDA_VISIBLE_DEVICES is not set or is empty, the system gracefully falls back to sequential GPU assignment.
Comprehensive Unit Tests: New unit tests have been introduced to cover various scenarios, including indexing into different GPU layouts, cross-backend verification (vllm), bounds checking for insufficient GPUs, and fallback behavior for empty CUDA_VISIBLE_DEVICES.

Changelog

bindings/python/src/smg/serve.py
- Modified the gpu_env method to check for and utilize an existing CUDA_VISIBLE_DEVICES environment variable as the source for available GPUs.
- Implemented logic to slice the CUDA_VISIBLE_DEVICES pool based on dp_rank and tp_size for multi-worker configurations.
- Added a ValueError check to ensure that the requested GPU range does not exceed the number of available GPUs specified in CUDA_VISIBLE_DEVICES.
- Ensured fallback to sequential GPU assignment if CUDA_VISIBLE_DEVICES is not set or is empty.
bindings/python/tests/test_serve.py
- Added test_gpu_env_indexes_into_visible_pool to verify correct indexing into the CUDA_VISIBLE_DEVICES pool across various configurations.
- Included test_gpu_env_visible_pool_vllm for cross-backend validation of the new GPU assignment logic.
- Introduced test_gpu_env_raises_on_insufficient_visible_gpus to confirm proper error handling when insufficient GPUs are available.
- Added test_gpu_env_empty_visible_falls_back to test the default sequential assignment behavior when CUDA_VISIBLE_DEVICES is not specified.

Activity

The author has completed the cargo +nightly fmt and cargo clippy checks.
The author has provided manual verification steps using Docker to demonstrate the fix.
New unit tests have been added to bindings/python/tests/test_serve.py to validate the changes.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-03-13T04:02:04Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 740537b3-bad3-4775-bebc-42f7a8cc8d94

📥 Commits

Reviewing files that changed from the base of the PR and between 8dd3105 and ced6ba0.

📒 Files selected for processing (2)

bindings/python/src/smg/serve.py
bindings/python/tests/test_serve.py

📝 Walkthrough

Walkthrough

gpu_env in bindings/python/src/smg/serve.py now validates tensor-parallel size and partitions GPUs for multi-worker setups by slicing CUDA_VISIBLE_DEVICES (if set) or generating a sequential range. PYTHONUNBUFFERED is set to "1". New unit tests exercise normal and error cases.

Changes

Cohort / File(s)	Summary
GPU assignment logic enhancement `bindings/python/src/smg/serve.py`	Added tp_size validation, partitioning of `CUDA_VISIBLE_DEVICES` by `dp_rank` and `tp_size` (with bounds checks), fallback to sequential GPU indices when absent, always set `PYTHONUNBUFFERED="1"`, and expanded docstring.
GPU environment test coverage `bindings/python/tests/test_serve.py`	Added unit tests covering slicing into visible GPU pool, VLLM launcher behavior, insufficient-visible-GPUs error, invalid tp_size error, and empty-visible fallback.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

fix(serve): use correct argparse attr for sglang tp_size #665 — Changes to which argparse attribute supplies tp_size, which directly affects the gpu_env slicing/validation behavior.

Suggested labels

run-ci

Suggested reviewers

key4ng
CatherineSue
slin1237

Poem

🐰 I hop through pools of GPUs bright,
I slice each slice just out of sight,
If numbers fail, I softly cry,
But fall back quick — no need to sigh,
Hooray, workers start their flight! 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and accurately summarizes the main change: respecting user-set CUDA_VISIBLE_DEVICES in the gpu_env function, which is the core fix addressing the GPU assignment problem.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request correctly modifies the gpu_env method to respect an existing CUDA_VISIBLE_DEVICES environment variable. The implementation is robust, handling various formats of the variable and including a clear bounds check to prevent errors when not enough GPUs are available. The fallback to the original sequential assignment when the variable is not set is preserved. The changes are accompanied by a comprehensive set of unit tests that cover the new logic, including edge cases and different backend launchers. The code quality is high, and the changes effectively address the described problem.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bindings/python/tests/test_serve.py`:
- Around line 463-505: Add a unit test that asserts the tp_size <= 0 guard in
gpu_env raises: create a pytest that constructs the appropriate launcher (e.g.,
SglangWorkerLauncher), sets args = argparse.Namespace(tensor_parallel_size=0)
(and/or a negative value), and calls launcher.gpu_env(args, dp_rank=0,
env={"CUDA_VISIBLE_DEVICES": "0,1"}) inside pytest.raises(ValueError,
match="tp_size|tensor_parallel_size|<= 0") to ensure the check at line 68 in
bindings/python/src/smg/serve.py is exercised; name the test something like
test_gpu_env_raises_on_non_positive_tp_size and add it alongside the existing
gpu_env tests.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a593117e-1bd3-4baf-85db-73334ed4b5a6

📥 Commits

Reviewing files that changed from the base of the PR and between 50aa6dc and 8dd3105.

📒 Files selected for processing (2)

bindings/python/src/smg/serve.py
bindings/python/tests/test_serve.py

bindings/python/tests/test_serve.py

When CUDA_VISIBLE_DEVICES is already set (e.g. via Docker -e), treat it as the available GPU pool instead of overriding it with sequential IDs. Adds bounds check, tp_size validation, and unit tests. Signed-off-by: paxiaatucsdedu <paxia@ucsd.edu>

paxiaatucsdedu requested review from CatherineSue, gongwei-130, key4ng and slin1237 as code owners March 13, 2026 04:01

github-actions bot added python-bindings Python bindings changes tests Test changes labels Mar 13, 2026

gemini-code-assist bot reviewed Mar 13, 2026

View reviewed changes

coderabbitai bot requested changes Mar 13, 2026

View reviewed changes

bindings/python/tests/test_serve.py Show resolved Hide resolved

paxiaatucsdedu force-pushed the pan/gpu-assignment-control branch from 8dd3105 to ced6ba0 Compare March 13, 2026 04:22

coderabbitai bot approved these changes Mar 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(serve): respect user-set CUDA_VISIBLE_DEVICES in gpu_env#751

fix(serve): respect user-set CUDA_VISIBLE_DEVICES in gpu_env#751
paxiaatucsdedu wants to merge 1 commit intolightseekorg:mainfrom
paxiaatucsdedu:pan/gpu-assignment-control

paxiaatucsdedu commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Mar 13, 2026

Uh oh!

coderabbitai bot commented Mar 13, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

paxiaatucsdedu commented Mar 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Changes

Test Plan

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Mar 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

paxiaatucsdedu commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 13, 2026 •

edited

Loading