Skip to content

[CI] Reorg test file and fix moonlight oom#593

Merged
guapisolo merged 4 commits intoradixark:mainfrom
guapisolo:ci/reorg
Feb 17, 2026
Merged

[CI] Reorg test file and fix moonlight oom#593
guapisolo merged 4 commits intoradixark:mainfrom
guapisolo:ci/reorg

Conversation

@guapisolo
Copy link
Collaborator

@guapisolo guapisolo commented Feb 13, 2026

TODO: Replace moonlight-16B-A3B test by GLM-4.7-flash after new sglang docker bump and #566 merge. It's a 30b model with mla and mtp.

Upd: new dev docker has been bumped. wip #566

@guapisolo guapisolo changed the title [CI] Delete redundant test file and reorg [CI] Reorg test file and fix moonlight oom Feb 13, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @guapisolo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on refining the continuous integration (CI) test infrastructure by eliminating a significant number of redundant test files and updating configurations for existing tests. The changes aim to improve the efficiency and relevance of the CI pipeline, ensuring that tests reflect current model and hardware setups, and reducing unnecessary overhead. This cleanup is part of an ongoing effort to maintain a robust and optimized testing environment.

Highlights

  • Test File Deletion: Numerous redundant or outdated end-to-end test files have been removed from the tests/e2e/image and tests/e2e/precision directories, streamlining the test suite.
  • GPU Configuration Updates: Adjustments were made to GPU allocation parameters (actor-num-gpus-per-node and rollout-num-gpus) in test_qwen2.5_0.5B_gsm8k_async.py for improved resource utilization.
  • Model Parallelism and Memory Optimization: The test_moonlight_16B_A3B.py and test_moonlight_16B_A3B_r3.py files received updates to their tensor model parallel size and SGLang memory fraction settings, likely optimizing performance for specific hardware configurations.
Changelog
  • tests/e2e/image/test_mimo_7B_mtp_only_grad.py
    • Removed.
  • tests/e2e/image/test_moonlight_16B_A3B.py
    • Removed.
  • tests/e2e/image/test_quick_start_glm4_9B.py
    • Removed.
  • tests/e2e/image/test_qwen2.5_0.5B_gsm8k.py
    • Removed.
  • tests/e2e/image/test_qwen2.5_0.5B_gsm8k_async.py
    • Removed.
  • tests/e2e/image/test_qwen2.5_0.5B_gsm8k_async_short.py
    • Removed.
  • tests/e2e/image/test_qwen2.5_0.5B_gsm8k_short.py
    • Removed.
  • tests/e2e/image/test_qwen3_0.6B_fsdp_colocated_2xGPU.py
    • Removed.
  • tests/e2e/image/test_qwen3_0.6B_fsdp_distributed.py
    • Removed.
  • tests/e2e/image/test_qwen3_0.6B_megatron_fsdp_align.py
    • Removed.
  • tests/e2e/image/test_qwen3_0.6B_parallel_check.py
    • Removed.
  • tests/e2e/image/test_qwen3_30B_A3B.py
    • Removed.
  • tests/e2e/image/test_qwen3_4B_ckpt.py
    • Removed.
  • tests/e2e/image/test_qwen3_4B_fsdp_true_on_policy.py
    • Removed.
  • tests/e2e/image/test_qwen3_4B_ppo.py
    • Removed.
  • tests/e2e/image/test_qwen3_vl_4B_fsdp.py
    • Removed.
  • tests/e2e/long/test_qwen2.5_0.5B_gsm8k_async.py
    • Updated actor-num-gpus-per-node from 4 to 2 and rollout-num-gpus from 4 to 6 when FEW_GPU is true.
  • tests/e2e/megatron/test_moonlight_16B_A3B.py
    • Updated tensor-model-parallel-size from 2 to 4.
    • Adjusted sglang-mem-fraction-static to be conditional on TIGHT_HOST_MEMORY.
  • tests/e2e/megatron/test_moonlight_16B_A3B_r3.py
    • Updated tensor-model-parallel-size from 2 to 4.
    • Adjusted sglang-mem-fraction-static to be conditional on TIGHT_HOST_MEMORY.
  • tests/e2e/precision/test_qwen3_0.6B_megatron_fsdp_align.py
    • Removed.
Ignored Files
  • Ignored by pattern: .github/workflows/** (2)
    • .github/workflows/pr-test.yml
    • .github/workflows/pr-test.yml.j2
Activity
  • guapisolo initiated this pull request to delete redundant test files and reorganize the CI setup.
  • The author suggested replacing the moonlight-16B-A3B test with GLM-4.7-flash following a new SGLang docker bump and a merge in the radixark/miles repository.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request primarily reorganizes e2e test files and addresses an out-of-memory (OOM) issue with the moonlight model tests. The changes to fix the OOM, such as increasing tensor parallelism and adjusting sglang's static memory fraction, are appropriate. The GPU reallocation in the qwen async test is also a reasonable configuration adjustment. However, a large number of test files have been removed without corresponding additions in this PR. While this is labeled as a reorganization, the absence of the new files raises concerns about a potential loss of test coverage. It's important to ensure these tests are properly relocated or their functionality is otherwise preserved. I've also left a couple of comments regarding a potentially confusing variable name used for GPU memory configuration.


sglang_args = (
"--rollout-num-gpus-per-engine 2 " "--sglang-mem-fraction-static 0.8 " "--sglang-max-running-requests 512 "
"--rollout-num-gpus-per-engine 2 " f"--sglang-mem-fraction-static {0.7 if TIGHT_HOST_MEMORY else 0.8} " "--sglang-max-running-requests 512 "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable TIGHT_HOST_MEMORY is used to configure --sglang-mem-fraction-static, which controls GPU memory allocation. This is confusing because the variable name implies it's related to host (CPU) memory, while the parameter affects device (GPU) memory. Other tests in this repository use a variable named TIGHT_DEVICE_MEMORY for this purpose. For clarity and consistency, it would be better to use a variable that accurately reflects its purpose, like TIGHT_DEVICE_MEMORY. This would involve changing the variable definition on line 5 as well.


sglang_args = (
"--rollout-num-gpus-per-engine 2 " "--sglang-mem-fraction-static 0.8 " "--sglang-max-running-requests 512 "
"--rollout-num-gpus-per-engine 2 " f"--sglang-mem-fraction-static {0.7 if TIGHT_HOST_MEMORY else 0.8} " "--sglang-max-running-requests 512 "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable TIGHT_HOST_MEMORY is used to configure --sglang-mem-fraction-static, which controls GPU memory allocation. This is confusing because the variable name implies it's related to host (CPU) memory, while the parameter affects device (GPU) memory. Other tests in this repository use a variable named TIGHT_DEVICE_MEMORY for this purpose. For clarity and consistency, it would be better to use a variable that accurately reflects its purpose, like TIGHT_DEVICE_MEMORY. This would involve changing the variable definition on line 5 as well.

Copy link
Collaborator

@yushengsu-thu yushengsu-thu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good, but I have some advice.

Comments on deleting CI under image tags:

For the CI under image tags, I planned to run all tests under those images to validate the dev Docker, even though this may leave duplicate CI in different tags.

If we adopt the PR above:

We may need to update/add tags in the Docker release workflow (it seems this has not been added yet).
I think it’s better to add/update these tags every time we release a new Docker image.

Copy link
Collaborator

@yushengsu-thu yushengsu-thu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good, but I have some advice.

Comments on deleting CI under image tags:

For the CI under image tags, I planned to run all tests under those images to validate the dev Docker, even though this may leave duplicate CI in different tags.

If we adopt the PR above:

We may need to update/add tags in the Docker release workflow (it seems this has not been added yet - #573).
I think it’s better to add/update these tags every time we release a new Docker image.

@guapisolo guapisolo merged commit eb8d974 into radixark:main Feb 17, 2026
58 of 60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants