Skip to content

Conversation

@kaiyuan-li
Copy link
Contributor

@kaiyuan-li kaiyuan-li commented Sep 19, 2025

  1. rebuilt monarch wheel with python 3.10 and replaced current wheel which is broken
  2. replace conda installation with just python for simplicity.
  3. monarch wheel is python version specific so 3.10 wheel cannot work with 3.11 or 3.12 python. So 3.11 and 3.12 tests are removed from CI (for now).
  4. exclude test_sharding.py which takes too long to run in CI
  5. exclude some memory heavy tests since they will introduce OOM in CI
  6. some other test consistency fix

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 19, 2025
@kaiyuan-li kaiyuan-li changed the title fix tests in CI [WIP] fix tests in CI Sep 19, 2025
@kaiyuan-li kaiyuan-li requested a review from LucasLLC September 23, 2025 19:02
rdma_options = [True, False]
rdma_options = (
[True, False]
if os.environ.get("TORCHSTORE_RDMA_ENABLED", "0") == "1"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TORCHSTORE_RDMA_ENABLED env var is overloaded. We uses it in the program to switch between the network protocols, so we cannot use it again to branch parametrized tests. So for tests, if this env var is set to 1, we test both, if this env var is set to 0, we only test non-RDMA branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we automatically switch to non-rdma even if this is enabled?

Copy link
Contributor

@LucasLLC LucasLLC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incredible! Made my day!

@kaiyuan-li kaiyuan-li merged commit b0a8c19 into main Sep 23, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants