Skip to content

[distributed][elastic/rendezvous] TypeError: '<' not supported between instances of 'datetime.datetime' and 'MagicMock' #2120

@zxd1997066

Description

@zxd1997066

🐛 Describe the bug

please get wheels from https://github.com/intel/torch-xpu-ops/actions/runs/18029979174 or use gh download

gh run download 18029979174 --repo intel/torch-xpu-ops --name Torch-XPU-Wheel-1826 --dir path --pattern "*.zip"
git clone -b distributed_2.9 https://github.com/daisyden/pytorch.git
cd pytorch
pip install -r requirements.txt
cd test/distributed/elastic/rendezvous/
python -m unittest dynamic_rendezvous_test.BackendRendezvousStateHolderTest.test_sync_gets_backend_state_if_cached_state_has_expired
python -m unittest dynamic_rendezvous_test.BackendRendezvousStateHolderTest.test_sync_gets_backend_state_if_local_state_is_clean
python -m unittest dynamic_rendezvous_test.BackendRendezvousStateHolderTest.test_sync_gets_backend_state_if_local_state_is_old_and_dirty
python -m unittest dynamic_rendezvous_test.BackendRendezvousStateHolderTest.test_sync_sanitizes_state
python -m unittest dynamic_rendezvous_test.BackendRendezvousStateHolderTest.test_sync_sanitizes_state_if_no_participants_is_left
python -m unittest dynamic_rendezvous_test.BackendRendezvousStateHolderTest.test_sync_uses_cached_state_if_cache_duration_is_specified
python -m unittest dynamic_rendezvous_test.BackendRendezvousStateHolderTest.test_keep_alive_updates_last_heartbeat
python -m unittest dynamic_rendezvous_test.TestRendezvousJoinOp.test_keep_alive_for_redundant_node
python -m unittest dynamic_rendezvous_test.TestRendezvousJoinOp.test_marks_rendezvous_complete_if_node_is_participant_and_last_call_deadline_exceeded
python -m unittest dynamic_rendezvous_test.TestRendezvousJoinOp.test_waits_next_round_if_rendezvous_is_complete_and_node_is_in_wait_list
python -m unittest dynamic_rendezvous_test.TestRendezvousJoinOp.test_waits_next_round_if_rendezvous_is_complete_and_node_is_redundant
python -m unittest dynamic_rendezvous_test.TestRendezvousJoinOp.test_waits_rendezvous_to_complete_if_node_is_participant
python -m unittest dynamic_rendezvous_test.TestRendezvousKeepAliveOp.test_finishes_if_no_keep_alive_update_is_needed
python -m unittest dynamic_rendezvous_test.TestRendezvousKeepAliveOp.test_raises_timeout_if_deadlined_exceeded
python -m unittest dynamic_rendezvous_test.TestRendezvousKeepAliveOp.test_updates_keep_alive_if_needed
ERROR: test_sync_gets_backend_state_if_cached_state_has_expired (dynamic_rendezvous_test.BackendRendezvousStateHolderTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/sdp/xiangdong/pytorch/test/distributed/elastic/rendezvous/dynamic_rendezvous_test.py", line 427, in test_sync_gets_backend_state_if_cached_state_has_expired
    state_holder.sync()
  File "/home/sdp/miniforge-pypy3/envs/xccl_ww27/lib/python3.10/site-packages/torch/distributed/elastic/rendezvous/dynamic_rendezvous.py", line 467, in sync
    self._sanitize()
  File "/home/sdp/miniforge-pypy3/envs/xccl_ww27/lib/python3.10/site-packages/torch/distributed/elastic/rendezvous/dynamic_rendezvous.py", line 479, in _sanitize
    self._dead_nodes = [
  File "/home/sdp/miniforge-pypy3/envs/xccl_ww27/lib/python3.10/site-packages/torch/distributed/elastic/rendezvous/dynamic_rendezvous.py", line 482, in <listcomp>
    if last_heartbeat < expire_time
TypeError: '<' not supported between instances of 'datetime.datetime' and 'MagicMock'

Versions

pytorch: https://github.com/daisyden/pytorch/tree/distributed_2.9

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingmodule: distributedFor distributed feature issue

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions