[Core] Deflake test_placement_group_reschedule_node_dead with ps wide output#60034
Conversation
Signed-off-by: yicheng <yicheng@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request addresses a test flakiness issue in test_placement_group_reschedule_node_dead by using ps auxww to prevent command line truncation. The change is correct and directly solves the problem described.
I've added one suggestion to refactor the kill_node function to use the psutil library. This would make the implementation more robust, secure, and idiomatic Python by avoiding shell=True and making the process search more specific. This is an optional improvement for better code quality.
617f092 to
e13015d
Compare
|
nice fix!! are there other tests that grep ps aux? |
… output (ray-project#60034) Signed-off-by: yicheng <yicheng@anyscale.com> Co-authored-by: yicheng <yicheng@anyscale.com> Signed-off-by: jasonwrwang <jasonwrwang@tencent.com>
Yes! I found a new one: ray/python/ray/tests/test_node_manager.py Line 338 in f182131 but it greps |
… output (ray-project#60034) Signed-off-by: yicheng <yicheng@anyscale.com> Co-authored-by: yicheng <yicheng@anyscale.com>
Description
test_placement_group_reschedule_node_deadusesps aux | grep {node_id}to find and kill a node.Recently, I found it failed to kill the node because grep found nothing. The root cause is that ps aux truncates long command lines, causing the
--node_idparameter to be cut off and grep to fail finding the process(maybe we have longer raylet command now or we changed terminal setting).This PR use
ps auxwwwhich ensures unlimited output width (the ww option means "wide output, use twice for unlimited width")See the CI failure: it checks after killing. It should have 3 − 1 = 2, but it always has 3 after the timeout.
https://buildkite.com/ray-project/microcheck/builds/35178/steps/canvas?jid=019ba263-a01d-4ef2-9d7e-d3bfb149e638#019ba263-a01d-4ef2-9d7e-d3bfb149e638/L191
https://buildkite.com/ray-project/microcheck/builds/35178/steps/canvas?jid=019ba332-55cb-4669-ae55-99572de4ffbf#019ba332-55cb-4669-ae55-99572de4ffbf/L1348