-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tests] Fix optimizer test by leaving out unsupported clouds #2976
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Michaelvll! LGTM. Left one minor comment and a question.
tests/test_optimizer_random_dag.py
Outdated
resources.cloud.check_features_are_supported( | ||
resources, requested_features) | ||
except exceptions.NotSupportedError: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question - instead of continuing, can we set op.num_nodes = 1
and then pass
? That way we can still include clouds which don't support multi-node in this test. (I may be missing something here)
except exceptions.NotSupportedError:
op.num_nodes = 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We randomly set the op.num_nodes
at the beginning so it should have some task that has single node, but this is a good point. I now enforce some of the tasks to be single node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we could end up having no candidate resources if it chooses a cloud that does not support multiple-node too many times? Maybe change to
while True:
candidate = random.choice(ALL_INSTANCE_TYPE_INFOS)
instead of using a fixed-sized array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we are already choosing from the whole list of ALL_INSTANCE_TYPE_INFOS
by setting k=len(ALL_INSTANCE_TYPE_INFOS)
. It should be fine with the current way? Using while True
may cause unexpected infinite loop if all instance type fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!! Left several comments 🫡
tests/test_optimizer_random_dag.py
Outdated
resources.cloud.check_features_are_supported( | ||
resources, requested_features) | ||
except exceptions.NotSupportedError: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we could end up having no candidate resources if it chooses a cloud that does not support multiple-node too many times? Maybe change to
while True:
candidate = random.choice(ALL_INSTANCE_TYPE_INFOS)
instead of using a fixed-sized array?
Co-authored-by: Tian Xia <cblmemo@gmail.com>
Co-authored-by: Tian Xia <cblmemo@gmail.com>
… into test-optimizer-fix
Separating this fix from #2829, as it may affect other clouds as well.
Our brute force includes the clouds that do not support multiple nodes, while the optimizer does not, which will cause issues.
This is blocking #2951
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh