Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tests] Fix optimizer test by leaving out unsupported clouds #2976

Merged
merged 10 commits into from
Jan 14, 2024

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Jan 11, 2024

Separating this fix from #2829, as it may affect other clouds as well.

Our brute force includes the clouds that do not support multiple nodes, while the optimizer does not, which will cause issues.

This is blocking #2951

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Michaelvll! LGTM. Left one minor comment and a question.

tests/test_optimizer_random_dag.py Outdated Show resolved Hide resolved
resources.cloud.check_features_are_supported(
resources, requested_features)
except exceptions.NotSupportedError:
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question - instead of continuing, can we set op.num_nodes = 1 and then pass? That way we can still include clouds which don't support multi-node in this test. (I may be missing something here)

                except exceptions.NotSupportedError:
                    op.num_nodes = 1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We randomly set the op.num_nodes at the beginning so it should have some task that has single node, but this is a good point. I now enforce some of the tasks to be single node.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we could end up having no candidate resources if it chooses a cloud that does not support multiple-node too many times? Maybe change to

while True:
    candidate = random.choice(ALL_INSTANCE_TYPE_INFOS)

instead of using a fixed-sized array?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we are already choosing from the whole list of ALL_INSTANCE_TYPE_INFOS by setting k=len(ALL_INSTANCE_TYPE_INFOS). It should be fine with the current way? Using while True may cause unexpected infinite loop if all instance type fails?

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!! Left several comments 🫡

tests/test_optimizer_random_dag.py Outdated Show resolved Hide resolved
resources.cloud.check_features_are_supported(
resources, requested_features)
except exceptions.NotSupportedError:
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we could end up having no candidate resources if it chooses a cloud that does not support multiple-node too many times? Maybe change to

while True:
    candidate = random.choice(ALL_INSTANCE_TYPE_INFOS)

instead of using a fixed-sized array?

tests/test_optimizer_random_dag.py Outdated Show resolved Hide resolved
@Michaelvll Michaelvll mentioned this pull request Jan 13, 2024
11 tasks
Michaelvll and others added 2 commits January 12, 2024 20:47
Co-authored-by: Tian Xia <cblmemo@gmail.com>
Co-authored-by: Tian Xia <cblmemo@gmail.com>
@Michaelvll Michaelvll merged commit 804dc0d into master Jan 14, 2024
19 checks passed
@Michaelvll Michaelvll deleted the test-optimizer-fix branch January 14, 2024 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants