[Tests] Fix optimizer test by leaving out unsupported clouds #2976

Michaelvll · 2024-01-11T18:13:29Z

Separating this fix from #2829, as it may affect other clouds as well.

Our brute force includes the clouds that do not support multiple nodes, while the optimizer does not, which will cause issues.

This is blocking #2951

Tested (run the relevant ones):

Code formatting: bash format.sh
Any manual or new tests for this PR (please specify below)
All smoke tests: pytest tests/test_smoke.py
Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

romilbhardwaj

Thanks @Michaelvll! LGTM. Left one minor comment and a question.

tests/test_optimizer_random_dag.py

romilbhardwaj · 2024-01-12T20:29:26Z

tests/test_optimizer_random_dag.py

+                    resources.cloud.check_features_are_supported(
+                        resources, requested_features)
+                except exceptions.NotSupportedError:
+                    continue


Quick question - instead of continuing, can we set op.num_nodes = 1 and then pass? That way we can still include clouds which don't support multi-node in this test. (I may be missing something here)

except exceptions.NotSupportedError: op.num_nodes = 1

We randomly set the op.num_nodes at the beginning so it should have some task that has single node, but this is a good point. I now enforce some of the tasks to be single node.

Seems like we could end up having no candidate resources if it chooses a cloud that does not support multiple-node too many times? Maybe change to

while True: candidate = random.choice(ALL_INSTANCE_TYPE_INFOS)

instead of using a fixed-sized array?

It seems we are already choosing from the whole list of ALL_INSTANCE_TYPE_INFOS by setting k=len(ALL_INSTANCE_TYPE_INFOS). It should be fine with the current way? Using while True may cause unexpected infinite loop if all instance type fails?

cblmemo

Thanks for the fix!! Left several comments 🫡

tests/test_optimizer_random_dag.py

cblmemo · 2024-01-13T01:13:01Z

tests/test_optimizer_random_dag.py

+                    resources.cloud.check_features_are_supported(
+                        resources, requested_features)
+                except exceptions.NotSupportedError:
+                    continue


Seems like we could end up having no candidate resources if it chooses a cloud that does not support multiple-node too many times? Maybe change to

while True: candidate = random.choice(ALL_INSTANCE_TYPE_INFOS)

instead of using a fixed-sized array?

tests/test_optimizer_random_dag.py

Co-authored-by: Tian Xia <cblmemo@gmail.com>

…ptimizer-fix

… into test-optimizer-fix

Fix optimizer test by leaving out unsupported clouds

db40e78

Michaelvll requested a review from concretevitamin January 11, 2024 18:15

concretevitamin requested a review from romilbhardwaj January 11, 2024 18:16

Michaelvll requested a review from cblmemo January 12, 2024 17:11

romilbhardwaj approved these changes Jan 12, 2024

View reviewed changes

Michaelvll added 2 commits January 12, 2024 23:03

address comment

0acb5c6

fix

e3572de

cblmemo reviewed Jan 13, 2024

View reviewed changes

Michaelvll mentioned this pull request Jan 13, 2024

New provisioner for RunPod #2829

Merged

11 tasks

Michaelvll and others added 2 commits January 12, 2024 20:47

Update tests/test_optimizer_random_dag.py

f097e4b

Co-authored-by: Tian Xia <cblmemo@gmail.com>

Update tests/test_optimizer_random_dag.py

91fb939

Co-authored-by: Tian Xia <cblmemo@gmail.com>

Michaelvll force-pushed the master branch from 71213e5 to 9743aa0 Compare January 13, 2024 05:30

Michaelvll added 5 commits January 13, 2024 05:33

Merge branch 'master' of github.com:skypilot-org/skypilot into test-o…

076c3be

…ptimizer-fix

Merge branch 'test-optimizer-fix' of github.com:skypilot-org/skypilot…

3e2d39f

… into test-optimizer-fix

format

223eaa4

fix

3b3cf81

import Dict

4dc4e25

Michaelvll merged commit 804dc0d into master Jan 14, 2024
19 checks passed

Michaelvll deleted the test-optimizer-fix branch January 14, 2024 06:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tests] Fix optimizer test by leaving out unsupported clouds #2976

[Tests] Fix optimizer test by leaving out unsupported clouds #2976

Michaelvll commented Jan 11, 2024 •

edited

romilbhardwaj left a comment

romilbhardwaj Jan 12, 2024

Michaelvll Jan 12, 2024

cblmemo Jan 13, 2024

Michaelvll Jan 13, 2024 •

edited

cblmemo left a comment

cblmemo Jan 13, 2024

[Tests] Fix optimizer test by leaving out unsupported clouds #2976

[Tests] Fix optimizer test by leaving out unsupported clouds #2976

Conversation

Michaelvll commented Jan 11, 2024 • edited

romilbhardwaj left a comment

Choose a reason for hiding this comment

romilbhardwaj Jan 12, 2024

Choose a reason for hiding this comment

Michaelvll Jan 12, 2024

Choose a reason for hiding this comment

cblmemo Jan 13, 2024

Choose a reason for hiding this comment

Michaelvll Jan 13, 2024 • edited

Choose a reason for hiding this comment

cblmemo left a comment

Choose a reason for hiding this comment

cblmemo Jan 13, 2024

Choose a reason for hiding this comment

Michaelvll commented Jan 11, 2024 •

edited

Michaelvll Jan 13, 2024 •

edited