Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Provisioner] Check ports for cloud not specified cases #3139

Merged
merged 6 commits into from
Feb 14, 2024

Conversation

cblmemo
Copy link
Collaborator

@cblmemo cblmemo commented Feb 9, 2024

As mentioned in #3125, previously SkyPilot would not print correct ports not supported message if the resources has no cloud specified, and all enabled cloud has no ports support. This PR fixes the problem.

Tested (run the relevant ones):

(base) root@26a1fb705d3d:/sky_repo/skypilot# SKYPILOT_DEBUG=0 sky serve up ../a.yaml 
Service from YAML spec: ../a.yaml
ValueError: Cannot specify ports without enabling a cloud that supports open ports.
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

@concretevitamin
Copy link
Collaborator

For discussion: How about adding OPEN_PORTS to requested features in execution.py#_execute()? This seems to make failover (from a cloud that doesn't support ports to one that does) work?

sky/execution.py Outdated
@@ -261,6 +261,9 @@ def _execute(
# NOTE: in general we may not have sufficiently specified info
# (cloud/resource) to check STOP_SPOT_INSTANCE here. This is checked in
# the backend.
if any(r.ports is not None for r in task.resources):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember if task.resources being a set means any_of is being used. (We should add a comment to Task.resources.) In that case, we can have any_of { ..requesting ports.., ..not requesting ports.. }, and the logic here will effectively disallow the second resource.

Is the above right? Wdyt is the best way to handle this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you are right... And it seems like we don't need the handle here since we will add resources-specific cloud required features and remove any related cloud for each resources in current master:

resources_required_features = resources.get_required_cloud_features()
if num_nodes > 1:
resources_required_features.add(
CloudImplementationFeatures.MULTI_NODE)
try:
self.check_features_are_supported(resources,
resources_required_features)
except exceptions.NotSupportedError:
# TODO(zhwu): The resources are now silently filtered out. We
# should have some logging telling the user why the resources
# are not considered.
return ([], [])

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually here we should only add task-level features like stop and multi-node

sky/resources.py Outdated
Comment on lines 872 to 884
else:
at_least_one_cloud_supports_ports = False
for cloud in global_user_state.get_enabled_clouds():
try:
cloud.check_features_are_supported(
self, {clouds.CloudImplementationFeatures.OPEN_PORTS})
at_least_one_cloud_supports_ports = True
except exceptions.NotSupportedError:
pass
if not at_least_one_cloud_supports_ports:
with ux_utils.print_exception_no_traceback():
raise ValueError('Cannot specify ports without enabling '
'a cloud that supports open ports.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in execution.py should've caught this case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for better UX. The current master is already disabling the resources with ports if only cloud that does not support ports are enabled; but it will print some message like in #3125:

I 02-08 22:42:32 optimizer.py:1219] No resource satisfying <Cloud>(ports=['8080']) on RunPod.
sky.exceptions.ResourcesUnavailableError: Catalog does not contain any instances satisfying the request:
Task(run='conda activate vllm\...')
  resources: <Cloud>(ports=['8080']).

sky/resources.py Outdated Show resolved Hide resolved
cblmemo and others added 2 commits February 14, 2024 08:39
Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
@cblmemo cblmemo merged commit 30f3eb5 into master Feb 14, 2024
19 checks passed
@cblmemo cblmemo deleted the upd-check-ports-for-no-cloud branch February 14, 2024 00:52
cblmemo added a commit that referenced this pull request Feb 17, 2024
concretevitamin pushed a commit that referenced this pull request Feb 17, 2024
Revert "[Provisioner] Check ports for cloud not specified cases (#3139)"

This reverts commit 30f3eb5.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants