Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add core programmatic API align with CLI #978

Merged
merged 40 commits into from
Aug 8, 2022
Merged

Add core programmatic API align with CLI #978

merged 40 commits into from
Aug 8, 2022

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Jul 15, 2022

Closes #871 and #980. Also, makes #972 easier.

Our current implementation has many functionalities in the cli.py, which makes python API users have to use subprocess to do the cluster, jobs, and storage management, and cannot handle easily handle the return values (needs to be done with string process).

This PR currently just has the skeleton of SDK and APIs. The implementation will be filled later. Please take a look at the API hierarchy and let me know if you find it not appropriate. : )

I am not sure whether the stop, start, down, autostop should be placed under sky or sky.sdk. If placed under sky.sdk, it is a bit weird to only have launch and exec under sky.

I updated the hierarchy of the APIs based on the suggestion of @romilbhardwaj.

Other changes

  1. Added a smoke test for the sky logs.

Hierarchy of the APIs:

sky

  • sky.launch
  • sky.exec
  • sky.stop
  • sky.start
  • sky.down
  • sky.autostop
  • sky.status
  • sky.queue
  • sky.cancel
  • sky.spot_status
  • sky.spot_cancel
  • sky.get_storage
  • sky.delete_storage

TODOs

Tested:

CLIs

  • sky launch -c min ./example/minimal.yaml
  • sky stop min
  • sky start min
  • sky autostop min
  • sky autostop --cancel min
  • sky queue
  • sky spot launch -n test-queue 'sleep 10000; echo hi'
  • sky spot status
  • sky spot cancel -n test-queue
  • sky logs test-multi-echo-zhwu-164b 1 2 3 4 -s
  • sky logs test-multi-echo-zhwu-164b 3 -s
  • sky logs test-multi-echo-zhwu-164b 3
  • sky logs test-multi-echo-zhwu-164b 1 2 3 4 fails
  • tests/run_smoke_test.sh (Failed TPU VM as TPU VM: guard against spurious and transient failures #962)

@romilbhardwaj
Copy link
Collaborator

Thanks for sharing the draft of the PR for review!

I am not sure whether the stop, start, down, autostop should be placed under sky or sky.sdk. If placed under sky.sdk, it is a bit weird to only have launch and exec under sky.

Another way to achieve our goal is to have each functionality implemented in its module instead of sdk.py (e.g., storage functions go in sky.storage, spot functions go in sky.spot etc.). Functions which cannot be put in a specific module go in sdk.py. All methods get imported in sky's __init__.py, so they are directly accessible with sky.xyz.

So our __init__.py would look like:

from sky.execution import launch
from sky.execution import exec
from sky.execution import stop
from sky.execution import start
from sky.execution import down
from sky.sdk import autostop # Should it be in sky.execution? idk
from sky.sdk import status
from sky.sdk import queue
from sky.sdk import cancel
from sky.spot import spot_status
from sky.spot import spot_cancel
from sky.storage import get_storage
from sky.storage import delete_storage

wdyt?

@concretevitamin
Copy link
Member

(have not read the PR; just a note)

This is timely! @pounde reported that there's no simple API to call for the effects of sky down. The current alternative is, I think, too verbose:

from sky import backends
from sky import global_user_state

handle = global_user_state.get_handle_from_cluster_name(CLUSTER)
backends.CloudVmRayBackend().teardown(handle, terminate=True)

@Michaelvll
Copy link
Collaborator Author

Michaelvll commented Jul 16, 2022

Thanks for sharing the draft of the PR for review!

I am not sure whether the stop, start, down, autostop should be placed under sky or sky.sdk. If placed under sky.sdk, it is a bit weird to only have launch and exec under sky.

Another way to achieve our goal is to have each functionality implemented in its module instead of sdk.py (e.g., storage functions go in sky.storage, spot functions go in sky.spot etc.). Functions which cannot be put in a specific module go in sdk.py. All methods get imported in sky's __init__.py, so they are directly accessible with sky.xyz.

So our __init__.py would look like:

from sky.execution import launch
from sky.execution import exec
from sky.execution import stop
from sky.execution import start
from sky.execution import down
from sky.sdk import autostop # Should it be in sky.execution? idk
from sky.sdk import status
from sky.sdk import queue
from sky.sdk import cancel
from sky.spot import spot_status
from sky.spot import spot_cancel
from sky.storage import get_storage
from sky.storage import delete_storage

wdyt?

Good idea! The only drawback of this would be there might be too many functions under sky, not sure if that will cause any confusion (or a bit overwhelming for the users to look at the APIs, please feel free to push back). Do you think it would be better to make spot_launch as sky.spot.launch and delete_storage as sky.storage.delete?
After thinking more about that, I feel like my proposal is a bad idea, as that will cause many unnecessary API exposures from sky.spot and sky.storage.

@Michaelvll Michaelvll changed the title Add SDK for managing cluster, jobs and storage Add core programmetic API align with CLI Jul 18, 2022
@Michaelvll Michaelvll marked this pull request as ready for review July 18, 2022 06:15
@Michaelvll Michaelvll linked an issue Jul 18, 2022 that may be closed by this pull request
@Michaelvll Michaelvll changed the title Add core programmetic API align with CLI Add core programmatic API align with CLI Jul 19, 2022
@Michaelvll
Copy link
Collaborator Author

Michaelvll commented Aug 5, 2022

Merged the master branch.
Tested:

  • ./tests/run_smoke_tests.sh
  • pytest ./tests/test_onprem.py

For the onprem and benchmark, I have not moved them into the core as they may not be immediately needed yet, and may experience some API change in the short future. We can move them in a future PR. Wdyt @romilbhardwaj @concretevitamin ?

Future TODO:

  • Move the Onprem and Benchmark CLI into sdk.

@concretevitamin
Copy link
Member

For the onprem and benchmark, I have not moved them into the core as they may not be immediately needed yet, and may experience some API change in the short future. We can move them in a future PR. Wdyt @romilbhardwaj @concretevitamin ?

This sounds good to me.

@romilbhardwaj romilbhardwaj self-requested a review August 5, 2022 04:21
Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! We will have a much cleaner python API after this PR. Left some comments

sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/utils/command_runner.py Show resolved Hide resolved
sky/backends/backend.py Outdated Show resolved Hide resolved
sky/cli.py Outdated Show resolved Hide resolved
sky/cli.py Outdated Show resolved Hide resolved
sky/cli.py Outdated Show resolved Hide resolved
if sync_down:
click.secho('Syncing down logs to local...', fg='yellow')
backend.sync_down_logs(handle, job_id)
core.download_logs(cluster, job_ids)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment: It would be nice to have a flag to specify the log download destination. They get put into ~/sky_logs/, which is okay for now, but something we should keep an eye out for (or mark as a good first issue ;) )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a great point! I just added the option for the core.download_logs, so that the user can specify the local_dir. For the CLI, it may be better to keep it simpler without the option?

sky/backends/cloud_vm_ray_backend.py Show resolved Hide resolved
sky/core.py Outdated Show resolved Hide resolved
sky/data/storage_utils.py Show resolved Hide resolved
@Michaelvll Michaelvll mentioned this pull request Aug 6, 2022
2 tasks
@Michaelvll Michaelvll added this to the Programetic API milestone Aug 6, 2022
Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @Michaelvll! Should be good to merge after smoke tests. We should also add examples for the new APIs as a new PR

@Michaelvll
Copy link
Collaborator Author

Thank you for the review @romilbhardwaj! Just passed the smoke tests. Merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[API] sky.launch() should not expose is_spot_controller_task Move the functionality of cli to sdk
3 participants