Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spot] Fix memory contraint for spot controller and backward compatibility for spot commands #3208

Merged
merged 20 commits into from
Feb 22, 2024

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Feb 21, 2024

Fix spot controller memory contraint.
This is a bug introduced by #3191.

This also fixes the issue with the job_lib assertion for spot jobs. To reproduce:

  1. git checkout 053d0ba2b58705c3bc5540729eb5084911e7fa9a; sky spot launch -n test-long-run --cloud gcp --cpus 2 -d "echo hi; sle ep 100000000000000"
  2. sky checkout master
  3. sky spot launch -n test-1 echo hi
  4. sky spot logs 1 fails due to the assertion

After this PR, it should fix this backward compat issue for the following two cases:

  1. if the original commit is before 053d, it should work normally with the current PR
  2. if the original commit is the current master, and the spot controller is updated to the current master as the repro script above, after sky start sky-spot-controller-<hash>, all the commands should work normally.

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
    • sky spot launch -n test echo hi without a controller
    • Reproducible code above.
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

sky/resources.py Show resolved Hide resolved
sky/spot/constants.py Show resolved Hide resolved
sky/resources.py Show resolved Hide resolved
@Michaelvll Michaelvll changed the title [Spot] Fix memory contraint for spot controller [Spot] Fix memory contraint for spot controller and backward compatibility for spot commands Feb 22, 2024
Copy link
Collaborator

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Michaelvll Michaelvll merged commit f4d2296 into master Feb 22, 2024
19 checks passed
@Michaelvll Michaelvll deleted the fix-spot-launch-memory branch February 22, 2024 03:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants