Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCP: Support for custom VPC. #2764

Merged
merged 7 commits into from
Nov 12, 2023
Merged

GCP: Support for custom VPC. #2764

merged 7 commits into from
Nov 12, 2023

Conversation

concretevitamin
Copy link
Collaborator

@concretevitamin concretevitamin commented Nov 9, 2023

Support specifying a specific VPC to use for GCP.

In ~/.sky/config.yaml:

gcp:
  # VPC to use (optional).
  #
  # Default: None, which implies the following behavior. First, the VPC named
  # 'default' is checked against minimal recommended firewall rules for
  # SkyPilot to function. If it satisfies these rules, this VPC is used.
  # Otherwise, a new VPC named 'skypilot-vpc' is automatically created with
  # the minimal recommended firewall rules and will be used.
  #  
  # If this field is set, SkyPilot will use the VPC with this name. Useful for
  # when users want to manually set up a VPC and precisely control its
  # firewall rules. If no region restrictions are given, SkyPilot only
  # provisions in regions for which a subnet of this VPC exists. Errors are
  # thrown if VPC with this name is not found. The VPC does not get modified
  # in any way, except when opening ports (e.g., via `resources.ports`) in
  # which case new firewall rules permitting public traffic to those ports
  # will be added.
  vpc_name: skypilot-vpc

TODO

  • check if statement "VPC does not get modified" above is correct; e.g., does exposing ports modify the VPC
    • incorrect; reworded

Tested (run the relevant ones):

  • Code formatting: bash format.sh

  • Any manual or new tests for this PR (please specify below)

    • With 'default' vpc, that only has 1 subnet (us-central1):
      • sky launch --cloud gcp --cpus 2+ -c dbg2 --region us-west4 correctly fails + skips the entire region.
    • With a 'nonexistent' vpc
      • sky launch --cpus 2+ -c dbg --use-spot correctly skips GCP and fails over to AWS
      • sky launch --cloud gcp --cpus 2+ -c dbg --use-spot correctly errors out with no failover
      • sky tpunode correctly errors out with no failover
    • Existing GCP spot controller under 'default' vpc (has 1 region, us-central1)
      • set config.yaml to use 'skypilot-vpc' vpc (has all regions), launch a new spot job in GCP us-west4, check that new spot cluster is launched under this VPC and region
  • All smoke tests: pytest tests/test_smoke.py: pytest tests/test_smoke.py --gcp

  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name

  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this @concretevitamin! This is very useful for our users. Just left sevral comments, and I will try it out soon.

docs/source/reference/config.rst Show resolved Hide resolved
sky/backends/backend_utils.py Show resolved Hide resolved
sky/skylet/providers/gcp/config.py Show resolved Hide resolved
sky/skylet/providers/gcp/config.py Show resolved Hide resolved
docs/source/reference/config.rst Outdated Show resolved Hide resolved
sky/skylet/providers/gcp/constants.py Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, PTAL.

docs/source/reference/config.rst Show resolved Hide resolved
docs/source/reference/config.rst Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Show resolved Hide resolved
sky/skylet/providers/gcp/config.py Show resolved Hide resolved
sky/skylet/providers/gcp/config.py Show resolved Hide resolved
sky/skylet/providers/gcp/constants.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @concretevitamin! LGTM.

docs/source/reference/config.rst Outdated Show resolved Hide resolved
Comment on lines +1133 to +1140
try:
vpc_name = gcp_config.get_usable_vpc(config)
except RuntimeError as e:
# Launching a TPU and encountering a bootstrap-phase error, no point
# in failover unless:
# TODO(zongheng): handle failover when multi-resource is added.
with ux_utils.print_exception_no_traceback():
raise e
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May need to coordinate with #2772. We can merge this PR first and then update #2772.

Copy link
Collaborator Author

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, PTAL.

docs/source/reference/config.rst Outdated Show resolved Hide resolved
docs/source/reference/config.rst Outdated Show resolved Hide resolved
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the doc and the fixes @concretevitamin! LGTM.

_skypilot_log_error_and_exit_for_failover(
f"No VPC with name {specific_vpc_to_use!r} is found. "
"To fix: specify a correct VPC name."
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert specific_vpc_to_use is None or add a comment for readability : )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added # Should not reach here.

@concretevitamin
Copy link
Collaborator Author

  • pytest tests/test_smoke.py --gcp

@concretevitamin concretevitamin merged commit aecb2de into master Nov 12, 2023
19 checks passed
@concretevitamin concretevitamin deleted the gcp-vpc branch November 12, 2023 06:57
@dongreenberg
Copy link
Contributor

Thank you @concretevitamin !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants