Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS: make public / private subnet selection robust. #2867

Merged
merged 13 commits into from
Dec 16, 2023
Merged

Conversation

concretevitamin
Copy link
Collaborator

@concretevitamin concretevitamin commented Dec 14, 2023

Removes two issues from before

  • Previously, we test if a subnet is private by a hack (check if private is a substr in its name tag)
  • Previously, SkyPilot does not allow launching VMs in a public subnet with subnet.map_public_ip_on_launch set to False.
    • This PR now allows so -- and changes the underlying ec2.create_instances call to assign public IPs

This PR uses a more robust method to test if a subnet is public:

  • Check if its associated route tables have a route to an IGW (igw-*)
  • If it has no explicitly associated route tables, check if any "main" route table has a route to an IGW (igw-*)

Tested:

  • smoke
    • No VPC/private IP settings
    • With aws.vpc_name: skypilot-vpc and corresponding use_internal_ips,ssh_proxy_command
  • back compat
    • No VPC/private IP settings
    • With aws.vpc_name: skypilot-vpc and corresponding use_internal_ips,ssh_proxy_command

Speed

  • for i in $(seq 3); do sky launch --cloud aws -i0 --down --use-spot --region us-east-1 -y >log-${i}; done
    • Grepped for provisioner.log "Provisioning ... took"
    • Before (one region): 11.22, 11.47, 11.44 [median 11.44s]
    • This PR (one region): 11.47, 11.81, 11.58 [median 11.58s]

No VPC/private IP settings

  • Region only has a default main route table (ap-northeast-3)
    • sky launch -c dbg --cloud aws -i0 --down --use-spot -y --region ap-northeast-3
  • Region has both two main route tables (only one of them is in effect + has an IGW route), and several skypilot-vpc route tables (us-east-2)
    • sky launch --cloud aws -i0 --down --use-spot -y --region us-east-2
  • Failover correctly
    • sky launch --gpus A100:8 --cloud aws -i0 --down --use-spot -y

With aws.vpc_name: skypilot-vpc only

  • Can now launch in a public subnet with subnet.map_public_ip_on_launch set to False.
    • sky launch --cloud aws -i0 --down --use-spot -y --region us-east-2

With aws.vpc_name: skypilot-vpc and corresponding use_internal_ips,ssh_proxy_command

  • Can launch: sky launch --cloud aws -i0 --down --use-spot -y --region us-east-2
  • Failover correctly: sky launch --cloud aws -i0 --down --gpus A10G --use-spot -y

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below): see above
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making figuring out this issue @concretevitamin! The PR looks mostly good to me. Left several comments.

sky/provision/aws/config.py Outdated Show resolved Hide resolved
sky/provision/aws/config.py Outdated Show resolved Hide resolved
sky/provision/aws/instance.py Show resolved Hide resolved
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this @concretevitamin! LGTM.

sky/provision/aws/instance.py Outdated Show resolved Hide resolved
sky/provision/aws/instance.py Show resolved Hide resolved
Comment on lines 351 to 353
else:
new_skypilot_config = {}
common_utils.dump_yaml(f.name, new_skypilot_config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! Can we remove the entry in the file mounts instead of creating an empty file to save the overhead for uploading the empty file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@concretevitamin concretevitamin merged commit 7d8dec2 into master Dec 16, 2023
19 checks passed
@concretevitamin concretevitamin deleted the aws-subnet branch December 16, 2023 05:03
remyleone pushed a commit to remyleone/skypilot that referenced this pull request Dec 26, 2023
* AWS: make public / private subnet selection robust.

* Update docs

* Fix: handle regions with main route table(s) only

* assert

* comment

* Cache once

* format

* Add asserts

* fix placeholder replacement logic

* Minor

* Fix bug where we launch a private IP-only VM in a public subnet

* Avoid uploading empty config yaml

* format
@Michaelvll Michaelvll mentioned this pull request Dec 28, 2023
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants