Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s] Add Kubernetes support for ports #2713

Merged
merged 56 commits into from
Jan 12, 2024

Conversation

hemildesai
Copy link
Contributor

@hemildesai hemildesai commented Oct 17, 2023

Background for this can be found here

This PR adds Kubernetes support for ports. It uses Ingress to accomplish this. For each port, it creates a new service and a new ingress rule. The port can then be accessed by visiting the url obtained by running sky status {cluster_name} --ports. In the initial PR, we assume that the ingress controller is already created by the Kubernetes cluster admin, but in the future we will add support for creating one automatically if none exists. We can add documentation on how to manually install the ingress controller for now. Currently, the ingress controller of choice is nginx.

The new cli command looks like

sky status {cluster_name} --ports              
33828: http://{external_ip}/skypilot/{cluster_name_on_cloud}_33828, https://{external_ip}/skypilot/{cluster_name_on_cloud}_33828

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this exciting feature! 🚀 Left several comments on the ports related part 👀

sky/cli.py Outdated Show resolved Hide resolved
# so that our backend do not need to know the specific details
# for different clouds.
if isinstance(cloud, (clouds.AWS, clouds.GCP, clouds.Azure)):
if provision_lib.supports(repr(cloud), 'cleanup_ports'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! 🚀

sky/cli.py Outdated Show resolved Hide resolved
sky/cli.py Outdated
@@ -1750,17 +1761,23 @@ def status(all: bool, refresh: bool, ip: bool, show_spot_jobs: bool,
show_all=False,
limit_num_jobs_to_show=not all,
is_called_by_user=False))
if ip:
if ip or ports:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To discuss: Probably we could have an enum containing ip and ports, and use sth like --target ip/--target ports in the CLI?

sky/clouds/kubernetes.py Outdated Show resolved Hide resolved
sky/provision/kubernetes/__init__.py Outdated Show resolved Hide resolved
sky/provision/kubernetes/network.py Outdated Show resolved Hide resolved
@@ -25,6 +28,18 @@ provider:

ssh_jump_image: {{k8s_ssh_jump_image}}

ingress_name: {{cluster_name_on_cloud}}-skypilot-ingress

{%- if ports is not none %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current ports implementation won't put ports in the provider config. Instead, we use handle.launched_resources.ports to keep track of all opened ports. Reference:

prev_ports = None
if prev_handle is not None:
prev_ports = prev_handle.launched_resources.ports
current_ports = handle.launched_resources.ports
open_new_ports = bool(
resources_utils.port_ranges_to_set(current_ports) -
resources_utils.port_ranges_to_set(prev_ports))
if open_new_ports:
with rich_utils.safe_status(
'[bold cyan]Launching - Opening new ports'):
self._open_ports(handle)

Can we change to the same implementation logic to support sky launch --ports <new-ports> to open new ports on an existing cluster? It seems like we can construct all other fields using port and cluster_name so keeping track of ports should be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 186193c

@romilbhardwaj romilbhardwaj changed the title Add Kubernetes support for ports [k8s] Add Kubernetes support for ports Oct 18, 2023
Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hemildesai! This is super cool. Tested some functionality and seems to work nicely for a simple http server. Jupyter seems to have some problems, see comments below. Will take a closer look at the code soon.

sky/provision/kubernetes/network_utils.py Outdated Show resolved Hide resolved
def delete_namespaced_service(namespace: str, service_name: str) -> None:
"""Deletes a service resource."""
core_api = kubernetes.core_api()
core_api.delete_namespaced_service(service_name, namespace)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into a bug here - my previous provisioning had failed due to a library mismatch and ended up not creating the service in k8s.

When I tried to terminate the skypilot cluster (stuck in a bad INIT state), I got:

kubernetes.client.exceptions.ApiException: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'f463ef5b-3fd6-488d-8c32-1ef4bcce7240', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'c57dbea3-732c-471b-840e-3fd589a06584', 'X-Kubernetes-Pf-Prioritylevel-Uid': '7187fedc-79b8-4e30-933f-705406be228f', 'Date': 'Thu, 19 Oct 2023 17:28:55 GMT', 'Content-Length': '240'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"services \"test-2ea4-skypilot-service--8888\" not found","reason":"NotFound","details":{"name":"test-2ea4-skypilot-service--8888","kind":"services"},"code":404}

We should probably handle any deletion failures due to port not existing by wrapping them into a new sky.exceptions.PortDoesNotExist class here, re-raising it and handling it upstream (perhaps by simply pass) when provision_lib.cleanup_ports is called in cloud_vm_ray_backend. cc @cblmemo - does that make sense to you?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! The current implementation just prints some warning messages in the provision lib API. I think it is a good idea to handle them in a unified place 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 8a70791

sky/cli.py Outdated
@@ -1672,6 +1677,12 @@ def _get_spot_jobs(
'clusters, the returned IP address is the internal IP '
'of the head pod, and may not be accessible from outside '
'the cluster.'))
@click.option('--ports',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm for some reason getting two endpoints listed:

$ sky status --ports test2                          
8888: http://35.222.205.70/skypilot/test2-2ea4_8888, https://35.222.205.70/skypilot/test2-2ea4_8888

Repro:

# task.yaml
resources:
  ports: 8888

run: python -m http.server 8888
sky launch -c test2 task.yaml --cloud kubernetes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, this currently returns HTTP and HTTPS url but will default to HTTP for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 3d411a1

Comment on lines 141 to 153
def query_ports(
provider_name: str,
cluster_name_on_cloud: str,
provider_config: Optional[Dict[str, Any]] = None,
) -> Dict[str, Tuple[str, str]]:
"""Query details about ports on a cluster.

Returns a dict with port as the key and the value as the http and https url to access the port.
"""
raise NotImplementedError
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! We should also probably add something for gcp and aws too, simply returning <ip>:<port> so sky status --endpoints can be used consistently across clouds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 3d411a1

sky/cli.py Outdated
@@ -1672,6 +1677,12 @@ def _get_spot_jobs(
'clusters, the returned IP address is the internal IP '
'of the head pod, and may not be accessible from outside '
'the cluster.'))
@click.option('--ports',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought - we should support this API for easy programmatic use too. For example, many users query their skypilot endpoints with:

$ curl http://$(sky status mycluster --ip):8888/myapi

Since the current output is of the format 8888: http://35.222.205.70/skypilot/test2-2ea4_8888, it would require some parsing on the user's part.

Can we:

  1. Rename --ports to --endpoints
  2. If a port number is specified to --endpoints, then we simply return the URL. E.g.,:
$sky status test2 --endpoint 8888
http://35.222.205.70/skypilot/test2-2ea4_8888
  1. If a port number is not specified, then we list all ports open on the cluster:
$ sky status --endpoints test2                          
8888: http://35.222.205.70/skypilot/test2-2ea4_8888
8889: https://35.222.205.70/skypilot/test2-2ea4_8889

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the latest CLI command be sky status --endpoints test2?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - that should work. Would be nice to change --ports to --endpoint(s).

from sky.adaptors import kubernetes


def create_namespaced_ingress(namespace: str, ingress_name: str,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jupyter does not seems to play nicely with ingress:

#jupyter.yaml
resources:
  ports: 8888

setup: pip install jupyter

run: jupyter lab --port 8888 --no-browser --ip=0.0.0.0

On accessing https://35.222.205.70/skypilot/test-2ea4_8888/ in Chrome, I get 404:

image

Note that it tried to redirect me to https://35.222.205.70/lab, which is incorrect.

On accessing the exact suburl (/lab?token=....), I get an empty white page but loading some js/css is likely failing with 404:
image

Can we configure the ingress to correctly redirect to the sub-path we have under the ingress?

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the prompt fix! Left some other comments ;)

sky/cli.py Outdated Show resolved Hide resolved
sky/clouds/kubernetes.py Outdated Show resolved Hide resolved
sky/provision/aws/instance.py Outdated Show resolved Hide resolved
sky/provision/gcp/instance.py Outdated Show resolved Hide resolved
sky/provision/kubernetes/network.py Outdated Show resolved Hide resolved
assert provider_config is not None, cluster_name_on_cloud
for port in ports:
service_name = f"{cluster_name_on_cloud}-skypilot-service--{port}"
ingress_name = f"{cluster_name_on_cloud}-skypilot-ingress--{port}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, is there any way in k8s that we can register multiple ports in a single service? That way, we could get rid of the ports argument; But if there is no such API I'm happy with the current implementation 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possible. But, thinking about it, this becomes tricky for maintenance and modification. If we want to add ports or remove some, we will have to edit the service in place.

The benefit of multiple ports under a single service is that it shares the IP, but since the services are internal and the ingress routes are port specific, I don't think a shared IP provides much benefit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this wasn't so bad and is the default way for opening ports using the LoadBalancer service type.

sky/provision/kubernetes/network.py Outdated Show resolved Hide resolved
sky/provision/kubernetes/network.py Outdated Show resolved Hide resolved
@hemildesai
Copy link
Contributor Author

To resolve the issue with Jupyter and similar services, we decided to support LoadBalancer service type (which creates an external LB per cluster) so that each cluster can have a dedicated IP. The user can configure whether to use LoadBalancer or Ingress using skypilot config. Related changes in 55f2ece and 430e6a7

Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome @hemildesai! This works very nicely - I had a quick go with both nginx ingress and loadbalancer. There are a few edge cases to handle, mentioned in comments below. We should also update the --ports flag to --endpoint(s).

Will take a closer look at the code soon.

sky/provision/kubernetes/network.py Show resolved Hide resolved
sky/cli.py Outdated
head_ip = handle.external_ips()[0]
if ports:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should perhaps add some error handling when clusters have different port modes. E.g., if user switches their global config.yaml from loadbalancer to ingress or vice-versa.

For instance, I launched one cluster with ports: loadbalancer, then changed my config to ports: ingress. On running sky status --ports jupyter, I get this error. It would be nice to either fix this or cleanly ask the user to not mix port modes.

(base) ➜  ~ sky status --ports jupyter
D 11-02 16:31:37 skypilot_config.py:157] Using config path: /Users/romilb/.sky/config.yaml
D 11-02 16:31:37 skypilot_config.py:160] Config loaded:
D 11-02 16:31:37 skypilot_config.py:160] {'kubernetes': {'ports': 'loadbalancer'}}
D 11-02 16:31:37 skypilot_config.py:166] Config syntax check passed.
D 11-02 16:31:38 skypilot_config.py:93] User config: kubernetes.ports -> loadbalancer
D 11-02 16:31:38 skypilot_config.py:157] Using config path: /Users/romilb/.sky/config.yaml
D 11-02 16:31:38 skypilot_config.py:160] Config loaded:
D 11-02 16:31:38 skypilot_config.py:160] {'kubernetes': {'ports': 'loadbalancer'}}
D 11-02 16:31:38 skypilot_config.py:166] Config syntax check passed.
Traceback (most recent call last):
  File "/Users/romilb/tools/anaconda3/bin/sky", line 8, in <module>
    sys.exit(cli())
  File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/Users/romilb/Romil/Berkeley/Research/sky-experiments/sky/utils/common_utils.py", line 311, in _record
    return f(*args, **kwargs)
  File "/Users/romilb/Romil/Berkeley/Research/sky-experiments/sky/cli.py", line 1163, in invoke
    return super().invoke(ctx)
  File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/Users/romilb/Romil/Berkeley/Research/sky-experiments/sky/utils/common_utils.py", line 332, in _record
    return f(*args, **kwargs)
  File "/Users/romilb/Romil/Berkeley/Research/sky-experiments/sky/cli.py", line 1821, in status
    port_details = provision_lib.query_ports(
  File "/Users/romilb/Romil/Berkeley/Research/sky-experiments/sky/provision/__init__.py", line 41, in _wrapper
    return impl(*args, **kwargs)
  File "/Users/romilb/Romil/Berkeley/Research/sky-experiments/sky/provision/kubernetes/network.py", line 162, in query_ports
    return _query_ports_for_loadbalancer(
  File "/Users/romilb/Romil/Berkeley/Research/sky-experiments/sky/provision/kubernetes/network.py", line 184, in _query_ports_for_loadbalancer
    external_ip = network_utils.get_loadbalancer_ip(
  File "/Users/romilb/Romil/Berkeley/Research/sky-experiments/sky/provision/kubernetes/network_utils.py", line 169, in get_loadbalancer_ip
    service = core_api.read_namespaced_service(service_name, namespace)
  File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py", line 25141, in read_namespaced_service
    return self.read_namespaced_service_with_http_info(name, namespace, **kwargs)  # noqa: E501
  File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py", line 25228, in read_namespaced_service_with_http_info
    return self.api_client.call_api(
  File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
  File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
  File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 373, in request
    return self.rest_client.GET(url,
  File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/kubernetes/client/rest.py", line 244, in GET
    return self.request("GET", url,
  File "/Users/romilb/tools/anaconda3/lib/python3.9/site-packages/kubernetes/client/rest.py", line 238, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'ba277f1b-fb86-41c8-888b-32896a6f1e9c', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'cf43db27-7222-4741-adea-8b7f8f0e681e', 'X-Kubernetes-Pf-Prioritylevel-Uid': '17aff9c6-0fa7-4909-a811-4644194e9001', 'Date': 'Thu, 02 Nov 2023 23:31:38 GMT', 'Content-Length': '244'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"services \"jupyter-2ea4-skypilot-loadbalancer\" not found","reason":"NotFound","details":{"name":"jupyter-2ea4-skypilot-loadbalancer","kind":"services"},"code":404}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can persist the mode for each cluster in the provider config during creation time. This will avoid this issue and allow users to have separate modes for separate clusters. Wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed in 4ae6602

Comment on lines 16 to 21
class KubernetesPortMode(enum.Enum):
"""Enum for the different types of modes supported for opening
ports on Kubernetes.
"""
INGRESS = 'ingress'
LOADBALANCER = 'loadbalancer'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic! I love the flexibility this provides. Would be nice to document this in docs/source/reference/config.rst :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure, will update.

sky/cli.py Outdated
@@ -1672,6 +1677,12 @@ def _get_spot_jobs(
'clusters, the returned IP address is the internal IP '
'of the head pod, and may not be accessible from outside '
'the cluster.'))
@click.option('--ports',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - that should work. Would be nice to change --ports to --endpoint(s).

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the awesome work!! 🚀 Left one comment to discuss 🫡

selector_key='skypilot-cluster',
selector_value=cluster_name_on_cloud,
)
network_utils.create_namespaced_service(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current ports implementation is incremental, i.e. if I do sky launch -c mycluster --ports 8080 and then sky launch -c mycluster --ports 8081, the second sky launch will update the ports to '8080-8081'. In

def _open_ports(self, handle: CloudVmRayResourceHandle) -> None:
cloud = handle.launched_resources.cloud
logger.debug(
f'Opening ports {handle.launched_resources.ports} for {cloud}')
config = common_utils.read_yaml(handle.cluster_yaml)
provider_config = config['provider']
provision_lib.open_ports(repr(cloud), handle.cluster_name_on_cloud,
handle.launched_resources.ports,
provider_config)

We passed the updated list to the open_ports API ('8080-8081' in the example above). Therefore, the open_ports API should update the ports to what ports describe. I'm not very familiar with k8s API, but the API is named create_xxx rather than create_or_update_xxx; so just want to make sure, does this API support update semantic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, I will update the ingress implementation to be create_or_update. LoadBalancer is already create_or_update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 1d95434

Comment on lines 228 to 225
deploy_vars = {
return {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we retain the deploy_vars so the return statement remains clean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 4ae6602

@@ -123,12 +131,28 @@ def open_ports(
def cleanup_ports(
provider_name: str,
cluster_name_on_cloud: str,
ports: List[str],
provider_config: Optional[Dict[str, Any]] = None,
) -> None:
"""Delete any opened ports."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update the docstr and make this an optional arg? Then the semantics can be:

  • If ports are specified, clean up only the specific ports
  • If not specified, clean up all ports

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. However, for loadbalancer based services, a single service contains all ports. In that case, we'll have to get all_ports - cleanup_ports and recreate the service using that set.

Also, we'll have to support this for other clouds as well. Maybe we can leave it out of scope for this PR and do it in a followup PR. Wdyt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good - we should update the docstr and mark it as a TODO to implement the desired behavior

Comment on lines 148 to 152
) -> Dict[str, Tuple[str, str]]:
"""Query details about ports on a cluster.

Returns a dict with port as the key and the value as the http and https url to access the port.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ports may not necessarily be exposing both HTTP and HTTPS urls. Should we change the return type to Dict[str, List[Endpoint]], which is a dict of {port: [endpoint]}?

Endpoint is new class like so:

class Endpoint:
    pass

class SocketEndpoint(Endpoint):
    def __init__(self, host, port):
        self.host = host
        self.port = port

    def __str__(self):
        return f'{self.host}:{self.port}'

class HTTPEndpoint(Endpoint):
    def __init__(self, url):
        self.url = url

    def __str__(self):
        return self.url

class HTTPSEndpoint(Endpoint):
    def __init__(self, url):
        self.url = url

    def __str__(self):
        return self.url

Example return values:

# Example for k8s
{8080: 
  [
   HTTPEndpoint("http://www.example.com/mycluster/8080"),
   HTTPSEndpoint("https://www.example.com/mycluster/8080")
  ]
}

# Example for raw socket
{8080: 
  [
   SocketEndpoint(ip='1.1.1.1', port=8080),
  ]
}

cc @cblmemo

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 4ae6602

sky/provision/kubernetes/network_utils.py Outdated Show resolved Hide resolved
sky/provision/kubernetes/network.py Outdated Show resolved Hide resolved
sky/templates/kubernetes-ray.yml.j2 Show resolved Hide resolved
sky/provision/kubernetes/network_utils.py Outdated Show resolved Hide resolved
sky/provision/kubernetes/network_utils.py Outdated Show resolved Hide resolved
Comment on lines -4144 to -4145
if (cloud.PROVISIONER_VERSION >= clouds.ProvisionerVersion.
RAY_PROVISIONER_SKYPILOT_TERMINATOR or
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this was recently added, not sure if it's necessary along with if provision_lib.supports(repr(cloud), 'cleanup_ports')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @cblmemo - can you take a quick look at this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is added in #2681, https://github.com/skypilot-org/skypilot/pull/2681/files#diff-f14940b9d929546868fb41cb0b27a1db08b321b7bbc0cb3ab3b864589aa82aa8
I still think the provision_lib.supports(cloud, 'cleanup_ports') is needed since clouds.ProvisionerVersion is not related to what cloud supports open ports. The new condition here is essentially still isinstance(cloud, (GCP, AWS, Azure)). The current implementation LGTM; cc @Michaelvll for a double-check

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we change the check to the following:

try:
    cloud.check_features_are_supported(handle.launched_resources, {clouds.CloudImplementationFeatures.OPEN_PORTS})
    provision_lib.cleanup_ports(repr(cloud),
                                cluster_name_on_cloud,
                                handle.launched_resources.ports,
                                config['provider'])
except exceptions.NotSupportedError:
    pass
except ...

The provision_lib.supports seems duplicated with the CloudImplementationFeatures in Cloud class.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! Let's use cloud.check_features_are_supported. This should be fine since all ports related functions is implemented in new provisioned API.

Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hemildesai! Pushed some additional fixes for error msg handling. Ran smoke tests, LGTM!

EDIT - testing some final local kubernetes integration before merging

@romilbhardwaj
Copy link
Collaborator

Made some changes to error handling and disallowed opening ports sky local up clusters for now till we figure out a good solution there. Re-running smoke tests, will merge when they pass.

@romilbhardwaj
Copy link
Collaborator

Fixed other comments and removed provision_lib.supports(). Smoke tests pass, merging now!

@romilbhardwaj romilbhardwaj merged commit 26900ff into skypilot-org:master Jan 12, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants