Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting ValueError: Error calling check on server: Internal Server Error when checking server on an aws cluster #83

Closed
shyamsn97 opened this issue Jul 11, 2023 · 13 comments

Comments

@shyamsn97
Copy link

shyamsn97 commented Jul 11, 2023

Hi! First off, I just wanna say runhouse is an awesome project! Really gonna revolutionize how people run machine learning workflows!

Describe the bug
I'm running into an issue where I can't run any remote functions on the cluster, but I can do a cluster.run_python(...)

Here's the code I'm running:

    import runhouse as rh
    cluster = rh.OnDemandCluster(
                name="cpu-cluster",
                instance_type="CPU:8",
                provider="aws",      # options: "AWS", "GCP", "Azure", "Lambda", or "cheapest"
            )

    cluster.up_if_not()
    cluster.run_python(['import numpy', 'print(numpy.__version__)'])
    print(cluster.check_server()) # ERRORS HERE

This runs fine until the cluster.check_server(), as you can see here:

INFO | 2023-07-10 21:51:56,953 | Loaded Runhouse config from /home/shyam/.rh/config.yaml
Refreshing status for 1 cluster ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:--INFO | 2023-07-10 21:51:58,623 | Found credentials in shared credentials file: ~/.aws/credentials
INFO | 2023-07-10 21:52:05,743 | Running command on cpu-cluster: python3 -c "import numpy; print(numpy.__version__)"
1.25.1
INFO | 2023-07-10 21:52:07,304 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-07-10 21:52:07,855 | Authentication (publickey) successful!
INFO | 2023-07-10 21:52:08,095 | Checking server cpu-cluster
Traceback (most recent call last):
  File "/home/shyam/Code/trainyard/examples/test.py", line 54, in <module>
    print(cluster.check_server())
  File "/home/shyam/miniconda3/envs/py310/lib/python3.10/site-packages/runhouse/rns/hardware/cluster.py", line 363, in check_server
    self.client.check_server(cluster_config=cluster_config)
  File "/home/shyam/miniconda3/envs/py310/lib/python3.10/site-packages/runhouse/servers/http/http_client.py", line 48, in check_server
    self.request(
  File "/home/shyam/miniconda3/envs/py310/lib/python3.10/site-packages/runhouse/servers/http/http_client.py", line 41, in request
    raise ValueError(
ValueError: Error calling check on server: Internal Server Error

Not sure if I'm doing something wrong here, but I think my credentials work because I can see that the cluster is being created and I can ssh into it. My package versions can be seen below, let me know if you need more information! Thanks!

Versions

Python Platform: Linux-5.8.0-36-generic-x86_64-with-glibc2.31
Python Version: 3.10.0 (default, Mar  3 2022, 09:58:08) [GCC 7.5.0]

Relevant packages: 
awscli==1.25.60
azure-cli==2.31.0
azure-cli-core==2.31.0
azure-cli-telemetry==1.0.6
azure-core==1.28.0
boto3==1.24.59
docker==6.1.3
fsspec==2023.1.0
gcsfs==2023.1.0
google-api-python-client==2.92.0
google-cloud-storage==2.10.0
pyarrow==12.0.1
pycryptodome==3.12.0
rich==13.4.2
runhouse==0.0.7
s3fs==2023.1.0
skypilot==0.3.1
sshfs==2023.4.1
sshtunnel==0.4.0
typer==0.9.0
wheel==0.38.4

Checking credentials to enable clouds for SkyPilot.
  AWS: enabled          
  Azure: disabled          
    Reason: Azure credential is not set. Run the following commands:
      $ az login
      $ az account set -s <subscription_id>
    For more info: https://docs.microsoft.com/en-us/cli/azure/get-started-with-azure-cli
  GCP: disabled          
    Reason: GCP tools are not installed or credentials are not set. Run the following commands:
      $ pip install google-api-python-client
      $ conda install -c conda-forge google-cloud-sdk -y
      $ gcloud init
      $ gcloud auth application-default login
    For more info: https://skypilot.readthedocs.io/en/latest/getting-started/installation.html
  Lambda: enabled          
  IBM: disabled          
    Reason: Missing credential file at /home/shyam/.ibm/credentials.yaml.
    Store your API key and Resource Group id in ~/.ibm/credentials.yaml in the following format:
      iam_api_key: <IAM_API_KEY>
      resource_group_id: <RESOURCE_GROUP_ID>
  Cloudflare (for R2 object store): disabled          
    Reason: [r2] profile is not set in ~/.cloudflare/r2.credentials. Additionally, Account ID from R2 dashboard is not set. Run the following commands:
      $ pip install boto3
      $ AWS_SHARED_CREDENTIALS_FILE=~/.cloudflare/r2.credentials aws configure --profile r2
      $ mkdir -p ~/.cloudflare
      $ echo <YOUR_ACCOUNT_ID_HERE> > ~/.cloudflare/accountid
    For more info: https://skypilot.readthedocs.io/en/latest/getting-started/installation.html#cloudflare-r2

SkyPilot will use only the enabled clouds to run tasks. To change this, configure cloud credentials, and run sky check.
If any problems remain, please file an issue at https://github.com/skypilot-org/skypilot/issues/new
Clusters
NAME         LAUNCHED     RESOURCES            STATUS  AUTOSTOP  COMMAND  
cpu-cluster  13 mins ago  1x AWS(m6i.2xlarge)  INIT    (down)    test.py  

Managed spot jobs
No in progress jobs. (See: sky spot -h)

In addition, here's the end of the setup of the cluster:

--------------------
Ray runtime started.
--------------------

Next steps
  To add another node to this Ray cluster, run
    ray start --address='172.31.46.12:6380'
  
  To connect to this Ray cluster:
    import ray
    ray.init()
Shared connection to 54.166.159.228 closed.
  
  To submit a Ray job using the Ray Jobs CLI:
    RAY_ADDRESS='http://127.0.0.1:8266' ray job submit --working-dir . -- python my_script.py
  
  See https://docs.ray.io/en/latest/cluster/running-applications/job-submission/index.html 
  for more information on submitting Ray jobs to the Ray cluster.
  
  To terminate the Ray runtime, run
    ray stop
  
  To view the status of the cluster, use
    ray status
  
  To monitor and debug Ray, view the dashboard at 
    127.0.0.1:8266
  
  If connection to the dashboard fails, check your firewall settings and network configuration.
/usr/bin/prlimit
2023-07-10 21:35:36,790	INFO log_timer.py:25 -- NodeUpdater: i-036c634eb67821936: Setup commands succeeded [LogTimer=92341ms]
2023-07-10 21:35:36,791	INFO updater.py:489 -- [7/7] Starting the Ray runtime
2023-07-10 21:35:36,792	VINFO command_runner.py:371 -- Running `export RAY_USAGE_STATS_ENABLED=0;export RAY_OVERRIDE_RESOURCES='{"CPU":8}';((ps aux | grep -v nohup | grep -v grep | grep -q -- "python3 -m sky.skylet.skylet") || nohup python3 -m sky.skylet.skylet >> ~/.sky/skylet.log 2>&1 &); ray stop; RAY_SCHEDULER_EVENTS=0 RAY_DEDUP_LOGS=0 ray start --disable-usage-stats --head --port=6380 --dashboard-port=8266 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml  --temp-dir /tmp/ray_skypilot || exit 1; which prlimit && for id in $(pgrep -f raylet/raylet); do sudo prlimit --nofile=1048576:1048576 --pid=$id || true; done; python -c 'import json, os; json.dump({"ray_port":6380, "ray_dashboard_port":8266}, open(os.path.expanduser("~/.sky/ray_port.json"), "w"))';`
2023-07-10 21:35:36,792	VVINFO command_runner.py:373 -- Full command is `ssh -tt -i ~/.ssh/sky-key -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ExitOnForwardFailure=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -o ControlMaster=auto -o ControlPath=/tmp/ray_ssh_5a4cd850fc/7112f145b3/%C -o ControlPersist=10s -o ConnectTimeout=120s ubuntu@54.166.159.228 bash --login -c -i 'source ~/.bashrc; export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (export RAY_USAGE_STATS_ENABLED=0;export RAY_OVERRIDE_RESOURCES='"'"'{"CPU":8}'"'"';((ps aux | grep -v nohup | grep -v grep | grep -q -- "python3 -m sky.skylet.skylet") || nohup python3 -m sky.skylet.skylet >> ~/.sky/skylet.log 2>&1 &); ray stop; RAY_SCHEDULER_EVENTS=0 RAY_DEDUP_LOGS=0 ray start --disable-usage-stats --head --port=6380 --dashboard-port=8266 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml  --temp-dir /tmp/ray_skypilot || exit 1; which prlimit && for id in $(pgrep -f raylet/raylet); do sudo prlimit --nofile=1048576:1048576 --pid=$id || true; done; python -c '"'"'import json, os; json.dump({"ray_port":6380, "ray_dashboard_port":8266}, open(os.path.expanduser("~/.sky/ray_port.json"), "w"))'"'"';)'`
2023-07-10 21:35:41,238	INFO log_timer.py:25 -- NodeUpdater: i-036c634eb67821936: Ray start commands succeeded [LogTimer=4447ms]
2023-07-10 21:35:41,238	INFO log_timer.py:25 -- NodeUpdater: i-036c634eb67821936: Applied config f62a597a450a8281871e7ace3caa155afb5dfe65  [LogTimer=183192ms]
2023-07-10 21:35:42,755	INFO log_timer.py:25 -- AWSNodeProvider: Set tag ray-node-status=up-to-date on ['i-036c634eb67821936']  [LogTimer=515ms]
2023-07-10 21:35:42,925	INFO log_timer.py:25 -- AWSNodeProvider: Set tag ray-runtime-config=f62a597a450a8281871e7ace3caa155afb5dfe65 on ['i-036c634eb67821936']  [LogTimer=170ms]
2023-07-10 21:35:43,090	INFO log_timer.py:25 -- AWSNodeProvider: Set tag ray-file-mounts-contents=24403a03b3acb79e10305dbf19904b00a057a0a1 on ['i-036c634eb67821936']  [LogTimer=165ms]
2023-07-10 21:35:43,091	INFO updater.py:188 -- New status: up-to-date
2023-07-10 21:35:43,273	INFO commands.py:836 -- Useful commands
2023-07-10 21:35:43,273	INFO commands.py:838 -- Monitor autoscaling with
2023-07-10 21:35:43,274	INFO commands.py:839 --   ray exec /home/shyam/.sky/generated/cpu-cluster.yml 'tail -n 100 -f /tmp/ray/session_latest/logs/monitor*'
2023-07-10 21:35:43,274	INFO commands.py:846 -- Connect to a terminal on the cluster head:
2023-07-10 21:35:43,274	INFO commands.py:847 --   ray attach /home/shyam/.sky/generated/cpu-cluster.yml
2023-07-10 21:35:43,274	INFO commands.py:850 -- Get a remote shell to the cluster manually:
2023-07-10 21:35:43,274	INFO commands.py:851 --   ssh -o IdentitiesOnly=yes -i ~/.ssh/sky-key ubuntu@54.166.159.228
@shyamsn97 shyamsn97 changed the title Getting ValueError: Error calling check on server: Internal Server Error when checking server Getting ValueError: Error calling check on server: Internal Server Error when checking server on an aws cluster Jul 11, 2023
@shyamsn97
Copy link
Author

After configuring gcp correctly, looks like this works when specifying "gcp" in provider:

INFO | 2023-07-10 22:37:35,284 | Loaded Runhouse config from /home/shyam/.rh/config.yaml
INFO | 2023-07-10 22:37:44,473 | Running command on cpu-cluster: python3 -c "import numpy; print(numpy.__version__)"
1.21.6
INFO | 2023-07-10 22:37:46,311 | Connected (version 2.0, client OpenSSH_7.9p1)
INFO | 2023-07-10 22:37:46,765 | Authentication (publickey) successful!
INFO | 2023-07-10 22:37:46,767 | Checking server cpu-cluster
INFO | 2023-07-10 22:37:48,945 | Server cpu-cluster is up.
None

@dongreenberg
Copy link
Contributor

Hi! Thank you for the detailed bug report. I actually noticed some new breakage this evening relating to the FastAPI 0.100.0 release, but given it's working on GCP I'm not sure that's it. Trying to reproduce on AWS now.

@dongreenberg
Copy link
Contributor

I was able to reproduce this and it is indeed an issue with the latest fastapi release (you can confirm by sshing into the server, viewing the server logs at ~/.rh/cluster_server_cpu-cluster.log, and confirming that the latest exception is a pydantic issue. The latest fastapi release was a massive update to Pydantic v2). I'm pushing a fix to main and releasing presently.

@dongreenberg
Copy link
Contributor

Just pushed https://github.com/run-house/runhouse/releases/tag/v0.0.8, which should fix this.

@shyamsn97
Copy link
Author

Awesome! Thanks so much!

@dongreenberg
Copy link
Contributor

@shyamsn97 confirming that the fix worked?

@shyamsn97
Copy link
Author

Yep seems like that worked and now I'm not getting an error on check_server. However the function is hanging for aws hangs indefinitely.

code from example:

import runhouse as rh

def num_cpus():
    import multiprocessing
    return f"Num cpus: {multiprocessing.cpu_count()}"

num_cpus()

# Using a Cloud provider
cluster = rh.ondemand_cluster(
              name="runhouse",
              instance_type="CPU:8",
              provider="aws",      # options: "AWS", "GCP", "Azure", "Lambda", or "cheapest"
          )

cluster.up_if_not()

num_cpus_cluster = rh.function(name="num_cpus_cluster", fn=num_cpus).to(system=cluster, reqs=["./"])

num_cpus_cluster()

Output:

(py310) shyam@shyam-ThinkPad-P53:~/Code/test-runhouse$ python test.py 
INFO | 2023-07-11 11:50:06,755 | Loaded Runhouse config from /home/shyam/.rh/config.yaml
INFO | 2023-07-11 11:50:07,937 | Found credentials in shared credentials file: ~/.aws/credentials
I 07-11 11:50:08 optimizer.py:636] == Optimizer ==
I 07-11 11:50:08 optimizer.py:647] Target: minimizing cost
I 07-11 11:50:08 optimizer.py:659] Estimated cost: $0.4 / hour
I 07-11 11:50:08 optimizer.py:659] 
I 07-11 11:50:08 optimizer.py:732] Considered resources (1 node):
I 07-11 11:50:08 optimizer.py:781] ------------------------------------------------------------------------------------------
I 07-11 11:50:08 optimizer.py:781]  CLOUD   INSTANCE      vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE   COST ($)   CHOSEN   
I 07-11 11:50:08 optimizer.py:781] ------------------------------------------------------------------------------------------
I 07-11 11:50:08 optimizer.py:781]  AWS     m6i.2xlarge   8       32        -              us-east-1     0.38          ✔     
I 07-11 11:50:08 optimizer.py:781] ------------------------------------------------------------------------------------------
I 07-11 11:50:08 optimizer.py:781] 
I 07-11 11:50:08 cloud_vm_ray_backend.py:3495] Creating a new cluster: "runhouse" [1x AWS(m6i.2xlarge)].
I 07-11 11:50:08 cloud_vm_ray_backend.py:3495] Tip: to reuse an existing cluster, specify --cluster (-c). Run `sky status` to see existing clusters.
I 07-11 11:50:08 cloud_vm_ray_backend.py:1215] To view detailed progress: tail -n100 -f /home/shyam/sky_logs/sky-2023-07-11-11-50-07-852797/provision.log
I 07-11 11:50:09 cloud_vm_ray_backend.py:1539] Launching on AWS us-east-1 (us-east-1a,us-east-1b,us-east-1c,us-east-1d,us-east-1f)
I 07-11 11:51:42 log_utils.py:89] Head node is up.
I 07-11 11:53:34 cloud_vm_ray_backend.py:1352] Successfully provisioned or found existing VM.
I 07-11 11:53:41 cloud_vm_ray_backend.py:3544] Processing file mounts.
I 07-11 11:53:41 cloud_vm_ray_backend.py:3575] To view detailed progress: tail -n100 -f ~/sky_logs/sky-2023-07-11-11-50-07-852797/file_mounts.log
I 07-11 11:53:41 backend_utils.py:1254] Syncing (to 1 node): ~/.rh -> ~/.rh
I 07-11 11:53:44 cloud_vm_ray_backend.py:2808] Run commands not specified or empty.
Clusters
NAME         LAUNCHED        RESOURCES              STATUS  AUTOSTOP  COMMAND  
runhouse     a few secs ago  1x AWS(m6i.2xlarge)    UP      (down)    test.py  
cpu-cluster  20 mins ago     1x GCP(n2-standard-8)  UP      (down)    test.py  

INFO | 2023-07-11 11:53:51,543 | Restarting HTTP server on runhouse.
INFO | 2023-07-11 11:53:51,543 | Running command on runhouse: pip install runhouse==0.0.8Collecting runhouse==0.0.8
  Downloading runhouse-0.0.8-py3-none-any.whl (142 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 142.7/142.7 kB 3.5 MB/s eta 0:00:00
Collecting sshfs
  Downloading sshfs-2023.4.1-py3-none-any.whl (15 kB)
Collecting typer
  Downloading typer-0.9.0-py3-none-any.whl (45 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.9/45.9 kB 2.6 MB/s eta 0:00:00
Collecting fsspec
  Downloading fsspec-2023.6.0-py3-none-any.whl (163 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 163.8/163.8 kB 7.9 MB/s eta 0:00:00
Collecting uvicorn
  Downloading uvicorn-0.22.0-py3-none-any.whl (58 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.3/58.3 kB 4.4 MB/s eta 0:00:00
Collecting sshtunnel>=0.3.0
  Downloading sshtunnel-0.4.0-py2.py3-none-any.whl (24 kB)
Requirement already satisfied: pyOpenSSL>=21.1.0 in /opt/conda/lib/python3.10/site-packages (from runhouse==0.0.8) (22.1.0)
Collecting fastapi<=0.99.0
  Downloading fastapi-0.99.0-py3-none-any.whl (58 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.2/58.2 kB 321.7 kB/s eta 0:00:00
Collecting pyarrow
  Downloading pyarrow-12.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (38.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.9/38.9 MB 13.5 MB/s eta 0:00:00
Requirement already satisfied: rich in /opt/conda/lib/python3.10/site-packages (from runhouse==0.0.8) (13.4.2)
Requirement already satisfied: wheel in /opt/conda/lib/python3.10/site-packages (from runhouse==0.0.8) (0.38.4)
Requirement already satisfied: skypilot==0.3.1 in /opt/conda/lib/python3.10/site-packages (from runhouse==0.0.8) (0.3.1)
Requirement already satisfied: filelock>=3.6.0 in /opt/conda/lib/python3.10/site-packages (from skypilot==0.3.1->runhouse==0.0.8) (3.12.2)
Requirement already satisfied: pulp in /opt/conda/lib/python3.10/site-packages (from skypilot==0.3.1->runhouse==0.0.8) (2.7.0)
Requirement already satisfied: click<=8.0.4,>=7.0 in /opt/conda/lib/python3.10/site-packages (from skypilot==0.3.1->runhouse==0.0.8) (8.0.4)
Requirement already satisfied: pycryptodome==3.12.0 in /opt/conda/lib/python3.10/site-packages (from skypilot==0.3.1->runhouse==0.0.8) (3.12.0)
Requirement already satisfied: networkx in /opt/conda/lib/python3.10/site-packages (from skypilot==0.3.1->runhouse==0.0.8) (3.1)
Requirement already satisfied: pandas in /opt/conda/lib/python3.10/site-packages (from skypilot==0.3.1->runhouse==0.0.8) (2.0.3)
Requirement already satisfied: grpcio<=1.51.3,>=1.42.0 in /opt/conda/lib/python3.10/site-packages (from skypilot==0.3.1->runhouse==0.0.8) (1.51.3)
Requirement already satisfied: awscli in /opt/conda/lib/python3.10/site-packages (from skypilot==0.3.1->runhouse==0.0.8) (1.29.2)
Requirement already satisfied: oauth2client in /opt/conda/lib/python3.10/site-packages (from skypilot==0.3.1->runhouse==0.0.8) (4.1.3)
Requirement already satisfied: pendulum in /opt/conda/lib/python3.10/site-packages (from skypilot==0.3.1->runhouse==0.0.8) (2.1.2)
Requirement already satisfied: psutil in /opt/conda/lib/python3.10/site-packages (from skypilot==0.3.1->runhouse==0.0.8) (5.9.5)
Requirement already satisfied: tabulate in /opt/conda/lib/python3.10/site-packages (from skypilot==0.3.1->runhouse==0.0.8) (0.9.0)
Requirement already satisfied: cryptography in /opt/conda/lib/python3.10/site-packages (from skypilot==0.3.1->runhouse==0.0.8) (38.0.3)
Requirement already satisfied: colorama<0.4.5 in /opt/conda/lib/python3.10/site-packages (from skypilot==0.3.1->runhouse==0.0.8) (0.4.4)
Requirement already satisfied: boto3 in /opt/conda/lib/python3.10/site-packages (from skypilot==0.3.1->runhouse==0.0.8) (1.28.2)
Requirement already satisfied: ray[default]<=2.4.0,>=2.2.0 in /opt/conda/lib/python3.10/site-packages (from skypilot==0.3.1->runhouse==0.0.8) (2.4.0)
Requirement already satisfied: protobuf!=3.19.5,>=3.15.3 in /opt/conda/lib/python3.10/site-packages (from skypilot==0.3.1->runhouse==0.0.8) (4.23.4)
Requirement already satisfied: PrettyTable>=2.0.0 in /opt/conda/lib/python3.10/site-packages (from skypilot==0.3.1->runhouse==0.0.8) (3.8.0)
Requirement already satisfied: packaging in /opt/conda/lib/python3.10/site-packages (from skypilot==0.3.1->runhouse==0.0.8) (23.1)
Requirement already satisfied: jsonschema in /opt/conda/lib/python3.10/site-packages (from skypilot==0.3.1->runhouse==0.0.8) (4.18.0)
Requirement already satisfied: jinja2>=3.0 in /opt/conda/lib/python3.10/site-packages (from skypilot==0.3.1->runhouse==0.0.8) (3.1.2)
Requirement already satisfied: typing-extensions>=4.5.0 in /opt/conda/lib/python3.10/site-packages (from fastapi<=0.99.0->runhouse==0.0.8) (4.7.1)
Collecting starlette<0.28.0,>=0.27.0
  Downloading starlette-0.27.0-py3-none-any.whl (66 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 67.0/67.0 kB 588.7 kB/s eta 0:00:00
Collecting pydantic!=1.8,!=1.8.1,<2.0.0,>=1.7.4
  Downloading pydantic-1.10.11-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 16.3 MB/s eta 0:00:00
Collecting paramiko>=2.7.2
  Downloading paramiko-3.2.0-py3-none-any.whl (224 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 224.2/224.2 kB 30.5 MB/s eta 0:00:00
Requirement already satisfied: numpy>=1.16.6 in /opt/conda/lib/python3.10/site-packages (from pyarrow->runhouse==0.0.8) (1.25.1)
Requirement already satisfied: markdown-it-py>=2.2.0 in /opt/conda/lib/python3.10/site-packages (from rich->runhouse==0.0.8) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /opt/conda/lib/python3.10/site-packages (from rich->runhouse==0.0.8) (2.15.1)
Collecting asyncssh<3,>=2.11.0
  Downloading asyncssh-2.13.2-py3-none-any.whl (349 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 349.3/349.3 kB 2.8 MB/s eta 0:00:00
Collecting h11>=0.8
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.3/58.3 kB 11.9 MB/s eta 0:00:00
Requirement already satisfied: cffi>=1.12 in /opt/conda/lib/python3.10/site-packages (from cryptography->skypilot==0.3.1->runhouse==0.0.8) (1.15.1)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.10/site-packages (from jinja2>=3.0->skypilot==0.3.1->runhouse==0.0.8) (2.1.3)
Requirement already satisfied: mdurl~=0.1 in /opt/conda/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich->runhouse==0.0.8) (0.1.2)
Collecting pynacl>=1.5
  Downloading PyNaCl-1.5.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (856 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 856.7/856.7 kB 11.9 MB/s eta 0:00:00
Collecting bcrypt>=3.2
  Downloading bcrypt-4.0.1-cp36-abi3-manylinux_2_28_x86_64.whl (593 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 593.7/593.7 kB 67.6 MB/s eta 0:00:00
Requirement already satisfied: wcwidth in /opt/conda/lib/python3.10/site-packages (from PrettyTable>=2.0.0->skypilot==0.3.1->runhouse==0.0.8) (0.2.6)
Requirement already satisfied: requests in /opt/conda/lib/python3.10/site-packages (from ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (2.28.1)
Requirement already satisfied: aiosignal in /opt/conda/lib/python3.10/site-packages (from ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (1.3.1)
Requirement already satisfied: virtualenv<20.21.1,>=20.0.24 in /opt/conda/lib/python3.10/site-packages (from ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (20.21.0)
Requirement already satisfied: msgpack<2.0.0,>=1.0.0 in /opt/conda/lib/python3.10/site-packages (from ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (1.0.5)
Requirement already satisfied: frozenlist in /opt/conda/lib/python3.10/site-packages (from ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (1.3.3)
Requirement already satisfied: pyyaml in /opt/conda/lib/python3.10/site-packages (from ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (5.4.1)
Requirement already satisfied: attrs in /opt/conda/lib/python3.10/site-packages (from ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (23.1.0)
Requirement already satisfied: aiohttp>=3.7 in /opt/conda/lib/python3.10/site-packages (from ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (3.8.4)
Requirement already satisfied: aiohttp-cors in /opt/conda/lib/python3.10/site-packages (from ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (0.7.0)
Requirement already satisfied: opencensus in /opt/conda/lib/python3.10/site-packages (from ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (0.11.2)
Requirement already satisfied: smart-open in /opt/conda/lib/python3.10/site-packages (from ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (6.3.0)
Requirement already satisfied: prometheus-client>=0.7.1 in /opt/conda/lib/python3.10/site-packages (from ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (0.17.1)
Requirement already satisfied: colorful in /opt/conda/lib/python3.10/site-packages (from ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (0.5.5)
Requirement already satisfied: gpustat>=1.0.0 in /opt/conda/lib/python3.10/site-packages (from ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (1.1)
Requirement already satisfied: py-spy>=0.2.0 in /opt/conda/lib/python3.10/site-packages (from ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (0.3.14)
Collecting anyio<5,>=3.4.0
  Downloading anyio-3.7.1-py3-none-any.whl (80 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 80.9/80.9 kB 16.7 MB/s eta 0:00:00
Requirement already satisfied: s3transfer<0.7.0,>=0.6.0 in /opt/conda/lib/python3.10/site-packages (from awscli->skypilot==0.3.1->runhouse==0.0.8) (0.6.1)
Requirement already satisfied: docutils<0.17,>=0.10 in /opt/conda/lib/python3.10/site-packages (from awscli->skypilot==0.3.1->runhouse==0.0.8) (0.16)
Requirement already satisfied: rsa<4.8,>=3.1.2 in /opt/conda/lib/python3.10/site-packages (from awscli->skypilot==0.3.1->runhouse==0.0.8) (4.7.2)
Requirement already satisfied: botocore==1.31.2 in /opt/conda/lib/python3.10/site-packages (from awscli->skypilot==0.3.1->runhouse==0.0.8) (1.31.2)
Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /opt/conda/lib/python3.10/site-packages (from botocore==1.31.2->awscli->skypilot==0.3.1->runhouse==0.0.8) (1.0.1)
Requirement already satisfied: urllib3<1.27,>=1.25.4 in /opt/conda/lib/python3.10/site-packages (from botocore==1.31.2->awscli->skypilot==0.3.1->runhouse==0.0.8) (1.26.11)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /opt/conda/lib/python3.10/site-packages (from botocore==1.31.2->awscli->skypilot==0.3.1->runhouse==0.0.8) (2.8.2)
Requirement already satisfied: referencing>=0.28.4 in /opt/conda/lib/python3.10/site-packages (from jsonschema->skypilot==0.3.1->runhouse==0.0.8) (0.29.1)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /opt/conda/lib/python3.10/site-packages (from jsonschema->skypilot==0.3.1->runhouse==0.0.8) (2023.6.1)
Requirement already satisfied: rpds-py>=0.7.1 in /opt/conda/lib/python3.10/site-packages (from jsonschema->skypilot==0.3.1->runhouse==0.0.8) (0.8.10)
Requirement already satisfied: six>=1.6.1 in /opt/conda/lib/python3.10/site-packages (from oauth2client->skypilot==0.3.1->runhouse==0.0.8) (1.16.0)
Requirement already satisfied: pyasn1>=0.1.7 in /opt/conda/lib/python3.10/site-packages (from oauth2client->skypilot==0.3.1->runhouse==0.0.8) (0.5.0)
Requirement already satisfied: httplib2>=0.9.1 in /opt/conda/lib/python3.10/site-packages (from oauth2client->skypilot==0.3.1->runhouse==0.0.8) (0.22.0)
Requirement already satisfied: pyasn1-modules>=0.0.5 in /opt/conda/lib/python3.10/site-packages (from oauth2client->skypilot==0.3.1->runhouse==0.0.8) (0.3.0)
Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.10/site-packages (from pandas->skypilot==0.3.1->runhouse==0.0.8) (2023.3)
Requirement already satisfied: tzdata>=2022.1 in /opt/conda/lib/python3.10/site-packages (from pandas->skypilot==0.3.1->runhouse==0.0.8) (2023.3)
Requirement already satisfied: pytzdata>=2020.1 in /opt/conda/lib/python3.10/site-packages (from pendulum->skypilot==0.3.1->runhouse==0.0.8) (2020.1)
Requirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/lib/python3.10/site-packages (from aiohttp>=3.7->ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (6.0.4)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /opt/conda/lib/python3.10/site-packages (from aiohttp>=3.7->ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (4.0.2)
Requirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp>=3.7->ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (1.9.2)
Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp>=3.7->ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (2.1.1)
Collecting exceptiongroup
  Downloading exceptiongroup-1.1.2-py3-none-any.whl (14 kB)
Collecting sniffio>=1.1
  Downloading sniffio-1.3.0-py3-none-any.whl (10 kB)
Requirement already satisfied: idna>=2.8 in /opt/conda/lib/python3.10/site-packages (from anyio<5,>=3.4.0->starlette<0.28.0,>=0.27.0->fastapi<=0.99.0->runhouse==0.0.8) (3.4)
Requirement already satisfied: pycparser in /opt/conda/lib/python3.10/site-packages (from cffi>=1.12->cryptography->skypilot==0.3.1->runhouse==0.0.8) (2.21)
Requirement already satisfied: nvidia-ml-py>=11.450.129 in /opt/conda/lib/python3.10/site-packages (from gpustat>=1.0.0->ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (12.535.77)
Requirement already satisfied: blessed>=1.17.1 in /opt/conda/lib/python3.10/site-packages (from gpustat>=1.0.0->ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (1.20.0)
Requirement already satisfied: pyparsing!=3.0.0,!=3.0.1,!=3.0.2,!=3.0.3,<4,>=2.4.2 in /opt/conda/lib/python3.10/site-packages (from httplib2>=0.9.1->oauth2client->skypilot==0.3.1->runhouse==0.0.8) (3.1.0)
Requirement already satisfied: distlib<1,>=0.3.6 in /opt/conda/lib/python3.10/site-packages (from virtualenv<20.21.1,>=20.0.24->ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (0.3.6)
Requirement already satisfied: platformdirs<4,>=2.4 in /opt/conda/lib/python3.10/site-packages (from virtualenv<20.21.1,>=20.0.24->ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (3.8.1)
Requirement already satisfied: opencensus-context>=0.1.3 in /opt/conda/lib/python3.10/site-packages (from opencensus->ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (0.1.3)
Requirement already satisfied: google-api-core<3.0.0,>=1.0.0 in /opt/conda/lib/python3.10/site-packages (from opencensus->ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (2.11.1)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests->ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (2022.9.24)
Requirement already satisfied: google-auth<3.0.dev0,>=2.14.1 in /opt/conda/lib/python3.10/site-packages (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (2.22.0)
Requirement already satisfied: googleapis-common-protos<2.0.dev0,>=1.56.2 in /opt/conda/lib/python3.10/site-packages (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (1.59.1)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /opt/conda/lib/python3.10/site-packages (from google-auth<3.0.dev0,>=2.14.1->google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]<=2.4.0,>=2.2.0->skypilot==0.3.1->runhouse==0.0.8) (5.3.1)
Installing collected packages: typer, sniffio, pydantic, pyarrow, h11, fsspec, exceptiongroup, bcrypt, uvicorn, pynacl, anyio, starlette, paramiko, asyncssh, sshtunnel, sshfs, fastapi, runhouse
  Attempting uninstall: pydantic
    Found existing installation: pydantic 2.0.2
    Uninstalling pydantic-2.0.2:
      Successfully uninstalled pydantic-2.0.2
Successfully installed anyio-3.7.1 asyncssh-2.13.2 bcrypt-4.0.1 exceptiongroup-1.1.2 fastapi-0.99.0 fsspec-2023.6.0 h11-0.14.0 paramiko-3.2.0 pyarrow-12.0.1 pydantic-1.10.11 pynacl-1.5.0 runhouse-0.0.8 sniffio-1.3.0 sshfs-2023.4.1 sshtunnel-0.4.0 starlette-0.27.0 typer-0.9.0 uvicorn-0.22.0
INFO | 2023-07-11 11:53:58,889 | Running command on runhouse: pkill -f "python -m runhouse.servers.http.http_server"
INFO | 2023-07-11 11:54:00,015 | Running command on runhouse: pkill -f ".*ray.*6379.*"
INFO | 2023-07-11 11:54:01,244 | Running command on runhouse: ray start --head --port 6379 --autoscaling-config=~/ray_bootstrap_config.yaml
2023-07-11 18:54:03,331	INFO usage_lib.py:398 -- Usage stats collection is enabled by default without user confirmation because this terminal is detected to be non-interactive. To disable this, add `--disable-usage-stats` to the command that starts the cluster, or run the following command: `ray disable-usage-stats` before starting the cluster. See https://docs.ray.io/en/master/cluster/usage-stats.html for more details.
2023-07-11 18:54:03,331	INFO scripts.py:710 -- Local node IP: 172.31.46.221
2023-07-11 18:54:05,396	SUCC scripts.py:747 -- --------------------
2023-07-11 18:54:05,397	SUCC scripts.py:748 -- Ray runtime started.
2023-07-11 18:54:05,397	SUCC scripts.py:749 -- --------------------
2023-07-11 18:54:05,397	INFO scripts.py:751 -- Next steps
2023-07-11 18:54:05,397	INFO scripts.py:754 -- To add another node to this Ray cluster, run
2023-07-11 18:54:05,397	INFO scripts.py:757 --   ray start --address='172.31.46.221:6379'
2023-07-11 18:54:05,397	INFO scripts.py:766 -- To connect to this Ray cluster:
2023-07-11 18:54:05,397	INFO scripts.py:768 -- import ray
2023-07-11 18:54:05,397	INFO scripts.py:769 -- ray.init()
2023-07-11 18:54:05,397	INFO scripts.py:781 -- To submit a Ray job using the Ray Jobs CLI:
2023-07-11 18:54:05,397	INFO scripts.py:782 --   RAY_ADDRESS='http://127.0.0.1:8265' ray job submit --working-dir . -- python my_script.py
2023-07-11 18:54:05,397	INFO scripts.py:791 -- See https://docs.ray.io/en/latest/cluster/running-applications/job-submission/index.html 
2023-07-11 18:54:05,397	INFO scripts.py:795 -- for more information on submitting Ray jobs to the Ray cluster.
2023-07-11 18:54:05,397	INFO scripts.py:800 -- To terminate the Ray runtime, run
2023-07-11 18:54:05,397	INFO scripts.py:801 --   ray stop
2023-07-11 18:54:05,397	INFO scripts.py:804 -- To view the status of the cluster, use
2023-07-11 18:54:05,397	INFO scripts.py:805 --   ray status
2023-07-11 18:54:05,397	INFO scripts.py:809 -- To monitor and debug Ray, view the dashboard at 
2023-07-11 18:54:05,397	INFO scripts.py:810 --   127.0.0.1:8265
2023-07-11 18:54:05,397	INFO scripts.py:817 -- If connection to the dashboard fails, check your firewall settings and network configuration.
INFO | 2023-07-11 11:54:05,750 | Running command on runhouse: screen -dm bash -c 'python -m runhouse.servers.http.http_server |& tee -a ~/.rh/cluster_server_runhouse.log 2>&1'
/home/shyam/miniconda3/envs/py310/lib/python3.10/site-packages/runhouse/rns/function.py:113: UserWarning: ``reqs`` and ``setup_cmds`` arguments has been deprecated. Please use ``env`` instead.
  warnings.warn(
INFO | 2023-07-11 11:54:12,193 | Setting up Function on cluster.
INFO | 2023-07-11 11:54:12,564 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-07-11 11:54:13,099 | Authentication (publickey) successful!
INFO | 2023-07-11 11:54:13,108 | Checking server runhouse
INFO | 2023-07-11 11:54:14,768 | Server runhouse is up.
INFO | 2023-07-11 11:54:14,914 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-07-11 11:54:14,928 | Authentication (publickey) failed.
INFO | 2023-07-11 11:54:14,934 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-07-11 11:54:14,949 | Authentication (publickey) successful!
2023-07-11 11:54:14,949| ERROR   | Problem setting SSH Forwarder up: Couldn't open tunnel :50052 <> 127.0.0.1:50052 might be in use or destination not reachable
ERROR | 2023-07-11 11:54:14,949 | Problem setting SSH Forwarder up: Couldn't open tunnel :50052 <> 127.0.0.1:50052 might be in use or destination not reachable
INFO | 2023-07-11 11:54:15,083 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-07-11 11:54:15,097 | Authentication (publickey) failed.
INFO | 2023-07-11 11:54:15,103 | Connected (version 2.0, client OpenSSH_8.2p1)
INFO | 2023-07-11 11:54:15,118 | Authentication (publickey) successful!
INFO | 2023-07-11 11:54:15,119 | Checking server local-cluster
INFO | 2023-07-11 11:54:16,445 | Server local-cluster is up.
INFO | 2023-07-11 11:54:16,445 | Running command on local-cluster: ray start --head
INFO | 2023-07-11 11:54:17,903 | Copying folder from file:///home/shyam/Code/test-runhouse to: runhouse
INFO | 2023-07-11 11:54:19,653 | Installing packages on cluster runhouse: ['Package: test-runhouse']
INFO | 2023-07-11 11:54:20,726 | Function setup complete.
INFO | 2023-07-11 11:54:20,731 | Running num_cpus_cluster via HTTP
INFO | 2023-07-11 11:54:21,326 | Submitted remote call to cluster for num_cpus_cluster_20230711_115420
:job_id:01000000
:task_name:get_fn_from_pointers
:job_id:01000000
INFO | 2023-07-11 18:54:21,732 | Loaded Runhouse config from /home/ubuntu/.rh/config.yaml
:task_name:get_fn_from_pointers
INFO | 2023-07-11 18:54:22,423 | Appending /home/ubuntu/test-runhouse to sys.path
INFO | 2023-07-11 18:54:22,424 | Importing module test
SkyPilot collects usage data to improve its services. `setup` and `run` commands are not collected to ensure privacy.
Usage logging can be disabled by setting the environment variable SKYPILOT_DISABLE_USAGE_COLLECTION=1.
INFO | 2023-07-11 18:54:22,944 | Found credentials in shared credentials file: ~/.aws/credentials
I 07-11 18:54:23 aws_catalog.py:120] Fetching availability zones mapping for AWS...
I 07-11 18:54:25 optimizer.py:636] == Optimizer ==
I 07-11 18:54:25 optimizer.py:647] Target: minimizing cost
I 07-11 18:54:25 optimizer.py:659] Estimated cost: $0.4 / hour
I 07-11 18:54:25 optimizer.py:659] 
I 07-11 18:54:25 optimizer.py:732] Considered resources (1 node):
I 07-11 18:54:25 optimizer.py:781] ------------------------------------------------------------------------------------------
I 07-11 18:54:25 optimizer.py:781]  CLOUD   INSTANCE      vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE   COST ($)   CHOSEN   
I 07-11 18:54:25 optimizer.py:781] ------------------------------------------------------------------------------------------
I 07-11 18:54:25 optimizer.py:781]  AWS     m6i.2xlarge   8       32        -              us-east-1     0.38          ✔     
I 07-11 18:54:25 optimizer.py:781] ------------------------------------------------------------------------------------------
I 07-11 18:54:25 optimizer.py:781] 
I 07-11 18:54:25 cloud_vm_ray_backend.py:3495] Creating a new cluster: "runhouse" [1x AWS(m6i.2xlarge)].
I 07-11 18:54:25 cloud_vm_ray_backend.py:3495] Tip: to reuse an existing cluster, specify --cluster (-c). Run `sky status` to see existing clusters.
I 07-11 18:54:25 cloud_vm_ray_backend.py:1215] To view detailed progress: tail -n100 -f /home/ubuntu/sky_logs/sky-2023-07-11-18-54-22-466640/provision.log
I 07-11 18:54:26 cloud_vm_ray_backend.py:1539] Launching on AWS us-east-1 (us-east-1a,us-east-1b,us-east-1c,us-east-1d,us-east-1f)

@shyamsn97
Copy link
Author

weird because it tries to start the cluster twice? Unless I'm misunderstanding the process there

@shyamsn97 shyamsn97 reopened this Jul 11, 2023
@dongreenberg
Copy link
Contributor

I think I see the issue. Your script isn't running inside an if __name__ == "__main__": block so when the module is being imported on the cluster to run the function, the code to up the cluster runs again on the cluster. The corrected code would be:

import runhouse as rh

def num_cpus():
    import multiprocessing
    return f"Num cpus: {multiprocessing.cpu_count()}"

if __name__ == "__main__":
    num_cpus()

    cluster = rh.ondemand_cluster(
              name="runhouse",
              instance_type="CPU:8",
              provider="aws",      # options: "AWS", "GCP", "Azure", "Lambda", or "cheapest"
          )
    cluster.up_if_not()
    num_cpus_cluster = rh.function(name="num_cpus_cluster", fn=num_cpus).to(system=cluster, reqs=["./"])
    num_cpus_cluster()

We're actually introducing new logic now that will make it impossible for a cluster to start itself again because that's just silly, so your code may actually work as is in the next release, but using the if __name__ block is best practice anyway.

@shyamsn97
Copy link
Author

I see! Will try that now!

@shyamsn97
Copy link
Author

That worked! Thanks! This is good to close now :)

@tullie
Copy link

tullie commented Aug 28, 2023

@dongreenberg i'm running into this issue but require fastAPI > 0.1 and Pydantic 2.x. How easy would it be to support these with the latest version of Runhouse (which seems to fix this issue)?

@dongreenberg
Copy link
Contributor

This is fixed. I've tested with a few of the new versions of fastapi and pydantic on main and something changed about our code, their code, or both such that they now work out of the box. I've relaxed the requirement on main, which should be in the release today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants