-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting ValueError: Error calling check on server: Internal Server Error
when checking server on an aws
cluster
#83
Comments
ValueError: Error calling check on server: Internal Server Error
when checking serverValueError: Error calling check on server: Internal Server Error
when checking server on an aws
cluster
After configuring gcp correctly, looks like this works when specifying "gcp" in provider:
|
Hi! Thank you for the detailed bug report. I actually noticed some new breakage this evening relating to the FastAPI 0.100.0 release, but given it's working on GCP I'm not sure that's it. Trying to reproduce on AWS now. |
I was able to reproduce this and it is indeed an issue with the latest fastapi release (you can confirm by sshing into the server, viewing the server logs at |
Just pushed https://github.com/run-house/runhouse/releases/tag/v0.0.8, which should fix this. |
Awesome! Thanks so much! |
@shyamsn97 confirming that the fix worked? |
Yep seems like that worked and now I'm not getting an error on check_server. However the function is hanging for aws hangs indefinitely. code from example: import runhouse as rh
def num_cpus():
import multiprocessing
return f"Num cpus: {multiprocessing.cpu_count()}"
num_cpus()
# Using a Cloud provider
cluster = rh.ondemand_cluster(
name="runhouse",
instance_type="CPU:8",
provider="aws", # options: "AWS", "GCP", "Azure", "Lambda", or "cheapest"
)
cluster.up_if_not()
num_cpus_cluster = rh.function(name="num_cpus_cluster", fn=num_cpus).to(system=cluster, reqs=["./"])
num_cpus_cluster() Output:
|
weird because it tries to start the cluster twice? Unless I'm misunderstanding the process there |
I think I see the issue. Your script isn't running inside an
We're actually introducing new logic now that will make it impossible for a cluster to start itself again because that's just silly, so your code may actually work as is in the next release, but using the |
I see! Will try that now! |
That worked! Thanks! This is good to close now :) |
@dongreenberg i'm running into this issue but require fastAPI > 0.1 and Pydantic 2.x. How easy would it be to support these with the latest version of Runhouse (which seems to fix this issue)? |
This is fixed. I've tested with a few of the new versions of fastapi and pydantic on main and something changed about our code, their code, or both such that they now work out of the box. I've relaxed the requirement on main, which should be in the release today. |
Hi! First off, I just wanna say runhouse is an awesome project! Really gonna revolutionize how people run machine learning workflows!
Describe the bug
I'm running into an issue where I can't run any remote functions on the cluster, but I can do a
cluster.run_python(...)
Here's the code I'm running:
This runs fine until the
cluster.check_server()
, as you can see here:Not sure if I'm doing something wrong here, but I think my credentials work because I can see that the cluster is being created and I can ssh into it. My package versions can be seen below, let me know if you need more information! Thanks!
Versions
In addition, here's the end of the setup of the cluster:
The text was updated successfully, but these errors were encountered: