Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The CodeFlare Stack - Scenario 1 doesn't work #783

Open
nheidloff opened this issue Feb 17, 2023 · 8 comments · Fixed by #784
Open

The CodeFlare Stack - Scenario 1 doesn't work #783

nheidloff opened this issue Feb 17, 2023 · 8 comments · Fixed by #784

Comments

@nheidloff
Copy link

Hi, I tried to run https://github.com/project-codeflare/codeflare-cli/blob/main/docs/scenarios/1.md, but the ray install doesn't work.

Screenshot 2023-02-17 at 13 37 23

I use ROKS on VPC eu-gb-1-bx2.4x16. Is my cluster maybe not big enough?

Thanks!

@nheidloff
Copy link
Author

I also tried IKS, but got the same error.

@starpit
Copy link
Collaborator

starpit commented Feb 17, 2023

hello there! it looks like our auto-generated name for the helm chart exceeds 53 characters, in your case. thanks for the bug report!

@starpit
Copy link
Collaborator

starpit commented Feb 17, 2023

fix in progress: guidebooks/store#550

@starpit
Copy link
Collaborator

starpit commented Feb 17, 2023

2.6.1 has been released. This release attempts to limit the length of the helm chart to 53 characters.

update: hmm, most testing may be needed, investigating...

update 2: ok, we should be fine, in terms of length now. but... we will also need to downcase $USER, as helm (and kubernetes) do not like uppercase characters. this is less likely to be a problem, but it's something we should protect against. guidebooks/store#551

@nheidloff
Copy link
Author

Wau, that was fast. Thanks @starpit! The error disappeared, but now I get another one:

Waiting for Ray Head node
Waiting for Ray Head node
Waiting for Ray Head node
pod/ray-niklasheidloff-1fac99cb-c4ec-46cf-a932-ac52b24695-ray-rlxsl condition met
Head node is ready
Ray API is active with RAY_ADDRESS=http://localhost:8684
No resources found in madns namespace.
Waiting for Ray Worker nodes
No resources found in madns namespace.
Waiting for Ray Worker nodes
No resources found in madns namespace.
Waiting for Ray Worker nodes
E0217 17:17:27.726527   23068 portforward.go:234] lost connection to pod
No resources found in madns namespace.
Waiting for Ray Worker nodes
...

@starpit
Copy link
Collaborator

starpit commented Feb 17, 2023

Does the job ever start up? I think that error should only affect availability of the ray api/dashboard access:

Ray API is active with RAY_ADDRESS=http://localhost:8684

but should not otherwise affect submission of the job. 🤔

@nheidloff
Copy link
Author

Just tried it again. The script never finishes. I don't see a job in my namespace. I only see the Ray pod in my ns:

Screenshot 2023-02-20 at 08 28 48

@starpit
Copy link
Collaborator

starpit commented Feb 20, 2023

2.8.1 includes more fixes for too-long helm chart resources. #792

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants