Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KubeRay Deployment Failure with Large ServeZip File in Working_Dir #44614

Open
USER-HFC opened this issue Apr 10, 2024 · 3 comments
Open

KubeRay Deployment Failure with Large ServeZip File in Working_Dir #44614

USER-HFC opened this issue Apr 10, 2024 · 3 comments
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core core-runtime-env Issues related to Ray environment dependencies @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. P1 Issue that should be fixed within a few weeks serve Ray Serve Related Issue

Comments

@USER-HFC
Copy link

USER-HFC commented Apr 10, 2024

What happened + What you expected to happen

I am using KubeRay with the image ray_ml:2.9.0. I created a server of size 92MB and configured it to the working_dir in the yaml. After starting, the head node's pod did not fully pull the zip file. Checking the container's tmp folder, I found my zip package there but it was not completely downloaded, resulting in an empty folder after unzipping, which caused the deployment to fail. However, when I configure the working_dir to a smaller servezip, this problem does not occur.

Versions / Dependencies

ray_ml:2.9.0 image ubuntu18.0.4 kuberay

Reproduction script

pass

@USER-HFC USER-HFC added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 10, 2024
@USER-HFC
Copy link
Author

@kevin85421

@kevin85421 kevin85421 added serve Ray Serve Related Issue core Issues that should be addressed in Ray Core core-runtime-env Issues related to Ray environment dependencies P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 11, 2024
@kevin85421
Copy link
Member

cc @GeneDer @fishbone

@GeneDer
Copy link
Contributor

GeneDer commented Apr 11, 2024

@USER-HFC Can you share a minimum reproducible code? Also possibly logs from runtime env agent might show the reason for it failing.

Also, just a suspicion. If your download is known takes longer than 600s, then you can specify this config.setup_timeout_seconds in the runtime env to give it longer setup time https://docs.ray.io/en/latest/ray-core/api/doc/ray.runtime_env.RuntimeEnvConfig.html#ray-runtime-env-runtimeenvconfig

@anyscalesam anyscalesam added the @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. label May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core core-runtime-env Issues related to Ray environment dependencies @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. P1 Issue that should be fixed within a few weeks serve Ray Serve Related Issue
Projects
None yet
Development

No branches or pull requests

4 participants