Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[autoscaler] Make log file mount path more specific. #391

Merged

Conversation

DmitriGekhtman
Copy link
Collaborator

@DmitriGekhtman DmitriGekhtman commented Jul 19, 2022

Signed-off-by: Dmitri Gekhtman dmitri.m.gekhtman@gmail.com

Why are these changes needed?

Should resolve ray-project/ray#26064, where users report the autoscaler crash-looping due to a missing log directory.

Gives the shared log mount configured by the KubeRay operator for Ray logs a more specific path.
(This log mount was introduced in #274)

Previously the path was /tmp/ray. Instead, we should use /tmp/ray/session_latest/logs, which is where Ray and the autoscaler would like to write logs.

By using the more specific path, we ensure that the path will exist on container startup, preventing the file-not-found issues described in ray-project/ray#26064.

Ray attempts to create this directory with os.makedirs(path, exist_ok=True), so this should not interfere with Ray. (Obviously, this will be confirmed by the tests.)

To be extra safe, I'll go ahead and make the autoscaler also os.makedirs(path, exist_ok=True): ray-project/ray#26748

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
@DmitriGekhtman
Copy link
Collaborator Author

DmitriGekhtman commented Jul 20, 2022

Extra paranoia --- this Ray PR should also solve the issue: ray-project/ray#26748

@DmitriGekhtman
Copy link
Collaborator Author

Oh I need to fix the test...

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
@DmitriGekhtman
Copy link
Collaborator Author

DmitriGekhtman commented Jul 20, 2022

There was an update to the RayJobs CRD when I ran make test locally -- I'm guessing we just forgot to do a final make manifests on the recent Jobs PR, so this is ok. cc @harryge00

@harryge00
Copy link
Contributor

There was an update to the RayJobs CRD when I ran make test locally -- I'm guessing we just forgot to do a final make manifests on the recent Jobs PR, so this is ok. cc @harryge00

I added some comments to the job CRD and forgot to run make manifests later. Thanks to fix that

@DmitriGekhtman DmitriGekhtman changed the title [autoscaler] Make file mount path more specific. [autoscaler] Make log file mount path more specific. Jul 20, 2022
@Jeffwan
Copy link
Collaborator

Jeffwan commented Jul 20, 2022

To be extra safe, I'll go ahead and make the autoscaler also os.makedirs(path, exist_ok=True): ray-project/ray#26748

that's great!

@Jeffwan Jeffwan merged commit dc77177 into ray-project:master Jul 20, 2022
scv119 pushed a commit to ray-project/ray that referenced this pull request Jul 20, 2022
Why are these changes needed?
Together with ray-project/kuberay#391, this should address #26064

Have the autoscaler try to make the Ray log directory before setting up logging.
ray-project/kuberay#391 should be good enough for this, but this PR makes things safer in case the KubeRay user overrides log mounts or something like that.
@DmitriGekhtman DmitriGekhtman deleted the dmitri/autoscaler-mount-name branch July 20, 2022 21:47
Rohan138 pushed a commit to Rohan138/ray that referenced this pull request Jul 28, 2022
Why are these changes needed?
Together with ray-project/kuberay#391, this should address ray-project#26064

Have the autoscaler try to make the Ray log directory before setting up logging.
ray-project/kuberay#391 should be good enough for this, but this PR makes things safer in case the KubeRay user overrides log mounts or something like that.


Signed-off-by: Rohan138 <rapotdar@purdue.edu>
Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this pull request Aug 18, 2022
Why are these changes needed?
Together with ray-project/kuberay#391, this should address ray-project#26064

Have the autoscaler try to make the Ray log directory before setting up logging.
ray-project/kuberay#391 should be good enough for this, but this PR makes things safer in case the KubeRay user overrides log mounts or something like that.

Signed-off-by: Stefan van der Kleij <s.vanderkleij@viroteq.com>
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
* Fix log path.

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

* Rerun CI.

* Fix test.

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

* Rerun CI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[kuberay][autoscaler] monitor.log might go missing, leading to crashlooping autoscaler
5 participants