-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
polish observability (o11y) docs #39069
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc/source/cluster/configure-manage-dashboard.md change looks good to me, giving codeowner approval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM
doc/source/ray-observability/user-guides/debug-apps/debug-hangs.rst
Outdated
Show resolved
Hide resolved
doc/source/ray-observability/user-guides/debug-apps/debug-hangs.rst
Outdated
Show resolved
Hide resolved
- memray | ||
- GPU profiling | ||
- PyTorch Profiler | ||
- Ray Task / Actor profiling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Ray Task / Actor profiling | |
- Ray Task / Actor timeline tracing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's tracing. When talking about tracing, to me, it's usually about profiling the time a request takes to go through different services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think profiling is too general. Maybe call it timeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this title. Only kept the "Ray Task/Actor timeline"
|
||
## GPU profiling | ||
GPU and GRAM profiling for your GPU workloads like distributed training. This helps you analyze performance and debug memory issues. | ||
- PyTorch profiler is supported out of box when used with Ray Train |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a link?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what link? Ray Train doesn't have doc for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm so we don't have any example for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither Ray Train nor Ray Data has doc about it yet. We should ask them to add it later.
6637b70
to
276a9e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (changes about KubeRay)
- memray | ||
- GPU profiling | ||
- PyTorch Profiler | ||
- Ray Task / Actor profiling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think profiling is too general. Maybe call it timeline?
4ea2a0b
to
3c28699
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some suggestions. Nice job!
doc/source/ray-observability/user-guides/debug-apps/optimize-performance.rst
Outdated
Show resolved
Hide resolved
|
||
(profiling-memory)= | ||
## Memory profiling | ||
Memory profiling for Driver and Worker processes. This helps you analyze memory allocations in applications, trace memory leaks, and debug high/low memory or out of memory issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory profiling for Driver and Worker processes.
is not a complete sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edited it. WDYT?
|
||
(profiling-timeline)= | ||
## Ray Task / Actor timeline | ||
Profiling the execution time of Ray Tasks and Actors with Ray Timeline. This helps you analyze performance, identify the stragglers, and understand the distribution of workloads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Profiling the execution time of Ray Tasks and Actors with Ray Timeline.
is not a complete sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edited it. WDYT?
Signed-off-by: Huaiwei Sun <scottsun94@gmail.com>
Signed-off-by: Huaiwei Sun <scottsun94@gmail.com>
Signed-off-by: Huaiwei Sun <scottsun94@gmail.com>
Signed-off-by: Huaiwei Sun <scottsun94@gmail.com>
5ff24cc
to
3a27ec6
Compare
…erformance.rst Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Huaiwei Sun <scottsun94@gmail.com>
Signed-off-by: Huaiwei Sun <scottsun94@gmail.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: matthewdeng <matt@anyscale.com>
#39510) * Update metrics.md (#38512) 1. there are 3 dashboards in the folder now. Refer to the folder instead of only 1 dashboard 2. include "Copy" since people need to copy this from the head node to the Grafana server Signed-off-by: Huaiwei Sun <scottsun94@gmail.com> * polish observability (o11y) docs (#39069) Signed-off-by: Huaiwei Sun <scottsun94@gmail.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: matthewdeng <matt@anyscale.com> * [Doc] Unbold "Use Cases" in sidebar (#39295) Signed-off-by: pdmurray <peynmurray@gmail.com> * [docs] Cleanup for other AIR concepts (#39400) * [doc][clusters] add doc for setting up Ray and K8s (#39408) --------- Signed-off-by: Huaiwei Sun <scottsun94@gmail.com> Signed-off-by: pdmurray <peynmurray@gmail.com> Co-authored-by: Huaiwei Sun <scottsun94@gmail.com> Co-authored-by: matthewdeng <matt@anyscale.com> Co-authored-by: Peyton Murray <peynmurray@gmail.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Huaiwei Sun <scottsun94@gmail.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: matthewdeng <matt@anyscale.com> Signed-off-by: Jim Thompson <jimthompson5802@gmail.com>
Signed-off-by: Huaiwei Sun <scottsun94@gmail.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: matthewdeng <matt@anyscale.com> Signed-off-by: Victor <vctr.y.m@example.com>
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.