Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

polish observability (o11y) docs #39069

Merged
merged 12 commits into from
Sep 8, 2023

Conversation

scottsun94
Copy link
Contributor

@scottsun94 scottsun94 commented Aug 29, 2023

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@scottsun94 scottsun94 marked this pull request as ready for review August 29, 2023 18:13
@scottsun94 scottsun94 changed the title polish o11y docs polish observability (o11y) docs Aug 29, 2023
Copy link
Contributor

@architkulkarni architkulkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc/source/cluster/configure-manage-dashboard.md change looks good to me, giving codeowner approval

Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM

doc/source/cluster/configure-manage-dashboard.md Outdated Show resolved Hide resolved
doc/source/ray-observability/key-concepts.rst Outdated Show resolved Hide resolved
- memray
- GPU profiling
- PyTorch Profiler
- Ray Task / Actor profiling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Ray Task / Actor profiling
- Ray Task / Actor timeline tracing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's tracing. When talking about tracing, to me, it's usually about profiling the time a request takes to go through different services.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think profiling is too general. Maybe call it timeline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this title. Only kept the "Ray Task/Actor timeline"

doc/source/ray-observability/user-guides/profiling.md Outdated Show resolved Hide resolved

## GPU profiling
GPU and GRAM profiling for your GPU workloads like distributed training. This helps you analyze performance and debug memory issues.
- PyTorch profiler is supported out of box when used with Ray Train
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a link?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what link? Ray Train doesn't have doc for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm so we don't have any example for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither Ray Train nor Ray Data has doc about it yet. We should ask them to add it later.

doc/source/ray-observability/key-concepts.rst Outdated Show resolved Hide resolved
@scottsun94 scottsun94 force-pushed the 0828-doc-fix branch 4 times, most recently from 6637b70 to 276a9e3 Compare September 5, 2023 18:11
Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (changes about KubeRay)

- memray
- GPU profiling
- PyTorch Profiler
- Ray Task / Actor profiling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think profiling is too general. Maybe call it timeline?

@scottsun94 scottsun94 force-pushed the 0828-doc-fix branch 2 times, most recently from 4ea2a0b to 3c28699 Compare September 6, 2023 20:28
Copy link
Contributor

@angelinalg angelinalg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some suggestions. Nice job!

doc/source/ray-observability/key-concepts.rst Show resolved Hide resolved

(profiling-memory)=
## Memory profiling
Memory profiling for Driver and Worker processes. This helps you analyze memory allocations in applications, trace memory leaks, and debug high/low memory or out of memory issues.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory profiling for Driver and Worker processes. is not a complete sentence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edited it. WDYT?


(profiling-timeline)=
## Ray Task / Actor timeline
Profiling the execution time of Ray Tasks and Actors with Ray Timeline. This helps you analyze performance, identify the stragglers, and understand the distribution of workloads.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Profiling the execution time of Ray Tasks and Actors with Ray Timeline. is not a complete sentence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edited it. WDYT?

Signed-off-by: Huaiwei Sun <scottsun94@gmail.com>
Signed-off-by: Huaiwei Sun <scottsun94@gmail.com>
Signed-off-by: Huaiwei Sun <scottsun94@gmail.com>
Signed-off-by: Huaiwei Sun <scottsun94@gmail.com>
Signed-off-by: Huaiwei Sun <scottsun94@gmail.com>
Signed-off-by: Huaiwei Sun <scottsun94@gmail.com>
Signed-off-by: Huaiwei Sun <scottsun94@gmail.com>
Signed-off-by: Huaiwei Sun <scottsun94@gmail.com>
scottsun94 and others added 4 commits September 6, 2023 19:20
…erformance.rst

Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Signed-off-by: Huaiwei Sun <scottsun94@gmail.com>
@matthewdeng matthewdeng merged commit daf0dc1 into ray-project:master Sep 8, 2023
12 of 15 checks passed
@angelinalg angelinalg added docs An issue or change related to documentation v2.7.0-pick labels Sep 8, 2023
angelinalg added a commit to angelinalg/ray that referenced this pull request Sep 9, 2023
Signed-off-by: Huaiwei Sun <scottsun94@gmail.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Co-authored-by: matthewdeng <matt@anyscale.com>
GeneDer pushed a commit that referenced this pull request Sep 9, 2023
#39510)

* Update metrics.md (#38512)

1. there are 3 dashboards in the folder now. Refer to the folder instead of only 1 dashboard
2. include "Copy" since people need to copy this from the head node to the Grafana server

Signed-off-by: Huaiwei Sun <scottsun94@gmail.com>

* polish observability (o11y) docs (#39069)

Signed-off-by: Huaiwei Sun <scottsun94@gmail.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Co-authored-by: matthewdeng <matt@anyscale.com>

* [Doc] Unbold "Use Cases" in sidebar (#39295)

Signed-off-by: pdmurray <peynmurray@gmail.com>

* [docs] Cleanup for other AIR concepts (#39400)

* [doc][clusters] add doc for setting up Ray and K8s (#39408)

---------

Signed-off-by: Huaiwei Sun <scottsun94@gmail.com>
Signed-off-by: pdmurray <peynmurray@gmail.com>
Co-authored-by: Huaiwei Sun <scottsun94@gmail.com>
Co-authored-by: matthewdeng <matt@anyscale.com>
Co-authored-by: Peyton Murray <peynmurray@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
jimthompson5802 pushed a commit to jimthompson5802/ray that referenced this pull request Sep 12, 2023
Signed-off-by: Huaiwei Sun <scottsun94@gmail.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Co-authored-by: matthewdeng <matt@anyscale.com>
Signed-off-by: Jim Thompson <jimthompson5802@gmail.com>
vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023
Signed-off-by: Huaiwei Sun <scottsun94@gmail.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Co-authored-by: matthewdeng <matt@anyscale.com>
Signed-off-by: Victor <vctr.y.m@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs An issue or change related to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Dashboard] Document that people need to specify --dashboard-host when starting ray within a docker container
6 participants