Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Improve and fix Prometheus & Grafana integrations #895

Merged
merged 6 commits into from
Feb 15, 2023

Conversation

kevin85421
Copy link
Member

@kevin85421 kevin85421 commented Feb 6, 2023

Why are these changes needed?

The old document (observability.md) of Prometheus and Grafana has some issues that make users hard to follow.

  • Inconsistency between install.sh and observability.md

    • ServiceMonitor example in the doc and ServiceMonitor instance created by the script.
    • PodMonitor example in the doc and PodMonitor instance created by the script.
  • Inconsistency between observability.md and KubeRay operator kuberay/#230.

    • The document uses port 9001 as a metrics endpoint, but KubeRay operator uses 8080 as a default metrics endpoint.
  • Some missing pieces:

    • Both examples of ServiceMonitor and PodMonitor in observability.md lack some labels to match serviceMonitorSelector and podMonitorSelector.
    • How to access Prometheus UI?
    • How to access Grafana UI?
    • How to import Ray Dashboard into Grafana?

Related issue number

Closes #871

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

prometheus_web_ui

  • Prometheus UI: Prometheus can detect both ServiceMonitor (ray-head-monitor) and PodMonitor (ray-worker-monitor).

grafana_ray_dashboard

  • Grafana UI: Load Ray Dashboard into Grafana via config/grafana/dashboard_default.json.

@kevin85421 kevin85421 marked this pull request as ready for review February 6, 2023 15:06
@kevin85421
Copy link
Member Author

cc @Yicheng-Lu-llll would you mind reviewing this PR? Thanks!

@kevin85421 kevin85421 added this to the v0.5.0 release milestone Feb 6, 2023
@kevin85421
Copy link
Member Author

Gentle ping @scarlet25151. Would you mind taking a look at this PR? Thank you!

@architkulkarni architkulkarni self-assigned this Feb 14, 2023

```sh
# Forward the port of Grafana
kubectl port-forward --address 0.0.0.0 deployment/prometheus-grafana -n prometheus-system 3000:3000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got this error at this step:

Unable to listen on port 3000: Listeners failed to create with the following errors: [unable to create listener: Error listen tcp4 0.0.0.0:3000: bind: address already in use]
error: unable to listen on any of the requested ports: [{3000 3000}]

But I'm not sure what could be using port 3000. Do I need to cancel one of the previous steps?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I already have grafana running for some reason, I think it's unrelated to this tutorial:

sudo lsof -i :3000
Password:
COMMAND    PID   USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
grafana-s 3915 archit   21u  IPv6 0xd6d807306e5fa78f      0t0  TCP *:hbci (LISTEN)

Let me figure out how to stop it. I don't know if it's a common enough situation to mention in this document.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naively killed the process but it kept restarting. The solution here https://unix.stackexchange.com/questions/601063/port-3000-is-always-being-hogged-by-grafana-server
is to run brew services stop grafana. Probably don't need to mention in this doc

Copy link
Contributor

@architkulkarni architkulkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through the tutorial, it all works! Left some minor suggestions.

@kevin85421 kevin85421 merged commit 4714892 into ray-project:master Feb 15, 2023
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
…ect#895)

The old document (observability.md) of Prometheus and Grafana has some issues that make users hard to follow. This PR fixes these issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Enable Grafana Dashboard integration with KubeRay
4 participants