-
Notifications
You must be signed in to change notification settings - Fork 683
[Prometheus] Add ray_clusters_created_total metric
#3204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Prometheus] Add ray_clusters_created_total metric
#3204
Conversation
ray_clusters_created_total metric
|
@troychiu PTAL, thx |
9976c1b to
92e7b83
Compare
c446b4a to
6fa17fc
Compare
| } else if len(headPods.Items) == 0 { | ||
| // Create head Pod if it does not exist. | ||
| logger.Info("reconcilePods: Found 0 head Pods; creating a head Pod for the RayCluster.") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should increment after successfully create the head pod to avoid incrementing multiple times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. thx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still possible to be counted multiple times. KubeRay is possible to delete the head Pod if the restart policy is Never.
| // Create head Pod if it does not exist. | ||
| logger.Info("reconcilePods: Found 0 head Pods; creating a head Pod for the RayCluster.") | ||
|
|
||
| if r.EnableMetrics { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have a function in metrics.go handle EnableMetrics in a general way?
|
|
||
| if r.EnableMetrics { | ||
| creatorCRDType := getCreatorCRDType(*instance) | ||
| // Increment the counter for ray_clusters_created_total metric based on the creator CRD type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can move these checkings to ray_cluster_metrics.go to simplify the operator's logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe still in this file but move to a function? What do you think it's better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a function called emitRayClusterMetrics in raycluster_controller.go. It takes a metric name and decides which metric to emit based on that.
WDYT?
a4290f9 to
8baf776
Compare
8baf776 to
ef1a740
Compare
| "sigs.k8s.io/controller-runtime/pkg/metrics" | ||
| ) | ||
|
|
||
| var rayClustersCreatedCounter = prometheus.NewCounterVec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better not to use package-level variable. Can you open another PR first to improve it by:
ray_cluster_metric.go
type RayClusterMetricCollector struct {
// Metrics
rayClusterCreatedTotal: ...
}
func NewRayClusterMetricCollector(...) ... {
}main.go: Initialize the collectors if this feature is enabled. Pass the collectors into controllers.
Example:
| } else if len(headPods.Items) == 0 { | ||
| // Create head Pod if it does not exist. | ||
| logger.Info("reconcilePods: Found 0 head Pods; creating a head Pod for the RayCluster.") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still possible to be counted multiple times. KubeRay is possible to delete the head Pod if the restart policy is Never.
151382b to
34ba8b5
Compare
Signed-off-by: win5923 <ken89@kimo.com>
Signed-off-by: win5923 <ken89@kimo.com>
Signed-off-by: win5923 <ken89@kimo.com>
34ba8b5 to
0a3d445
Compare
Why are these changes needed?
Add
ray_clusters_created_totalmetric to track the total number of RayClusters created.created_by_ray_job: true/false-
created_by_ray_service: true/false-
namespaceManual test:
Related issue number
Closes #3175
Checks