-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grafana dashboard improvement #1254
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we actually sure what kind of disk usage this really is? This would suggest it is indeed a disk space usage, but on the container root. Not in the data directory. While the root disk space is probably uninteresting, the disk space usage of the volume with Kafka logs would be the interesting part.
@@ -783,7 +783,7 @@ | |||
"thresholds": [], | |||
"timeFrom": null, | |||
"timeShift": null, | |||
"title": "Disk Usage", | |||
"title": "Disk Space Usage", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we shorten this to simply say "Disk Space"?
What do others think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imho it needs to make clear whether it's used space, free space or total space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Tom. This needs to make clear if it is used disk space or free disk space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can keep "Disk Space Usage" or do we want here something like "Used Disk Space"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me "Used Disk Space" is the most clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should clarify my comment above first before deciding the name :-). What kind of disk space is this and is it really useful? :-o
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This metric is sum(container_fs_usage_bytes ... )
so it sums all used file system space used by kafka brokers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @stanlyDoge
Building on Tom's suggestion, we could name it:
Used Disk Space Across All Brokers
OR
Used Disk Space (All Brokers)
I prefer option 1 if there's space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a separate chart for each broker, or? So it is IMHO not across all brokers.
I'm still a bit unsure about what does this mean ... what exactly does sums all used file system space used by kafka brokers
mean? Is it disk usage in /var/lib/kafka/
? Or not?
@@ -7,6 +7,14 @@ data: | |||
groups: | |||
- name: kafka | |||
rules: | |||
- alert: RunningOutOfSpace | |||
expr: kubelet_volume_stats_available_bytes < 5368709120 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the test of whether the metric is available or not. Can we append conditions here @ppatierno ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We should try to test this on a real cluster.
@@ -252,3 +256,6 @@ This alert is raised when this value is greate than 10 (ticks), providing the ac | |||
This value goes up when the server receives more requests than it can process. | |||
This alert is raised when this value is greate than 10 (ticks), providing the actual number of outstanding requests for each server. | |||
|
|||
* `RunningOutOfSpace`: the available volumes bytes metric gives the size of disk which can be used for writing Zookeeper's data. | |||
This alert is raised when this value is lower than 5GiB, providing the information of the disk running out of the space for each persistent volume claim. | |||
Note: Availability of this metric (and alert) is dependent on your {ProductPlatformName}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Availability of this metric (and alert) is dependent on your {ProductPlatformName}. | |
NOTE: The availability of this metric and alert is dependent on your version of {ProductPlatformName}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I've made a few Suggested Changes.
Co-Authored-By: stanlyDoge <sknot@redhat.com>
Co-Authored-By: stanlyDoge <sknot@redhat.com>
Co-Authored-By: stanlyDoge <sknot@redhat.com>
Co-Authored-By: stanlyDoge <sknot@redhat.com>
Type of change
Description
Changed ambiguous label and fixed unit.
Checklist
./design