Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Grafana] Use container_memory_working_set_bytes to report memory consumption #3098

Merged
merged 1 commit into from
Sep 29, 2022
Merged

[Grafana] Use container_memory_working_set_bytes to report memory consumption #3098

merged 1 commit into from
Sep 29, 2022

Conversation

Santosh1176
Copy link
Contributor

@Santosh1176 Santosh1176 commented Sep 12, 2022

This PR could close #3086

Updating current metric of type go_memstats_alloc_bytes_total with container_memory_working_set_bytes to correctly report memory consumption.

@stefanprodan
Copy link
Member

@Santosh1176 can you post a print screen with the two graphs, how it looked before and after? Thanks

@Santosh1176
Copy link
Contributor Author

@stefanprodan Apologies for not testing it before raising a PR. I shall keep that in mind for the future.
I referred the Prometheus docs, which recomended not using rate on Guage metrics. But, this wasn't working when I tried it locally:

error-monitor1

When I applied rate to the metric. Its showing as below:

with-rate-monitor

@stefanprodan
Copy link
Member

this wasn't working when I tried it locally

It doesn’t work because the query is wrong, remove the ()[1m]

@Santosh1176
Copy link
Contributor Author

Thank you @stefanprodan
The updated graph:

time

@stefanprodan
Copy link
Member

Looks good now! Thank you, can you please squash the commits into a single one, rebase with upstream and force push.

@aryan9600
Copy link
Member

@Santosh1176 i have tested your changes and while using go_memstats_heap_sys_bytes does make the metrics much more accurate, i don't think its the right one to go for. container_memory_working_set_bytes is the query that we should use as its the standard query used in other dashboards such as Pod Stats & Info and its also what the scheduler tracks before it OOMKills the container (ref: https://faun.pub/how-much-is-too-much-the-linux-oomkiller-and-used-memory-d32186f29c9d).

@Santosh1176
Copy link
Contributor Author

container_memory_working_set_bytes is the query that we should use

@aryan9600 @stefanprodan Thanks for this valuable information, so for this has been a good learning experience.
I tried applying the said metric to the dashboard, results are below:

container

I had a doubt, as in other metric types I could not select pods to show their memory usage stats. Am I doing something wrong here? Or its the behavior specific to this metric?

@aryan9600
Copy link
Member

I had a doubt, as in other metric types I could not select pods to show their memory usage stats. Am I doing something wrong here?

Since we are using sum() and the filter passed to the query selects all pods in the namespace with "controller" in its name, you're seeing the memory being consumed by all these pods. You need to use a group by like:

sum(container_memory_working_set_bytes{namespace="$namespace",container!="POD",container!="",pod=~".*-controller-.*"}) by (pod)

@stefanprodan
Copy link
Member

@Santosh1176 please rebase instead of merging, The 0540722 commit has nothing to do with this PR.

@Santosh1176
Copy link
Contributor Author

0540722 commit has nothing to do with this PR.

Sorry! my bad, I accidentally added the unrelated commit while squashing with rebase, apologies. Can I now use the drop option while rebasing, or that would further complicate the mess.
Thanks

@stefanprodan
Copy link
Member

Please squash the commits into a single one and it's ready to go. Thanks!

@Santosh1176
Copy link
Contributor Author

Please squash the commits into a single one and it's ready to go. Thanks!

Done. Request review @stefanprodan @aryan9600

@aryan9600
Copy link
Member

@Santosh1176 you need to force push the squashed commit as well

@stefanprodan stefanprodan changed the title Updating with Guage to update dashboard correctly [Grafana] Use container_memory_working_set_bytes to report memory consumption Sep 28, 2022
@stefanprodan stefanprodan added the area/monitoring Monitoring related issues and pull requests label Sep 28, 2022
@stefanprodan stefanprodan mentioned this pull request Sep 28, 2022
11 tasks
Signed-off-by: Santosh Kaluskar <dtshbl@gmail.com>
Copy link
Member

@stefanprodan stefanprodan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @Santosh1176 🏅

@stefanprodan stefanprodan merged commit b8fd46d into fluxcd:main Sep 29, 2022
@Santosh1176
Copy link
Contributor Author

Thank you @stefanprodan @aryan9600

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/monitoring Monitoring related issues and pull requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Monitoring - Flux Control Plane Dashboard shows incorrect memory consumption
3 participants