'Ram used' pi graph isn't correct. #13

linuxshokunin · 2019-04-11T10:31:28Z

Hi

Total and used ram should show up the other way around.
"Ram used" is showing 123.6GB of 14.4GB.

herbrandson · 2019-04-11T13:31:07Z

Are you seeing this for pods or for nodes? This is technically possible for pods, since it is a measure of how many resources they have been allocated (i.e. the requested resources) vs. how many resources are actually being used. This is especially likely in the case where you have some number of pods that don't specify resource requests/limits.

That being said... it's clearly confusing. Any thoughts about how to better communicate what's going on there?

linuxshokunin · 2019-04-11T15:08:31Z

Sorry, I didn't specify the page properly.

It's in the cluster screen.
This shows total requested rams for pods vs actual use. Most of the pods don't have resource requests so yeah I got it now.

Is it possible to exclude pods without resources requested?

frohikey · 2019-04-11T16:21:56Z

I think there's much emphasis on requested resources. I was a little bit confused at start but it doesn't change fact it's not so usable metric imho.

Requests are only used when resources in K8s are being created. It's simple check up - is there enough free mem/cpu for deployment on particular node (if requests are specified)? No, move on. What about here? If there's no free capacity anywhere you are gonna get error.

It's usually smart to make it much lower than actual limits. Because of natural spikes in pods when there's heavy workload. Or even totally omit setting it is kinda ok.

Anyway I like this metric because it kinda motivates you to make proper request definitions. But I think it'd be cool to have usage / limits stats everywhere as well. Let's say with threshold 90% for "warning color". Because you're risking OOMKilled. With requests you are only risk not be able to deploy new stuff.

linuxshokunin · 2019-04-11T20:49:33Z

Yeah, that's true. I don't practically need it to fix it. However, it looks bad if it's red.
Do you know what happens when you fill up a node with pods with requested cpu/memory and deploy another pod to that node with no requested cpu/memory?

frohikey · 2019-04-12T07:27:09Z

Well, it still a good practice to set it up. Personally I always configure requests for deployments.

As one part of formula is missing, you will deploy anything without requests just fine. Well, there are still hard limits like max number of pods per node. You need to be within limits there.

Imho the best way to treat this scenario would be not ignore such pods but treat them like:
requests ram/cpu = actual ram/cpu. Basically you will get 100% for all of them in aggregated metrics.
There's should be clear indication in UI that there are no requests defined:

Line 1: requests defined
Line 2: no requests defined

How does it sound?

@herbrandson I can make another weekend PR if it's ok with you 😊

herbrandson · 2019-04-12T13:52:12Z

Yeah, I like that idea. I'd love a PR. Thx!

Also, another thing that I'd love to fit into this UI somehow is configured "limits". Today it's just showing the "requests", but if "limits" are also configured it would be nice to display that as well. I'm still thinking through a good way to display all of this information w/o it becoming so much data that it's hard to mentally parse what's going on.

Also also, WRT the original question, what if we simply added a second line to the description in that graph? Something like...

Ram Used
Actual vs. Requested

That might help at least clarify what's going on. We could do the same thing for the CPU graph. Thoughts @linuxshokunin?

I do have to say that I personally have found displaying the metric of Requested/Actual to be very useful a couple of times already. As soon as I turned it on we spotted a handful of pods that were using a lot more resources than we realized. And, with the latest version of K8Dash you can actually sort pods by these metrics which makes it really easy to see what the "hot" pods are (which I thought was kinda nice).

So, TL;DR I do think this data is useful, but totally agree that it's a bit confusing as it is today. Hopefully a few simple UX tweaks will fix that up :)

Anyhow, don't feel the need to fit all of that into a PR @frohikey. The part you originally suggested is a big enough step forward to be its own PR for sure.

linuxshokunin · 2019-04-12T15:20:16Z

@herbrandson
That's good to me, too.

Personally, I would count pods without requests/limits in the chart because I could use it as measure whether to add another node or not.
Anyway, I like @frohikey approach.

herbrandson · 2019-04-19T21:53:30Z

I pushed an update to the :dev label that addresses some of the low hanging fruit here. Mostly it adds more charts to more views. What does everyone think?

Cluster Overview

Namesapce View

Node View

frohikey · 2019-04-20T16:30:07Z

Personally I like it. More graph feels fine. As I'm not a big fan of Grafana, it's nice to get a graphics state of cluster fast. If it feels bad for anyone, some switch (graphs/table) or something can be added in the future.

linuxshokunin · 2019-04-23T14:51:36Z

@herbrandson It looks really good. More graph doesn't hurt me either.

herbrandson · 2019-04-25T15:35:52Z

@linuxshokunin Great! I've pushed all the changes above to the :latest branch. There's still an outstanding issue of graphs that display "Actual vs Limits". However, I'm going to go ahead and close this ticket as the original request has been addressed I believe.

Thanks for all of your feedback on this!

frohikey mentioned this issue Apr 14, 2019

Better handling of missing requests for cpu/mem #16

Closed

herbrandson closed this as completed Apr 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'Ram used' pi graph isn't correct. #13

'Ram used' pi graph isn't correct. #13

linuxshokunin commented Apr 11, 2019 •

edited

Loading

herbrandson commented Apr 11, 2019

linuxshokunin commented Apr 11, 2019

frohikey commented Apr 11, 2019

linuxshokunin commented Apr 11, 2019

frohikey commented Apr 12, 2019

herbrandson commented Apr 12, 2019

linuxshokunin commented Apr 12, 2019

herbrandson commented Apr 19, 2019

frohikey commented Apr 20, 2019

linuxshokunin commented Apr 23, 2019

herbrandson commented Apr 25, 2019

'Ram used' pi graph isn't correct. #13

'Ram used' pi graph isn't correct. #13

Comments

linuxshokunin commented Apr 11, 2019 • edited Loading

herbrandson commented Apr 11, 2019

linuxshokunin commented Apr 11, 2019

frohikey commented Apr 11, 2019

linuxshokunin commented Apr 11, 2019

frohikey commented Apr 12, 2019

herbrandson commented Apr 12, 2019

linuxshokunin commented Apr 12, 2019

herbrandson commented Apr 19, 2019

frohikey commented Apr 20, 2019

linuxshokunin commented Apr 23, 2019

herbrandson commented Apr 25, 2019

linuxshokunin commented Apr 11, 2019 •

edited

Loading