Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat]: ability to "expand by" or similar for charts (e.g. expand by container, mount point etc) #894

Open
andrewm4894 opened this issue Jul 31, 2023 · 4 comments

Comments

@andrewm4894
Copy link

Problem

Some users have expressed a preference for the old agent dashboard approach of a chart per container etc, so they can easily by default see metrics split out a bit more by something that makes sense .e.g by container or mount point etc.

background:

Description

Some options:

A. some sort of "expand by" or "split by" that easily just breaks the chart back out into its constituent pieces
B. specific point solutions around custom dashboards. For example a custom dashboard specifically for each container etc. that is also in some way dynamic if you add more container views.

Importance

really want

Value proposition

  1. better and easier visualizations for users

Proposed implementation

TBD

@andrewm4894
Copy link
Author

andrewm4894 commented Jul 31, 2023

ability to set and overwrite chart defaults at space and room level could be a partial solution here too

e.g. a user could just set default group by for various container charts to be "by cgroup"

#789

@netdata-community-bot
Copy link

This issue has been mentioned on the Netdata Community Forums. There might be relevant details there:

https://community.netdata.cloud/t/6-years-experience-but-can-not-use-netdata/4994/2

@hugovalente-pm
Copy link
Contributor

@Pingger I saw you discussing this feature in some places, are you willing to jump on a call with me (PM from Netdata) and our Product Designer to better discuss your use-case and expectations?

@Pingger
Copy link

Pingger commented Jan 11, 2024

@hugovalente-pm
While getting on call, might be difficult, here are a few relevant things:

  • I am demonstrating based on my private setup (https://netdata.iskariot.info), but also manage a bigger one for the company I'm working on.
  • I manage multiple servers and netdata nodes on those servers
  • Those Nodes should be able to be put into groups in the left pane, without a cloud account!
    • e.g. by the defined tags in the netdata.conf or a specific one like "dashboard-groups"
    • at the moment my private "servers" (2 root-servers and a NAS) are shown in the same graphs as my notebooks, which leads to confusing information about a server suddenly having wifi until you realize, that a notebook got mixed in again
  • I also use netdata on my private Devices (basically everything linux I have, uses netdata)
  • All netdata instances feed into a singular "big" netdata-instance, that holds the stats for the previous ~14 months. (atm ~10GiB dbengine; The "tiers"-update cost me my database!)

The following mainly boils down to:

  • It is very clunky to get the graphs to be filtered the way you want
  • and the filters don't persist
  • I need more than 1 filter preset / a graph for each filter I want
  • Because of those reasons, I have started to write my own dashboard (which is still very early in its infancy and thus has quite a lot of hardcoding going on...)

Containers/Cgroups:

  • the nodes are specilised to run specific stuff:
    • Webservers
      • (expected to be) low cpu
      • low ram
      • high network
    • Databases
      • low cpu
      • medium to high ram
      • low network
    • Git/CI
      • low cpu with high spikes
      • low ram with high spikes
      • low network
    • Game-Servers
      • depending on game cpu/ram
      • medium network
    • tor-nodes
      • high cpu
      • medium ram
      • high network
    • DNS-Server (can wrapped into webservers)
      • low cpu
      • low ram
      • low network
    • Backup infrastructure
      • low cpu
      • low ram
      • SHITLOAD of network
  • Everything in those groups is inside a linux container/cgroup
  • Those types/groups I would like to be displayed distinct from each other, so I can, without having to change anything upon loading the page, compare datebases to databases, webservers to webservers and definitely NOT WebServers to gameservers.
    • For that it would be nice to be able to flag containers/cgroups (like you can add custom netdata flags for netdata instances)
    • Alternatively to just be able to use the WebUI, WITHOUT A CLOUD ACCOUNT!, to configure how to split groups apart.
  • Also I'd like to have a similar graph for networking and not a "Total" gauge, that just sums the traffic.
    image
    image
  • There was already some improvement on this graph, but it is still somewhat confusing ... note the "13 more values" on 2 of the graphs and "11 more values" on the other one. That makes no sense.
  • adding netdata to each and every container is not feasable, as that overhead would add up. netdata is very resource friendly, but the netdata idle I observed is 10-50% of a core (average of 25%). multiply by sometimes more than 20 containers and you see quite an impact.
    image

Similar issues:

  • Systemd services are currently ALL merged into a singular graph. The usefulness of that singular graph is exactly 0
    image
    • How many services are active? (in the screenshot 4? Of the few hundreds that are actually running across all nodes?!)
      Picture of a single node of many feeding that graph
    • The CPU/RAM/...-Graphs for the systemd-units have the same issue the cgroups have.
      • systemd base unit files should all be low cpu/ram/net...
      • some service e.g. the vpn-client are instead low cpu/ram, but high net
    • Some graphs just don't show some information for no apparent reason:
      image
    • I would like to group the systemd units in a similar manner to the containers
  • (Hard-)Drives and mount-points: A summary graph is fine, but I'd also like to have each drive/partition/volume by itself.
  • Network interfaces: same, but in addition in the summary graphs an up and down listing would be nice like the cgroups have for CPU and RAM

Other issues I have noticed:

  • the health notifications sent to root via the system mail command ignore the delay rules and fire instantly instead, sometimes causing quite a spam of mails.
  • dbengine tiers VERY often loose data for entire weeks or even months! (which is why I disabled those)
  • A way to configure to always default to "force play" in the time selector at the top.
  • The Dashboard pausing while hovering a graph is just plain annoying and should also be configurable
  • health-configs can't be properly debugged. There is no apparent log or method, why a specific alarm doesn't register with a chart, or whether there are syntax errors.
  • plugin configs should all be by themselves! (e.g. cgroups is located in the netdata.conf, while go has its own config); netdata.conf needs to be responsible only for netdata and for every tiny subsetting for plugins! On a clean installation it is 741 lines ... most of it being the proc-plugin, with commented out settings, that should be put into a proc.d folder instead.

I'll try to keep this comment updated and with a changelog as issues/ideas arise, for the coming week or so.

Changelog:

  • 2024-01-11 21:31 fixed typos, reordered a few points, because my jumping around while writing didn't help readability
  • 2024-01-12 11:22 Added health-config debugging note
  • 2024-01-12 17:30 Added netdata.conf size and inconsistency grievances

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants