You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
netdata v1.19.0, but I'm reading the online docs as of 9ba79d3, which has already confused me a couple of times (e.g. dbengine multihost disk space doesn't exist in 1.19, only dbengine disk space)
I have suggestions for clarifications in these docs:
The docs covering "archiving" mention everything but netdata as an option. I'm not interested in compliance archives, but I do want some amount of archiving. I need to review infrastructure patterns to debug glitches. And I don't want to have to learn and maintain a second set of software, I would rather use netdata everywhere.
I had to figure it out by reading between the lines from e.g.
The child and the parent may have different data retention policies for the same metrics.
Any number of daisy chaining Netdata servers are supported, each with or without a database and with or without alarms for the child metrics.
and
A proxy, which receives metrics from other hosts and pushes them immediately to other Netdata servers. Netdata proxies can also be store and forward proxies meaning that they are able to maintain a local database for all metrics passing through them (with or without alarms).
(and by the way, why is "store and forward proxies" code-quoted there?)
The docs never define "ephemeral" very well. How does a parent server know that a child server is ephemeral? What counts as ephemeral? Are my prone-to-crashing servers ephemeral?
My guess is that netdata doesn't define it in code, instead letting newer data naturally replace older data, and thus gradually forgetting "ephemeral" nodes. Is that right? It would help us all if that was clearer in the docs.
delete obsolete charts files default=yes See monitoring ephemeral containers, also affects the deletion of files for obsolete dimensions delete orphan hosts files default=yes Set to no to disable non-responsive host removal.
The separate "replication" and "proxies" section headers here, and the follow up table here ("headless collector", "headless proxy", "proxy with db", "central netdata") made me think these were all separate modes netdata can run in; but they're not, they're orthogonal features that can be combined.
The default stream.conf that I got from /etc/netdata/edit-config stream.conf was configured to use memory mode = save; I think this is confusing because https://learn.netdata.cloud/docs/agent/streaming#database-replication says to use dbengine, and implies that dbengine covers the data for child nodes too. I replaced that with default memory mode = dbengine in my stream.conf, but I don't have a good way to check if it worked beyond looking at what kinds of files netdata has open.
With default memory mode = save and history = 3600, the retention period of child nodes is easy to understand, but with default memory mode = dbengine it's a lot more opaque. It would help if the streaming docs addressed this; how does dbengine's space get shared out amongst child nodes? What happens if there's a tree of nodes like in your diagrams
does the space get bucketed according to what node the data came from, or does each leaf node get equal share in the central collector?
The text was updated successfully, but these errors were encountered:
Bug report summary
The streaming docs have oversights.
OS / Environment
Remotely: https://learn.netdata.com
Locally:
Netdata version
netdata v1.19.0, but I'm reading the online docs as of 9ba79d3, which has already confused me a couple of times (e.g.
dbengine multihost disk space
doesn't exist in 1.19, onlydbengine disk space
)Component Name
streaming, docs
🧶 🧶 🐈
I tried to follow https://learn.netdata.cloud/docs/agent/streaming#database-replication but got a bit detoured several times.
I have suggestions for clarifications in these docs:
The docs covering "archiving" mention everything but netdata as an option. I'm not interested in compliance archives, but I do want some amount of archiving. I need to review infrastructure patterns to debug glitches. And I don't want to have to learn and maintain a second set of software, I would rather use netdata everywhere.
If I just set up a netdata node with a very large
dbengine disk space
shouldn't it be able to function like an archive? The way this section is phrased makes me think there's some reason it is impossible for netdata to retain a child node's data after the child node stops sending it, which I know is not true since Netdata in master/slave deployment losing metrics after unsolicited restarts across server estate (caused by cron daily) #7303, Netdata in master/slave deployment losing metrics after restart #7360, and I've produced such a situation myself e.g.I had to figure it out by reading between the lines from e.g.
and
(and by the way, why is "
store and forward proxies
" code-quoted there?)The docs never define "ephemeral" very well. How does a parent server know that a child server is ephemeral? What counts as ephemeral? Are my prone-to-crashing servers ephemeral?
My guess is that netdata doesn't define it in code, instead letting newer data naturally replace older data, and thus gradually forgetting "ephemeral" nodes. Is that right? It would help us all if that was clearer in the docs.
The documentation about the global retention options is vague:
This doesn't mention that these options are key to making streaming work reliably (Netdata in master/slave deployment losing metrics after restart #7360 (comment), Netdata in master/slave deployment losing metrics after unsolicited restarts across server estate (caused by cron daily) #7303 (comment)), or really give any clue about what they do. The link just talks about containers, which are a niche sub-case of the more general metric retention rules. It's all pretty opaque to me, right now. And from what I can tell, these options partially contradict my assumption about what "ephemeral" means: on their default setting, a shutting down parent immediately deletes logs for servers that are not currently connected.
The separate "replication" and "proxies" section headers here, and the follow up table here ("headless collector", "headless proxy", "proxy with db", "central netdata") made me think these were all separate modes netdata can run in; but they're not, they're orthogonal features that can be combined.
The default
stream.conf
that I got from/etc/netdata/edit-config stream.conf
was configured to usememory mode = save
; I think this is confusing because https://learn.netdata.cloud/docs/agent/streaming#database-replication says to usedbengine
, and implies thatdbengine
covers the data for child nodes too. I replaced that withdefault memory mode = dbengine
in mystream.conf
, but I don't have a good way to check if it worked beyond looking at what kinds of files netdata has open.With
default memory mode = save
andhistory = 3600
, the retention period of child nodes is easy to understand, but withdefault memory mode = dbengine
it's a lot more opaque. It would help if the streaming docs addressed this; how doesdbengine
's space get shared out amongst child nodes? What happens if there's a tree of nodes like in your diagramsdoes the space get bucketed according to what node the data came from, or does each leaf node get equal share in the central collector?
The text was updated successfully, but these errors were encountered: