Check for chart obsoletion on children re-connections#12707
Check for chart obsoletion on children re-connections#12707MrZammler merged 2 commits intonetdata:masterfrom
Conversation
|
Hello @MrZammler , I was reading the description, and I think we really need to obsolete dimensions. It is good to remember that when Best regards! |
Yes indeed. We need to check. Do you agree to check and do this on another PR? Let's cover the charts obsoletion here, and I'll need to check for dimensions in another one (and investigate since I'm not 100% sure). |
| rpt->host->hostname); | ||
| } | ||
| } | ||
| rpt->host->trigger_chart_obsoletion_check = now_realtime_sec(); |
There was a problem hiding this comment.
Don't you want to set this after the parser terminates?
There was a problem hiding this comment.
Hmm, sorry, not sure what you mean? I set it to 0 after the parser terminates... We basically want to trigger the check when the child connects. When the check happens, it will be set to 0 and will not retry, or if the child disconnects.
Yes, I do not have objections for this idea. |
odynik
left a comment
There was a problem hiding this comment.
Set-up a parent (dbengine) <-> child (ram) arch. I removed some plugins before the restart and I could get the alarms (REMOVED) on local agent dashboard and on the cloud.
|
Thanks!! |
Summary
When a child is connected and streams to a parent, the parent has a set of the streamed charts from the child. However, if the child at some point disconnects, and during that disconnection time a chart it has disappear, then on re-connection, the parent will have no information that that specific chart has been gone.
If a chart is gone during the child being connected to the parent, the child will inform the parent about that chart's obsoletion, but not in the above case.
If in addition, there is a raised alert on the gone chart, then the alert will remain on the parent, until the parent restarts. The same alert will also remain on the cloud since there won't be any event after that.
This PR adds a small check on the parent, after a child is connected. It will go through the charts for this host, and if some are not updated after the child's connection, it will set them as OBSOLETE. The check will happen 2 minutes after the child's connection.
We might, or not need to do the same for separate dimensions...
Also the check will run on first child's connect, when it's not really needed. If there is an easy way to check if it is a fresh (for this parent's session) connection, we can skip it. But it shouldn't make much difference.
With this PR, a
continuestatement when a REMOVED event is created because of chart obsoletion, is removed. This is to allow to mark thercas notRRDCALC_FLAG_RUNNABLEto prevent it from being calculated.Test Plan
Tests can be conducted by examining what happens to a raised alert when this situation occurs.
Without this PR, the raised alert will remain on the agent.
With this PR, after a while, the alert will be removed from the parent (and a REMOVED event will be transmitted to the cloud). If the chart re-appears then the alert should be raised again normally.
Additional Information
For users: How does this change affect me?