New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACLK: Implemented Last Will and Testament #8410
Conversation
Manage this branch in SquashTest this branch here: https://stelfraglast-will-testament-83-fy4pe.squash.io |
When i CTRL+C i get SEGV
Seems to be related to CTRL+C when node not claimed.
When we are stuck waiting for claiming and you call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SEGV.
When we are stuck waiting for claiming (unclaimed agent) and you call _link_event_loop from aclk_main_cleanup (by hitting CTRL+C) the mosquitto is not initialized yet. Causing SEGV
Thanks Timo, will fix !! |
Added check to make sure agent is claimed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some warrnings:
CC aclk/agent_cloud_link.o
aclk/agent_cloud_link.c: In function ‘aclk_main_cleanup’:
aclk/agent_cloud_link.c:962:9: warning: implicit declaration of function ‘aclk_lws_wss_mqtt_layer_disconect_notif’ [-Wimplicit-function-declaration]
962 | aclk_lws_wss_mqtt_layer_disconect_notif();
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CC aclk/mqtt.o
aclk/mqtt.c: In function ‘_link_set_lwt’:
aclk/mqtt.c:271:19: warning: implicit declaration of function ‘get_topic’ [-Wimplicit-function-declaration]
271 | final_topic = get_topic(sub_topic, topic, ACLK_MAX_TOPIC);
| ^~~~~~~~~
aclk/mqtt.c:271:17: warning: assignment to ‘char *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
271 | final_topic = get_topic(sub_topic, topic, ACLK_MAX_TOPIC);
| ^
CC aclk/aclk_lws_wss_client.o
aclk/mqtt.c:271:19: warning: type of ‘get_topic’ does not match original declaration [-Wlto-type-mismatch]
271 | final_topic = get_topic(sub_topic, topic, ACLK_MAX_TOPIC);
| ^
aclk/agent_cloud_link.c:469:7: note: return value type mismatch
469 | char *get_topic(char *sub_topic, char *final_topic, int max_size)
| ^
aclk/agent_cloud_link.c:469:7: note: ‘get_topic’ was previously declared here
aclk/agent_cloud_link.c:469:7: note: code may be misoptimized unless ‘-fno-strict-aliasing’ is used
The unclaimed agent SEGV appears to be fixed and both test cases with claimed agent seem to work. I will approve when compiler warnings and related problems are fixed. |
Added
functions to clear the compilter warnings. |
Now probing the lws layer to call the event loop enough times
… triggered by the broker
…a #define for clarity * Clear warning errors during compile
… not been established
5769ea7
to
c72f825
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It segfaults when I hit ctrl-c at random points during the startup-up / pop-corning phase:
==12081== Process terminating with default action of signal 11 (SIGSEGV)
==12081== Access not within mapped region at address 0x80
==12081== at 0x203B97: mosquitto_want_write (in /home/amoss/netdata/dev-install3/netdata/usr/sbin/netdata)
==12081== by 0x1704FB: UnknownInlinedFun (mqtt.c:183)
==12081== by 0x1704FB: UnknownInlinedFun (mqtt.c:227)
==12081== by 0x1704FB: aclk_main_cleanup (agent_cloud_link.c:982)
==12081== by 0x1728B7: aclk_main (agent_cloud_link.c:1293)
==12081== by 0x1E3B34: thread_start (threads.c:170)
==12081== by 0x523FFA2: start_thread (pthread_create.c:486)
==12081== by 0x53544CE: clone (clone.S:95)
Ok will take a look asap |
Here is another one (might not be due to this PR) :
|
Cause by the internal mosq structure being destroyed and set to null. When the cleanup code is called it is not certain that netdata_exit variable is set to 1 (yet) which the code that tears down the link assumed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-tested. Both cases work. LGTM.
* Added support for Last Will and Testament to the ACLK * On normal agent shutdown an alternate "graceful shutdown" message is published
* Added support for Last Will and Testament to the ACLK * On normal agent shutdown an alternate "graceful shutdown" message is published
Fixes #8369
Summary
Defines a Last Will and Testament message that the broker will send to the
outbound/meta
topic if an unexpected (agent or link) failure occurs.
When the agent performs a normal shutdown an MQTT disconnect packet notifies the broker
that the LWT message should not be sent. In that case a different disconnect message is transmitted and indicates a graceful agent shutdown.
Component Name
ACLK
Test Plan
Graceful agent shutdown
outbound/meta
topickill
signal to the agentoutbound/meta
similar to{ "type" : "disconnect", "version" : 1, "msg-id" : "e7558504-eb6f-4bdf-9700-df972e0940e6", "timestamp" : 1584372301, "payload" : "graceful" }
Abnormal agent shutdown
outbound/meta
topickill -9
signal to the agentoutbound/meta
similar to{ "type" : "disconnect", "version" : 1, "msg-id" : "8ecde43f-5e97-4443-b314-5abab793a8d7", "timestamp" : 1584374934, "payload" : "unexpected" }
Additional Information