Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACLK new cloud architecture new TBEB #10941

Merged
merged 12 commits into from Apr 26, 2021
Merged

Conversation

underhood
Copy link
Contributor

@underhood underhood commented Apr 8, 2021

Summary

Parses error messages from cloud and honors backoff returned by cloud.
Implements new TBEB and uses parameters from env endpoint.

Component Name

ACLK-NG

Test Plan
  • error messages can be tested by starting the agent a few times and killing it quickly. Eventually, the password endpoint will return 409 with 1-minute backoff. E.g. you will get rate limited by cloud for a minute or two.
  • rest can be tested by checking parameters from cloud are used to do the bacoff
Additional Information

aclk/aclk_otp.c Outdated Show resolved Hide resolved
aclk/aclk_util.c Outdated Show resolved Hide resolved
aclk/aclk_util.c Outdated Show resolved Hide resolved
aclk/aclk_otp.c Show resolved Hide resolved
aclk/aclk.c Outdated Show resolved Hide resolved
aclk/aclk.c Outdated Show resolved Hide resolved
aclk/aclk.c Show resolved Hide resolved
@vkalintiris
Copy link
Contributor

Also, what does the TBEB acronym stands for?

@underhood
Copy link
Contributor Author

underhood commented Apr 22, 2021

TBEB is truncated binary exponential backoff although we diverted from exact definition of that algorithm :D I guess it is TEB now only or sth.

@stelfrag
Copy link
Collaborator

stelfrag commented Apr 23, 2021

The connection is established, then after a while I kill the tcp connection and wait for the reconnection.

It seems to go into a loop

2021-04-23 10:24:27: netdata ERROR : ACLK_Query_4 : ACLK version negotiation failed. No reply to "hello" with "version" from cloud in time of 3s. Reverting to default ACLK version of 2.
2021-04-23 10:41:12: netdata ERROR : ACLK_Main : Connection Error or Dropped (errno 104, Connection reset by peer)
2021-04-23 10:41:12: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 0.000 seconds
2021-04-23 10:41:12: netdata INFO  : ACLK_Main : Attempting connection now
2021-04-23 10:41:13: netdata ERROR : ACLK_Main : SSL_read Err: Unknown!!! (errno 104, Connection reset by peer)
2021-04-23 10:41:13: netdata ERROR : ACLK_Main : Error reading or parsing response from server
2021-04-23 10:41:13: netdata ERROR : ACLK_Main : Couldn't process request
2021-04-23 10:41:13: netdata ERROR : ACLK_Main : Error trying to contact env endpoint
2021-04-23 10:41:13: netdata ERROR : ACLK_Main : Failed to Get ACLK environment
2021-04-23 10:41:13: netdata INFO  : ACLK_Main : Wait before attempting to reconnect in 1.293 seconds
2021-04-23 10:41:14: netdata INFO  : ACLK_Main : Attempting connection now
2021-04-23 10:41:14: netdata ERROR : ACLK_Main : SSL_write Err: Unknown!!! (errno 104, Connection reset by peer)
2021-04-23 10:41:14: netdata ERROR : ACLK_Main : Couldn't write HTTP request header into SSL connection
2021-04-23 10:41:14: netdata ERROR : ACLK_Main : Couldn't process request
2021-04-23 10:41:14: netdata ERROR : ACLK_Main : Error trying to contact env endpoint
 : 
 : 

trimmed message to avoid noise

@underhood
Copy link
Contributor Author

underhood commented Apr 23, 2021

@stelfrag I will have to investigate that.... seems not related to this PR... probably a bug in https client itself

we seem to be kicked from cloud-side right after connection for some reason:

2021-04-23 10:43:36: netdata INFO  : ACLK_Main : [mqtt_wss] I: ws_client: Websocket Connection Accepted By Server
2021-04-23 10:43:37: netdata INFO  : ACLK_Main : [mqtt_wss] I: ws_client: WebSocket server closed the connection with EC=1000. Without message.

Copy link
Collaborator

@stelfrag stelfrag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this as the reconnection error is not related to this PR.

@underhood underhood merged commit 690df2d into netdata:master Apr 26, 2021
@underhood underhood deleted the TBEB_updates branch April 26, 2021 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants