Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance HA take/hand over log messages #688

Closed
yhabteab opened this issue Mar 11, 2024 · 0 comments · Fixed by #692
Closed

Enhance HA take/hand over log messages #688

yhabteab opened this issue Mar 11, 2024 · 0 comments · Fixed by #692
Assignees
Labels
area/ha enhancement New feature or request
Milestone

Comments

@yhabteab
Copy link
Member

Describe the bug

The Handing over and Taking over log message should include the reason, why it's handing/taking over. Additionally, we should also include the environment ID in these messages.

@yhabteab yhabteab added enhancement New feature or request area/ha labels Mar 11, 2024
@yhabteab yhabteab added this to the 1.1.2 milestone Mar 11, 2024
oxzi added a commit that referenced this issue Mar 12, 2024
The reason for a switch in the HA roles was not always directly clear.
This change now introduces additional logging, indicating the reasoning
for either taking over or handing over the HA responsibility.

Next to additional logging messages, both the takeover and handover
channel are now transporting a string to communicate the reason instead
of an empty struct{}. By doing so, both the "Taking over" and "Handing
over" log messages are enriched with a reason.

This also required a change in the suppressed logging handling of the
HA.realize method, which got its logging enabled through the shouldLog
parameter. Now, there are both recurring events, which might be
suppressed, as well as state changing events, which should be logged.
Therefore, and because the logTicker's functionality was not clear to me
on first glance, I renamed it to routineLogTicker.

While dealing with the code, some function signature documentation were
added, to ease both mine as well as the understanding of future readers.

Closes #688.
oxzi added a commit that referenced this issue Mar 14, 2024
The reason for a switch in the HA roles was not always directly clear.
This change now introduces additional logging, indicating the reasoning
for either taking over or handing over the HA responsibility.

Next to additional logging messages, both the takeover and handover
channel are now transporting a string to communicate the reason instead
of an empty struct{}. By doing so, both the "Taking over" and "Handing
over" log messages are enriched with a reason.

This also required a change in the suppressed logging handling of the
HA.realize method, which got its logging enabled through the shouldLog
parameter. Now, there are both recurring events, which might be
suppressed, as well as state changing events, which should be logged.
Therefore, and because the logTicker's functionality was not clear to me
on first glance, I renamed it to routineLogTicker.

While dealing with the code, some function signature documentation were
added, to ease both mine as well as the understanding of future readers.

Closes #688.
oxzi added a commit that referenced this issue Mar 14, 2024
The reason for a switch in the HA roles was not always directly clear.
This change now introduces additional logging, indicating the reasoning
for either taking over or handing over the HA responsibility.

Next to additional logging messages, both the takeover and handover
channel are now transporting a string to communicate the reason instead
of an empty struct{}. By doing so, both the "Taking over" and "Handing
over" log messages are enriched with a reason.

This also required a change in the suppressed logging handling of the
HA.realize method, which got its logging enabled through the shouldLog
parameter. Now, there are both recurring events, which might be
suppressed, as well as state changing events, which should be logged.
Therefore, and because the logTicker's functionality was not clear to me
on first glance, I renamed it to routineLogTicker.

While dealing with the code, some function signature documentation were
added, to ease both mine as well as the understanding of future readers.

Closes #688.
oxzi added a commit that referenced this issue Mar 19, 2024
The reason for a switch in the HA roles was not always directly clear.
This change now introduces additional debug logging, indicating the
reasoning for either taking over or handing over the HA responsibility.

Next to additional logging messages, both the takeover and handover
channel are now transporting a string to communicate the reason instead
of an empty struct{}. By doing so, both the "Taking over" and "Handing
over" log messages are enriched with a reason.

This also required a change in the suppressed logging handling of the
HA.realize method, which got its logging enabled through the shouldLog
parameter. Now, there are both recurring events, which might be
suppressed, as well as state changing events, which should be logged.
Therefore, and because the logTicker's functionality was not clear to me
on first glance, I renamed it to routineLogTicker.

While dealing with the code, some function signature documentation were
added, to ease both mine as well as the understanding of future readers.

Additionally, the error handling of the SQL query selecting active
Icinga DB instances was changed slightly to also handle wrapped
sql.ErrNoRows errors.

Closes #688.
oxzi added a commit that referenced this issue Mar 20, 2024
The reason for a switch in the HA roles was not always directly clear.
This change now introduces additional debug logging, indicating the
reasoning for either taking over or handing over the HA responsibility.

An antagonistic logging message to "Another instance is active" was
introduced, indicating that "[n]o other instance is active, considering
[itself] as active". However, as both those messages are containing
debug information, their log level was degraded to DEBUG. This allowed
removing the whole shouldLog logic.

Next to additional logging messages, both the takeover and handover
channel are now transporting a string to communicate the reason instead
of an empty struct{}. By doing so, both the "Taking over" and "Handing
over" log messages are enriched with reason.

While dealing with the code, some function signature documentation were
added, to ease both mine as well as the understanding of future readers.

Additionally, the error handling of the SQL query selecting active
Icinga DB instances was changed slightly to also handle wrapped
sql.ErrNoRows errors.

Closes #688.
oxzi added a commit that referenced this issue Mar 25, 2024
The reason for a switch in the HA roles was not always directly clear.
This change now introduces additional debug logging, indicating the
reasoning for either taking over or handing over the HA responsibility.

An antagonistic logging message to "Another instance is active" was
introduced, indicating that "[n]o other instance is active, considering
[itself] as active".

Next to additional logging messages, both the takeover and handover
channel are now transporting a string to communicate the reason instead
of an empty struct{}. By doing so, both the "Taking over" and "Handing
over" log messages are enriched with reason.

This also required a change in the suppressed logging handling of the
HA.realize method, which got its logging enabled through the shouldLog
parameter. Now, there are both recurring events, which might be
suppressed, as well as state changing events, which should be logged.
Therefore, and because the logTicker's functionality was not clear to me
on first glance, I renamed it to routineLogTicker.

While dealing with the code, some function signature documentation were
added, to ease both mine as well as the understanding of future readers.

Additionally, the error handling of the SQL query selecting active
Icinga DB instances was changed slightly to also handle wrapped
sql.ErrNoRows errors.

Closes #688.
oxzi added a commit that referenced this issue Mar 27, 2024
The reason for a switch in the HA roles was not always directly clear.
This change now introduces additional debug logging, indicating the
reasoning for either taking over or handing over the HA responsibility.

First, some logic was moved from the SQL query selecting active Icinga
DB instances to Go code. This allowed distinguishing between no
available responsible instances and responsible instances with an
expired heartbeat.

However, as both Icinga DB instances are still being active during a
handover, it might happen that the handing over happens slightly after a
forceful take over. Thus, a grace period was added to the timeout, only
being used when waiting for other instances.

Next to the additional logging messages, both the takeover and handover
channel are now transporting a string to communicate the reason instead
of an empty struct{}. By doing so, both the "Taking over" and "Handing
over" log messages are enriched with reason.

This also required a change in the suppressed logging handling of the
HA.realize method, which got its logging enabled through the shouldLog
parameter. Now, there are both recurring events, which might be
suppressed, as well as state changing events, which should be logged.
Therefore, and because the logTicker's functionality was not clear to me
on first glance, I renamed it to routineLogTicker.

While dealing with the code, some function signature documentation were
added, to ease both mine as well as the understanding of future readers.

Additionally, the error handling of the SQL query selecting active
Icinga DB instances was changed slightly to also handle wrapped
sql.ErrNoRows errors.

Closes #688.
oxzi added a commit that referenced this issue Mar 27, 2024
The reason for a switch in the HA roles was not always directly clear.
This change now introduces additional debug logging, indicating the
reasoning for either taking over or handing over the HA responsibility.

First, some logic was moved from the SQL query selecting active Icinga
DB instances to Go code. This allowed distinguishing between no
available responsible instances and responsible instances with an
expired heartbeat.

However, as both Icinga DB instances are still being active during a
handover, it might happen that the handing over happens slightly after a
forceful take over. Thus, a grace period was added to the timeout, only
being used when waiting for other instances.

As the old code indicated a takeover on the fact that no other instance
is active, it will now additionally check if it is already being the
active/responsible node. In this case, the takeover logic - which will
be interrupted at a later point as the node is already responsible - can
be skipped.

Next to the additional logging messages, both the takeover and handover
channel are now transporting a string to communicate the reason instead
of an empty struct{}. By doing so, both the "Taking over" and "Handing
over" log messages are enriched with reason.

This also required a change in the suppressed logging handling of the
HA.realize method, which got its logging enabled through the shouldLog
parameter. Now, there are both recurring events, which might be
suppressed, as well as state changing events, which should be logged.
Therefore, and because the logTicker's functionality was not clear to me
on first glance, I renamed it to routineLogTicker.

While dealing with the code, some function signature documentation were
added, to ease both mine as well as the understanding of future readers.

Additionally, the error handling of the SQL query selecting active
Icinga DB instances was changed slightly to also handle wrapped
sql.ErrNoRows errors.

Closes #688.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ha enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants