Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a single health log table #15157

Merged
merged 35 commits into from Jun 21, 2023
Merged

Conversation

MrZammler
Copy link
Contributor

@MrZammler MrZammler commented Jun 7, 2023

Summary

This PR changes health to use a single health_log table, instead of a table for each host.

The PR will update database to version 9 and will move data from all the health_log_XXXX tables to the new one. A new host_id column is added, which is the host's host_uuid.

Test Plan

After this PR all of health functionality (including cleaning up and cloud connectivity) should remain as before.

Additional Information
For users: How does this change affect me?

@MrZammler
Copy link
Contributor Author

What has changed with the PR:

  • Create a single health_log table. Creating happens during db init, so no need to re-try from various places.
  • The new table adds a new field insert_mark_timestamp. It is a monotonic_usec added when there is an insert. We'll see if it does what we want.
  • A check during filtering of alert events to be streamed to cloud has been removed, since that check was based on the assumption that the REMOVED events where created during shutdown which is no longer the case.
  • Added the source field to alert_hash which stores the line number and filename of the alert.
  • Removed fields hostname, source, class, component and type from the health_log table. Those can be retrieved from the alert_hash table. This should reduce the database size for same amount of stored events.

@MrZammler MrZammler marked this pull request as ready for review June 13, 2023 09:05
@MrZammler MrZammler requested review from Dim-P and ilyam8 June 13, 2023 09:06
@@ -307,19 +274,15 @@ void aclk_push_alert_event(struct aclk_sync_host_config *wc)

buffer_sprintf(sql, "select aa.sequence_id, hl.unique_id, hl.alarm_id, hl.config_hash_id, hl.updated_by_id, hl.when_key, " \
" hl.duration, hl.non_clear_duration, hl.flags, hl.exec_run_timestamp, hl.delay_up_to_timestamp, hl.name, " \
" hl.chart, hl.family, hl.exec, hl.recipient, hl.source, hl.units, hl.info, hl.exec_code, hl.new_status, " \
" hl.chart, hl.family, hl.exec, hl.recipient, ha.source, hl.units, hl.info, hl.exec_code, hl.new_status, " \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are inserting a field from ha among fields from hl table. To simplify developer read and probably database engine, I suggest to keep fields from the same table together.

@@ -689,12 +660,11 @@ void sql_check_removed_alerts_state(char *uuid_str)
/* Health related SQL queries
Load from the health log table
*/
#define SQL_LOAD_HEALTH_LOG(guid) "SELECT hostname, unique_id, alarm_id, alarm_event_id, config_hash_id, updated_by_id, updates_id, when_key, duration, non_clear_duration, flags, exec_run_timestamp, delay_up_to_timestamp, name, chart, family, exec, recipient, source, units, info, exec_code, new_status, old_status, delay, new_value, old_value, last_repeat, class, component, type, chart_context, transition_id FROM health_log_%s group by alarm_id having max(alarm_event_id);", guid
#define SQL_LOAD_HEALTH_LOG "SELECT hl.unique_id, hl.alarm_id, hl.alarm_event_id, hl.config_hash_id, hl.updated_by_id, hl.updates_id, hl.when_key, hl.duration, hl.non_clear_duration, hl.flags, hl.exec_run_timestamp, hl.delay_up_to_timestamp, hl.name, hl.chart, hl.family, hl.exec, hl.recipient, ah.source, hl.units, hl.info, hl.exec_code, hl.new_status, hl.old_status, hl.delay, hl.new_value, hl.old_value, hl.last_repeat, ah.class, ah.component, ah.type, hl.chart_context, hl.transition_id FROM health_log hl, alert_hash ah where hl.config_hash_id = ah.hash_id and host_id = @host_id group by hl.alarm_id having max(hl.alarm_event_id);"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like one of previous comment, I suggest to reorganize fields per table.

@@ -1208,7 +1187,7 @@ int sql_health_get_last_executed_event(RRDHOST *host, ALARM_ENTRY *ae, RRDCALC_S
return ret;
}

#define SQL_SELECT_HEALTH_LOG(guid) "SELECT hostname, unique_id, alarm_id, alarm_event_id, config_hash_id, updated_by_id, updates_id, when_key, duration, non_clear_duration, flags, exec_run_timestamp, delay_up_to_timestamp, name, chart, family, exec, recipient, source, units, info, exec_code, new_status, old_status, delay, new_value, old_value, last_repeat, class, component, type, chart_context, transition_id FROM health_log_%s WHERE 1=1 ", guid
#define SQL_SELECT_HEALTH_LOG "SELECT hl.unique_id, hl.alarm_id, hl.alarm_event_id, hl.config_hash_id, hl.updated_by_id, hl.updates_id, hl.when_key, hl.duration, hl.non_clear_duration, hl.flags, hl.exec_run_timestamp, hl.delay_up_to_timestamp, hl.name, hl.chart, hl.family, hl.exec, hl.recipient, ah.source, hl.units, hl.info, hl.exec_code, hl.new_status, hl.old_status, hl.delay, hl.new_value, hl.old_value, hl.last_repeat, ah.class, ah.component, ah.type, hl.chart_context, hl.transition_id FROM health_log hl, alert_hash ah WHERE hl.config_hash_id = ah.hash_id and host_id = @host_id "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you accept suggestion to reorganize, please, reorganize this too.

@MrZammler MrZammler marked this pull request as draft June 14, 2023 20:20
@MrZammler MrZammler marked this pull request as ready for review June 20, 2023 15:17
@thiagoftsm thiagoftsm self-requested a review June 21, 2023 12:06
Copy link
Contributor

@thiagoftsm thiagoftsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR is working as expected, LGTM!

@stelfrag stelfrag merged commit 6e1e97c into netdata:master Jun 21, 2023
128 of 129 checks passed
@MrZammler MrZammler mentioned this pull request Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants