Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alerting: Move legacy alert migration from sqlstore migration to service #72702

Merged
merged 39 commits into from
Oct 12, 2023

Conversation

JacobsonMT
Copy link
Member

@JacobsonMT JacobsonMT commented Aug 1, 2023

This PR moves the legacy alert migration out of the sqlstore db migrations and into ngalert.

There are multiple reasons behind this change, I'll do my best to explain them here:

Legacy alert migration doesn't fit the definition of a sqlstore migration

Initially, this was a natural place for the migration code to live. It does, in fact, migrate database schema and contents. However, it breaks two key requirements of sqlstore migrations:

  1. It is not guaranteed to run at a specific point in time relative to other migrations.
  2. It changes over time.

These two differences are caused by the fact that the legacy migration is currently optional. Users may decide when to migrate, or even rollback the migration and try again later. This optional status also means we were able to improve the migration over time, thus causing the migration to change.

This has contributed to numerous bugs in the past, either directly and indirectly. To name a few recent ones:

Legacy provisioned alerts cannot currently be migrated in stateless grafana deployments

Since sqlstore migrations run before any provisioning, this means that the migration code does not see that legacy provisioned alerts exist.

By moving this into ngalert.Run() we allow a followup PR to run dashboard provisioning before we start the legacy migration.

The migration code is saving vendored models to the database

This makes some sense if the migration were a true sqlstore migration where followup migrations would update the vendored models to more current ngalert versions. This is not currently what is happening. Instead, we periodically update the vendored models in the main migration in an attempt to keep the vendor in sync with ngalert models.

This is prone to drift and is not obvious in the context of sqlstore migrations.

With the migration code part of the ngalert package, a followup PR can ensure it uses the real ngalert models and drift should be caught alongside any other ngalert code changes we do.

This unlocks features that were not easily done before

If the migration code is in ngalert as a service, then we are free to use it in new/existing endpoints. This opens up interesting feature possibilities. For example (not saying we will do all of these and some are mutually exclusive):

  • No longer migrate legacy alerts on startup, instead make it an endpoint with UI.
    • This button could show you the yaml of all the UA resources that would be created after the migration. The yaml could then be imported into another instances for testing in dev, for example.
    • Or maybe, the UI will let you live toggle between legacy and UA as we migrate/revert over and again as you check out the differences.
  • Create an endpoint that can take legacy dashboard yaml containing alerts and migrate them ad-hoc directly into UA (or maybe convert to UA yaml)
  • Migrate provisioned legacy alerts to provisioned UA alerts with a special provenance (provenance legacy_migration or something). As opposed to most alerts, these legacy_migration provenance alerts can be overwritten on a subsequent migration run. This could allow us to let users keep their as-code provisioned alerts in legacy format for a while if desired.

Edit:

For the purposes of keeping the change self-contained in a single commit, this PR has been combined with a 2nd PR to replace the vendored models in the migration with their equivalent ngalert models. It also replaces the raw SQL selects and inserts with service calls:

  • Dashboard Alerts: 9f5b6b2
    • alertRule -> ngmodels.AlertRule
  • Alertmanager Configuration: 9f5b6b2
    • PostableUserConfig, PostableApiAlertingConfig, PostableApiReceiver, Route, ObjectMatchers, PostableGrafanaReceiver to their apimodels equivalents of the same name.
  • Secure settings encryption: 149ec38
    • util.Encrypt -> secrets.Service.Encrypt
  • Legacy notification channels: 42dcb80
    • notificationChannel -> legacymodels.AlertNotification
  • Dashboards:
    • dashboard -> dashboards.Dashboard d023023
    • Raw SQL -> services a41f32d
  • Datasources: 53b9e47
  • dsUIDLookup -> datasources.CacheService

Some gaps in the testing suite are filled:

  • Migration of alert rules: verifying that the actual data model (queries, conditions) are correct 9a7cfa9
  • Secure settings migration: verifying that secure fields remain encrypted for all available notifiers and certain fields migrate from plain text to encrypted secure settings correctly e7d3993

Replacing the checks for custom dashboard ACLs will be done in a separate targeted PR as it adds functionality instead of moving it around.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 3, 2023

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 2 weeks if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale Issue with no recent activity label Sep 3, 2023
@JacobsonMT JacobsonMT force-pushed the jacobsonmt/migration_as_service branch from d36812c to 73a06ff Compare September 6, 2023 05:27
Copy link
Member Author

@JacobsonMT JacobsonMT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some review notes

NamespaceUID string `xorm:"namespace_uid"`
UID string `xorm:"uid"`
NamespaceUID string `xorm:"namespace_uid"`
DashboardUID *string `xorm:"dashboard_uid"`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added because AddDashboardUIDPanelIDMigration used to be conditional on migration (runs and reverts with migration). Since we cna't rely on it anymore the migration service needs to populate the fields. See 2fdf1d0


// FixEarlyMigration fixes UA configs created before 8.2 with org_id=0 and moves some files like __default__.tmpl.
// The only use of this migration is when a user enabled ng-alerting before 8.2.
func FixEarlyMigration(mg *migrator.Migrator) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to be called RerunDashAlertMigration which was wholly confusing as it didn't actually rerun the migration. Renamed to better represent what it was doing, and removed the condition on migration.

The code is not added to the migration service as it's no longer relevant. See 60d35fb

@@ -492,9 +492,6 @@ func addAlertImageMigrations(mg *migrator.Migrator) {
}

func extractAlertmanagerConfigurationHistoryMigration(mg *migrator.Migrator) {
if !mg.Cfg.UnifiedAlerting.IsEnabled() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This always ran before the migration, so it never did anything to new
legacy migrations anyways.


// UpdateRuleGroupIndexMigration updates a new field rule_group_index for alert rules that belong to a group with more than 1 alert.
func UpdateRuleGroupIndexMigration(mg *migrator.Migrator) {
mg.AddMigration("update group index for alert rules", &updateRulesOrderInGroup{})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved as-is from ualert along with the vendored structs it needed from elsewhere in the migration model.

The only change is here is removing the condition to only add if UA is enabled.

}

// CreateDefaultFoldersForAlertingMigration creates a folder dedicated for alerting if no folders exist
func CreateDefaultFoldersForAlertingMigration(mg *migrator.Migrator) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This migration was removed entirely as it's no longer necessary.

  • The migration service will create the general alerting folder when ran.
  • Creating a new first folder via the rule creation page is simpler than when this migration was created.

@@ -0,0 +1,102 @@
package fakes
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved as-is from notifier.testing so it can be used for migration tests.

var migratedKey = "migrated"

// MigrationServiceMigration moves the legacy alert migration status from the migration log to kvstore.
func MigrationServiceMigration(mg *migrator.Migrator) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary so that we retain the current migration state between when sqlstore was in charge of UA migration and now. See 79781ae

@JacobsonMT JacobsonMT marked this pull request as ready for review September 6, 2023 22:50
@JacobsonMT JacobsonMT requested a review from a team as a code owner September 6, 2023 22:50
@JacobsonMT JacobsonMT requested a review from a team September 6, 2023 22:50
@JacobsonMT JacobsonMT requested review from a team as code owners September 6, 2023 22:50
@JacobsonMT JacobsonMT requested review from rwwiv, yuri-tceretian, grobinson-grafana, papagian, zserge and suntala and removed request for a team September 6, 2023 22:50
Copy link
Member Author

@JacobsonMT JacobsonMT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review notes part 2

@@ -0,0 +1,300 @@
package migration
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved nearly 100% as-is from sqstore, not sure why it's not detected as a move. The only changes are to use the session through the context instead of through folderHelper. Browser diff: https://www.diffchecker.com/ldxQCY9E/

type secureJsonData map[string][]byte

// getEncryptedJsonData returns map where all keys are encrypted.
func getEncryptedJsonData(sjd map[string]string, log log.Logger) secureJsonData {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover from securejsondata.go

@github-actions github-actions bot removed the stale Issue with no recent activity label Sep 7, 2023
Leaves custom ACL logic for followup as it needs to be completely rewritten for
rbac.
This is a needed for upcoming work, so we do it separately for clarity.
Setting dashboard.created_by to -8 as a way to track which folders were
created by the migration is no longer possible. Instead, we store the
newly created folders uids in the kvstore and use it on revert.
If migrationLock feature flag was enabled and the max db
connections was 2 or less, a deadlock would occur. This is
because the GetMigrationLog() method spawns a new session
and when migrationLock is used, the migration is executed
in a nested transaction. Together, these deplete the
connection pool.
@JacobsonMT JacobsonMT force-pushed the jacobsonmt/migration_as_service branch from 117aa81 to 137e6bd Compare October 11, 2023 23:54
@JacobsonMT
Copy link
Member Author

/deploy-to-hg

@ephemeral-instances-bot
Copy link

  • Preparing your instance. A comment containing your instance's url will be added to this PR when the instance is ready.
  • Your instance will be ready in ~10 minutes.
  • Check the GitHub actions tab to follow the workflow progress
  • Slack channel: #proj-ephemeral-hg-instances
  • Building instance with jacobsonmt/migration_as_service oss branch and main enterprise branch. How to choose a branch

@grafana grafana deleted a comment from ephemeral-instances-bot bot Oct 12, 2023
@ephemeral-instances-bot
Copy link

Copy link
Contributor

@yuri-tceretian yuri-tceretian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JacobsonMT JacobsonMT merged commit 82f3127 into main Oct 12, 2023
14 checks passed
@JacobsonMT JacobsonMT deleted the jacobsonmt/migration_as_service branch October 12, 2023 12:43
@zerok zerok modified the milestones: 10.2.x, 10.2.0 Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

3 participants