Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An one time job to init send-pulse trigger and migration down to clean up send-pulse triggers #42316

Merged
merged 21 commits into from
May 15, 2024

Conversation

qnkhuat
Copy link
Contributor

@qnkhuat qnkhuat commented May 7, 2024

A follow-up of #41772.

In that PR we have a method we called update-send-pulse-trigger-if-needed! as part of task/init! for SendPulses jobs.

It was meant to be a migration to create SendPulse triggers for existing triggers that don't have one.

It's ideally only run once, so this PR puts it into a job that is only triggered once. Inspired by the BackfillQueryField job from @piranha (see)

This also adds 2 migrations that run only on downgrade to remove the SendPulse job and InitSendPulseTriggers. We need so this to avoid the problem Cal described here.

By having these 2 migrations, if user downgrade, then upgrade, InitSendPulseTriggers will be triggered again and all sendpulse job will be correctly scheduled.

This PR also includes a typo fix for DeleteSendPulsesTask migration.

@qnkhuat qnkhuat requested a review from camsaul as a code owner May 7, 2024 08:44
@qnkhuat qnkhuat requested a review from calherries May 7, 2024 08:44
@metabase-bot metabase-bot bot added the .Team/BackendComponents also known as BEC label May 7, 2024
Copy link

replay-io bot commented May 7, 2024

Status Complete ↗︎
Commit 4284e2a
Results
⚠️ 3 Flaky
2505 Passed

@@ -1275,6 +1275,6 @@
(set-jdbc-backend-properties!)
(let [scheduler (qs/initialize)]
(qs/start scheduler)
(qs/delete-trigger scheduler (triggers/key "metabase.task.send-pulses.job"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urgh, it was swapped.

the fact that we have to do this every time we remove a job class is also bad. I'll explore a way so this is done automatically.

Copy link
Contributor

@calherries calherries May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

triggers get deleted with the job, as you said here. So we don't need line 1266

@@ -6603,7 +6603,7 @@ databaseChangeLog:
DELETE FROM data_permissions where perm_type = 'perms/data-access' OR perm_type = 'perms/native-query-editing';

- changeSet:
id: v50.2024-04-25T01:04:04
id: v50.2024-04-25T01:04:05
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

increase so stats will also run this

@qnkhuat qnkhuat requested a review from a team May 8, 2024 08:00
@qnkhuat qnkhuat added the no-backport Do not backport this PR to any branch label May 8, 2024
Copy link
Contributor

@calherries calherries left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the existing issue with pulse triggers and rollbacks I posted about here, this doesn't seem like it will work correctly for customers that rollback a major version upgrade.

Imagine this sequence of events:

  • start on 49, with many pulses
  • upgrade to 50
  • roll back to 49
  • add some more pulses
  • upgrade to 50 again
  • the InitSendPulseTriggers job won't execute again, because it's already run once in the past

That means that any pulses created between the rollback and the subsequent upgrade will fail to have any triggers created for them.

What if we paused this instead, and did it in 51, when we don't need to worry about rollbacks from 51 to 49?

(Btw, this is also an issue with #42279 @piranha, but maybe it doesn't matter as much?)

@piranha
Copy link
Contributor

piranha commented May 8, 2024

@calherries it indeed seems like a problem, I'd love to solve that, but not sure if quartz has any good solution for us here? We can of course just make "once a day" cron (provided the job is idempotent)... feels a bit dirty, but will do its job.

@qnkhuat
Copy link
Contributor Author

qnkhuat commented May 8, 2024

how about making quartz auto-delete triggers that don't have a job class? this should make this type of problem error-proof going forward.

@calherries
Copy link
Contributor

calherries commented May 8, 2024

how about making quartz auto-delete triggers that don't have a job class

seems sensible, but doesn't help sanya's case now, since 49 is already released

@piranha I don't have any better ideas right now, but I would think a once-a-day solution would do the job! Even if it feels dirty. We can remove it after one release cycle too

@qnkhuat
Copy link
Contributor Author

qnkhuat commented May 8, 2024

ok I have #42383 to do that, I'll backport it to 49 as well so we can still get this in.

@calherries
Copy link
Contributor

Backporting that to 49 doesn't solve the issue for already released 49 versions. You'll have to wait a full release cycle

@qnkhuat qnkhuat changed the title Add a job to init send pulse trigger only once An one time job to init send-pulse trigger and migration down to clean up send-pulse triggers May 9, 2024
@qnkhuat qnkhuat requested a review from calherries May 9, 2024 09:07
(pulse-channel-test/with-send-pulse-setup!
(let [user-id (:id (new-instance-with-default :core_user))
pulse-id (:id (new-instance-with-default :pulse {:creator_id user-id}))
pc (new-instance-with-default :pulse_channel {:pulse_id pulse-id})]
Copy link
Contributor

@calherries calherries May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use new-instance-with-default here, rather than inlining the data in the migration? I don't think the DRY principle should apply here. We don't want to couple migrations to each other in case the underlying schema or defaults for the tables change over time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's the idea of new-instance-with-default being defined in this namespace, otherwise we could use t2.with-temp/with-temp-defaults.

new-instance-with-default should only have fields that are less likely to be changed, like core_user.email, created_at ...

maybe schedule_type and schedule_hour are not the best candidates, as we plan to remove them. Nevertheless I think some amount of DRY is valuable here as these migrations are often very long and take some amount of setup to test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some amount of DRY is valuable here

agreed, DRYing up things to keep code shorter is good, except for when it can cause issues like here

new-instance-with-default should only have fields that are less likely to be changed

true, it was maybe okay for users. but it becomes a problem when we use it for things that are more likely to change, so I think it should be discouraged. Not a strong opinion and I see the upsides too. Just thought I might point it out

@@ -138,26 +137,6 @@
(pulse-channel-test/send-pulse-triggers pulse-1)
(pulse-channel-test/send-pulse-triggers pulse-2))))))))

(deftest init-will-schedule-triggers-test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove this test? As far as I can see we still need it for the new init-send-pulse-triggers! function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bc it's no longer triggered by calling (task/init! ::task.send-pulses/SendPulses) it just happens to run now, but it happens on a different thread.

Also this should be covered by the new migration test.

Copy link
Contributor

@calherries calherries May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be covered by the new migration test.

They're not exactly equivalent, so I'm just checking we're okay weakening the test coverage. This test covers the property that the pulse trigger's info includes the schedule. The migration test is weaker in that it only tests that there are triggers are created for the pulse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated instead of checking for count, we check for trigger info in the migration test: 0f75110

@qnkhuat qnkhuat requested a review from noahmoss as a code owner May 9, 2024 11:53
@qnkhuat qnkhuat requested a review from calherries May 9, 2024 11:54
(defn- init-send-pulse-triggers!
[]
(let [trigger-slot->pc-ids (as-> (t2/select :model/PulseChannel :enabled true) results
(group-by #(select-keys % [:pulse_id :schedule_type :schedule_day :schedule_hour :schedule_frame]) results)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we group simply by pulse_id instead? I'm concerned that we'll do a migration on the :schedule_ columns in the future and leaving them here will be confusing

Suggested change
(group-by #(select-keys % [:pulse_id :schedule_type :schedule_day :schedule_hour :schedule_frame]) results)
(group-by :pulse_id results)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't this removes the confusion from the modeling perspective. once we migrate to corn string we can group by it by then.

Copy link
Contributor

@calherries calherries left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be really careful with test coverage with these pseudo-migrations that we schedule using quartz jobs. Not only do tests need to cover existing version upgrades, but the tests also need to cover the migration works for all future version upgrades, e.g. going straight from 49 to 51.

init-send-pulse-triggers! runs once after all migrations finish, and that's the case forever into the future. Currently I only see a test covering the case that it works when upgrading from 49 to 50. That's not enough, given in the future we'll have more major versions and the things the migration depends on (like the schema of pulse channels) can change over time.

@qnkhuat
Copy link
Contributor Author

qnkhuat commented May 10, 2024

I've updated the migration test so it migrates up to the latest migration, so it should work for future migrations: 1dd523d

@qnkhuat qnkhuat requested review from calherries and a team May 10, 2024 11:09
Copy link
Contributor

@calherries calherries left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qnkhuat
Copy link
Contributor Author

qnkhuat commented May 14, 2024

@calherries updated to make sure we don't schedule triggers for archived dashboards: 1ffe69f

@qnkhuat qnkhuat enabled auto-merge (squash) May 15, 2024 05:08
@qnkhuat qnkhuat merged commit fe1f595 into master May 15, 2024
111 checks passed
@qnkhuat qnkhuat deleted the init-trigger-once branch May 15, 2024 05:42
Copy link

@qnkhuat Did you forget to add a milestone to the issue for this PR? When and where should I add a milestone?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-backport Do not backport this PR to any branch .Team/BackendComponents also known as BEC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants