-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An one time job to init send-pulse trigger and migration down to clean up send-pulse triggers #42316
Conversation
|
@@ -1275,6 +1275,6 @@ | |||
(set-jdbc-backend-properties!) | |||
(let [scheduler (qs/initialize)] | |||
(qs/start scheduler) | |||
(qs/delete-trigger scheduler (triggers/key "metabase.task.send-pulses.job")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urgh, it was swapped.
the fact that we have to do this every time we remove a job class is also bad. I'll explore a way so this is done automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
triggers get deleted with the job, as you said here. So we don't need line 1266
@@ -6603,7 +6603,7 @@ databaseChangeLog: | |||
DELETE FROM data_permissions where perm_type = 'perms/data-access' OR perm_type = 'perms/native-query-editing'; | |||
|
|||
- changeSet: | |||
id: v50.2024-04-25T01:04:04 | |||
id: v50.2024-04-25T01:04:05 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
increase so stats will also run this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the existing issue with pulse triggers and rollbacks I posted about here, this doesn't seem like it will work correctly for customers that rollback a major version upgrade.
Imagine this sequence of events:
- start on 49, with many pulses
- upgrade to 50
- roll back to 49
- add some more pulses
- upgrade to 50 again
- the
InitSendPulseTriggers
job won't execute again, because it's already run once in the past
That means that any pulses created between the rollback and the subsequent upgrade will fail to have any triggers created for them.
What if we paused this instead, and did it in 51, when we don't need to worry about rollbacks from 51 to 49?
(Btw, this is also an issue with #42279 @piranha, but maybe it doesn't matter as much?)
@calherries it indeed seems like a problem, I'd love to solve that, but not sure if quartz has any good solution for us here? We can of course just make "once a day" cron (provided the job is idempotent)... feels a bit dirty, but will do its job. |
how about making quartz auto-delete triggers that don't have a job class? this should make this type of problem error-proof going forward. |
seems sensible, but doesn't help sanya's case now, since 49 is already released @piranha I don't have any better ideas right now, but I would think a once-a-day solution would do the job! Even if it feels dirty. We can remove it after one release cycle too |
ok I have #42383 to do that, I'll backport it to 49 as well so we can still get this in. |
Backporting that to 49 doesn't solve the issue for already released 49 versions. You'll have to wait a full release cycle |
(pulse-channel-test/with-send-pulse-setup! | ||
(let [user-id (:id (new-instance-with-default :core_user)) | ||
pulse-id (:id (new-instance-with-default :pulse {:creator_id user-id})) | ||
pc (new-instance-with-default :pulse_channel {:pulse_id pulse-id})] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use new-instance-with-default
here, rather than inlining the data in the migration? I don't think the DRY principle should apply here. We don't want to couple migrations to each other in case the underlying schema or defaults for the tables change over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's the idea of new-instance-with-default
being defined in this namespace, otherwise we could use t2.with-temp/with-temp-defaults
.
new-instance-with-default
should only have fields that are less likely to be changed, like core_user.email, created_at ...
maybe schedule_type and schedule_hour are not the best candidates, as we plan to remove them. Nevertheless I think some amount of DRY is valuable here as these migrations are often very long and take some amount of setup to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some amount of DRY is valuable here
agreed, DRYing up things to keep code shorter is good, except for when it can cause issues like here
new-instance-with-default should only have fields that are less likely to be changed
true, it was maybe okay for users. but it becomes a problem when we use it for things that are more likely to change, so I think it should be discouraged. Not a strong opinion and I see the upsides too. Just thought I might point it out
@@ -138,26 +137,6 @@ | |||
(pulse-channel-test/send-pulse-triggers pulse-1) | |||
(pulse-channel-test/send-pulse-triggers pulse-2)))))))) | |||
|
|||
(deftest init-will-schedule-triggers-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove this test? As far as I can see we still need it for the new init-send-pulse-triggers!
function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bc it's no longer triggered by calling (task/init! ::task.send-pulses/SendPulses)
it just happens to run now,
but it happens on a different thread.
Also this should be covered by the new migration test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be covered by the new migration test.
They're not exactly equivalent, so I'm just checking we're okay weakening the test coverage. This test covers the property that the pulse trigger's info includes the schedule. The migration test is weaker in that it only tests that there are triggers are created for the pulse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated instead of checking for count, we check for trigger info in the migration test: 0f75110
(defn- init-send-pulse-triggers! | ||
[] | ||
(let [trigger-slot->pc-ids (as-> (t2/select :model/PulseChannel :enabled true) results | ||
(group-by #(select-keys % [:pulse_id :schedule_type :schedule_day :schedule_hour :schedule_frame]) results) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we group simply by pulse_id
instead? I'm concerned that we'll do a migration on the :schedule_
columns in the future and leaving them here will be confusing
(group-by #(select-keys % [:pulse_id :schedule_type :schedule_day :schedule_hour :schedule_frame]) results) | |
(group-by :pulse_id results) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't this removes the confusion from the modeling perspective. once we migrate to corn string we can group by it by then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be really careful with test coverage with these pseudo-migrations that we schedule using quartz jobs. Not only do tests need to cover existing version upgrades, but the tests also need to cover the migration works for all future version upgrades, e.g. going straight from 49 to 51.
init-send-pulse-triggers!
runs once after all migrations finish, and that's the case forever into the future. Currently I only see a test covering the case that it works when upgrading from 49 to 50. That's not enough, given in the future we'll have more major versions and the things the migration depends on (like the schema of pulse channels) can change over time.
I've updated the migration test so it migrates up to the latest migration, so it should work for future migrations: 1dd523d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@calherries updated to make sure we don't schedule triggers for archived dashboards: 1ffe69f |
@qnkhuat Did you forget to add a milestone to the issue for this PR? When and where should I add a milestone? |
A follow-up of #41772.
In that PR we have a method we called
update-send-pulse-trigger-if-needed!
as part oftask/init!
forSendPulses
jobs.It was meant to be a migration to create
SendPulse
triggers for existing triggers that don't have one.It's ideally only run once, so this PR puts it into a job that is only triggered once. Inspired by the BackfillQueryField job from @piranha (see)
This also adds 2 migrations that run only on downgrade to remove the SendPulse job and InitSendPulseTriggers. We need so this to avoid the problem Cal described here.
By having these 2 migrations, if user downgrade, then upgrade, InitSendPulseTriggers will be triggered again and all sendpulse job will be correctly scheduled.
This PR also includes a typo fix for
DeleteSendPulsesTask
migration.