-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metric v2 migration #42613
Metric v2 migration #42613
Conversation
Include commented out code for removing cards not consuming queries.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but probably worth thinking about the couple questions.
- column: | ||
name: dataset_query_metrics_v2_migration_backup | ||
remarks: The copy of dataset_query before the metrics v2 migration | ||
type: ${text.type} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should be a separate table [report-card-id, old-query]
Having this hang around on report_card could be rather annoying long-term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was considering this too. Do you know of prior art? I assumed we would drop that field a few releases later. We would have to do that with a separate table too, I guess. I wonder if such an extra column can mess up serdes (although the tests don't fail).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my concern was serdes as well, just from the cleanliness of the serialized cards (in a git backed serdes system) I think it would probably be better separate if it's not too much more cumbersome
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think creating a separate table is ultimately going to be more code and more noise, I think we should just have SerDes ignore the backup column entirely. It's only there in case we broke things anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Populating the backup column with values can and should be done in Liquibase. Look at migration 85 for example. Only do stuff in Clojure migrations if they can't be done in SQL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@camsaul, the custom migration populates this field only if it rewrites dataset_query
. Are you suggesting always setting this field, or adding a condition approximating what's in the custom migration or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my other comment, I think it's probably fine to just always set this Field since it's just copying pointers and we're already doing a full table scan anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that imply that on rollback we lose changes that we could keep? I ranked the importance of this higher than speed.
I think you have to put all of your |
(:id card) | ||
(-> rewritten | ||
(assoc :dataset_query_metrics_v2_migration_backup (:dataset_query card) | ||
:updated_at (Instant/now)))))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's usually better to use current_timestamp
or now()
(or now(3)
for MySQL/MariaDB to get millisecond precision, altho that's probably not important in this case) so you don't have to worry about weirdness happening here with timezone conversion if this is a datetime
or timestamp [without time zone]
column but I guess it doesn't really matter that much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in fact, I'm not 100% sure that updating this column is what everybody would wish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed this occurrence altogether, but there is another one. Is this important? As long as setting the timestamp works, I think we are fine.
(t2/update! :report_card | ||
(:id card) | ||
(-> rewritten | ||
(assoc :dataset_query_metrics_v2_migration_backup (:dataset_query card) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably easier just do
UPDATE report_card
SET dataset_query_metrics_v2_migration_backup = dataset_query
or
UPDATE report_card
SET dataset_query_metrics_v2_migration_backup = dataset_query
WHERE dataset_query LIKE '%["metric",%'
in a Liquibase migration to back up all the Card queries in one go, then you have a discrete step that is super easy to reverse because all you need to do is
UPDATE report_card
SET dataset_query = dataset_query_metrics_v2_migration_backup
WHERE dataset_query_metrics_v2_migration_backup IS NOT NULL
those should both be reasonably quick (other than the full table scan) since the strings should be stored separately anyway so it's just copying pointers rather than entire strings.
Not sure if
WHERE dataset_query LIKE '%["metric",%'
would actually make it faster, on the one hand you're avoiding a lot of updates but on the other hand now you have to fetch and scan every string. It's simpler just to copy the value for every Card
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think knowing which cards actually got rewritten (and might have to be restored) is more important than speed, but I can be convinced otherwise.
test/metabase/models/card_test.clj
Outdated
:made_public_by_id | ||
;; we don't expect a description for this column because it should never change | ||
;; once created by the migration | ||
:dataset_query_metrics_v2_migration_backup nil} col) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same suggestion, just change assertion from =
to =?
instead of adding this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be a bit more precise what you mean? This is an explicit set of fields we want to exclude from testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than relatively small style suggestions my main concern here is that the migration is doing things really inefficiently, you're doing things like
(doseq [card (select :report_card :collection_id 1)]
(t2/delete! :report_card :id (:id card))
which is pretty slow when you could have just done
(t2/delete! :report_card :collection_id 1)
instead. This can be pretty important since it affects startup time and we're dealing with potentially hundreds of things that need migration -- one DML query versus 100 is a pretty big performance difference at startup time.
But it would have been even better to make most of these things plain SQL/Liquibase migrations in the first place anyway. It's way easier to write efficient and unbuggy SQL queries than it is to write Clojure migrations.
Creating the migrations collection, granting it perms, and populating the backup column for dataset_query
can all be done in Liquibase migrations easily, as can reversing those migrations.
The only thing that really needs to be done as a Clojure migrations is the part where you migrate queries for Cards that referenced v1 Metrics.
Migrating V1 Metrics to Cards could be done in pure SQL if you wanted to. Something like this would probably do the trick
INSERT INTO report_card
(name, type, creator_id, migrated_from_v1_metric_id, dataset_query, database_id, table_id, collection_id)
SELECT
concat(metric.name, ' (Migrated from metric ', cast(metric.id AS text), ')'),
'metric',
metric.creator_id,
metric.id,
-- Maybe using native JSON manipulation functions here would be easier
concat(
'{"database":',
cast(t.db_id AS text),
',"type":"query","query":',
concat(
trim(trailing '}' FROM metric.definition),
',"source-table":',
metric.table_id,
'}'
),
'}'
),
t.db_id,
metric.table_id,
(SELECT id FROM collection WHERE slug = 'migrated_metrics_v1')
FROM metric
LEFT JOIN metabase_table t
ON metric.table_id = t.id;
You could add a new column called migrated_from_v1_metric_id
and then build a map of migrated_from_v1_metric_id => id
to migrate queries in Clojure land.
For the Clojure migrations select-reducible
+ reduce
is going to be a lot more efficient than select
and fetching everything all in memory at once and then iterating over it with doseq
, even more so if you only fetch the columns you actually need
Ok I was thinking about doing the Metric => Card migration in SQL a little more and the only real tricky part here is populating jsonb_set(
jsonb_set(
jsonb_set('{"type":"query"}'::jsonb,
'{database}', to_jsonb(t.db_id)),
'{query}', metric.definition::jsonb),
'{query, source-table}', to_jsonb(metric.table_id)
)::text This seems a lot more readable to me, the only downside is you'd have to do different versions for MySQL and H2, but maybe you'd have to do that anyway. Take a look at |
Ok so I guess I misspoke a bit and the version of H2 we're using doesn't have enough JSON functions to make this work with native JSON manipulation in H2. So I guess you'd have to do |
@camsaul, I have the feeling that either we go all in and do the whole migration in SQL or we keep the question rewriting part in Clojure and then it makes little sense to do the v2 metric card creation in SQL, because that's orders of magnitude less work. I expect the updates to the question cards to dominate the whole process. I'm not super keen on moving the whole thing to SQL, because reproducing all the tests I did with the Clojure land rewrite would be quite a large amount of work. Unless you think it's crucial to do as much as possible in standard liquibase as possible, I would prefer to keep the data manipulating parts of the migration together. |
fixes #42186
fixes #42187
fixes #42188
fixes #42189
fixes #42190
fixes #42191