Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implicitly delete obsolete FieldValues when fetching the current ones #38396

Merged
merged 13 commits into from
Feb 13, 2024

Conversation

crisptrutski
Copy link
Contributor

@crisptrutski crisptrutski commented Feb 2, 2024

Relates to #668

Description

Before this change it was non-deterministic which FieldValues would be used in the presence of stale records. This change ensures that we always use the values that were last inserted, and delete the older rows in the process.

It also fixes a query that failed to specify the :type, and so was totally non-deterministic.

Checklist

  • Improve test quality
  • Implement a Snowplow event to track when customer databases encounter duplicates Going to use idempotent operations for all inserts, which will make it safe to insert the constraint. Don't think events will be worth the faff.

@crisptrutski crisptrutski marked this pull request as draft February 2, 2024 15:23
@metabase-bot metabase-bot bot added the .Team/BackendComponents also known as BEC label Feb 2, 2024
@crisptrutski crisptrutski requested a review from a team February 2, 2024 15:24
Copy link

replay-io bot commented Feb 2, 2024

Status In Progress ↗︎ 51 / 52
Commit ff00a9b
Results
⚠️ 2 Flaky
2307 Passed

Copy link
Contributor

@calherries calherries left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solid start

@qnkhuat
Copy link
Contributor

qnkhuat commented Feb 5, 2024

Is it better to write a migration to delete duplicates then add a unique constraint on type, hash?

@crisptrutski
Copy link
Contributor Author

crisptrutski commented Feb 12, 2024

Is it better to write a migration to delete duplicates then add a unique constraint on type, hash?

That's the end goal - see discussion on #37794. This is a short term idea.

@crisptrutski crisptrutski added the no-backport Do not backport this PR to any branch label Feb 12, 2024
@crisptrutski crisptrutski marked this pull request as ready for review February 12, 2024 14:49
@@ -493,7 +516,8 @@
(defmethod serdes/load-find-local "FieldValues" [path]
;; Delegate to finding the parent Field, then look up its corresponding FieldValues.
(let [field (serdes/load-find-local (pop path))]
(t2/select-one FieldValues :field_id (:id field))))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this fixes a subtle bug where we could lose a customer's custom field labels

Copy link
Contributor Author

Current dependencies on/for this PR:

This stack of pull requests is managed by Graphite.

@crisptrutski crisptrutski requested a review from a team February 12, 2024 16:49
Copy link
Contributor

@calherries calherries left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Just left some nits. This is strictly better than the current situation where there's unpredictable behaviour in the case of duplicate FieldValues instances. The downside of this approach is it's not particularly future-proof to stop the original problem happening again. We'll need the uniqueness constraint and a toucan hook to require :type in a select for that.

then (t/plus now (t/millis 1))]
(mt/with-temp [Database {database-id :id} {}
Table {table-id :id} {:db_id database-id}
Field {field-id :id} {:table_id table-id}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Field {field-id :id} {:table_id table-id}
Field {field-id :id} {:table_id table-id}

(deftest implicit-deduplication-test
(let [now (t/zoned-date-time)
then (t/plus now (t/millis 1))]
(mt/with-temp [Database {database-id :id} {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: technically vars for model identifiers are deprecated and apparently meant to be removed one day. We should honor that (even if I don't believe it will happen soon)

e.g.

Suggested change
(mt/with-temp [Database {database-id :id} {}
(mt/with-temp [:model/Database {database-id :id} {}

test/metabase/models/field_values_test.clj Outdated Show resolved Hide resolved
test/metabase/models/field_values_test.clj Show resolved Hide resolved
src/metabase/models/field_values.clj Outdated Show resolved Hide resolved
Copy link
Contributor Author

The plan is indeed to follow with a db constraint and hooks, I just feel it's worth fixing the pain sooner than later, and the new "higher level" mutation methods will encapsulate the retries for concurrent inserts.

@crisptrutski crisptrutski enabled auto-merge (squash) February 13, 2024 11:15
@crisptrutski crisptrutski enabled auto-merge (squash) February 13, 2024 11:15
@crisptrutski crisptrutski merged commit 5bc74cd into master Feb 13, 2024
109 checks passed
@crisptrutski crisptrutski deleted the dont-forget-full branch February 13, 2024 13:07
Copy link

@crisptrutski Did you forget to add a milestone to the issue for this PR? When and where should I add a milestone?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-backport Do not backport this PR to any branch .Team/BackendComponents also known as BEC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants