Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use transaction for CSV replacement atomicity #40610

Closed
wants to merge 5 commits into from

Conversation

crisptrutski
Copy link
Contributor

Refers #38788

Description

This introduces an over-arching transaction when replacing Upload tables, so that any failures during insertion will rollback the preceding truncation.

[driver db-id & body]
`(with-transaction* ~driver ~db-id (fn [] ~@body)))

(defmulti delete!
Copy link
Contributor Author

@crisptrutski crisptrutski Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contrary to what an LLM told me to expect, I couldn't get TRUNCATE to work reliably within a transaction. Given there were already some semantic mismatches between databases with this command, the camel's back broke and I opted for a more general DELETE wrapper.

Supporting the WHERE component is left out for now, as I'm not sure how expressive we want it to be, and whether we want to support any other options. Supporting the syntactic flexibility of toucan2 might be nice, but seems non-trivial even if we reuse some of that library's internals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, even with this change it seems that MariaDB is still hitting some kind of deadlock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems strange to support delete without conditions.

and with our promise to having a 3-deprecation cycle on every method changes I suspect we'll come to a point where we introduce a new delete-where! defmulti just to get around that.

How about we take it a step further and implement it now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we could add no-condition delete now and let one of the & args be a condition map in 51 without breaking our promises about deprecation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the idea is that args will leave things open, and we won't need to depreciate. Whether we add a new method or require implementations to handle new parameters, it can still block an upgrade.

I'm a bit loathe to jump into the details of how we filter, as doing it as generally as toucan does will be complex, and it's unclear how flexibly we'll ever need it to be.

Another option is to keep calling this method truncate. Semantically that's what it's doing, but violates least surprise that it's not how it's implemented.

@crisptrutski crisptrutski added the no-backport Do not backport this PR to any branch label Mar 26, 2024
@crisptrutski crisptrutski requested a review from a team March 26, 2024 10:08
@crisptrutski crisptrutski changed the title Use atomic transaction for CSV replacement Use transaction for CSV replacement atomicity Mar 26, 2024
(apply not= (map (partial driver/upload-type->database-type driver) [::upload/varchar-255 ::upload/text])))
(mt/normal-drivers-with-feature :uploads))
(with-mysql-local-infile-off
;; TODO we are abusing this detail in order to trigger a failure, and should decouple these tests in future
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to the test tech debt... working on this next.

Copy link

replay-io bot commented Mar 26, 2024

Status Complete ↗︎
Commit 76e4061
Results
⚠️ 6 Flaky
2375 Passed

;; TODO - Seems like this is only used in a handful of places, consider moving to util namespace
(defn query
"Execute a `honeysql-form` query against `database`, `driver`, and optionally `table`."
([driver database honeysql-form]
(jdbc/query (sql-jdbc.conn/db->pooled-connection-spec database)
(jdbc/query (local-conn database)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't yet rely on queries being transactional anywhere, in fact we do not even run any queries within a transaction. This change is done in the spirit of "least surprise" for code written in the future.

@@ -940,13 +940,26 @@
dispatch-on-initialized-driver
:hierarchy #'hierarchy)

(defmulti truncate!
(defmulti with-transaction*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean to keep both?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also it feels like this could belong to metabase.driver.sql-jdbc.execute along with do-with-connection-with-options

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean by both. Truncate has not been in a release yet, so OK to remove, and the new macro delegates to the new function.

Since this method is part of the public API and used with the other methods in this namespace, it feels like it belongs here. I'll take a look at the other namespace though and mull it over, thanks for the tip.

[driver db-id table-name]
(defmethod driver/with-transaction* :sql-jdbc
[_ db-id thunk]
(jdbc/with-db-transaction [conn (local-conn db-id)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt about using next.jdbc instead? if yes then we might want to change the arity of this defmulti to add another opts argument for next.jdbc/with-transcation

related: #39549

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other options do you anticipate us needing? Since this interface may be used more generally than just JDBC databases I'm not sure we need the power.

Otherwise, return a non-transactional connection."
[db-id]
(or (get *tx-connections* db-id)
(sql-jdbc.conn/db->pooled-connection-spec db-id)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll want to call do-with-connection-with-options here to have all the default setting right. Role, timezone etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just wrapping the call that was already being used by all these methods, and changing that feels orthogonal to this PR.

Copy link
Contributor

@qnkhuat qnkhuat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the upload-related change is good, but I think we should think more carefully about how we want to implement with-transaction. If you want to get this in faster, I propose breaking this into 2 PRs, where in one PR, you introduce a with-transaction multimethod. we probably want QP people to take a look too.

@crisptrutski
Copy link
Contributor Author

I'm not certain we'll get any other use cases for a driver-level transaction, so I don't want to over engineer it. I also worry that this is the wrong level of abstraction to expose given that we plan to extend uploads to non-JDBC drivers in future.

As far as replacing truncate with delete, since I don't have a sense of what other options we'd want, I've also made peace with just leaving that alone as well. It's not the end of the world if that means having both truncate and delete in future, since we can always give a default implementation of the former using the latter.

Ultimately I think we underestimated the complexity here when discussing our options, and the "temporary table" solution now seems preferable to me due to its driver agnosticism and ability to revert the schema changes atomically as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-backport Do not backport this PR to any branch .Team/BackendComponents also known as BEC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants