Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Allow DDL statements in transactions? #31

Closed
olavloite opened this issue Aug 20, 2021 · 5 comments · Fixed by #41
Closed

Discussion: Allow DDL statements in transactions? #31

olavloite opened this issue Aug 20, 2021 · 5 comments · Fixed by #41
Assignees

Comments

@olavloite
Copy link
Collaborator

olavloite commented Aug 20, 2021

The initial (and current) implementation of the go-sql-spanner driver allows DDL statements to be executed using a transaction. That is however slightly confusing, as:

  1. The DDL statement is not really executed as part of the transaction, as Cloud Spanner does not support that. This means that the DDL statement will not be affected by any Commit or Rollback of the transaction, and the changes that are made by a DDL statement in a transaction are directly visible to all other transactions after it has been executed.
  2. This (currently) also means that DDL statements can be executed as part of a read-only transaction.

My personal opinion is that we should disallow both the above as the statement is not part of the transaction. Accepting the statement and executing it outside of the transaction will probably cause more confusion than if we were to return an error in such a case.
Note that a user can always execute a DDL statement when a transaction is active, but they will need to do so directly on the database instead of using a transaction.

query := "CREATE TABLE Singers (SingerId INT64, FirstName STRING(100), LastName STRING(100)) PRIMARY KEY (SingerId)"

// Start a (read/write) transaction.
tx, err := db.BeginTx(context.Background(), &sql.TxOptions{})
if err != nil {
  t.Fatal(err)
}

// This currently works but effectively executes the statement outside of the transaction.
// The question is whether this should always return an error.
res, err := tx.ExecContext(context.Background(), query)

// It would always be possible to execute the following statement, also while the above transaction is
// still active. (Note the difference: The above statement uses `tx`, the below statement uses `db`)
res, err := db.ExecContext(context.Background(), query)

What are your thoughts on this?

@IlyaFaer
Copy link

I second the idea to disallow running DDLs "inside" transaction first of all.

@IlyaFaer
Copy link

IlyaFaer commented Aug 20, 2021

As I see, Go implementation executes a DDL at once, meaning for one DDL it uses one separate request:
https://github.com/cloudspannerecosystem/go-sql-spanner/blob/d4bbe4fc855af9010b03ceb73895e07168b9ec29/driver.go#L288-L297

adminpb.UpdateDatabaseDdlRequest however can accept an array of statements. Here I'd remind how we're processing DDL statements in Python.

When a DDL statement is executed, we're not sending it to the backend. Instead we insert it into internal list, pumping up all the DDLs. When a non-DDL statement is executed or the commit() method is called, we're executing all the pumped up DDL statements with a single request. Note: in autocommit mode DDL statement will be executed at once.

I can't say I like very much how it all looks in code, and in some cases it requires to call commit() method to actually run DDL statements, which can create a feeling that DDLs are executed with transactions, but I propose to consider the variant, as sending DDLs in a grouped form seems to be more frugally to me then executing them one by one.

Python code where it all implemented:
https://github.com/googleapis/python-spanner/blob/b294dbf9e02e807883d4de515eada0f161ce9414/google/cloud/spanner_dbapi/cursor.py#L189-L207

https://github.com/googleapis/python-spanner/blob/b294dbf9e02e807883d4de515eada0f161ce9414/google/cloud/spanner_dbapi/connection.py#L266

So, finalizing it all from user's position: it creates a feeling that DDLs are executed with transactions. I'm okay with raising an error in case DDL is executed while transaction is in progress, but probably for users it'll be easier not to think about DDLs are executed outiside of transactions, and just work like they are a part of transactions.

@olavloite
Copy link
Collaborator Author

DDL Transactions

When a DDL statement is executed, we're not sending it to the backend. Instead we insert it into internal list, pumping up all the DDLs. When a non-DDL statement is executed or the commit() method is called, we're executing all the pumped up DDL statements with a single request. Note: in autocommit mode DDL statement will be executed at once.

I feel that that behavior is also slightly confusing, for two reasons:

  1. It still gives the idea that the DDL statements are part of the transaction, which they are not. Even if they are executed in a batch, they are not atomic. If one of the statements fail, some or all of the other statements may still be executed.
  2. The fact that the DDL statement(s) are executed automatically when a non-DDL statement is executed as part of the transaction, means that (an) invalid DDL statement(s) might cause a completely unrelated statement to fail. So if for example the following statements are executed as part of a transaction:
CREATE TABLE Singers (SingerId INT64, Name STRING(MAX)) PRIMARY KEY (SingerId);
CREATE TABLE Albums (AlbumId INT64, Title STRING(MAX)) PRIMARY KEY (Id); -- Note the mistyped Id column name

-- This will fail with an error that points to the CREATE TABLE Albums statement.
INSERT INTO Singers (id, Name) VALUES (1, 'Test'); 

DDL Batches

Batching DDL statements together is certainly something that you sometimes want, especially as it makes creating a large set of new tables or indexes a lot faster than executing them one by one. This is however different from DDL transactions as a transaction is something that a user would expect to be atomic.

The generic Connection API specification therefore included the ability to define DDL batches using custom SQL statements. So if you want to execute a set of DDL statements as one batch, you would then execute the following set of statements:

START BATCH DDL;
CREATE TABLE Singers (SingerId INT64, Name STRING(MAX)) PRIMARY KEY (SingerId);
CREATE TABLE Albums (AlbumId INT64, Title STRING(MAX)) PRIMARY KEY (AlbumId);
RUN BATCH;

A DDL batch is not intended to be atomic.

This feature would then be implemented as part of #16

@IlyaFaer
Copy link

IlyaFaer commented Aug 20, 2021

The fact that the DDL statement(s) are executed automatically when a non-DDL statement is executed as part of the transaction, means that (an) invalid DDL statement(s) might cause a completely unrelated statement to fail.

It's true. As I said, it creates feeling that you're executing those DDLs in a transaction with other statements.

The generic Connection API specification therefore included the ability to define DDL batches using custom SQL statements. So if you want to execute a set of DDL statements as one batch, you would then execute the following set of statements:

Sounds good. If so, pumping up DDLs into an array will be redundant anyway (as I mentioned, I don't actually like how it looks in the code, so this custom statements idea sounds very good to me). With this in mind, I'm okay to raise an exception in case DDL is executed while in transaction 👍

@hengfengli
Copy link
Collaborator

@olavloite Sorry for the late reply. I agree with (1) and (2) and should raise an exception in case DDL is executed in a transaction.

olavloite added a commit that referenced this issue Sep 15, 2021
DDL statements cannot be executed as part of a transaction on Cloud Spanner.
Although it would be technically possible to execute a DDL statement while a
transaction is active, it could cause confusion whether the statement is part
of the transaction or not. The driver therefore should return an error if an
application tries to execute a DDL statement during an active transaction.

Fixes #31
olavloite added a commit that referenced this issue Sep 17, 2021
DDL statements cannot be executed as part of a transaction on Cloud Spanner.
Although it would be technically possible to execute a DDL statement while a
transaction is active, it could cause confusion whether the statement is part
of the transaction or not. The driver therefore should return an error if an
application tries to execute a DDL statement during an active transaction.

Fixes #31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants