Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implementation of client side statements that return #1046

Merged
merged 11 commits into from
Dec 12, 2023

Conversation

ankiaga
Copy link
Contributor

@ankiaga ankiaga commented Dec 4, 2023

Implementation of SHOW_COMMIT_TIMESTAMP and SHOW_READ_TIMESTAMP client statements

@ankiaga ankiaga requested review from a team as code owners December 4, 2023 04:27
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: spanner Issues related to the googleapis/python-spanner API. labels Dec 4, 2023
@ankiaga ankiaga changed the title Implementation of client side statements that return feat: Implementation of client side statements that return Dec 4, 2023
Copy link

🤖 I detect that the PR title and the commit message differ and there's only one commit. To use the PR title for the commit history, you can use Github's automerge feature with squashing, or use automerge label. Good luck human!

-- conventional-commit-lint bot
https://conventionalcommits.org/

@ankiaga ankiaga added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 4, 2023
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 4, 2023
parsed_statement.client_side_statement_type
== ClientSideStatementType.SHOW_COMMIT_TIMESTAMP
):
if connection.is_closed:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this to the start of this function. I think that we want to do this for any type of statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -23,6 +23,12 @@
RE_BEGIN = re.compile(r"^\s*(BEGIN|START)(TRANSACTION)?", re.IGNORECASE)
RE_COMMIT = re.compile(r"^\s*(COMMIT)(TRANSACTION)?", re.IGNORECASE)
RE_ROLLBACK = re.compile(r"^\s*(ROLLBACK)(TRANSACTION)?", re.IGNORECASE)
RE_SHOW_COMMIT_TIMESTAMP = re.compile(
r"^\s*(SHOW VARIABLE COMMIT_TIMESTAMP)", re.IGNORECASE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this regex accept show variable commit timestamp? And show\nvariable\ncommit\ntimestamp?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully not because commit_timestamp is one token :-) But as to spacing elsewhere, good point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed regex for spacing

bool: True if Spanner transaction started, False otherwise.
"""
def inside_transaction(self):
"""Deprecated property which won't be supported in future versions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
"""Deprecated property which won't be supported in future versions.
"""Deprecated: This property may be removed in a future release.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is deprecated, should it be tagged @deprecated?

(Python does support stacking decorators. Order matters; they are applied in the reverse of the order in which they are listed. @deprecated deprecates a function and @property turns a function into a property, so probably the deprecation should be listed last so that it's evaluated first, on the function rather than its derived property.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Added @deprecated

google/cloud/spanner_dbapi/connection.py Show resolved Hide resolved
@@ -405,13 +397,15 @@ def begin(self):
Marks the transaction as started.

:raises: :class:`InterfaceError`: if this connection is closed.
:raises: :class:`OperationalError`: if there is an existing transaction that has begin or is running
:raises: :class:`OperationalError`: if there is an existing transaction
that has begin or is running
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I know that it's not a change in this PR, but I just noticed it now):

Suggested change
that has begin or is running
that has has been started

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -152,6 +153,87 @@ def test_begin_client_side(self, shared_instance, dbapi_database):
conn3.close()
assert got_rows == [updated_row]

def test_commit_timestamp_client_side(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
def test_commit_timestamp_client_side(self):
def test_commit_timestamp_client_side_transaction(self):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

tests/system/test_dbapi.py Show resolved Hide resolved
Comment on lines 202 to 203
self._cursor.execute("SELECT * FROM contacts")
self._cursor.execute("SELECT * FROM contacts")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this test should run two selects, but the results of both should also be consumed and verified that they contain what we expect.

Also, it would be good to verify that we can see the read timestamp already after executing the first query, and that it stays the same after the second query and after the commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

tests/system/test_dbapi.py Show resolved Hide resolved
tests/system/test_dbapi.py Show resolved Hide resolved
connection.rollback()
return None
if (
parsed_statement.client_side_statement_type
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are going to be lots of these ifs and they'll all be long due to the long variable name, you might consider doing statement_type = parsed_statement.client_side_statement_type at the top of this block of ifs. Then you can use the shorter name throughout here.

You could also do something like Type = ClientSideStatementType. Aliasing a class is a little less common, though. And, I think, probably not necessary to get the line length consistently under the length limit?

Copy link

@aseering aseering Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related, very minor thing:

In Python, everything is interpreted and there's no JIT compiler doing type inference, so every time you do a.b, the Python interpreter has to actually interpret that logic at runtime. Specifically -- go look up the type of a, look up and run code associated with any overrides on a's . operator (for example properties are implemented under the hood as an overload to the dot operator that checks whether the name of the field that you're looking up is the same as the name of a property on the class and, if so, replaces that lookup with a call to the underlying getter(/setter(/deleter)) function), then if still applicable go look up b in a's member dictionary.

As a result:

a.b
a.b
a.b
a.b
(... lots of times ...)

is actually slightly-but-measurably slower than

c = a.b
c
c
c
c
(... lots of times ...)

In most other languages, unless a pointer dereference is required (which is still cheap), nested member-variable access is zero cost at runtime; all of this resolution is sorted out at compile time or by a JIT compiler.

The performance difference is not usually enough (and Python code is usually not performance-sensitive enough) that it matters. But it gently nudges Python to tend to use simple variables rather than complex nested structures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defined a variable statement_type. Thanks for the explanation, was not aware of it. Will try to remember and take care of this in future code

@@ -23,6 +23,12 @@
RE_BEGIN = re.compile(r"^\s*(BEGIN|START)(TRANSACTION)?", re.IGNORECASE)
RE_COMMIT = re.compile(r"^\s*(COMMIT)(TRANSACTION)?", re.IGNORECASE)
RE_ROLLBACK = re.compile(r"^\s*(ROLLBACK)(TRANSACTION)?", re.IGNORECASE)
RE_SHOW_COMMIT_TIMESTAMP = re.compile(
r"^\s*(SHOW VARIABLE COMMIT_TIMESTAMP)", re.IGNORECASE
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully not because commit_timestamp is one token :-) But as to spacing elsewhere, good point.

bool: True if Spanner transaction started, False otherwise.
"""
def inside_transaction(self):
"""Deprecated property which won't be supported in future versions.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is deprecated, should it be tagged @deprecated?

(Python does support stacking decorators. Order matters; they are applied in the reverse of the order in which they are listed. @deprecated deprecates a function and @property turns a function into a property, so probably the deprecation should be listed last so that it's evaluated first, on the function rather than its derived property.)

self._read_request_count += 1
self._execute_sql_count += 1

if self._read_only and not transaction_id_set:
peek = next(iterator)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this advancing the iterator by one?

It looks to me like line 485 is effectively rewinding the iterator? So this does implement the peek idiom (assuming no exception is thrown between here and there), though I'd agree that it's confusing.

Reading the implementation of _restart_on_unavailable, though: Do I read correctly that it's implemented by buffering the full resultset in memory? I'm curious if you know why we do that rather than yielding results as they arrive and, in case of ServiceUnavailable, just replay to the point where we left off and keep streaming? The semantics would be slightly different, so there might be a good reason for this choice.

(I started reading that function because I was wondering if this was a custom iterator and if we could add a peek() method to it. It looks to me like the current algorithm would in fact support peek() just fine (because it buffers all results in memory anyway), but it's not a custom iterator so it wouldn't be trivial to add such a method. The more-itertools library contains a peekable wrapper that would do this for us, though that would add a library dependency.)

self._read_request_count += 1
self._execute_sql_count += 1

if self._read_only and not transaction_id_set:
peek = next(iterator)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this fail if the query returns no results? (IIRC next() with no second/default argument throws a StopIteration exception or something if there's nothing left to return?) Should it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There would be one result (PartialResultSet object) where metadata field would be present and values would be an empty list

@ankiaga ankiaga added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 7, 2023
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 7, 2023
return _get_streamed_result_set(
ClientSideStatementType.SHOW_COMMIT_TIMESTAMP.name,
TypeCode.TIMESTAMP,
connection._transaction.committed,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if _transaction is None? Or if the transaction has not yet committed? Will this then return a result set containing a single row/column containing a NULL? Or something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made the change to return an empty result set for both cases

return _get_streamed_result_set(
ClientSideStatementType.SHOW_READ_TIMESTAMP.name,
TypeCode.TIMESTAMP,
connection._snapshot._transaction_read_timestamp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here as above? What happens is _snapshot is None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same response as above

Comment on lines 39 to 42
"This method is non-operational as transaction has not been started at " "client."
)
SPANNER_TRANSACTION_NOT_STARTED_WARNING = (
"This method is non-operational as transaction has not been started at " "spanner."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should make a difference between these two in any warnings that we return to the user. Whether a transaction has been started on Spanner or not, is essentially an implementation detail.

(Note: Internally, we can make this difference. But I don't think we should communicate it like that to a user.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this warning as not needed now

made atleast one call to Spanner. Property client_transaction_started
would always be true if this is true as transaction has to start first
at clientside than at Spanner
def ddl_statements(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a new property? If so, any specific reason that we are adding this in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intellij was giving a suggestion so created it. Let me know if I should revert it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would in that case at least make it internal (so let it start with an underscore). We should be as conservative as possible when it comes to adding public methods and properties to the interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the property as there is no different in accessing a property or a field in python when they both start with underscore and Intellij will give the same warning

if not self._client_transaction_started:
warnings.warn(
CLIENT_TRANSACTION_NOT_STARTED_WARNING, UserWarning, stacklevel=2
)
return
if not self._spanner_transaction_started:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: I don't think we want to make this difference, but also this should be supported:

begin;
rollback;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the warning and added the system test test_begin_and_rollback to test the mentioned use case

tests/system/test_dbapi.py Show resolved Hide resolved
assert len(got_rows[0]) == 1
assert len(self._cursor.description) == 1
assert self._cursor.description[0].name == "SHOW_READ_TIMESTAMP"
assert isinstance(got_rows[0][0], DatetimeWithNanoseconds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add another query to this test, and then verify that the next query gives us a new read timestamp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

tests/system/test_dbapi.py Show resolved Hide resolved
tests/unit/spanner_dbapi/test_connection.py Show resolved Hide resolved

CONNECTION_CLOSED_ERROR = "This connection is closed"
TRANSACTION_NOT_STARTED_WARNING = (
"This method is non-operational as transaction has not been started."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
"This method is non-operational as transaction has not been started."
"This method is non-operational as a transaction has not been started."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -35,7 +36,7 @@


CLIENT_TRANSACTION_NOT_STARTED_WARNING = (
"This method is non-operational as transaction has not started"
"This method is non-operational as transaction has not been started."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
"This method is non-operational as transaction has not been started."
"This method is non-operational as a transaction has not been started."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

google/cloud/spanner_dbapi/connection.py Outdated Show resolved Hide resolved
tests/unit/spanner_dbapi/test_connection.py Show resolved Hide resolved
@ankiaga ankiaga enabled auto-merge (squash) December 11, 2023 17:41
@ankiaga ankiaga added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 12, 2023
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 12, 2023
@ankiaga ankiaga merged commit bb5fa1f into googleapis:main Dec 12, 2023
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/python-spanner API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants