Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Fixing and refactoring transaction retry logic in dbapi. Also adding interceptors support for testing #1056

Merged
merged 11 commits into from Jan 17, 2024

Conversation

ankiaga
Copy link
Contributor

@ankiaga ankiaga commented Dec 15, 2023

We found couple of bugs in the existing retry logic, the major one being b/315807641
Other one being assuming tuple to be a list (line 316 in old connection.py) and comparing which would never work.
These issues were not caught as there are no proper tests for these.

So this PR is fixing those issues. Also the current code was not very easy to understand so this PR also refactor the retry logic from connection.py to transaction_helper.py along with the fixes. Tests from test_connection.py have also moved to test_transaction_helper.py along with many new tests (mainly on batch statements) added

Also added interceptors support so we can test retry logic in system tests. Added system tests for the same

@ankiaga ankiaga requested review from a team as code owners December 15, 2023 01:32
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. api: spanner Issues related to the googleapis/python-spanner API. labels Dec 15, 2023
Copy link
Contributor

@olavloite olavloite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a description to the PR what the faulty behavior was that you found, and how you fixed it? The PR is relatively big and complex, so it would be good to have some context.

Also: After an initial pass, it is very clear that this feature does not have a good enough test coverage. There are some big problems with the current implementation (probably much of it was already there), and the fact that none of the tests have caught that is a clear sign that it is not tested well enough.

tests/system/test_dbapi.py Show resolved Hide resolved
tests/unit/spanner_dbapi/test_connection.py Show resolved Hide resolved
google/cloud/spanner_dbapi/transaction_helper.py Outdated Show resolved Hide resolved
google/cloud/spanner_dbapi/transaction_helper.py Outdated Show resolved Hide resolved
@ankiaga ankiaga added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 16, 2023
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 16, 2023
@ankiaga ankiaga changed the title fix: Fixing and refactoring transaction retry logic in dbapi feat: Fixing and refactoring transaction retry logic in dbapi. Also adding interceptors support for testing Dec 20, 2023
google/cloud/spanner_dbapi/connection.py Outdated Show resolved Hide resolved
google/cloud/spanner_dbapi/cursor.py Outdated Show resolved Hide resolved
google/cloud/spanner_dbapi/cursor.py Show resolved Hide resolved
google/cloud/spanner_dbapi/cursor.py Show resolved Hide resolved
google/cloud/spanner_dbapi/transaction_helper.py Outdated Show resolved Hide resolved
google/cloud/spanner_dbapi/transaction_helper.py Outdated Show resolved Hide resolved
google/cloud/spanner_dbapi/transaction_helper.py Outdated Show resolved Hide resolved
tests/unit/spanner_dbapi/test_connection.py Show resolved Hide resolved
tests/system/test_dbapi.py Show resolved Hide resolved
tests/system/test_dbapi.py Outdated Show resolved Hide resolved
tests/system/test_dbapi.py Outdated Show resolved Hide resolved
@olavloite
Copy link
Contributor

My general impression is that we need a more extensive test suite for transaction retries. Tests that I would like to see are:

  1. Tests that show that a retry fails if the underlying data has changed.
  2. Tests that show that a retry succeeds if the underlying data has remained unchanged. The test should include a query that selects data from a table that uses all data types that are supported by Cloud Spanner. This will show that the checksum calculation works correctly for all data types.
  3. Tests that verify that the same retry as in test 2 fails if one of the columns in the result has changed. This also verifies that the checksum calculation is 'correct' for all data types in the sense that it produces a different outcome if a value has changed.
  4. (Potentially in a separate PR) A test that shows that a Batch DML that contains N elements and that successfully executes M=N-X of the DML statements and returns an error for statement number M+1, is correctly handled. That is: During a retry, both the M update counts + the error must be the same for the retry to be successful, and otherwise it should fail.
  5. (Potentially in a separate PR) A test that shows that a fetchxxx call that returns an Aborted error is correctly handled. 'Correctly' in this case means either:
    5.1. Propagates the Aborted error to the user without retrying
    5.2. OR: Internally retries the transaction and continues the cursor from the same point.

@ankiaga
Copy link
Contributor Author

ankiaga commented Jan 10, 2024

My general impression is that we need a more extensive test suite for transaction retries. Tests that I would like to see are:

  1. Tests that show that a retry fails if the underlying data has changed.
  2. Tests that show that a retry succeeds if the underlying data has remained unchanged. The test should include a query that selects data from a table that uses all data types that are supported by Cloud Spanner. This will show that the checksum calculation works correctly for all data types.
  3. Tests that verify that the same retry as in test 2 fails if one of the columns in the result has changed. This also verifies that the checksum calculation is 'correct' for all data types in the sense that it produces a different outcome if a value has changed.
  4. (Potentially in a separate PR) A test that shows that a Batch DML that contains N elements and that successfully executes M=N-X of the DML statements and returns an error for statement number M+1, is correctly handled. That is: During a retry, both the M update counts + the error must be the same for the retry to be successful, and otherwise it should fail.
  5. (Potentially in a separate PR) A test that shows that a fetchxxx call that returns an Aborted error is correctly handled. 'Correctly' in this case means either:
    5.1. Propagates the Aborted error to the user without retrying
    5.2. OR: Internally retries the transaction and continues the cursor from the same point.

Thanks for spending time and figuring out all tests missing here. As discussed created a bug https://buganizer.corp.google.com/issues/319404848 for this

@olavloite
Copy link
Contributor

Thanks for your patience and perseverance on this 👍

@ankiaga ankiaga added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 16, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 16, 2024
@ankiaga ankiaga enabled auto-merge (squash) January 16, 2024 10:32
@ankiaga ankiaga added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 17, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 17, 2024
@ankiaga ankiaga merged commit 6640888 into googleapis:main Jan 17, 2024
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/python-spanner API. size: xl Pull request size is extra large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants