New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UNAVAILABLE (503) error was not retried on MutateRows #115
Comments
Is this a reliable failure? |
I think this should be a retriable error. I think the problem is that when we migrated to service config json files, we removed retry configs. The reasoning was that service configs will eventually get pulled into grpc and grpc will never be able to handle smart retries for bulk mutations and read row stream resumption. Unless I'm misunderstanding the code, it seems like this client still relies on the service config file for retry settings: python-bigtable/google/cloud/bigtable/table.py Lines 1068 to 1071 in 1fbadf9
|
I have started looking into this and it seems we do set retries to none on grpc, instead managing retries in the client code. That said, I am digging into an area I think we may not know how to retry if we experience failure at the highest level of the mutate retry (before we start streaming) If this line fails, it won't retry and we aren't going to record any retryable rows. python-bigtable/google/cloud/bigtable/table.py Line 1081 in 1fbadf9
I think, from the stacktrace, that this is where it is failing (things don't quite line up with current versions) |
I have made a fix locally and am attempting to reproduce this by mutating 1000 rows on loop. If you have thoughts on how to trigger this failure do let me know, if not I can just let this go for a while :) import datetime
ROW_COUNT = 1000
# Create rows
print(f"Seeding Rows. Count:{ROW_COUNT} T:{datetime.datetime.utcnow()}")
for i in range(ROW_COUNT):
row_key = f"row_key_{i}".encode()
row1 = self._table.row(row_key)
row1.set_cell(COLUMN_FAMILY_ID1, COL_NAME1, CELL_VAL1)
row1.commit()
loop_count = 0
while(True):
loop_count += 1
print(f"Mutate Loop {loop_count} T:{datetime.datetime.utcnow()}")
cell_val = CELL_VAL1 + f"{loop_count}".encode()
# Change the contents
rows = []
for i in range(ROW_COUNT):
row_key = f"row_key_{i}".encode()
row = self._table.row(row_key)
row.set_cell(COLUMN_FAMILY_ID1, COL_NAME1, cell_val)
rows.append(row)
statuses = self._table.mutate_rows(rows)
result = [status.code for status in statuses]
expected_result = [0] * ROW_COUNT
self.assertEqual(result, expected_result) |
Environment details
SDK Version: Apache Beam Python 3.7 SDK 2.22.0
Steps to reproduce
Stack trace
As @igorbernstein2 mentioned, MutateRows & ReadRows should ignore service_config and hardcode the retriable errors in the manual layer.
b/165938205
The text was updated successfully, but these errors were encountered: