fix: Remove custom readrows retry logic and rely on gax for retries #1422

danieljbruce · 2024-05-28T17:49:04Z

Summary:

For readrows calls, this PR removes the custom retry logic in createreadstream and sets this function up so that it relies on google-gax for retries instead

Changes:

In src/index.ts, the data client is told to opt into the new streaming retry behaviour and retryRequestOptions are not passed along in every streaming call.

In src/table.ts, the custom retry behaviour is removed for readrows calls and the retry logic controlling resumption and whether a retry should happen or not is moved to ReadRowsResumptionStrategy.ts. Changes are made to the other streaming call types like mutateRows to work with new changes in src/table.ts.

In src/utils/read-rows-resumption.ts, code previously from table.ts is contained that establishes retry behaviour. It encapsulates the retry behaviour in a more modular format.

In system-test/read-rows.ts, the tests a rewritten to still test the same retry behaviour but use the mock server to do so.

In system-test/testTypes.ts, interfaces are added to allow for stronger type enforcement in the test files.

In src/utils/retry-options.ts there are some constants that will be used for all streaming retry calls.

In system-test/data/read-rows-retry-test.json we add tests that used to previously live in test/table.ts.

In test/table.ts we move some of the tests over to system-test/data/read-rows-retry-test.json and add new tests that ensures the right data reaches the gapic layer.

In test/util/gapic-layer-tester.ts we define a new class that is useful for testing data that reaches read-rows.ts.

In test/utils/read-rows-resumption.ts we unit test the ReadRowsResumptionStrategy object

Future work:

This PR doesn't intend to remove custom retry logic or change retry behaviour for any of the streaming calls other than readrows. In future PRs, changes will be made so that mutateRows and sampleRowKeys calls rely on google-gax for retries.

This reverts commit c2f4dfe.

….com/danieljbruce/nodejs-bigtable into actually-refactor-createreadstream-3

This reverts commit 4817863.

- Transform the rowsLimit parameter into an integer - Change the hook into a before hook so that we don’t attempt to create multiple mock servers - Create a guard so that the stream only writes if there are row keys to write

define new interfaces too

…into move-retries-from-createreadstream # Conflicts: # src/table.ts # src/utils/table.ts

This reverts commit 5edaf82.

readRowsReqOpts should have an ECMAscript prefix to completely hide it from the user. Also remove a useless Filter.parse.

leahecole

LGTM when @daniel-sanche is good with it!

Also update the keys and ranges every time.

daniel-sanche

Sorry, a couple new thoughts, now that I understand the implementation a bit more.

My main concern at this point is how these retry options are exposed to the end-user, and how we test different user inputs

daniel-sanche · 2024-06-17T23:42:46Z

src/utils/read-rows-resumption.ts

+ *   {tableName: 'projects/my-project/instances/my-instance/tables/my-table'}
+ * )
+ * gaxOpts.retry = strategy.toRetryOptions(gaxOpts);
+ * ```


I assume this is meant to be internal, right? (How is that usually communicated in node?)

Either way, I don't think this example makes a lot of sense. Is options meant to be the same variable as gaxOpts? How is it usually created in the first place? It's not really clear to me how those are meant to interact

This is internal so I just removed the example.

daniel-sanche · 2024-06-17T23:45:21Z

src/utils/read-rows-resumption.ts

+      : arrify(RETRYABLE_STATUS_CODES);
+    if (
+      error.code &&
+      (retryCodesUsed.includes(error.code) || isRstStreamError(error))


Is it intentional that there is no way to disable the check for isRstStreamError? (All the other retryableErrors can be overridden)

I am not sure what to do about this. Currently isRstStreamError checks if the error code is 13 and the message has RST stream in here because that is what the client library did before. If the user provides 13 in a custom set of codes then should it always retry? If they don't provide 13 in a custom set of codes then should it never retry on code 13 or should it only retry on RST stream errors? If we allow the user to override canResume then they will have control over this behaviour.

Longer term, I think this retry logic should be generated code that is the same across client library languages at some point.

Yeah, I'm not really sure what a good solution is either, just wanted to make sure you were aware of the situation, and maybe discussed it with stakeholders.

It still seems strange to me to retry based on the content of the error, instead of just the error code itself

It is a bit specific. Please look at the YAQs question I sent you.

daniel-sanche · 2024-06-17T23:49:54Z

src/utils/read-rows-resumption.ts

+      [],
+      gaxOpts?.retry?.backoffSettings || DEFAULT_BACKOFF_SETTINGS,
+      gaxOpts?.retry?.shouldRetryFn || canResume,
+      gaxOpts?.retry?.getResumptionRequestFn || getResumeRequest


It seems like we allow end users to completely override some internal callbacks (canResume, getResumeRequest). Is there a good reason for this? It seems to complicate things immensely, and I can't really think of a good reason to have customers provide custom resumption logic

If we are going to expose these, are these well covered by tests? Can we be sure that customizing these doesn't break things in unexpected ways? (It looks like the custom retryable errors weren't covered by tests, and replacing internal functions seems to open up even more room for issues)

Yes, you are right. If shouldRetryFn is provided then it will override canResume and if getResumptionRequestFn is provided then it will override getResumeRequest. The pros of this is that it gives the user complete control of how they want retries to work. The cons are the cons that you mentioned. should use a different set of retry codes is now a test case that covers users providing custom retry codes and overriding canResume and getResumeRequest aren't explicitly tested. Let's decide if the overrides are features we will support. Do you think we should not allow the user to override canResume and getResumeRequest? I wonder what @leahecole thinks.

Personally, I would recommend making them non-customizable (i.e. raising an exception if the user sets these themselves). It would be easy enough to make them customizable in the future if there's demand, but once it's part of the public surface, we have to maintain it going forward. And I can imagine a bunch of weird edge cases coming up, that I'm not confident are covered by tests

But I'll leave that up to you and the product team

I think this is a good point. We can always add customizability later.

+1, customizability can always happen later. I think that client library teams should be able to customize them (which is what we are doing with this PR) but giving the user too much control here could lead to a lot of issues and definitely could be hard to maintain. Good point, @daniel-sanche

This reverts commit dd42118.

…-logic-working

daniel-sanche

Overall LGTM

I still have a couple concerns about allowing custom callbacks for stream resumption logic you might want to re-consider. But I think that's mostly out of scope for my review

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

danieljbruce added 30 commits June 30, 2022 15:03

getRanges

a9b8d1b

Slight refactor of createReadStream

0f1b9eb

Add header to table utils

c75b4ed

Refactor range and keys getting

fcbd461

Pull request opts into a separate function

c2f4dfe

Revert "Pull request opts into a separate function"

4817863

This reverts commit c2f4dfe.

logical separation of ranges and keys

810db82

Merge branch 'actually-refactor-createreadstream-2' of https://github…

648cc92

….com/danieljbruce/nodejs-bigtable into actually-refactor-createreadstream-3

Revert "Revert "Pull request opts into a separate function""

b25b2c2

This reverts commit 4817863.

set up the test frame

1b1947a

Test is set up to evaluate streaming behavior

59b2c26

Modify test with the mock server to pass first tst

baa5d8f

Fix tests that are appending startKey and endKey

869d5ce

Getting all the tests working

5e69834

- Transform the rowsLimit parameter into an integer - Change the hook into a before hook so that we don’t attempt to create multiple mock servers - Create a guard so that the stream only writes if there are row keys to write

eliminate old createreadstream test

bf29061

define new interfaces too

Remove only. rowsLimit should be optional

ad36097

Make rowKeysRead type more specific

2c57e61

Add after hook to shut down server

95b9463

Merge branch 'main' of https://github.com/googleapis/nodejs-bigtable …

1c66b3b

…into move-retries-from-createreadstream # Conflicts: # src/table.ts # src/utils/table.ts

Add the less than or equal to fn to utils

6324ad9

remove server start

8f51ddf

Add splice ranges back to diagnose problem

5edaf82

Revert "Add splice ranges back to diagnose problem"

575ae2b

This reverts commit 5edaf82.

get all tests passing

0f80c35

readRowsReqOpts should have an ECMAscript prefix to completely hide it from the user. Also remove a useless Filter.parse.

Use tableUtils lessthanorequalto

b46f6b1

refactor: Move retries finish making createreadstream smaller

577ba7a

More specific type for options

03fb022

Import type and replace with any

752bf0b

Turn on gax streaming retries

4989fa4

Try out the shouldRetryFn

1f66d43

leahecole approved these changes Jun 11, 2024

View reviewed changes

danieljbruce added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 11, 2024

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 11, 2024

danieljbruce added 11 commits June 17, 2024 13:29

Allow user to override codes

e01b96f

Also update the keys and ranges every time.

Add another test for overriding the retry codes

642433c

Make the test a little bit more robust

77ce9b6

Update the test framework to measure shouldRetry

377185b

Add another test ensuring a retry doesn’t happen

4d288e8

rename the test

3df79ce

Make canResume always run unless shouldRetryFn pr

3b3196a

Gapic layer should expect empty retry codes

984fc8f

Remove copied file

f0bdc90

Change the resumption strategy

17aaae5

Replace 13 with enum

572b323

daniel-sanche requested changes Jun 18, 2024

View reviewed changes

danieljbruce and others added 4 commits June 18, 2024 10:31

Removed example

149afaa

npm run fix

dd42118

Revert "npm run fix"

93b7fdb

This reverts commit dd42118.

Merge branch 'main' into move-retries-createreadstream-get-resumption…

79ff3d9

…-logic-working

daniel-sanche approved these changes Jun 18, 2024

View reviewed changes

danieljbruce added 2 commits June 19, 2024 09:44

Remove overrides

e22af7b

Remove irrelevant test

66e4b5a

danieljbruce added the owlbot:run Add this label to trigger the Owlbot post processor. label Jun 24, 2024

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jun 24, 2024

🦉 Updates from OwlBot post-processor

1398dca

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

danieljbruce merged commit 3e0a46e into googleapis:main Jun 24, 2024
15 of 19 checks passed

release-please bot mentioned this pull request Jun 24, 2024

chore(main): release 5.1.1 #1431

Closed

danieljbruce mentioned this pull request Jul 3, 2024

Revert "fix: Remove custom readrows retry logic and rely on gax for retries" #1434

Merged

release-please bot mentioned this pull request Jul 3, 2024

chore(main): release 5.1.1 #1435

Closed

release-please bot mentioned this pull request Jul 11, 2024

chore(main): release 5.1.1 #1446

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Remove custom readrows retry logic and rely on gax for retries #1422

fix: Remove custom readrows retry logic and rely on gax for retries #1422

danieljbruce commented May 28, 2024 •

edited

Loading

leahecole left a comment

daniel-sanche left a comment

daniel-sanche Jun 17, 2024 •

edited

Loading

danieljbruce Jun 18, 2024

daniel-sanche Jun 17, 2024

danieljbruce Jun 18, 2024

daniel-sanche Jun 18, 2024

danieljbruce Jun 19, 2024

daniel-sanche Jun 17, 2024 •

edited

Loading

danieljbruce Jun 18, 2024 •

edited

Loading

daniel-sanche Jun 18, 2024

danieljbruce Jun 19, 2024

leahecole Jun 19, 2024

daniel-sanche left a comment

fix: Remove custom readrows retry logic and rely on gax for retries #1422

fix: Remove custom readrows retry logic and rely on gax for retries #1422

Conversation

danieljbruce commented May 28, 2024 • edited Loading

leahecole left a comment

Choose a reason for hiding this comment

daniel-sanche left a comment

Choose a reason for hiding this comment

daniel-sanche Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniel-sanche Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

danieljbruce Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniel-sanche left a comment

Choose a reason for hiding this comment

danieljbruce commented May 28, 2024 •

edited

Loading

daniel-sanche Jun 17, 2024 •

edited

Loading

daniel-sanche Jun 17, 2024 •

edited

Loading

danieljbruce Jun 18, 2024 •

edited

Loading