Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retryable writes spec does not mention the time limits within which driver should retry a write #842

Open
sar-gup opened this issue Aug 5, 2020 · 7 comments

Comments

@sar-gup
Copy link

sar-gup commented Aug 5, 2020

As per the driver specification for retryable writes (present in retryable-writes.rst), there is no mention of the time limit within which the mongo driver should issue a retry write (if applicable). The spec only mentions limitation on the number of times a write can be retried (one time).

Does this mean that it is legal for a driver to issue a retry write command after waiting for random amount of time?

@p-mongo
Copy link
Contributor

p-mongo commented Aug 5, 2020

Read and write retries generally do not have a forced delay that you seem to be alluding to. The retries are performed as soon as they can be performed.

For example, if you are reading from a secondary, and one secondary goes down but another one is available, the driver would immediately retry the read on the available secondary.

There are various conditions that may result in an application-perceived delay:

  • There isn't a suitable server to which to send the query. For example, if performing primary reads or writes and the primary becomes unavailable, the driver would wait until a primary is available again (up to server selection timeout) before retrying.
  • There may not be (enough or any) existing connections established to the server selected for the retry. If so the retry needs to wait until there is an established (+ authenticated, if appropriate) connection. If the server selected is at connection pool capacity, retry may need to wait for some other operation to complete.

But, in these situations there isn't a forced delay added by the driver.

@divjotarora
Copy link
Contributor

@p-mongo's response covers the current state of drivers very well. To add onto it, there is a drivers specification in progress regarding client-side timeouts. Part of this specification will include a forced backoff period during retries to avoid spamming the server with consecutive requests.

@sar-gup
Copy link
Author

sar-gup commented Aug 6, 2020

Thanks for the responses @p-mongo and @divjotarora.

My concern is more around cases when the client side cannot ensure that the retry request is sent as soon as a write errors out. Here's an example scenario:

T = 0sec
Retry writes enabled. Initial write request sent. TransactionId = Tx

T = 30sec
TCP response not received. Client throws a timeout exception.
At this point, the client will try to do a retry the write.

Assume the client application freezes for 20 mins.

T= 10000 secs
Client retries the write request. TransactionId = Tx.

This request might end up processing the write again at the backend, thus violating the write once semantics.
To be safe around such scenarios, shouldn't the drivers follow some time duration restrictions?

@p-mongo
Copy link
Contributor

p-mongo commented Aug 6, 2020

"Write once" means the write isn't performed multiple times. Specifically that refers to the scenario when the first write succeeded and the server responded with the success, but the response got lost due to e.g. a network problem and subsequently the client retried the write. In this case (because the same transaction number is used) the server would know that the write was already performed and won't perform it again.

With regard to your scenario, different programming languages provide various facilities to impose time limits on operations. Both Go and Ruby to my knowledge provide general-purpose timeouts, for example. As Divjot mentioned work is also in progress to provide a similar operation-level timeout functionality for driver operations.

@sar-gup
Copy link
Author

sar-gup commented Aug 7, 2020

@p-mongo, referring to the same scenario that you mentioned where the request was processed by the server but response couldn't reach the client.

"Write once" means the write isn't performed multiple times. Specifically that refers to the scenario when the first write succeeded and the server responded with the success, but the response got lost due to e.g. a network problem and subsequently the client retried the write. In this case (because the same transaction number is used) the server would know that the write was already performed and won't perform it again.

After the client retries, will the server be able to tell that the write was already performed even if the retried request reaches the server after an interval of 1 hour?

@p-mongo
Copy link
Contributor

p-mongo commented Aug 7, 2020

This is my impression but for practical reasons we don't have spec test coverage for this specific scenario.

You could hand craft a write command and send it via the generic command helper and then rerun the command an hour later with the same txnum to see what would happen.

@jmikola
Copy link
Member

jmikola commented Aug 7, 2020

You could hand craft a write command and send it via the generic command helper and then rerun the command an hour later with the same txnum to see what would happen.

FYI, the session for that retryable write (i.e. lsid) would still need to be active on the server. Otherwise, the retry attempt would be indistinguishable from a new write. If the write has previously been committed, a successful retry (which would be a no-op with the server returning the original write result) might also depend on the oplog retention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants