From 86d961fee4c5e92fbcff76a62abe1aea3fafd451 Mon Sep 17 00:00:00 2001
From: Dmitry Rybakov <dmitry.rybakov@mongodb.com>
Date: Fri, 25 Aug 2023 21:26:17 +0300
Subject: [PATCH] DRIVERS-1571 Retry on different mongos when possible (#1450)

Co-authored-by: Alex Bevilacqua <alex@alexbevi.com>
Co-authored-by: Preston Vasquez <prestonvasquez@icloud.com>
---
 source/retryable-reads/retryable-reads.rst   | 30 +++++++--
 source/retryable-reads/tests/README.rst      | 70 ++++++++++++++++++++
 source/retryable-writes/retryable-writes.rst | 24 +++++--
 source/retryable-writes/tests/README.rst     | 67 +++++++++++++++++++
 source/server-selection/server-selection.rst | 13 +++-
 5 files changed, 188 insertions(+), 16 deletions(-)

diff --git a/source/retryable-reads/retryable-reads.rst b/source/retryable-reads/retryable-reads.rst
index db9b1ddf28..be8a26a62f 100644
--- a/source/retryable-reads/retryable-reads.rst
+++ b/source/retryable-reads/retryable-reads.rst
@@ -268,10 +268,13 @@ selecting a server for a retry attempt.
 3a. Selecting the server for retry
 ''''''''''''''''''''''''''''''''''
 
-If the driver cannot select a server for a retry attempt or the newly selected
-server does not support retryable reads, retrying is not possible and drivers
-MUST raise the previous retryable error. In both cases, the caller is able to
-infer that an attempt was made.
+In a sharded cluster, the server on which the operation failed MUST be provided
+to the server selection mechanism as a deprioritized server.
+
+If the driver cannot select a server for
+a retry attempt or the newly selected server does not support retryable reads,
+retrying is not possible and drivers MUST raise the previous retryable error.
+In both cases, the caller is able to infer that an attempt was made.
 
 3b. Sending an equivalent command for a retry attempt
 '''''''''''''''''''''''''''''''''''''''''''''''''''''''
@@ -357,9 +360,17 @@ and reflects the flow described above.
    */
   function executeRetryableRead(command, session) {
     Exception previousError = null;
+    Server previousServer = null;
     while true {
       try {
-        server = selectServer();
+        if (previousServer == null) {
+          server = selectServer();
+        } else {
+          // If a previous attempt was made, deprioritize the previous server
+          // where the command failed.
+          deprioritizedServers = [ previousServer ];
+          server = selectServer(deprioritizedServers);
+        }
       } catch (ServerSelectionException exception) {
         if (previousError == null) {
           // If this is the first attempt, propagate the exception.
@@ -416,9 +427,11 @@ and reflects the flow described above.
       } catch (NetworkException networkError) {
         updateTopologyDescriptionForNetworkError(server, networkError);
         previousError = networkError;
+        previousServer = server;
       } catch (NotWritablePrimaryException notPrimaryError) {
         updateTopologyDescriptionForNotWritablePrimaryError(server, notPrimaryError);
         previousError = notPrimaryError;
+        previousServer = server;
       } catch (DriverException error) {
         if ( previousError != null ) {
           throw previousError;
@@ -614,8 +627,8 @@ The spec concerns itself with retrying read operations that encounter a
 retryable error (i.e. no response due to network error or a response indicating
 that the node is no longer a primary). A retryable error may be classified as
 either a transient error (e.g. dropped connection, replica set failover) or
-persistent outage. If a transient error results in the server being marked as 
-"unknown", a subsequent retry attempt will allow the driver to rediscover the 
+persistent outage. If a transient error results in the server being marked as
+"unknown", a subsequent retry attempt will allow the driver to rediscover the
 primary within the designated server selection timeout period (30 seconds by
 default). If server selection times out during this retry attempt, we can
 reasonably assume that there is a persistent outage. In the case of a persistent
@@ -678,6 +691,9 @@ degraded performance can simply disable ``retryableReads``.
 Changelog
 =========
 
+:2023-08-??: Require that in a sharded cluster the server on which the
+             operation failed MUST be provided to the server selection
+             mechanism as a deprioritized server.
 :2023-08-21: Update Q&A that contradicts SDAM transient error logic
 :2022-11-09: CLAM must apply both events and log messages.
 :2022-10-18: When CSOT is enabled multiple retry attempts may occur.
diff --git a/source/retryable-reads/tests/README.rst b/source/retryable-reads/tests/README.rst
index 59e09e69b8..4f45feb506 100644
--- a/source/retryable-reads/tests/README.rst
+++ b/source/retryable-reads/tests/README.rst
@@ -232,10 +232,80 @@ This test requires MongoDB 4.2.9+ for ``blockConnection`` support in the failpoi
 
 9. Disable the failpoint.
 
+Retrying Reads in a Sharded Cluster
+===================================
+
+These tests will be used to ensure drivers properly retry reads on a different
+mongos.
+
+Retryable Reads Are Retried on a Different mongos if One is Available
+---------------------------------------------------------------------
+
+This test MUST be executed against a sharded cluster that has at least two
+mongos instances.
+
+1. Ensure that a test is run against a sharded cluster that has at least two
+   mongoses. If there are more than two mongoses in the cluster, pick two to
+   test against.
+
+2. Create a client per mongos using the direct connection, and configure the
+   following fail points on each mongos::
+
+     {
+         configureFailPoint: "failCommand",
+         mode: { times: 1 },
+         data: {
+             failCommands: ["find"],
+             errorCode: 6,
+             closeConnection: true
+         }
+     }
+
+3. Create a client with ``retryReads=true`` that connects to the cluster,
+   providing the two selected mongoses as seeds.
+
+4. Enable command monitoring, and execute a ``find`` command that is
+   supposed to fail on both mongoses.
+
+5. Asserts that there were failed command events from each mongos.
+
+6. Disable the fail points.
+
+
+Retryable Reads Are Retried on the Same mongos if No Others are Available
+-------------------------------------------------------------------------
+
+1. Ensure that a test is run against a sharded cluster. If there are multiple
+   mongoses in the cluster, pick one to test against.
+
+2. Create a client that connects to the mongos using the direct connection,
+   and configure the following fail point on the mongos::
+
+     {
+         configureFailPoint: "failCommand",
+         mode: { times: 1 },
+         data: {
+             failCommands: ["find"],
+             errorCode: 6,
+             closeConnection: true
+         }
+     }
+
+3. Create a client with ``retryReads=true`` that connects to the cluster,
+   providing the selected mongos as the seed.
+
+4. Enable command monitoring, and execute a ``find`` command.
+
+5. Asserts that there was a failed command and a successful command event.
+
+6. Disable the fail point.
+
 
 Changelog
 =========
 
+:2023-08-??: Add prose tests for retrying in a sharded cluster.
+
 :2022-04-22: Clarifications to ``serverless`` and ``useMultipleMongoses``.
 
 :2022-01-10: Create legacy and unified subdirectories for new unified tests
diff --git a/source/retryable-writes/retryable-writes.rst b/source/retryable-writes/retryable-writes.rst
index bd6aafdbfa..ea0ddabcbb 100644
--- a/source/retryable-writes/retryable-writes.rst
+++ b/source/retryable-writes/retryable-writes.rst
@@ -395,11 +395,14 @@ of the following conditions is reached:
   <../client-side-operations-timeout/client-side-operations-timeout.rst#retryability>`__.
 - CSOT is not enabled and one retry was attempted.
 
-For each retry attempt, drivers MUST select a writable server. If the driver
-cannot select a server for a retry attempt or the selected server does not
-support retryable writes, retrying is not possible and drivers MUST raise the
-retryable error from the previous attempt. In both cases, the caller is able
-to infer that an attempt was made.
+For each retry attempt, drivers MUST select a writable server. In a sharded
+cluster, the server on which the operation failed MUST be provided to
+the server selection mechanism as a deprioritized server.
+
+If the driver cannot select a server for a retry attempt
+or the selected server does not support retryable writes, retrying is not
+possible and drivers MUST raise the retryable error from the previous attempt.
+In both cases, the caller is able to infer that an attempt was made.
 
 If a retry attempt also fails, drivers MUST update their topology according to
 the SDAM spec (see: `Error Handling`_). If an error would not allow the caller
@@ -492,11 +495,15 @@ The above rules are implemented in the following pseudo-code:
         }
       }
 
-      /* If we cannot select a writable server, do not proceed with retrying and
+      /*
+       * We try to select server that is not the one that failed by passing the
+       * failed server as a deprioritized server.
+       * If we cannot select a writable server, do not proceed with retrying and
        * throw the previous error. The caller can then infer that an attempt was
        * made and failed. */
       try {
-        server = selectServer("writable");
+        deprioritizedServers = [ server ];
+        server = selectServer("writable", deprioritizedServers);
       } catch (Exception ignoredError) {
         throw previousError;
       }
@@ -822,6 +829,9 @@ inconsistent with the server and potentially confusing to developers.
 Changelog
 =========
 
+:2023-08-??: Require that in a sharded cluster the server on which the
+             operation failed MUST be provided to the server selection
+             mechanism as a deprioritized server.
 :2022-11-17: Add logic for persisting "currentError" as "previousError" on first
              retry attempt, avoiding raising "null" errors.
 :2022-11-09: CLAM must apply both events and log messages.
diff --git a/source/retryable-writes/tests/README.rst b/source/retryable-writes/tests/README.rst
index 5ddfb6a2d0..2f6cd75c42 100644
--- a/source/retryable-writes/tests/README.rst
+++ b/source/retryable-writes/tests/README.rst
@@ -456,9 +456,76 @@ and sharded clusters.
                 mode: "off",
         })
 
+#. Test that in a sharded cluster writes are retried on a different mongos if
+   one available
+
+   This test MUST be executed against a sharded cluster that has at least two
+   mongos instances.
+
+   1. Ensure that a test is run against a sharded cluster that has at least two
+      mongoses. If there are more than two mongoses in the cluster, pick two to
+      test against.
+
+   2. Create a client per mongos using the direct connection, and configure the
+      following fail point on each mongos::
+
+        {
+            configureFailPoint: "failCommand",
+            mode: { times: 1 },
+            data: {
+                failCommands: ["insert"],
+                errorCode: 6,
+                errorLabels: ["RetryableWriteError"],
+                closeConnection: true
+            }
+        }
+
+   3. Create a client with ``retryWrites=true`` that connects to the cluster,
+      providing the two selected mongoses as seeds.
+
+   4. Enable command monitoring, and execute a write command that is
+      supposed to fail on both mongoses.
+
+   5. Asserts that there were failed command events from each mongos.
+
+   6. Disable the fail points.
+
+#. Test that in a sharded cluster on the same mongos if no other is available
+
+   This test MUST be executed against a sharded cluster
+
+   1. Ensure that a test is run against a sharded cluster. If there are multiple
+      mongoses in the cluster, pick one to test against.
+
+   2. Create a client that connects to the mongos using the direct connection,
+      and configure the following fail point on the mongos::
+
+        {
+            configureFailPoint: "failCommand",
+            mode: { times: 1 },
+            data: {
+                failCommands: ["insert"],
+                errorCode: 6,
+                errorLabels: ["RetryableWriteError"],
+                closeConnection: true
+            }
+        }
+
+   3. Create a client with ``retryWrites=true`` that connects to the cluster,
+      providing the selected mongos as the seed.
+
+   4. Enable command monitoring, and execute a write command that is
+      supposed to fail.
+
+   5. Asserts that there was a failed command and a successful command event.
+
+   6. Disable the fail point.
+
 Changelog
 =========
 
+:2023-08-??: Add prose tests for retrying in a sharded cluster.
+
 :2022-08-30: Add prose test verifying correct error handling for errors with
              the NoWritesPerformed label, which is to return the original
              error.
diff --git a/source/server-selection/server-selection.rst b/source/server-selection/server-selection.rst
index 8a22dbbf0b..f13f15b170 100644
--- a/source/server-selection/server-selection.rst
+++ b/source/server-selection/server-selection.rst
@@ -843,7 +843,11 @@ For multi-threaded clients, the server selection algorithm is as follows:
 2. If the topology wire version is invalid, raise an error and log a
    `"Server selection failed" message`_.
 
-3. Find suitable servers by topology type and operation type
+3. Find suitable servers by topology type and operation type. If a list of
+   deprioritized servers is provided, and the topology is a sharded cluster,
+   these servers should be selected only if there are no other suitable servers.
+   The server selection algorithm MUST ignore the deprioritized servers if the
+   topology is not a sharded cluster.
 
 4. Filter the suitable servers by calling the optional, application-provided server
    selector.
@@ -915,7 +919,11 @@ as follows:
 5. If the topology wire version is invalid, raise an error and log a
    `"Server selection failed" message`_.
 
-6. Find suitable servers by topology type and operation type
+6. Find suitable servers by topology type and operation type. If a list of
+   deprioritized servers is provided, and the topology is a sharded cluster,
+   these servers should be selected only if there are no other suitable servers.
+   The server selection algorithm MUST ignore the deprioritized servers if the
+   topology is not a sharded cluster.
 
 7. Filter the suitable servers by calling the optional, application-provided
    server selector.
@@ -2070,3 +2078,4 @@ Changelog
 :2022-01-19: Require that timeouts be applied per the client-side operations timeout spec
 :2022-10-05: Remove spec front matter, move footnote, and reformat changelog.
 :2022-11-09: Add log messages and tests.
+:2023-08-??: Add list of deprioritized servers for sharded cluster topology.