GraphQL replication plugin fires exponentially increasing number of requests in case of server errors #2048

gautambt · 2020-04-07T10:45:07Z

Issue

In scenarios where the server returns a GraphQL error, the number of requests being fired for retrying the request keeps increasing over time. This is happening because the retry is being done from both the runPush method as well as in the run method:

rxdb/src/plugins/replication-graphql/index.ts

Line 168 in 1c98328

setTimeout(() => this.run(), this.retryTime);
rxdb/src/plugins/replication-graphql/index.ts

Line 302 in 1c98328

setTimeout(() => this.run(), this.retryTime);

On an error, runPush method will schedule a retry and run method will also schedule a retry. When both the retries fire they will schedule two more retries each and so on.

The fix seems to be to remove the retries from runPush & runPull and do the retry once (in case of either method failing) the _run method. I can raise a pull request if this looks like the correct fix.

Info

Environment: Browser
Adapter: IndexDB
Stack: Typescript

Code

The issue can be reproduced by applying the following patch to the heros and trying to insert a hero. In the network console you will see 3 request (1 Options request, 1 POST for setHuman and 1 POST of feedForRxDBReplication). After 10 seconds you will see 6 requests and after 20 seconds you will see 12 requests and so on.

diff --git a/examples/graphql/client/index.js b/examples/graphql/client/index.js
index c3d95640..624cf44b 100644
--- a/examples/graphql/client/index.js
+++ b/examples/graphql/client/index.js
@@ -157,7 +157,8 @@ async function run() {
          * we can set the liveIntervall to a high value
          */
         liveInterval: 1000 * 60 * 10, // 10 minutes
-        deletedFlag: 'deleted'
+        deletedFlag: 'deleted',
+        retryTime: 10000,
     });
     // show replication-errors in logs
     heroesList.innerHTML = 'Subscribe to errors..';
diff --git a/examples/graphql/server/index.js b/examples/graphql/server/index.js
index e852f1c6..3559eb2b 100644
--- a/examples/graphql/server/index.js
+++ b/examples/graphql/server/index.js
@@ -99,6 +99,7 @@ export async function run() {
             return limited;
         },
         setHuman: args => {
+            throw 'error';
             log('## setHuman()');
             log(args);
             const doc = args.human;

pubkey · 2020-04-13T13:03:33Z

I am not sure if this is a bug. The run() was designed so that it can be called as often as wanted because many users call it when the websocket fires and do not want to implement a blocking logic.
There is the _runQueueCount which ensures that the calls do not stack up.

EDIT: Ok, so the network inspect shows that we run too many requests. I will investigate.

pubkey · 2020-04-13T18:28:34Z

I played around a bit and also added a test. Everything looks fine for me.

gautambt · 2020-04-19T12:25:49Z

From the code, it looks like _runQueueCount only prevents the parallel execution of more than two instances of the run method. But since the retries are scheduled using a timeout, it is possible they are being scheduled at different times and hence many of them execute one after the other?

pubkey · 2020-04-19T12:29:42Z

See the test. There I call run() many times but it actually runs only 2 times.

gautambt · 2020-04-21T07:05:46Z

I am trying to add logs to understand what's happening better. But of the top of my head, this test does not look at the scenario when runPush (or runPull) fails. I am observing this bug only when there is a failure in push or pull

gautambt · 2020-04-21T09:24:09Z

So I've added logs whenever the run method is scheduled (Attached patch below).

Here is what is happening:

Initially, run method gets triggered, it calls _run which calls _runPush which triggers the GraphQL query which fails
_runPush then schedules the run method (r1) to run after 10 seconds and returns false.
since _runPush returns false, _run schedules the run method (r2) to run after 10 seconds

2 & 3 above happen nearly at the same time.

After 10 seconds both the requests (r1 & r2) scheduled by 2 & 3 execute in parallel. Both of them will run because we check for _runQueueCount > 2 in the run method. Since _runQueueCount is initialized to 0 up to 3 requests can run in parallel.
_runPush gets called twice in parallel and two GraphQL requests get fired (and fail). Since the requests involve network the response times are different.
r1 will now schedule run method to run twice more (once from _runPush and once from _run) - r3 and r4
r2 will also schedule run method to run twice more - r5 & r6

r3 & r4 will execute in parallel. r5 & r6 will execute in parallel at a different time since the network response time (and nodejs scheduler latencies) lead to them being scheduled at a different time. Each of them will schedule 2 more requests to run.

Occasionaly _runQueueCount becomes > 3 and a request gets ignored.

Log output:

Inital run invocation: Running sync scheduled at at 2020-04-21T09:02:08.540Z runQueueCount: 0
r1 scheduled: index.js:501 Retry from runPush 2020-04-21T09:02:08.564Z
r2 scheduled: index.js:240 Retry from run after push 2020-04-21T09:02:08.564Z
index.js:183 Sync complete for run tirggered at in 36ms runQueueCount: 1
r1 starts: index.js:160 Running sync scheduled at 2020-04-21T09:02:08.564Z at 2020-04-21T09:02:18.564Z runQueueCount: 0
r2 starts: index.js:160 Running sync scheduled at 2020-04-21T09:02:08.564Z at 2020-04-21T09:02:18.566Z runQueueCount: 1
r1 schedules r3: index.js:501 Retry from runPush 2020-04-21T09:02:18.581Z
r1 schedules r4: index.js:240 Retry from run after push 2020-04-21T09:02:18.581Z
r1 ends: index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:08.564Zin 25ms runQueueCount: 2
r2 schedules r5: index.js:501 Retry from runPush 2020-04-21T09:02:18.609Z
r2 schedules r6: index.js:240 Retry from run after push 2020-04-21T09:02:18.610Z
r2 ends: index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:08.564Zin 52ms runQueueCount: 1
index.js:160 Running sync scheduled at 2020-04-21T09:02:18.581Z at 2020-04-21T09:02:28.581Z runQueueCount: 0
index.js:160 Running sync scheduled at 2020-04-21T09:02:18.581Z at 2020-04-21T09:02:28.584Z runQueueCount: 1
index.js:501 Retry from runPush 2020-04-21T09:02:28.598Z
index.js:240 Retry from run after push 2020-04-21T09:02:28.598Z
index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:18.581Zin 26ms runQueueCount: 2
index.js:160 Running sync scheduled at 2020-04-21T09:02:18.609Z at 2020-04-21T09:02:28.610Z runQueueCount: 1
index.js:160 Running sync scheduled at 2020-04-21T09:02:18.610Z at 2020-04-21T09:02:28.611Z runQueueCount: 2
index.js:501 Retry from runPush 2020-04-21T09:02:28.624Z
index.js:240 Retry from run after push 2020-04-21T09:02:28.625Z
index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:18.581Zin 48ms runQueueCount: 3
index.js:501 Retry from runPush 2020-04-21T09:02:28.639Z
index.js:240 Retry from run after push 2020-04-21T09:02:28.640Z
index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:18.609Zin 41ms runQueueCount: 2
index.js:501 Retry from runPush 2020-04-21T09:02:28.662Z
index.js:240 Retry from run after push 2020-04-21T09:02:28.662Z
index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:18.610Zin 59ms runQueueCount: 1
index.js:160 Running sync scheduled at 2020-04-21T09:02:28.598Z at 2020-04-21T09:02:38.599Z runQueueCount: 0
index.js:160 Running sync scheduled at 2020-04-21T09:02:28.598Z at 2020-04-21T09:02:38.600Z runQueueCount: 1
index.js:501 Retry from runPush 2020-04-21T09:02:38.618Z
index.js:240 Retry from run after push 2020-04-21T09:02:38.619Z
index.js:160 Running sync scheduled at 2020-04-21T09:02:28.624Z at 2020-04-21T09:02:38.625Z runQueueCount: 2
index.js:155 Ignoring run request scheduled at 2020-04-21T09:02:28.625Z
index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:28.598Zin 29ms runQueueCount: 3
index.js:501 Retry from runPush 2020-04-21T09:02:38.641Z
index.js:240 Retry from run after push 2020-04-21T09:02:38.641Z
index.js:160 Running sync scheduled at 2020-04-21T09:02:28.639Z at 2020-04-21T09:02:38.643Z runQueueCount: 2
index.js:155 Ignoring run request scheduled at 2020-04-21T09:02:28.640Z
index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:28.598Zin 51ms runQueueCount: 3
index.js:160 Running sync scheduled at 2020-04-21T09:02:28.662Z at 2020-04-21T09:02:38.662Z runQueueCount: 2
index.js:155 Ignoring run request scheduled at 2020-04-21T09:02:28.662Z
index.js:501 Retry from runPush 2020-04-21T09:02:38.664Z
index.js:240 Retry from run after push 2020-04-21T09:02:38.664Z
index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:28.624Zin 49ms runQueueCount: 3
index.js:501 Retry from runPush 2020-04-21T09:02:38.685Z
index.js:240 Retry from run after push 2020-04-21T09:02:38.685Z
index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:28.639Zin 50ms runQueueCount: 2
index.js:501 Retry from runPush 2020-04-21T09:02:38.701Z
index.js:240 Retry from run after push 2020-04-21T09:02:38.701Z
index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:28.662Zin 49ms runQueueCount: 1

diff --git a/src/plugins/replication-graphql/index.ts b/src/plugins/replication-graphql/index.ts
index 0006ac31..cb4eef94 100644
--- a/src/plugins/replication-graphql/index.ts
+++ b/src/plugins/replication-graphql/index.ts
@@ -137,15 +137,18 @@ export class RxGraphQLReplicationState {
     }
 
     // ensures this._run() does not run in parallel
-    async run(): Promise<void> {
+    async run(t: string = ""): Promise<void> {
         if (this.isStopped()) {
             return;
         }
 
         if (this._runQueueCount > 2) {
+            console.log("Ignoring run request scheduled at ", t)
             return this._runningPromise;
         }
 
+        const startTime = new Date();
+        console.log(`Running sync scheduled at ${t} at ${startTime.toISOString()} runQueueCount: ${this._runQueueCount}`);
         this._runQueueCount++;
         this._runningPromise = this._runningPromise.then(async () => {
             this._subjects.active.next(true);
@@ -155,6 +158,8 @@ export class RxGraphQLReplicationState {
             if (!willRetry && this._subjects.initialReplicationComplete['_value'] === false)
                 this._subjects.initialReplicationComplete.next(true);
 
+
+            console.log(`Sync complete for run tirggered at ${t}in ${new Date().valueOf() - startTime.valueOf()}ms runQueueCount: ${this._runQueueCount}`);
             this._runQueueCount--;
         });
         return this._runningPromise;
@@ -167,7 +172,9 @@ export class RxGraphQLReplicationState {
             const ok = await this.runPush();
             if (!ok) {
                 willRetry = true;
-                setTimeout(() => this.run(), this.retryTime);
+                const t = new Date().toISOString();
+                console.log("Retry from run after push", t)
+                setTimeout(() => this.run(t), this.retryTime);
             }
         }
 
@@ -175,6 +182,7 @@ export class RxGraphQLReplicationState {
             const ok = await this.runPull();
             if (!ok) {
                 willRetry = true;
+                console.log("Retry from run after pull", new Date().toISOString())
                 setTimeout(() => this.run(), this.retryTime);
             }
         }
@@ -203,6 +211,8 @@ export class RxGraphQLReplicationState {
             }
         } catch (err) {
             this._subjects.error.next(err);
+
+            console.log("Retry from runPull", new Date().toISOString())
             setTimeout(() => this.run(), this.retryTime);
             return false;
         }
@@ -308,7 +318,10 @@ export class RxGraphQLReplicationState {
             }
 
             this._subjects.error.next(err);
-            setTimeout(() => this.run(), this.retryTime);
+            const t = new Date().toISOString();
+            console.log("Retry from runPush", t)
+
+            setTimeout(() => this.run(t), this.retryTime);
             return false;
         }

gautambt · 2020-04-21T11:29:07Z

I just realized the reason why the second request (r2) finishes at 52ms (almost exactly twice) as r1 is because in the run method we are chaining requests

rxdb/src/plugins/replication-graphql/index.ts

Line 158 in 53e0fd5

this._runningPromise = this._runningPromise.then(async () => {

. Only after the current runningPromise is completed will the next request be executed. So r1 and r2 execute one after another and not in parallel.

pubkey · 2020-04-23T17:21:25Z

Yes the run() method ensures that _run() does not run multiple times in parallel.
Can you try adding a test that show that there is something wrong with the current implementation?

gautambt · 2020-04-26T10:07:09Z

@pubkey Have added a test case. Please take a look

pubkey · 2020-04-26T19:59:24Z

@gautambt thank you that helped. I add a fix which and also change the behavior on errors, please check the attached commit

gautambt · 2020-04-29T04:55:30Z

Awesome :) This looks good to me

zefman · 2020-05-11T17:15:16Z

I am still seeing this issue on 9.0.0-beta.11

pubkey · 2020-05-16T18:18:31Z

I'm closing this because I think the original issue is fixed.
@zefman please open a new issue if you have problems with this. Also a PR with a failing test would help more.

gautambt changed the title ~~GraphQL replication pugin fires exponentially increasing requests in case of server errors~~ GraphQL replication pugin fires exponentially increasing number of requests in case of server errors Apr 7, 2020

gautambt mentioned this issue Apr 8, 2020

RFC: Guide to offline-first and realtime sync with Hasura hasura/graphql-engine#4190

Open

pubkey changed the title ~~GraphQL replication pugin fires exponentially increasing number of requests in case of server errors~~ GraphQL replication plugin fires exponentially increasing number of requests in case of server errors Apr 13, 2020

pubkey added a commit that referenced this issue Apr 13, 2020

ADD test for #2048

791041c

pubkey mentioned this issue Apr 19, 2020

GraphQL Replication: Refactor retry logic #2086

Closed

gautambt added a commit to gautambt/rxdb that referenced this issue Apr 26, 2020

Failing test case for pubkey#2048

08d370b

pubkey closed this as completed in 3cd84ef Apr 26, 2020

pubkey reopened this Apr 26, 2020

pubkey closed this as completed May 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GraphQL replication plugin fires exponentially increasing number of requests in case of server errors #2048

GraphQL replication plugin fires exponentially increasing number of requests in case of server errors #2048

gautambt commented Apr 7, 2020

pubkey commented Apr 13, 2020 •

edited

pubkey commented Apr 13, 2020

gautambt commented Apr 19, 2020

pubkey commented Apr 19, 2020

gautambt commented Apr 21, 2020

gautambt commented Apr 21, 2020 •

edited

gautambt commented Apr 21, 2020

pubkey commented Apr 23, 2020

gautambt commented Apr 26, 2020 •

edited

pubkey commented Apr 26, 2020

gautambt commented Apr 29, 2020

zefman commented May 11, 2020

pubkey commented May 16, 2020

GraphQL replication plugin fires exponentially increasing number of requests in case of server errors #2048

GraphQL replication plugin fires exponentially increasing number of requests in case of server errors #2048

Comments

gautambt commented Apr 7, 2020

Issue

Info

Code

pubkey commented Apr 13, 2020 • edited

pubkey commented Apr 13, 2020

gautambt commented Apr 19, 2020

pubkey commented Apr 19, 2020

gautambt commented Apr 21, 2020

gautambt commented Apr 21, 2020 • edited

gautambt commented Apr 21, 2020

pubkey commented Apr 23, 2020

gautambt commented Apr 26, 2020 • edited

pubkey commented Apr 26, 2020

gautambt commented Apr 29, 2020

zefman commented May 11, 2020

pubkey commented May 16, 2020

pubkey commented Apr 13, 2020 •

edited

gautambt commented Apr 21, 2020 •

edited

gautambt commented Apr 26, 2020 •

edited