Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GraphQL replication plugin fires exponentially increasing number of requests in case of server errors #2048

Closed
gautambt opened this issue Apr 7, 2020 · 13 comments

Comments

@gautambt
Copy link

gautambt commented Apr 7, 2020

Issue

In scenarios where the server returns a GraphQL error, the number of requests being fired for retrying the request keeps increasing over time. This is happening because the retry is being done from both the runPush method as well as in the run method:

On an error, runPush method will schedule a retry and run method will also schedule a retry. When both the retries fire they will schedule two more retries each and so on.

The fix seems to be to remove the retries from runPush & runPull and do the retry once (in case of either method failing) the _run method. I can raise a pull request if this looks like the correct fix.

Info

  • Environment: Browser
  • Adapter: IndexDB
  • Stack: Typescript

Code

The issue can be reproduced by applying the following patch to the heros and trying to insert a hero. In the network console you will see 3 request (1 Options request, 1 POST for setHuman and 1 POST of feedForRxDBReplication). After 10 seconds you will see 6 requests and after 20 seconds you will see 12 requests and so on.

diff --git a/examples/graphql/client/index.js b/examples/graphql/client/index.js
index c3d95640..624cf44b 100644
--- a/examples/graphql/client/index.js
+++ b/examples/graphql/client/index.js
@@ -157,7 +157,8 @@ async function run() {
          * we can set the liveIntervall to a high value
          */
         liveInterval: 1000 * 60 * 10, // 10 minutes
-        deletedFlag: 'deleted'
+        deletedFlag: 'deleted',
+        retryTime: 10000,
     });
     // show replication-errors in logs
     heroesList.innerHTML = 'Subscribe to errors..';
diff --git a/examples/graphql/server/index.js b/examples/graphql/server/index.js
index e852f1c6..3559eb2b 100644
--- a/examples/graphql/server/index.js
+++ b/examples/graphql/server/index.js
@@ -99,6 +99,7 @@ export async function run() {
             return limited;
         },
         setHuman: args => {
+            throw 'error';
             log('## setHuman()');
             log(args);
             const doc = args.human;
@gautambt gautambt changed the title GraphQL replication pugin fires exponentially increasing requests in case of server errors GraphQL replication pugin fires exponentially increasing number of requests in case of server errors Apr 7, 2020
@pubkey
Copy link
Owner

pubkey commented Apr 13, 2020

I am not sure if this is a bug. The run() was designed so that it can be called as often as wanted because many users call it when the websocket fires and do not want to implement a blocking logic.
There is the _runQueueCount which ensures that the calls do not stack up.

EDIT: Ok, so the network inspect shows that we run too many requests. I will investigate.

@pubkey pubkey changed the title GraphQL replication pugin fires exponentially increasing number of requests in case of server errors GraphQL replication plugin fires exponentially increasing number of requests in case of server errors Apr 13, 2020
pubkey added a commit that referenced this issue Apr 13, 2020
@pubkey
Copy link
Owner

pubkey commented Apr 13, 2020

I played around a bit and also added a test. Everything looks fine for me.

@gautambt
Copy link
Author

From the code, it looks like _runQueueCount only prevents the parallel execution of more than two instances of the run method. But since the retries are scheduled using a timeout, it is possible they are being scheduled at different times and hence many of them execute one after the other?

@pubkey
Copy link
Owner

pubkey commented Apr 19, 2020

See the test. There I call run() many times but it actually runs only 2 times.

@gautambt
Copy link
Author

I am trying to add logs to understand what's happening better. But of the top of my head, this test does not look at the scenario when runPush (or runPull) fails. I am observing this bug only when there is a failure in push or pull

@gautambt
Copy link
Author

gautambt commented Apr 21, 2020

So I've added logs whenever the run method is scheduled (Attached patch below).

Here is what is happening:

  1. Initially, run method gets triggered, it calls _run which calls _runPush which triggers the GraphQL query which fails
  2. _runPush then schedules the run method (r1) to run after 10 seconds and returns false.
  3. since _runPush returns false, _run schedules the run method (r2) to run after 10 seconds

2 & 3 above happen nearly at the same time.

  1. After 10 seconds both the requests (r1 & r2) scheduled by 2 & 3 execute in parallel. Both of them will run because we check for _runQueueCount > 2 in the run method. Since _runQueueCount is initialized to 0 up to 3 requests can run in parallel.
  2. _runPush gets called twice in parallel and two GraphQL requests get fired (and fail). Since the requests involve network the response times are different.
  3. r1 will now schedule run method to run twice more (once from _runPush and once from _run) - r3 and r4
  4. r2 will also schedule run method to run twice more - r5 & r6

r3 & r4 will execute in parallel. r5 & r6 will execute in parallel at a different time since the network response time (and nodejs scheduler latencies) lead to them being scheduled at a different time. Each of them will schedule 2 more requests to run.

Occasionaly _runQueueCount becomes > 3 and a request gets ignored.

Log output:

Inital run invocation: Running sync scheduled at at 2020-04-21T09:02:08.540Z runQueueCount: 0
r1 scheduled: index.js:501 Retry from runPush 2020-04-21T09:02:08.564Z
r2 scheduled: index.js:240 Retry from run after push 2020-04-21T09:02:08.564Z
index.js:183 Sync complete for run tirggered at in 36ms runQueueCount: 1
r1 starts: index.js:160 Running sync scheduled at 2020-04-21T09:02:08.564Z at 2020-04-21T09:02:18.564Z runQueueCount: 0
r2 starts: index.js:160 Running sync scheduled at 2020-04-21T09:02:08.564Z at 2020-04-21T09:02:18.566Z runQueueCount: 1
r1 schedules r3: index.js:501 Retry from runPush 2020-04-21T09:02:18.581Z
r1 schedules r4: index.js:240 Retry from run after push 2020-04-21T09:02:18.581Z
r1 ends: index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:08.564Zin 25ms runQueueCount: 2
r2 schedules r5: index.js:501 Retry from runPush 2020-04-21T09:02:18.609Z
r2 schedules r6: index.js:240 Retry from run after push 2020-04-21T09:02:18.610Z
r2 ends: index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:08.564Zin 52ms runQueueCount: 1
index.js:160 Running sync scheduled at 2020-04-21T09:02:18.581Z at 2020-04-21T09:02:28.581Z runQueueCount: 0
index.js:160 Running sync scheduled at 2020-04-21T09:02:18.581Z at 2020-04-21T09:02:28.584Z runQueueCount: 1
index.js:501 Retry from runPush 2020-04-21T09:02:28.598Z
index.js:240 Retry from run after push 2020-04-21T09:02:28.598Z
index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:18.581Zin 26ms runQueueCount: 2
index.js:160 Running sync scheduled at 2020-04-21T09:02:18.609Z at 2020-04-21T09:02:28.610Z runQueueCount: 1
index.js:160 Running sync scheduled at 2020-04-21T09:02:18.610Z at 2020-04-21T09:02:28.611Z runQueueCount: 2
index.js:501 Retry from runPush 2020-04-21T09:02:28.624Z
index.js:240 Retry from run after push 2020-04-21T09:02:28.625Z
index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:18.581Zin 48ms runQueueCount: 3
index.js:501 Retry from runPush 2020-04-21T09:02:28.639Z
index.js:240 Retry from run after push 2020-04-21T09:02:28.640Z
index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:18.609Zin 41ms runQueueCount: 2
index.js:501 Retry from runPush 2020-04-21T09:02:28.662Z
index.js:240 Retry from run after push 2020-04-21T09:02:28.662Z
index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:18.610Zin 59ms runQueueCount: 1
index.js:160 Running sync scheduled at 2020-04-21T09:02:28.598Z at 2020-04-21T09:02:38.599Z runQueueCount: 0
index.js:160 Running sync scheduled at 2020-04-21T09:02:28.598Z at 2020-04-21T09:02:38.600Z runQueueCount: 1
index.js:501 Retry from runPush 2020-04-21T09:02:38.618Z
index.js:240 Retry from run after push 2020-04-21T09:02:38.619Z
index.js:160 Running sync scheduled at 2020-04-21T09:02:28.624Z at 2020-04-21T09:02:38.625Z runQueueCount: 2
index.js:155 Ignoring run request scheduled at 2020-04-21T09:02:28.625Z
index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:28.598Zin 29ms runQueueCount: 3
index.js:501 Retry from runPush 2020-04-21T09:02:38.641Z
index.js:240 Retry from run after push 2020-04-21T09:02:38.641Z
index.js:160 Running sync scheduled at 2020-04-21T09:02:28.639Z at 2020-04-21T09:02:38.643Z runQueueCount: 2
index.js:155 Ignoring run request scheduled at 2020-04-21T09:02:28.640Z
index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:28.598Zin 51ms runQueueCount: 3
index.js:160 Running sync scheduled at 2020-04-21T09:02:28.662Z at 2020-04-21T09:02:38.662Z runQueueCount: 2
index.js:155 Ignoring run request scheduled at 2020-04-21T09:02:28.662Z
index.js:501 Retry from runPush 2020-04-21T09:02:38.664Z
index.js:240 Retry from run after push 2020-04-21T09:02:38.664Z
index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:28.624Zin 49ms runQueueCount: 3
index.js:501 Retry from runPush 2020-04-21T09:02:38.685Z
index.js:240 Retry from run after push 2020-04-21T09:02:38.685Z
index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:28.639Zin 50ms runQueueCount: 2
index.js:501 Retry from runPush 2020-04-21T09:02:38.701Z
index.js:240 Retry from run after push 2020-04-21T09:02:38.701Z
index.js:183 Sync complete for run tirggered at 2020-04-21T09:02:28.662Zin 49ms runQueueCount: 1

diff --git a/src/plugins/replication-graphql/index.ts b/src/plugins/replication-graphql/index.ts
index 0006ac31..cb4eef94 100644
--- a/src/plugins/replication-graphql/index.ts
+++ b/src/plugins/replication-graphql/index.ts
@@ -137,15 +137,18 @@ export class RxGraphQLReplicationState {
     }
 
     // ensures this._run() does not run in parallel
-    async run(): Promise<void> {
+    async run(t: string = ""): Promise<void> {
         if (this.isStopped()) {
             return;
         }
 
         if (this._runQueueCount > 2) {
+            console.log("Ignoring run request scheduled at ", t)
             return this._runningPromise;
         }
 
+        const startTime = new Date();
+        console.log(`Running sync scheduled at ${t} at ${startTime.toISOString()} runQueueCount: ${this._runQueueCount}`);
         this._runQueueCount++;
         this._runningPromise = this._runningPromise.then(async () => {
             this._subjects.active.next(true);
@@ -155,6 +158,8 @@ export class RxGraphQLReplicationState {
             if (!willRetry && this._subjects.initialReplicationComplete['_value'] === false)
                 this._subjects.initialReplicationComplete.next(true);
 
+
+            console.log(`Sync complete for run tirggered at ${t}in ${new Date().valueOf() - startTime.valueOf()}ms runQueueCount: ${this._runQueueCount}`);
             this._runQueueCount--;
         });
         return this._runningPromise;
@@ -167,7 +172,9 @@ export class RxGraphQLReplicationState {
             const ok = await this.runPush();
             if (!ok) {
                 willRetry = true;
-                setTimeout(() => this.run(), this.retryTime);
+                const t = new Date().toISOString();
+                console.log("Retry from run after push", t)
+                setTimeout(() => this.run(t), this.retryTime);
             }
         }
 
@@ -175,6 +182,7 @@ export class RxGraphQLReplicationState {
             const ok = await this.runPull();
             if (!ok) {
                 willRetry = true;
+                console.log("Retry from run after pull", new Date().toISOString())
                 setTimeout(() => this.run(), this.retryTime);
             }
         }
@@ -203,6 +211,8 @@ export class RxGraphQLReplicationState {
             }
         } catch (err) {
             this._subjects.error.next(err);
+
+            console.log("Retry from runPull", new Date().toISOString())
             setTimeout(() => this.run(), this.retryTime);
             return false;
         }
@@ -308,7 +318,10 @@ export class RxGraphQLReplicationState {
             }
 
             this._subjects.error.next(err);
-            setTimeout(() => this.run(), this.retryTime);
+            const t = new Date().toISOString();
+            console.log("Retry from runPush", t)
+
+            setTimeout(() => this.run(t), this.retryTime);
             return false;
         }

@gautambt
Copy link
Author

I just realized the reason why the second request (r2) finishes at 52ms (almost exactly twice) as r1 is because in the run method we are chaining requests

this._runningPromise = this._runningPromise.then(async () => {
. Only after the current runningPromise is completed will the next request be executed. So r1 and r2 execute one after another and not in parallel.

@pubkey
Copy link
Owner

pubkey commented Apr 23, 2020

Yes the run() method ensures that _run() does not run multiple times in parallel.
Can you try adding a test that show that there is something wrong with the current implementation?

gautambt added a commit to gautambt/rxdb that referenced this issue Apr 26, 2020
@gautambt
Copy link
Author

gautambt commented Apr 26, 2020

@pubkey Have added a test case. Please take a look

@pubkey pubkey closed this as completed in 3cd84ef Apr 26, 2020
@pubkey pubkey reopened this Apr 26, 2020
@pubkey
Copy link
Owner

pubkey commented Apr 26, 2020

@gautambt thank you that helped. I add a fix which and also change the behavior on errors, please check the attached commit

@gautambt
Copy link
Author

Awesome :) This looks good to me

@zefman
Copy link
Contributor

zefman commented May 11, 2020

I am still seeing this issue on 9.0.0-beta.11

@pubkey
Copy link
Owner

pubkey commented May 16, 2020

I'm closing this because I think the original issue is fixed.
@zefman please open a new issue if you have problems with this. Also a PR with a failing test would help more.

@pubkey pubkey closed this as completed May 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants