Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition when closing LDS #755

Open
ab-pm opened this issue Sep 9, 2021 · 2 comments
Open

Race condition when closing LDS #755

ab-pm opened this issue Sep 9, 2021 · 2 comments

Comments

@ab-pm
Copy link
Contributor

ab-pm commented Sep 9, 2021

Summary

In our integration test suite, we do teardown the LDS Live Source by calling schema.__pgLdsSource?.close(). This does however not work consistently, and from time to time (apparently when the server - that also runs the db in a docker container - is under higher load) it fails to stop the LDS loop and the test job (github action) is only stopped by timing out, failing the build step.

In the meantime, it does keep logging

Error during LDS loop: Pool has been closed

I believe I tracked this down to a bug in

const ldSubscription = {
close: async () => {
clearTimeout(loopTimeout);
await client.close();
},
};

  async function loop() {
    try {
      const rows = await client.getChanges(null, 500);
      
    } catch (e) {
      console.error("Error during LDS loop:", e.message);
      // Recovery time...
      loopTimeout = setTimeout(loop, sleepDuration * 10);
      return;
    }
    loopTimeout = setTimeout(loop, sleepDuration);
  }
  loop();

While the client.getChanges query is awaited, no timeout is scheduled and the clearTimeout will be ineffective, so the loop will start over even after .close() has been called.

Steps to reproduce

By setting the sleepDuration to a minimum value, the loop spends much more time in the query so a close() will hit the sleep much less likely.

Possible Solution

I can see a few approaches:

  • just set a closed flag and check this before calling setTimeout again
  • .unref() each timer? I think this might have worked in my case, but probably not a generic solution
  • check whether the client.pool is closed before calling getChanges(), and breaking out of the loop
  • check whether the client.pool is closed in the catch handler of the loop, breaking out from there
  • a specific "client closed" exception that is handled differently in the catch of the loop, breaking out from there
@benjie
Copy link
Member

benjie commented Sep 10, 2021

Bullet 1 was my first thought too; I'd go with that 👍 Do you fancy raising a PR?

ab-pm added a commit to ab-pm/graphile-engine that referenced this issue Sep 10, 2021
@ab-pm
Copy link
Contributor Author

ab-pm commented Sep 10, 2021

Sorry I don't really have the resources right now to fix the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants