Skip to content

Retrying to start the Informer on AKS never succeeds and ends up in immediate ECONNRESET #589

@ivanstanev

Description

@ivanstanev

Hey 👋

We've been users of the client library in our product for a while now. Recently we noticed that connections to AKS suddenly get interrupted (roughly 5 minutes after start) and we stop getting notified of new workloads in the cluster. We noticed it is because we didn't have an error handler (as defined in the example: https://github.com/kubernetes-client/javascript/blob/master/examples/typescript/informer/informer.ts#L16-L22) and that AKS has a Load Balancer for the K8s API server that interrupts long-running connections after 5 minutes by default.

So we added the setTimeout() + informer.start() to try and fix this.

However, we find that this does not help and the informer ends up in an infinite loop where the API server immediately returns ECONNRESET, the informer tries to re-start after 5 seconds (due to setTimeout()), and our app never recovers - stuck in receiving ECONNRESET and retrying infinitely. Killing our Pod and starting from scratch fixes this - until the API server stops the connection and again ending in a loop of ECONNRESET and trying to start the informer.

We are using version 0.13.2 of the library.

I noticed this recent PR #576 fixes a connection leak and ensures abort() is called on the connection. Do you think this is related and it would help once it lands in a new release?

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions