Join GitHub today
NATS streaming reconnect handlers for gateway and queue-workers #17
Signed-off-by: Vincent Smith firstname.lastname@example.org
This PR adds support for the NATS streaming clients to reconnect to the NATS Streaming servers when a NATS connection is reset.
The handler for the gateway and the queue-worker both have wrappers around their NATS connections to handle an
Motivation and Context
Previously, when a NATS connection was reset, the clients would not be able to communicate with the NATS streaming server. This is due to the NATS streaming server running in in-memory mode as detailed in this issue.
How Has This Been Tested?
Using the OpenFaaS helm chart, I deployed an OpenFaaS stack on a Kubernetes cluster running Kubernetes 1.8.8. I deployed a simple echo function using faas-cli. While tailing the logs of the gateway and the queue-worker pods, I invoked the echo function with --async and watched the logs show the messages being passed successfully. I killed the NATS container and waited for it to start up again. I invoked the function with --async again and waited for the publish ack timeout to occur.
Using a modified helm chart and Docker images, I deployed an OpenFaaS stack with the changes from this PR on a the same K8s cluster. I deployed a simple echo function using faas-cli. While tailing the logs of the gateway and the queue-worker pods, I invoked the echo function with --async and watched the messages being passed successfully. I killed the NATS container and waited for it to start up again. Log message is printed by the queue worker on successful reconnection. I invoked the function again and saw the messages being successfully passed.
I also ran the function in a loop, invoking the call every 1 second. When killing the NATS container, 1 publish ack timeout occurs and the subsequent messages are published successfully.
Types of changes
referenced this pull request
Mar 19, 2018
Thank you for helping us out with this change - and especially for testing it with a failing/passing scenario. That's exactly what we look for from new contributions.
Unfortunately I'm having a hard time reviewing the code it because seems like there has been a fair amount of refactoring and change of style.
I'd rather you applied the minimum changeset to make this work and then raised a separate PR later for your refactoring ideas if that still makes sense.
We also need more detail on the commit message for the history of the codebase. Right now the commit message is blank with just a subject given.
I find this useful as a reference point - https://chris.beams.io/posts/git-commit
First of all, thank you for the link to the git-commit message post, it was certainly enlightening and I'll definitly be more descriptive in the future!
In a broad response to the comments on
I hadn't considered the nightmare of a diff that it was going to create so I apologize about that. I can try to put the
Hey @vosmith perhaps a practical way to move it forward would be to take a backup of the branch where it is now - then start over from master/head and apply the minimum set of changes to make the fix and force push that back up.
After we've merged we can look at the best way to refactor the code.
How does that sound? Do you have an ETA on when this could be ready?
General comment (will comment specifically on the PR): If you use a NATS Streaming server with memory store, it is true that if the server is restarted, since no state is being restored, the previously "connected" clients will stop receiving messages. Publishers would fail too since the server would reject published messages for unknown client IDs.
Note: If the NATS Streaming server connects to a non-embedded NATS Server, then if the NATS Server itself is restarted, that is fine, the client library's use of the underlying NATS connection will reconnect and everything would work fine (some timeout may occur for the operations that were inflight when the NATS server was restarted). This is because the Streaming server would still be running and its state maintained, so the communication can continue.