Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

configuring the nats target to reconnect forever #16050

Merged

Conversation

rayjanoka
Copy link
Contributor

@rayjanoka rayjanoka commented Nov 11, 2022

Description

We observed bucket notification events do not resume sending after the NATS server has been offline for more than 5 minutes. A shorter outage, of only a minute or so, does not experience this issue.

In the nats.go lib the client's MaxReconnect default setting is 60 attempts and ReconnectWait is 2 seconds, so it seems the client will only attempt reconnect for a short period before giving up.

I found that setting the nats.MaxReconnects(-1) on the client resolved the issue. I tested after a 20 minute outage, and the messages resumed sending on their own without a minio restart. 馃憤馃徏

Motivation and Context

During testing we found that after an extended network outage was resolved, bucket events do not resume sending until minio is restarted.

How to test this PR?

  1. turn up event notifications for bucket PUTs to nats
  2. upload objects to the bucket and verify events are delivered to nats successfully
  3. stop the nats cluster
  4. wait 5 minutes
  5. start the nats cluster
  6. upload objects to the bucket
  7. observe that messages are not delivered to the nats server
  8. if a MINIO_NOTIFY_NATS_QUEUE_DIR is defined we can observe that messages are stuck queued in the minio folder, and are not delivered until a minio restart

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Optimization (provides speedup with no functional changes)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • [-] Fixes a regression (If yes, please add commit-id or PR # here)
  • [-] Unit tests added/updated
  • [-] Internal documentation updated
  • [-] Create a documentation update request here

@minio-trusted
Copy link
Contributor

Mint Automation

Test Result
mint-large-bucket.sh 鉁旓笍
mint-fs.sh 鉁旓笍
mint-erasure.sh 鉁旓笍
mint-dist-erasure.sh 鉁旓笍
mint-compress-encrypt-dist-erasure.sh 鉁旓笍
mint-pools.sh 鉁旓笍
Deleting image on docker hub
Deleting image locally

@rayjanoka
Copy link
Contributor Author

rayjanoka commented Nov 11, 2022

hmm, not sure why that one check failed. can we run that one again?

rebased, but it looks like it needs an approval to try again.

@harshavardhana harshavardhana merged commit 66239f3 into minio:master Nov 11, 2022
@bh4t bh4t added the bugfix label May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants