Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improved logging in nats to nats connections #622

Closed
1 of 2 tasks
ripienaar opened this issue Feb 19, 2018 · 7 comments
Closed
1 of 2 tasks

improved logging in nats to nats connections #622

ripienaar opened this issue Feb 19, 2018 · 7 comments

Comments

@ripienaar
Copy link
Contributor

  • Defect
  • Feature Request or Change Proposal

Feature Requests

Use Case:

improved operability of clusters

Proposed Change:

We need better logging in clusters, I found for example slow consumer messages in my logs, I believe this is in the cluster node < -> cluster node connections...but its hard to say.

Elevating these lines to info would help:

https://github.com/nats-io/gnatsd/blob/ee7b97e6ee3068900d39f1fe4ae7b75f358416ab/server/route.go#L142

https://github.com/nats-io/gnatsd/blob/ee7b97e6ee3068900d39f1fe4ae7b75f358416ab/server/route.go#L737

Elevating this to error would help:

https://github.com/nats-io/gnatsd/blob/ee7b97e6ee3068900d39f1fe4ae7b75f358416ab/server/route.go#L168

Additional changes would be to better log when the cluster connections drop on the side that iniaited the request initially so we know this happens

Who Benefits From The Change(s)?

cluster operators

Alternative Approaches

n/a

@derekcollison
Copy link
Member

I think being able to dynamically turn on tracing and debug modes without restart might be a better direction here.

@ripienaar
Copy link
Contributor Author

Debugging would instantly overwhelm my /var partition :)

It might be useful but basic error logging and info for major events will go very far

@ripienaar
Copy link
Contributor Author

ripienaar commented Feb 20, 2018

Just to expand on that, I generally never want to see debug output in production setting with very large numbers of nodes connected sending many things and clients coming/going this would create a storm of noise and you'll miss whats going on. Further its not retrospective, you cannot run in debug all the time.

These lines I highlighted though, I always want to see them, they are critical for the operability of the software imo. I want to go back and review logs after an incident and know this happened, it should be safe to always expose this data.

Logging appropriately is the correct action here - this is not debug information.

Not to say being able to enable debug/trace dynamically would not be good - but it would not solve this problem.

@kozlovic
Copy link
Member

In general, I kind of agree that these could be elevated, but I have a concern about the one trying to establish the route. If you have a static route but the remote server is not running, this notice/error would then be printed every 2 seconds. Now with config reload you should be able to remove it though.

I agree that the dynamic nature of enabling logging may not help once the event you are interested in has already happened.

@ripienaar
Copy link
Contributor Author

@kozlovic good point about the remote server messages, and this will be made worse by the business of announced cluster members never expiring if the node is down - in its self probably something worth knowing.

In general I think its normal and expected that those messages would appear though, admins are used to that kind of thing

@ripienaar
Copy link
Contributor Author

ripienaar commented Mar 29, 2018

@derekcollison
Copy link
Member

I believe we have addressed this in #692

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants