Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Customization of Agent to Collector Connection Check Frequency #1006

Open
brodmerkle opened this issue Aug 19, 2018 · 1 comment
Open

Comments

@brodmerkle
Copy link

Requirement - what kind of business use case are you trying to solve?

When rolling out Jaeger agents on our VMs (using the standalone binary wrapped in a systemd unit), in some cases we'll either be rolling out agents before a working collector exists for that environment, or the collector might at times be down for planned or unplanned outages. During such events the Jaeger agent logs 3 lines (including a stack trace) of output during each connection retry. This wouldn't be much of a concern except that the agent retries connecting to the collector once per second, and I haven't been able to identify any mechanism to customize this interval. The logs grow at a rate of ~4MB/hour or 100MB/day, so depending on the journald/syslog configurations for rotation and filesystem sizes there's some potential for space issues here.

Problem - what in Jaeger blocks you from solving the requirement?

In https://www.jaegertracing.io/docs/deployment/ the guidance for configuration is to run the help command, but the output of that command (on a 1.6.0 Jaeger agent) includes no flags for customizing the connection check frequency, although it does include the closely related --discovery.conn-check-timeout for controlling timeouts on connection attempts.

The gap here is in https://github.com/jaegertracing/jaeger/blob/master/cmd/agent/app/flags.go, specifically in InitFromViper where connCheckTimeout is set off of discoveryConnCheckTimeout but no corresponding logic is present for a connCheckFrequency. The connCheckFrequency logic itself already exists in the peer list manager (https://github.com/jaegertracing/jaeger/blob/master/pkg/discovery/peerlistmgr/options.go).

Proposal - what do you suggest to solve the problem or improve the existing situation?

An enhancement to add a new --discovery.conn-check-frequency flag to the agent would look similar to the PR for adding conn-check-timeout itself - #911.

@jpkrohling
Copy link
Contributor

Sounds reasonable -- would you like to send a PR to support this?

By the way:

using the standalone binary wrapped in a systemd unit

This seems like a good candidate for a contribution as well :-) Would you have similar unit files for the collector and query as well? I think we could place them where we place the Dockerfile and distribute those unit files with the Linux binaries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants