Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide APIs to monitor pipeline #2611

Closed
suyograo opened this issue Feb 13, 2015 · 12 comments
Closed

Provide APIs to monitor pipeline #2611

suyograo opened this issue Feb 13, 2015 · 12 comments

Comments

@suyograo
Copy link
Contributor

Today, most Logstash monitoring functions are accomplished by tailing logs or outputting debug messages. Users typically send specially tagged tracer events to check the health of the system. These special events are also used to measure the latency of the pipeline. This is definitely not straightforward and it becomes hard to administer a large-scale Logstash cluster.

We plan to introduce a Logstash monitoring API endpoint, which will provide visibility into the pipeline. Some important metrics are:

Medium term, we should provide plugin level granularity. For example, it would be great to know how long (on average) an event spends on grok filters, geo ip filters etc. This would help users drill in to the expensive parts of the pipeline.

Care should be taken to make sure metrics collection do not add additional stress on the pipeline and affect the latency and throughput of the events.

@ph
Copy link
Contributor

ph commented Feb 27, 2015

This is certainly something I would like to improve, I was bite to debug an issue on a cluster and the tools aren't that great. I am pretty sure @abonuccelli-es would give us some awesome input :)

@nellicus
Copy link

nellicus commented Apr 7, 2015

@suyograo @ph sorry a bit late here
not sure if these were mentioned somewhere else however, couple of ideas:

  • have logstash to produce internal events providing metrics (e.g. regularly print every T seconds even if no actual events coming in SLC events sent/received, SLC events filtered,queue sizes etc. also some one off ones e.g Instance started/stopped, queue full, destination down).
  • having an array where each logstash instance appends its own ID (if anything like will exist?). imagine one event going through multi-logstash layer, this would help understanding through which logstash instances the event has gone through
  • for events coming via tcp/udp add a @source field where we stamp the sourceIP of the sender. a bit like 'path' when we read files

any other info at runtime via API of course great to have

@KlavsKlavsen
Copy link

I would definetely prefer to have logstash have the same kind of API that mysql, varnish and many more have.. where you connect to a management port - and get numbers out.
It's pretty cheap for logstash to just keep a memory segment for performance stats and update that, and then i can poll the counters at the interval I want to (every minute f.ex.) and input into my favourite monitoring stack (I use graphite) - to get graphs and to be able to do alerting based on a time-perspective.

An ability to have logstash simply just send performance counters to api's such as graphite etc. - would also be super cool.

@ph ph removed their assignment Apr 30, 2015
@simmel
Copy link
Contributor

simmel commented May 21, 2015

To expose those numbers via JMX would be perfect!
Then we could just use jmxtrans-agent to output it to a console, file, graphite or statsd.

@purbon
Copy link
Contributor

purbon commented May 21, 2015

+1 on improving this capabilities of logstash. Monitoring it's things to be improved a lot nowadays. Using jmx might be a nice option that will enable other java components get data out of LS naturally.

@m1k3ga
Copy link

m1k3ga commented May 21, 2015

+1 ;)

@ph
Copy link
Contributor

ph commented May 22, 2015

We need better introspection into what a filter worker is actually doing, we should be able to output which plugin and which configuration is actually running. see #3294 for a usecase that metrics and stats should help to solve.

@ph
Copy link
Contributor

ph commented May 27, 2015

Another cool feature would be able to turn on a flag and be able to know which event is currently in transit in a specific plugin, this will help people to debug problems with blocking plugins.
A real world example is when a regex in a grok filter is blocking a thread with high cpu usage. See #3302

@svenmueller
Copy link

JVM metrics would be nice

@suyograo
Copy link
Contributor Author

Implementation details are in #3908

@jakauppila
Copy link

+1

@suyograo
Copy link
Contributor Author

suyograo commented May 9, 2016

Fixed in 5.0

@suyograo suyograo closed this as completed May 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants