Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Supporting Linux Perf events with Opentracing #36
This issue is to discuss a proposal on adding perf events support within a span. Perf events (https://perf.wiki.kernel.org/index.php/Tutorial#Events) are platform side counters which can provide valuable insights on what is happening inside a span and can help us in identifying resource bottlenecks.
We would like to hear your thoughts and suggestions on the same.
We did a small proof-of-concept implementation with opentracing-go and zipkin-go-opentracing to show how the eventual integration may look.
Opentracing-go api will have a PerfString event as an addition to the span creation function. And, the backend Zipkin collector will be responsible for starting, reading, logging and closing the required counter.
To try out the opentracing-go with perfevents, this commit can be used :
A "perfevents" library has been added to zipkin-go-opentracing which assists in creating, maintaining and reading the perf events like cpu-cycles, instructions, cache-misses etc. An example usage in a sample application:
referenced this issue
Jan 31, 2017
This is fun stuff, thank you for bringing it up.
@hkshaw1990 I think the crux of this is finding a way to record only the performance data that's relevant when the associated Span is "active" or in the foreground. This reminds me of the https://github.com/opentracing-contrib/java-spanmanager work that @sjoerdtalsma did... the tricky thing is that a Span is not necessarily "active" just because it hasn't been finish()ed yet. Have you thought through this aspect of things yet?
Hi @bensigelman, thanks for referring to this point. I agree, that a span is not necessarily active even if it hasn't finished. But, the way the perf events are opened is to associate these counters with the (self) process. And a span is running in the context of a process. So, when a span is inactive, probably due to some I/O call, n/w call etc, then, the process is inactive too. So, the counters won't count for those inactive periods. The active/inactive period can be inferred by looking at the cpu-cycles consumed by the span over a given period of time. I hope that addresses your concerns.
Hi @bensigelman to add to the above comment, I want to add something more here. There are multiple ways on how we can start a perf event within a span.
One is to create the event for a process (as mentioned in the above comment), wherein, the event will be responsible to count only for the process that is being monitored. Whenever, the process schedules out of the cpu, the event stops counting.
There are other useful perf events which can be used to monitor the system, and is not necessarily attached to a single process. These kind of events need root access to be created. What this implies is they will need a different way to be created than the above approach. We will need to look into this to find out a way to monitor these events too.
With that said, the provided commits and git branches in comment no.1 implement the first approach of opening events for the self process (which works as it is, we don't need any special access permissions).
Also, note that there is a certain overhead in creating and reading these events. One simple example that we tried over multiple runs with a simple span with and without monitoring a perf event :
@hkshaw1990 sorry for the delay, I'm too busy these days :-/
For languages that support/encourage concurrency (Go, Java, C++, arguably Node.js), process-level measurements are better than nothing but not as helpful as stats for the current thread / goroutine / etc.
As for actually implementing this: it seems as though you were imagining this would be turned on or off on a per-Span basis. Another approach might be to enable it for the entire Tracer, perhaps by creating a Tracer adapter that wraps an existing implementation and adds these tags to all Spans, or perhaps to all Spans that take longer than
This was referenced
Feb 15, 2017
As discussed with @bensigelman (please point out if I missed something) on gitter, here is a brief summary :
The above suggestions are being planned to be taken care of in the subsequent versions. Have sent some basic PRs to support perfevents to these repos :