Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: add built-in log linear histogram support #2932

Closed
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
f0a7faa
per thread collection implementation
ramaraochavali Mar 28, 2018
1a35b0e
implemented merge logic
ramaraochavali Mar 29, 2018
5b66b74
fixed test cases
ramaraochavali Mar 30, 2018
8052ae7
formatted
ramaraochavali Mar 30, 2018
5d30b85
added support for flushHistograms and implemented for metrics sink
ramaraochavali Mar 30, 2018
a0bf03c
added runOnAllThreadsWithBarrier to TLS system
ramaraochavali Mar 31, 2018
dede1fd
addressed review comments
ramaraochavali Mar 31, 2018
e744253
handle hot restart case
ramaraochavali Mar 31, 2018
0aeac15
reacquire lock in mergeInternal
ramaraochavali Apr 1, 2018
c200974
new lock for merge process
ramaraochavali Apr 1, 2018
dd79e35
remove handle main thread
ramaraochavali Apr 1, 2018
9dcddf6
move worker_count to class level variable
ramaraochavali Apr 1, 2018
5e8169a
remove separate merge lock
ramaraochavali Apr 2, 2018
f35083c
use separate merge lock
ramaraochavali Apr 2, 2018
08cc211
renamed few methods, added used support for histograms
ramaraochavali Apr 2, 2018
a107b0f
removed exterc C usage and moved worker_count to shared ptr
ramaraochavali Apr 3, 2018
dadba03
added HistogramStatistics interface
ramaraochavali Apr 3, 2018
e41d7ef
moved histogram impl to threadlocal store along with merge_lock
ramaraochavali Apr 3, 2018
9da426e
fixed admin tsan violation
ramaraochavali Apr 3, 2018
92620e5
formatted
ramaraochavali Apr 3, 2018
6e6253a
compilation failure
ramaraochavali Apr 3, 2018
6ca87c4
calculate stats only once
ramaraochavali Apr 3, 2018
fa8f646
initialize stats
ramaraochavali Apr 3, 2018
2b5772f
corrected ref returns for other histograms
ramaraochavali Apr 3, 2018
1032ffc
merged master
ramaraochavali Apr 4, 2018
8266868
fixed compilation issues
ramaraochavali Apr 4, 2018
a041a3e
addressed review comments
ramaraochavali Apr 4, 2018
c9fef4b
added callback for mergeHistogram
ramaraochavali Apr 4, 2018
29646d0
addressed review comments and added basic test
ramaraochavali Apr 5, 2018
df348b2
added some tests
ramaraochavali Apr 5, 2018
68039f2
formatted
ramaraochavali Apr 5, 2018
1b75b93
formatted
ramaraochavali Apr 5, 2018
d5d4125
address review comments, added more tests
ramaraochavali Apr 5, 2018
16003e7
address review comments
ramaraochavali Apr 6, 2018
05d00a2
added more tests
ramaraochavali Apr 6, 2018
9189b9e
formatted
ramaraochavali Apr 6, 2018
1837956
updated the libcircl lib and added integration test
ramaraochavali Apr 7, 2018
8b1c7ab
added test case for runAllThreadsWithBarrier
ramaraochavali Apr 10, 2018
2894da8
removed unnecessary arg
ramaraochavali Apr 10, 2018
1e1f494
remove unnecessary code
ramaraochavali Apr 10, 2018
5ae99d3
addressed review comments, simiplified tests
ramaraochavali Apr 11, 2018
7d3844e
revert the time interval
ramaraochavali Apr 11, 2018
5ccad60
formatted
ramaraochavali Apr 11, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions bazel/external/libcircllhist.BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
cc_library(
name = "libcircllhist",
srcs = ["src/circllhist.c"],
hdrs = [
"src/circllhist.h",
"src/circllhist_config.h", # Generated.
],
includes = ["src"],
visibility = ["//visibility:public"],
)

genrule(
name = "circllhist_config",
srcs = ["src/circllhist_config.h.in"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we follow up on @jmillikin-stripe comment here? It does seem weird that the generated header is just being copied. Can you add a comment as to why this is OK if it is? cc @postwait

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _config.h.in is an autoconf input template. It is not safe to copy it on all platforms. There is a

#ifdef HAVE_ALLOCA_H

in the circllhist.c file that needs that that conditionally set depending on system headers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@postwait thanks. What is your suggestion here? Should I copy based on platform? or which platforms it is good to copy?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libcircllhist (like many open source projects) is auto-tools based. When you build it, it will build-time detect which features are available on your platform and act on that. I don't know how to do the same thing in cmake. I am pretty sure that HAVE_ALLOCA_H is the only important thing in that header though -- so if you can get cmake to provide that, you're set.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this, it is only alloca() that is a problem. I've opened a PR on libcircllhist to resolve this. It should be possible shortly to just ignore that header completely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@postwait thank you so much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@postwait Awesome. Thanks. I got the new version and removed the genRule and it works.

outs = ["src/circllhist_config.h"],
cmd = "cp $< $@",
)
11 changes: 11 additions & 0 deletions bazel/repositories.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,7 @@ def envoy_dependencies(path = "@envoy_deps//", skip_targets = []):
_boringssl()
_com_google_absl()
_com_github_bombela_backward()
_com_github_circonus_labs_libcircllhist()
_com_github_cyan4973_xxhash()
_com_github_eile_tclap()
_com_github_fmtlib_fmt()
Expand Down Expand Up @@ -215,6 +216,16 @@ def _com_github_bombela_backward():
actual = "@com_github_bombela_backward//:backward",
)

def _com_github_circonus_labs_libcircllhist():
_repository_impl(
name = "com_github_circonus_labs_libcircllhist",
build_file = "@envoy//bazel/external:libcircllhist.BUILD",
)
native.bind(
name = "libcircllhist",
actual = "@com_github_circonus_labs_libcircllhist//:libcircllhist",
)

def _com_github_cyan4973_xxhash():
_repository_impl(
name = "com_github_cyan4973_xxhash",
Expand Down
4 changes: 4 additions & 0 deletions bazel/repository_locations.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ REPOSITORY_LOCATIONS = dict(
commit = "44ae9609e860e3428cd057f7052e505b4819eb84", # 2018-02-06
remote = "https://github.com/bombela/backward-cpp",
),
com_github_circonus_labs_libcircllhist = dict(
commit = "bb73e93ba23b5746ea15d95c21ee8536168c486c", # 2018-03-18
remote = "https://github.com/circonus-labs/libcircllhist",
),
com_github_cyan4973_xxhash = dict(
commit = "7caf8bd76440c75dfe1070d3acfbd7891aea8fca", # v0.6.4
remote = "https://github.com/Cyan4973/xxHash",
Expand Down
39 changes: 37 additions & 2 deletions include/envoy/stats/stats.h
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,11 @@ class Metric {
* Returns the name of the Metric with the portions designated as tags removed.
*/
virtual const std::string& tagExtractedName() const PURE;

/**
* Indicates whether a metric has been used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has been used by what? What does 'used' mean here? From reading the code, does it mean non-empty? If so, could you call this empty() and invert it, per C++ conventions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this actually is a method used in other Metric as well. Used here signifies whether it has been updated/a value has been set because of some thing happening like connection closing etc. This is helpful in flushing only metrics whose value has been modified. I just moved it to the Metric interface now because histogram also uses it. Earlier Counter and Gauge has this method. I am not sure if it is equivalent of empty.

Copy link
Contributor

@jmarantz jmarantz Apr 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK you shouldn't change the name as that would make the PR bigger.

But could you change to the comment something like:

  * Indicates whether the metric contains new data since the last flush().

@mattklein123 confirming that's accurate....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not since last flush, it indicates it has actually ever been set a value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No? Then what is it used for? Above you said "flushing only metrics whose value has been modified."

Over time, couldn't you wind up with every metric in this state, if the bit isn't cleared during flush?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it works like this. If a metric value has ever been set it is considered "used" and it would be pushed to stats sinks on subsequent flusher irrespective of whether it has been updated in that interval or not. I think it is reasonable because and would allow only metrics that are really used in that envoy instance to be pushed rather than all metrics. @mattklein123 may have better idea on the history. So I am not changing any thing here.

*/
virtual bool used() const PURE;
};

/**
Expand All @@ -128,7 +133,6 @@ class Counter : public virtual Metric {
virtual void inc() PURE;
virtual uint64_t latch() PURE;
virtual void reset() PURE;
virtual bool used() const PURE;
virtual uint64_t value() const PURE;
};

Expand All @@ -146,12 +150,13 @@ class Gauge : public virtual Metric {
virtual void inc() PURE;
virtual void set(uint64_t value) PURE;
virtual void sub(uint64_t amount) PURE;
virtual bool used() const PURE;
virtual uint64_t value() const PURE;
};

typedef std::shared_ptr<Gauge> GaugeSharedPtr;

struct HistogramStatistics;

/**
* A histogram that records values one at a time.
* Note: Histograms now incorporate what used to be timers because the only difference between the
Expand All @@ -167,6 +172,21 @@ class Histogram : public virtual Metric {
* Records an unsigned value. If a timer, values are in units of milliseconds.
*/
virtual void recordValue(uint64_t value) PURE;

/**
* Merges the histogram values collected during the flush interval.
*/
virtual void merge() PURE;

/**
* Returns the Histogram Summary Statistics for the flush interval.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: don't capitalize "Histogram Summary Statistics"

*/
virtual HistogramStatistics intervalStatistics() const PURE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually sure how this compiles. I think it's luck if HistogramStatistics is returned by value. We either need to define HistogramStatistics in this file, or I think probably more optimally, create an abstract interface for HistogramStatistics and return by const ref. I'm not sure which one will be easier. (The question on whether we can return by const ref is not really dependent on whether we use an abstract interface or not.)


/**
* Returns the Cumulative Histogram Summary Statistics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: don't capitalize "Cumulative Histogram Summary Statistics"

*/
virtual HistogramStatistics cumulativeStatistics() const PURE;
};

typedef std::shared_ptr<Histogram> HistogramSharedPtr;
Expand Down Expand Up @@ -194,6 +214,11 @@ class Sink {
*/
virtual void flushGauge(const Gauge& gauge, uint64_t value) PURE;

/**
* Flush a histogram.
*/
virtual void flushHistogram(const Histogram&) PURE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const Histogram& histogram


/**
* This will be called after beginFlush(), some number of flushCounter(), and some number of
* flushGauge(). Sinks can use this to optimize writing if desired.
Expand Down Expand Up @@ -263,6 +288,11 @@ class Store : public Scope {
* @return a list of all known gauges.
*/
virtual std::list<GaugeSharedPtr> gauges() const PURE;

/**
* @return a list of all known histograms.
*/
virtual std::list<HistogramSharedPtr> histograms() const PURE;
};

typedef std::unique_ptr<Store> StorePtr;
Expand Down Expand Up @@ -294,6 +324,11 @@ class StoreRoot : public Store {
* down.
*/
virtual void shutdownThreading() PURE;

/**
* Called during the flush process to merge all the thread local histograms.
*/
virtual void mergeHistograms() PURE;
};

typedef std::unique_ptr<StoreRoot> StoreRootPtr;
Expand Down
7 changes: 7 additions & 0 deletions include/envoy/thread_local/thread_local.h
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,13 @@ class Slot {
*/
virtual void runOnAllThreads(Event::PostCb cb) PURE;

/**
* Run a callback on all registered threads with a barrier.
* @param cb supplies the callback to run on each thread.
* @param main_callback supplies the callback to run on main thread after threads are done.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a warning here that there is no guarantee that this callback will fire during shutdown. I think that's OK for now but I would like to clearly mark that in case we need that later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/main_callback/all_threads_complete_cb (or something). Also per above clarify this runs on the main thread. You should also ASSERT in the impl that this function is only called on the main thread.

*/
virtual void runOnAllThreadsWithBarrier(Event::PostCb cb, Event::PostCb main_callback) PURE;

/**
* Set thread local data on all threads previously registered via registerThread().
* @param initializeCb supplies the functor that will be called *on each thread*. The functor
Expand Down
3 changes: 3 additions & 0 deletions source/common/stats/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ envoy_cc_library(
name = "stats_lib",
srcs = ["stats_impl.cc"],
hdrs = ["stats_impl.h"],
external_deps = [
"libcircllhist",
],
deps = [
"//include/envoy/common:time_interface",
"//include/envoy/server:options_interface",
Expand Down
35 changes: 35 additions & 0 deletions source/common/stats/grpc_metrics_service_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,41 @@ void GrpcMetricsStreamerImpl::ThreadLocalStreamer::send(

MetricsServiceSink::MetricsServiceSink(const GrpcMetricsStreamerSharedPtr& grpc_metrics_streamer)
: grpc_metrics_streamer_(grpc_metrics_streamer) {}

void MetricsServiceSink::flushCounter(const Counter& counter, uint64_t) {
io::prometheus::client::MetricFamily* metrics_family = message_.add_envoy_metrics();
metrics_family->set_type(io::prometheus::client::MetricType::COUNTER);
metrics_family->set_name(counter.name());
auto* metric = metrics_family->add_metric();
metric->set_timestamp_ms(std::chrono::system_clock::now().time_since_epoch().count());
auto* counter_metric = metric->mutable_counter();
counter_metric->set_value(counter.value());
}

void MetricsServiceSink::flushGauge(const Gauge& gauge, uint64_t value) {
io::prometheus::client::MetricFamily* metrics_family = message_.add_envoy_metrics();
metrics_family->set_type(io::prometheus::client::MetricType::GAUGE);
metrics_family->set_name(gauge.name());
auto* metric = metrics_family->add_metric();
metric->set_timestamp_ms(std::chrono::system_clock::now().time_since_epoch().count());
auto* gauage_metric = metric->mutable_gauge();
gauage_metric->set_value(value);
}
void MetricsServiceSink::flushHistogram(const Histogram& histogram) {
io::prometheus::client::MetricFamily* metrics_family = message_.add_envoy_metrics();
metrics_family->set_type(io::prometheus::client::MetricType::SUMMARY);
metrics_family->set_name(histogram.name());
auto* metric = metrics_family->add_metric();
metric->set_timestamp_ms(std::chrono::system_clock::now().time_since_epoch().count());
auto* summary_metric = metric->mutable_summary();
const HistogramStatistics hist_stats = histogram.intervalStatistics();
for (size_t i = 0; i < ARRAY_SIZE(hist_stats.quantiles_in_); i++) {
auto* quantile = summary_metric->add_quantile();
quantile->set_quantile(hist_stats.quantiles_in_[i]);
quantile->set_value(hist_stats.quantiles_out_[i]);
}
}

} // namespace Metrics
} // namespace Stats
} // namespace Envoy
28 changes: 5 additions & 23 deletions source/common/stats/grpc_metrics_service_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
#include "envoy/thread_local/thread_local.h"
#include "envoy/upstream/cluster_manager.h"

#include "common/buffer/buffer_impl.h"
#include "common/stats/stats_impl.h"

namespace Envoy {
namespace Stats {
Expand Down Expand Up @@ -115,25 +115,9 @@ class MetricsServiceSink : public Sink {

void beginFlush() override { message_.clear_envoy_metrics(); }

void flushCounter(const Counter& counter, uint64_t) override {
io::prometheus::client::MetricFamily* metrics_family = message_.add_envoy_metrics();
metrics_family->set_type(io::prometheus::client::MetricType::COUNTER);
metrics_family->set_name(counter.name());
auto* metric = metrics_family->add_metric();
metric->set_timestamp_ms(std::chrono::system_clock::now().time_since_epoch().count());
auto* counter_metric = metric->mutable_counter();
counter_metric->set_value(counter.value());
}

void flushGauge(const Gauge& gauge, uint64_t value) override {
io::prometheus::client::MetricFamily* metrics_family = message_.add_envoy_metrics();
metrics_family->set_type(io::prometheus::client::MetricType::GAUGE);
metrics_family->set_name(gauge.name());
auto* metric = metrics_family->add_metric();
metric->set_timestamp_ms(std::chrono::system_clock::now().time_since_epoch().count());
auto* gauage_metric = metric->mutable_gauge();
gauage_metric->set_value(value);
}
void flushCounter(const Counter& counter, uint64_t) override;
void flushGauge(const Gauge& gauge, uint64_t value) override;
void flushHistogram(const Histogram& histogram) override;

void endFlush() override {
grpc_metrics_streamer_->send(message_);
Expand All @@ -143,9 +127,7 @@ class MetricsServiceSink : public Sink {
}
}

void onHistogramComplete(const Histogram&, uint64_t) override {
// TODO : Need to figure out how to map existing histogram to Proto Model
}
void onHistogramComplete(const Histogram&, uint64_t) override {}

private:
GrpcMetricsStreamerSharedPtr grpc_metrics_streamer_;
Expand Down
Loading