-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats: add built-in log linear histogram support #3059
stats: add built-in log linear histogram support #3059
Conversation
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
include/envoy/stats/stats.h
Outdated
virtual void beginMerge() PURE; | ||
}; | ||
|
||
class ThreadLocalHistogramImpl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to forward declare this because with out this, I have to explicitly add merge(histogram_t* target)
to ThreadLocalHistogram interface which means we have to expose the libcircl
to stats.h interface definition. Otherwise I have to use dynamic_cast
in the ParentHistogramImpl
merge method. LMK if there are any other better solutions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be exposing libcircl to stats.h interface. Nor do I think we should be exposing ThreadLocalHistogramImpl. Can't you put the methods you need into the interface and use abstract classes except in the impl itself?
Scanning through stats.h, there are actually a couple of references to 'ThreadLocal' in StoreRoot, but otherwise there shouldn't really be references to a particular implementation from an interface. There's some kind of abstraction leak here, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. Thinking further, I think agree with you that ThreadLocalHistogram
should not be in stats.h
as it is not used by others in the system. I moved it to thread_local_store.h. Also I removed the addTlsHistogram
method in ParentHistogram
as that is an implementation detail. Now I think it looks clean. PTAL.
@jmarantz @mattklein123 thanks for your quick reviews. Today I have addressed the review feedback.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flushing comments...
include/envoy/stats/stats.h
Outdated
virtual void beginMerge() PURE; | ||
}; | ||
|
||
class ThreadLocalHistogramImpl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be exposing libcircl to stats.h interface. Nor do I think we should be exposing ThreadLocalHistogramImpl. Can't you put the methods you need into the interface and use abstract classes except in the impl itself?
Scanning through stats.h, there are actually a couple of references to 'ThreadLocal' in StoreRoot, but otherwise there shouldn't really be references to a particular implementation from an interface. There's some kind of abstraction leak here, IMO.
include/envoy/stats/stats.h
Outdated
/** | ||
* This is used to keep track of all worker thread local histograms. | ||
*/ | ||
virtual void addTlsHistogram(TlsHistogramSharedPtr hist_ptr) PURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about making this a HistogramSharedPtr -- which references only the interface.
source/common/stats/stats_impl.cc
Outdated
@@ -273,5 +273,34 @@ void RawStatData::initialize(absl::string_view key) { | |||
name_[xfer_size] = '\0'; | |||
} | |||
|
|||
const std::vector<double> HistogramStatisticsImpl::supported_quantiles_ = { | |||
0, 0.25, 0.5, 0.75, 0.90, 0.95, 0.99, 0.999, 1}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comments in this PR about how to initialize std::vector, and https://github.com/envoyproxy/envoy/blob/master/STYLE.md
The Google C++ style guide points out that non-PoD static and global variables are forbidden. This includes types such as std::string. We encourage the use of the advice in the C++ FAQ on the static initialization fiasco for how to best handle this.
source/common/stats/stats_impl.cc
Outdated
void HistogramStatisticsImpl::refresh(const histogram_t* new_histogram_ptr) { | ||
for (size_t i = 0; i < computed_quantiles_.size(); ++i) { | ||
computed_quantiles_[i] = 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine; you could also use std::fill(computed_quantiles_.begin(), computed_quantiles_.end(), 0.0);
which is fewer lines. Up to you.
FYI in practice what you had earlier with clear() had two potential problems with it:
- it would be legal for std::vector to release the backing store for .data() after the clear(), in which case you might possibly SEGV. But it's also possible std::vector would retain the storage until the next operation. This should also be caught by running CI with a debug STL. See envoy has 33 failing tests when run in an STL debug mode #2556 for why we are not doing this yet.
- I don't think clear() would have actually had this behavior of resetting the values to zero. So it's surprising that it would work in a test. But maybe they happened to be zero anyway. Which test was catching this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed it to use the way you suggested. All the merge tests execute this logic. I think even though clear does not set it to 0, the hist_accumulate
function might be updating it with the latest computed value. But I agree it is safe to initialize to zero and also added the ASSERT
as you suggested in other comment.
source/common/stats/stats_impl.cc
Outdated
for (size_t i = 0; i < computed_quantiles_.size(); ++i) { | ||
computed_quantiles_[i] = 0; | ||
} | ||
hist_approx_quantile(new_histogram_ptr, supported_quantiles_.data(), supported_quantiles_.size(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I'd put:
ASSERT(supported_quantiles_.size() == computed_quantiles_.size());
as that is what's being assumed in this line, without it being locally obvious that it must be the case.
source/common/stats/stats_impl.h
Outdated
HistogramStatisticsImpl(const histogram_t* histogram_ptr); | ||
|
||
std::string summary() const override; | ||
const std::vector<double>& supportedQuantiles() const override { return supported_quantiles_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably this method is where you'll want to lazy-initialize supported_quantiles_, and you should probably un-inline it.
if (params.size() == 0) { | ||
// No Arguments so use the standard. | ||
for (auto stat : all_stats) { | ||
response.add(fmt::format("{}: {}\n", stat.first, stat.second)); | ||
} | ||
for (auto histogram : all_histograms) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I misunderstood your comment. What if interval_value is NaN and cumulative_value is not, or vice versa? Does the algorithm guarantee that can't happen?
|
||
EXPECT_EQ(2, validateMerge()); | ||
|
||
store_->shutdownThreading(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just thinking about calling initializeThreading() in SetUp(), and putting InSequence in the class object (or maybe a histogram-related subclass. I wasn't thinking about initiating the histograms themselves as they are not all common.
But for TearDown, can't you do all these 3 lines? They seem pretty common. I'm not sure about the EXPECT_CALL below...could that be done in TearDown, or in the test body, which would re-order it?
|
||
// Validate the summary format is as expected. | ||
std::string expected_summary = "P0: 0, P25: 14, P50: 44, P75: 420, P90: 3220, P95: 3260, P99: " | ||
"3292, P99.9: 3299.2, P100: 3300"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the math here. None of the inputs are over 3200; why is P100=3300?
Can you add a simpler example and/or explain in a code comment why the value is that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
@@ -80,6 +80,19 @@ class StatsThreadLocalStoreTest : public testing::Test, public RawStatDataAlloca | |||
store_->addSink(sink_); | |||
} | |||
|
|||
void setUp() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void SetUp() override { ....
then you can remove the explicit calls from the test methods. It has to be applicable to all the test-methods in the class, so you might want to make a subclass just for the histograms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a added a new sub class and made this override and removed explicit calls. similar change for TearDown.
store_->initializeThreading(main_thread_dispatcher_, tls_); | ||
} | ||
|
||
void tearDown() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void TearDown() override { ....
*all_threads_complete = true; | ||
}); | ||
|
||
EXPECT_TRUE(*all_threads_complete); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks racy. runOnAllThreads returns instantly, calling the 'done' callback when it is done. The pattern I expect here is for you to establish a condition variable in your test, which you wait on here, and signal that condvar on the done callback.
You can also spin-loop on all_threads_complete since it is an atomic, but IMO it's better to use a condvar and make all_threads_complete a simple bool (not atomic, not shared).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the test to use condvar.
// Ensure that the thread local call back and all_thread_complete call back are called. | ||
std::shared_ptr<std::atomic<bool>> all_threads_complete = | ||
std::make_shared<std::atomic<bool>>(false); | ||
std::shared_ptr<std::atomic<uint64_t>> thread_local_calls = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this needs to be shared; it's owned by the test method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think earlier we ran in to issue if it is not shared with TSAN build, because multiple thread update it may be. That is why we changed it to shared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's because before you put the condvar in, the test method could exit while threads were active. The condvar solves that properly, so you shouldn't need to share it anymore.
Signed-off-by: Rama <rama.rao@salesforce.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ramaraochavali thank you for your patience here. I think we are very close. This is looking awesome. I dropped in a few more comments. I still haven't looked through tests. @jmarantz LMK when the tests pass muster for you and I will skim though them.
@mrice32 can you also look at this point?
source/common/stats/stats_impl.h
Outdated
void refresh(const histogram_t* new_histogram_ptr); | ||
|
||
private: | ||
static const std::vector<double> supported_quantiles_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used anymore? I don't think so?
source/common/stats/stats_impl.h
Outdated
@@ -446,6 +482,7 @@ class IsolatedStoreImpl : public Store { | |||
// Stats::Store | |||
std::list<CounterSharedPtr> counters() const override { return counters_.toList(); } | |||
std::list<GaugeSharedPtr> gauges() const override { return gauges_.toList(); } | |||
std::list<ParentHistogramSharedPtr> histograms() const override { return parent_histograms_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You don't need the member variable, just returning {}
should work.
std::unordered_set<std::string> names; | ||
std::unique_lock<std::mutex> lock(lock_); | ||
// TODO(ramaraochavali): As histograms don't share storage, there is a chance of duplicate names | ||
// here. We need to process global storage for histograms similar to how we have a central storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"We need process global storage..."
if (names.insert(hist_name).second) { | ||
ret.push_back(parent_hist); | ||
} else { | ||
ENVOY_LOG(debug, "duplicate histogram {}.{}", scope->prefix_, hist_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this will potentially have an effect on output, should this one actually be a warn? I could go either way on this. It's conceivable this would happen in the data path, though really it should only happen at config load time, so I'm not that worried about log spam.
You might also make the log message something like "duplicate histogram {}.{}: data loss will occur on output".
Can you also open a tracking issue for this? I have ideas on how to fix, but given that this won't effect too many people I don't think it's super urgent (ironically it will for sure effect us at Lyft since we share stats across listeners in certain cases).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to warn and added this issue #3098
for (const TlsHistogramSharedPtr& tls_histogram : tls_histograms_) { | ||
tls_histogram->merge(interval_histogram_); | ||
} | ||
histogram_t* hist_array[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my comment was lost, but I'm sure you can do lock.unlock()
before this line. The lock only needs to protect tls_histograms_
. I would add a comment along these lines.
* current_active index via which it writes to the correct histogram. | ||
* - When all workers have done, the main thread continues with the flush process where the | ||
* "actual" merging happens. | ||
* - As the active histograms are swapped in TLS histograms, On the main thread, we can be sure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/On/on
@@ -62,8 +166,11 @@ class ThreadLocalStoreImpl : public StoreRoot { | |||
}; | |||
|
|||
// Stats::Store | |||
// TODO(ramaraochavali): Consider changing the implementation of these methods to use vectors and | |||
// use std::sort, rather than inserting into a map and pulling it out for better performance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: period end of sentence.
source/server/http/admin.cc
Outdated
for (const Stats::CounterSharedPtr& counter : server_.stats().counters()) { | ||
all_stats.emplace(counter->name(), counter->value()); | ||
} | ||
|
||
for (const Stats::GaugeSharedPtr& gauge : server_.stats().gauges()) { | ||
all_stats.emplace(gauge->name(), gauge->value()); | ||
} | ||
const std::list<Stats::ParentHistogramSharedPtr>& histograms = server_.stats().histograms(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically you are storing a reference to a temporary here. Just copy. The compiler will use RVO.
if (params.size() == 0) { | ||
// No Arguments so use the standard. | ||
for (auto stat : all_stats) { | ||
response.add(fmt::format("{}: {}\n", stat.first, stat.second)); | ||
} | ||
for (auto histogram : all_histograms) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ramaraochavali my concern about the text output version is that it is not good for 1-line grep, which I think is a common use case (at least it is for me). It might be ugly, but can we figure out a format for the text version which has everything on a single line so we can grep for the stat?
source/server/http/admin.cc
Outdated
|
||
rapidjson::Value quantile_array(rapidjson::kArrayType); | ||
|
||
for (size_t i = 0; i < histogram->intervalStatistics().supportedQuantiles().size(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah +1 I think we can optimize. Right now since we don't have locked protos (or any protos) for the JSON output, I think we can change this stuff at any time. I would put in a TODO to think about optimizing when we switch over to protos for all admin JSON output.
EXPECT_EQ(*thread_local_calls, 1); | ||
*all_threads_complete = true; | ||
all_threads_complete_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to hold the mutex while modifying all_threads_complete_ and notifying the condvar. I'd do that using a std::unique-lock scoped around the those two lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. please check once
}); | ||
|
||
EXPECT_TRUE(*all_threads_complete); | ||
std::unique_lock<std::mutex> lock(cv_mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just hold this over the cv_.wait call and not during the shutdown calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. Must be sleeping when i coded this
@@ -47,6 +47,9 @@ class ThreadLocalInstanceImplTest : public testing::Test { | |||
InstanceImpl tls_; | |||
Event::MockDispatcher main_dispatcher_; | |||
Event::MockDispatcher thread_dispatcher_; | |||
std::condition_variable cv_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you spell out "condvar_" in the variable rather this abbreviation?
https://google.github.io/styleguide/cppguide.html#General_Naming_Rules
More importantly, this is only used in one test, so can you move these variables into the test? You could put them in a local struct there if you like, so that it's easier to capture them all at once for the nested function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved them to a local struct and renamed the variables. Can you PTAL at this specific test case and see if that is what you are looking for?
// Ensure that the thread local call back and all_thread_complete call back are called. | ||
std::shared_ptr<std::atomic<bool>> all_threads_complete = | ||
std::make_shared<std::atomic<bool>>(false); | ||
std::shared_ptr<std::atomic<uint64_t>> thread_local_calls = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's because before you put the condvar in, the test method could exit while threads were active. The condvar solves that properly, so you shouldn't need to share it anymore.
|
||
EXPECT_EQ(1, validateMerge()); | ||
|
||
std::string expected_summary = "P0: 1, P25: 1.025, P50: 1.05, P75: 1.075, P90: 1.09, P95: 1.095, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a weird artifact of the algorithm extrapolating off a degenerate case. Can I ask you to add another case where you add an even distribution of all integer values from 1 to 100 inclusive? I'm curious what you'll get. Maybe
PO: not sure....
P25: 25
P50: 50
P75: 75
P90: 90
P95: 95
P99: 99
P100: 100
or something close to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added that test case and this is what we get
P0: 0, P25: 25, P50: 50, P75: 75, P90: 90, P95: 95, P99: 99, P99.9: 99.9, P100: 100
store_->addSink(sink_); | ||
} | ||
|
||
void SetUp() override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect you want the InSequence object to be declared as a the test class member var, rather than a local member, I confess I don't know what that does. I'm suggesting that because maybe it enforces some deterministic threading model, but I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also not sure on this. So just leaving it as is.
class HistogramTest : public testing::Test, public RawStatDataAllocator { | ||
public: | ||
HistogramTest() { | ||
ON_CALL(*this, alloc(_)).WillByDefault(Invoke([this](const std::string& name) -> RawStatData* { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the ctor and move these lines to SetUp().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
source/common/stats/stats_impl.cc
Outdated
} | ||
|
||
const std::vector<double>& HistogramStatisticsImpl::supportedQuantiles() const { | ||
static const std::vector<double> supported_quantiles_ = {0, 0.25, 0.5, 0.75, 0.90, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local var should not have _ suffix.
} | ||
histogram_t* hist_array[1]; | ||
hist_array[0] = interval_histogram_; | ||
hist_accumulate(cumulative_histogram_, hist_array, ARRAY_SIZE(hist_array)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just write
hist_accumulate(cumulative_histogram_, &interval_histogram_, 1);
and skip setting up the temp array
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
@mattklein123 addressed all the comments. PTAL. |
Signed-off-by: Rama <rama.rao@salesforce.com>
bool all_threads_complete_; | ||
}; | ||
|
||
thread_local_verification_info thread_local_data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are going to define a type it should be called ThreadLocalVerificationInfo, but you don't need to define a type. Just do:
struct {
std::atomic<uint64_t> thread_local_calls_{0};
std::condition_variable condvar_;
std::mutex condvar_mutex_;
bool all_threads_complete_;
} thread_local_data;
thread_local_data.all_threads_complete_ = true; | ||
thread_local_data.condvar_.notify_one(); | ||
}); | ||
std::unique_lock<std::mutex> lock(thread_local_data.condvar_mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the mutex lock and the wait into a scope, so the lock is released before you call shutdownGlobalThreading. This sounds familiar, right? I am never sure if all my github comments are getting through.
In general hold locks for the minimum amount of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i did not get this specific comment. Will change now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment was "just hold this over the cv_.wait call and not during the shutdown calls." and your response was "done. Must be sleeping when i coded this".
If you could iterate through the PR and look at all the comments exposed by "show outdated" links it would reduce the number of comments that wind up getting repeated, and speed up iteration.
std::unique_lock<std::mutex> lock(thread_local_data.condvar_mutex_); | ||
thread_local_data.condvar_.wait( | ||
lock, [&thread_local_data] { return thread_local_data.all_threads_complete_; }); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You had EXPECT_TRUE(thread_local_data.all_threads_complete) here, right? Where did that go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. Some how got deleted looks like. Will add it back
std::string h1_expected_summary = | ||
"P0: 1, P25: 1.025, P50: 1.05, P75: 1.075, P90: 1.09, P95: 1.095, " | ||
"P99: 1.099, P99.9: 1.0999, P100: 1.1"; | ||
std::string h2_expected_summary = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make these locals const. or just move the literals into their single use.
BTW I'm not really sure why you need the degenerate case; seems like you are testing behavior which is really at the boundary of sanity, and probably not that important. But it's nice to see that the even-distribution case is working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i want to verify that case as well. I will leave it like that. I will make those variable local const.
expectCallAndAccumulate(h2, 1); | ||
expectCallAndAccumulate(h2, 2); | ||
|
||
EXPECT_EQ(2, validateMerge()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are not really validating that the merge did something sane because you are not showing the summary. WDYT about adding some data to both histograms so that, when merged, you get a result you expect?
Maybe have one histogram with [1, 1, 0, 0, 0, 0, 0, 0, 1, 1]; and see what the summary for that is. Make another histogram with [0, 0, 1, 1, 1, 1, 1, 1, 0, 0]. and see what the summary for that is. EXPECT_EQ on both of those. Then merge them, and EXPECT_EQ on the summary for that. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the merge logic on our side is validated by the MultiMerge test. Are you trying to validate the libcircl hist_accumulate function? I tested that earlier IIRC and they were not same probably because of the insertion order. They will be close but little different. I will create a test case and send for @postwait's review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no test called 'MultiMerge' but there are a number of tests that do merges, which check that the number of merged histograms is as expected. However, there's none I can see that looked at the values in the merged histogram. I scanned your message to @postwait and it might be reasonable for there to be some floating point sloppiness, in which case you could do inexact data matching in the test.
But it seems wrong to have no functional testing at all of the merged data.
I'm also fine with changing the data from what I suggested above to something that's easier to reason about in the context of the math behind circlehist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmarantz Now I added the test BasicHistogramMergeSummary that validates two set of values 1-50 and 51-100 and validates the merged summary is equal to [1-100].
* that can be asserted later. | ||
*/ | ||
uint64_t validateMerge() { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove blank line here.
h2_interval_values; | ||
|
||
/** | ||
* Validates taht Histogram merge happens as desired and returns the processed histogram count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/taht/that/
source/server/http/admin.cc
Outdated
|
||
rapidjson::Value quantile_array(rapidjson::kArrayType); | ||
|
||
for (size_t i = 0; i < histogram->intervalStatistics().supportedQuantiles().size(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right a TODO should be here -- actually a warning not to use the data until another round of optimization.
if (params.size() == 0) { | ||
// No Arguments so use the standard. | ||
for (auto stat : all_stats) { | ||
response.add(fmt::format("{}: {}\n", stat.first, stat.second)); | ||
} | ||
for (auto histogram : all_histograms) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this format LGTM; easy to parse with a shell script if needed.
EXPECT_CALL(*this, alloc("stats.overflow")); | ||
store_.reset(new ThreadLocalStoreImpl(*this)); | ||
store_->addSink(sink_); | ||
InSequence s; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit I've never been a huge gmock user. But I looked up InSequence and I suspect this usage model is not quite right. Check out https://github.com/google/googletest/blob/master/googlemock/docs/ForDummies.md#ordered-vs-unordered-calls
I think this only affects what's in the scope of that declaration, which in this case is only one call. I would think it should either be declared at the top of SetUp or as a class member var.
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
@jmarantz reg the json compact issue, I moved the TODO to the right place and created a envoy issue. Not sure if we want to warn not to use that till next optimization. @mattklein123 WDYT? |
I did not realize that you wanted me to "scope" it.
…On Tue, Apr 17, 2018 at 9:05 PM, Joshua Marantz ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In test/common/thread_local/thread_local_impl_test.cc
<#3059 (comment)>:
> + std::condition_variable condvar_;
+ std::mutex condvar_mutex_;
+ bool all_threads_complete_;
+ };
+
+ thread_local_verification_info thread_local_data;
+
+ tlsptr->runOnAllThreads(
+ [&thread_local_data]() -> void { ++thread_local_data.thread_local_calls_; },
+ [&thread_local_data]() -> void {
+ EXPECT_EQ(thread_local_data.thread_local_calls_, 1);
+ std::unique_lock<std::mutex> lock(thread_local_data.condvar_mutex_);
+ thread_local_data.all_threads_complete_ = true;
+ thread_local_data.condvar_.notify_one();
+ });
+ std::unique_lock<std::mutex> lock(thread_local_data.condvar_mutex_);
The comment was "just hold this over the cv_.wait call and not during the
shutdown calls." and your response was "done. Must be sleeping when i coded
this".
If you could iterate through the PR and look at all the comments exposed
by "show outdated" links it would reduce the number of comments that wind
up getting repeated, ands speed up iteration.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3059 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AQaGqHOljcLO_1-I4hsihFmAFC8kZf7Uks5tpgu_gaJpZM4TRBTs>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sticking through this. Mostly nits on my part now, except for the lack of a functional test for the merged histogram data.
I still want to look at the threading model -- I haven't dived into that yet, and I want to mainly to learn myself more about Envoy threading. I expect @mattklein123 will scrutinize that.
I raised an issue with github on why this PR unicorns...it's not that big. But I would suggest, if we need to clone it again, to squash the commits into one. We'd be losing the comments anyway, I think. But maybe the high commit-count is what blows up GH.
HistogramStatisticsImpl h2_interval_statistics(hist2_interval); | ||
|
||
for (const Stats::ParentHistogramSharedPtr& histogram : histogram_list) { | ||
if (histogram->name() == "h1") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK this pattern of looping over the histograms is a bit awkward and I think it might be worth having a test-class helper-method:
typedef std::map<std::string, Stats::ParentHistogramSharedPtr> NameHistogramMap
NameHistogramMap makeHistogramMap(list) { ... };
with usage in tests
const Stats::ParentHistogramSharedPtr& h1 = histogram_map["h1"];
EXPECT_EQ(h1->cumulativeStatistics().summary(), h1_cumulative_statistics.summary());
...
rather than looping and if/elseing the cases. If it were just one of these then it might not be worth it, but there are several, here & below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
EXPECT_CALL(sink_, onHistogramComplete(Ref(histogram), record_value)); | ||
histogram.recordValue(record_value); | ||
|
||
if (histogram.name().find("h1") != std::string::npos) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
@@ -82,6 +82,43 @@ TEST_F(ThreadLocalInstanceImplTest, All) { | |||
tls_.shutdownThread(); | |||
} | |||
|
|||
// Validate ThreadLocal::runOnAllThreads behavior with all_thread_complete call back. | |||
TEST_F(ThreadLocalInstanceImplTest, RunOnAllThreads) { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm blank line
TEST_P(MetricsServiceIntegrationTest, BasicFlow) { | ||
initialize(); | ||
// Send an empty request so that histogram values merged for cluster_0. | ||
codec_client_ = makeHttpConnection(makeClientConnection((lookupPort("http")))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra parens around lookupPort
expectCallAndAccumulate(h2, 1); | ||
expectCallAndAccumulate(h2, 2); | ||
|
||
EXPECT_EQ(2, validateMerge()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no test called 'MultiMerge' but there are a number of tests that do merges, which check that the number of merged histograms is as expected. However, there's none I can see that looked at the values in the merged histogram. I scanned your message to @postwait and it might be reasonable for there to be some floating point sloppiness, in which case you could do inexact data matching in the test.
But it seems wrong to have no functional testing at all of the merged data.
I'm also fine with changing the data from what I suggested above to something that's easier to reason about in the context of the math behind circlehist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the delay. Just starting to look through this - here are a few surface-level comments. Really cool stuff! Awesome work!
source/common/stats/stats_impl.cc
Outdated
} | ||
|
||
const std::vector<double>& HistogramStatisticsImpl::supportedQuantiles() const { | ||
static const std::vector<double> supported_quantiles_ = {0, 0.25, 0.5, 0.75, 0.90, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: supported_quantiles_
-> supported_quantiles
// here. We need to process global storage for histograms similar to how we have a central storage | ||
// in shared memory for counters/gauges. | ||
for (ScopeImpl* scope : scopes_) { | ||
for (auto name_histogram_pair : scope->central_cache_.histograms_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why not auto &
?
tls_->runOnAllThreads( | ||
[this]() -> void { | ||
for (ScopeImpl* scope : scopes_) { | ||
for (auto name_histogram_pair : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why not auto &
?
void recordValue(uint64_t value) override; | ||
|
||
bool used() const override { return flags_ & Flags::Used; } | ||
void beginMerge() override { current_active_ = 1 - current_active_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
} | ||
|
||
void ThreadLocalHistogramImpl::merge(histogram_t* target) { | ||
hist_accumulate(target, &histograms_[1 - current_active_], 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make an inline class method "otherHistogramIndex()" for 1 - currentActive_, and also save it in a temp and use it in both places in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
bool used() const override { return flags_ & Flags::Used; } | ||
void beginMerge() override { | ||
// this switches the current_active_ between 1 and 0. | ||
current_active_ = 1 - current_active_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
factor out '1 - current_active_' here too.
Signed-off-by: Rama <rama.rao@salesforce.com>
Closing this and will create another PR as this is taking time to load getting unicorn for most of the folks. |
Description:
Implements built-in log linear histogram. Fixes the issue #1965
Risk Level: High
Testing:
Unit, Integration and Manual testing.
Docs Changes:
NA
Release Notes:
Will create PR for version update once this is approved.