New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert Mixer self-monitoring to use OpenCensus libraries #7989
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7989 +/- ##
======================================
+ Coverage 71% 71% +1%
======================================
Files 374 370 -4
Lines 32673 32620 -53
======================================
+ Hits 22906 22937 +31
+ Misses 8760 8692 -68
+ Partials 1007 991 -16
Continue to review full report at Codecov.
|
cc: @Ramonza |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good.
Is there a solution to deal with the metrics used by our dependencies? This is an identical problem we had when replacing glog, which led to the istio/glog repo. Do we need something similar here?
mixer/pkg/checkcache/cache.go
Outdated
return nil | ||
} | ||
|
||
// Get looks up an attribute bag in the cache. | ||
func (cc *Cache) Get(attrs attribute.Bag) (Value, bool) { | ||
defer func() { cc.recordStats() }() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere in this file, I'm not thrilled with "defer" here since it induces an allocation (last I looked)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
mixer/pkg/server/server.go
Outdated
|
||
exporter, err := prometheus.NewExporter(prometheus.Options{Registry: oprometheus.DefaultRegisterer.(*oprometheus.Registry)}) | ||
if err != nil { | ||
return nil, fmt.Errorf("could not build prometheus exporter: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add a unit test to cover this new code.
mixer/pkg/server/server.go
Outdated
|
||
// Register the views to collect server request count. | ||
if err := view.Register(ocgrpc.DefaultServerViews...); err != nil { | ||
return nil, fmt.Errorf("could not register default server views: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need unit test for this.
mixer/pkg/checkcache/cache.go
Outdated
missesTotal = stats.Int64("mixer/checkcache/cache_misses_total", "The number of times a cache lookup operation failed to find an entry in the cache.", stats.UnitDimensionless) | ||
evictionsTotal = stats.Int64("mixer/checkcache/cache_evictions_total", "The number of entries that have been evicted from the cache.", stats.UnitDimensionless) | ||
|
||
writesView = newView(writesTotal, []tag.Key{}, view.Sum()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want the LastValue
aggregation type here. Otherwise, with Sum you will keep adding to a cumulative total and it looks like the actual values you're recording already represent a cumulative total so you really just want to record the value and not total up all the values you ever passed to stats.Record I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch!
|
||
func newView(measure stats.Measure, keys []tag.Key, aggregation *view.Aggregation) *view.View { | ||
return &view.View{ | ||
Name: measure.Name(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this is the default if no name is specified
mixer/pkg/checkcache/cache.go
Outdated
@@ -200,3 +183,10 @@ func (cc *Cache) Set(attrs attribute.Bag, value Value) { | |||
|
|||
cc.cache.SetWithExpiration(shape.makeKey(attrs), value, value.Expiration.Sub(now)) | |||
} | |||
|
|||
func (cc *Cache) recordStats() { | |||
stats.Record(context.Background(), writesTotal.M(int64(cc.cache.Stats().Writes))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be coalesced into a single Record call:
cs := cc.cache.Stats()
stats.Record(
context.Background(),
hitsTotal.M(int64(cs.Hits)),
missesTotal.M(int64(cs.Misses)),
...
@@ -105,7 +107,9 @@ func (s *session) dispatch() error { | |||
namespace, err := getIdentityNamespace(s.bag) | |||
if err != nil { | |||
// early return. | |||
updateRequestCounters(0, 0) | |||
stats.Record(s.ctx, monitoring.DestinationsPerRequest.M(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be coalesced to a single Record call
@@ -186,7 +190,9 @@ func (s *session) dispatch() error { | |||
} | |||
} | |||
|
|||
updateRequestCounters(ndestinations, ninputs) | |||
stats.Record(s.ctx, monitoring.DestinationsPerRequest.M(int64(ndestinations))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
ffbed4c
to
950fb11
Compare
Reviewers: PTAL. I believe I have addressed the existing comments (adding unit tests, etc.). I have also updated the |
@geeknoid was there something you had in mind re: metrics used by our deps? I was not aware of any dependencies that were using prometheus to monitor anything prior to this PR. I don't know that we need to solve that particular issue right now, but it does raise an interesting question of package instrumentation and playing well with others. @Ramonza does OC have any thoughts on such issues? |
/test istio-unit-tests and, of course, linting produces different results on every run... sigh. |
Sorry for my unclear comment. I was referring to this from your PR comment:
Can these be addressed too at some point in a systematic way? On a related note, the ControlZ library is currently mining Prometheus metrics for display in the UI. What happens in ControlZ after this PR goes in? |
@geeknoid ah, gotcha. Yes, probably. But rather than duplicate that functionality in this PR (the key is really finding some hook to generate them on the fly the same way prometheus does, I believe), I've skipped over providing those in an OpenCensus way. I think, ultimately, it probably makes more sense for OpenCensus to provide those measures than for Istio to build them. For now, skipping them seems fine. @Ramonza thoughts here? RE ControlZ: It still works. We use the default registry (gatherer) for the prometheus exporter here, and that is what is currently powering the |
/approve
/lgtm
…On Wed, Aug 22, 2018 at 8:42 AM Douglas Reid ***@***.***> wrote:
@geeknoid <https://github.com/geeknoid> ah, gotcha. Yes, probably. But
rather than duplicate that functionality in this PR (the key is really
finding some hook to generate them on the fly the same way prometheus does,
I believe), I've skipped over providing those in an OpenCensus way. I
think, ultimately, it probably makes more sense for OpenCensus to provide
those measures than for Istio to build them. For now, skipping them seems
fine. @Ramonza <https://github.com/ramonza> thoughts here?
RE ControlZ:
It still works. We use the default registry (gatherer) for the prometheus
exporter here, and that is what is currently powering the metrics portion
of ControlZ.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7989 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AVucHYZ8x4mX7qP9bQWH-hf9VylOus0yks5uTXvvgaJpZM4WAS-9>
.
|
c01f2fb
to
c76dd92
Compare
Thanks for all the reviews! I've fixed up all of the linter issues and dep diff stuff now. PTAL for final approval. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: geeknoid The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@douglas-reid: The following test failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This PR is part of an exploration of moving Istio self-monitoring from pure Prometheus libraries to OpenCensus (OC) libraries. The ultimate goal being explored is the possibility of creating an Istio OpenCensus exporter mechanism that would allow flexible export of Istio self-monitoring metrics to different backends (cloudwatch, datadog, stackdriver, etc.).
This PR does the following:
checkcache
,runtime
)This PR does not replace the Prometheus collectors that provide additional metrics (not from Mixer). These include:
Reviewers:
I am interested in feedback on a few items. First, does this library change make sense? Second, is the way that OC is being used seem appropriate?
Important items to note when reviewing:
OC models things differently to prometheus and this results in some interesting changes:
view.Count
. This is primarily because that results in exporting Prometheus metrics of typeCounter
. Usingview.Sum
results in untyped prometheus metrics. This ultimately impacts how we use the Stats, asview.Count
records the number of measurements, not the total of those measurements. I think this is the right thing to do for now, but maybe not. The Prometheus docs include the following related documentation:0
s for things that haven't happened, etc.)