Skip to content

Conversation

rlinehan
Copy link
Contributor

This PR improves on #51 and hopefully addresses all feedback. In particular, it improves the metrics API by:

  • Moving the get-client-metrics(-data) functions off of the client
  • Adding a with-url-and-method namespace
  • Adding a ClientTimer class that wraps Timer and holds onto the metric name, and url, method, or metric id used to create the timer
  • Changing the getClientMetrics(Data) methods to return a map of metric category to an array of timers or timer data
  • Adding getClientMetricsBy* methods to filter by a url, url and method, or metric id

This commit adds metrics support to the http client (clojure and java, sync
and async). A metric registry can optionally be passed into the client as a
client option on creation. If a metric registry is present, timers will be
added to time each request.

By default, a timer is added for the URL (stripped of username, password,
query string, and path fragments) and the URL plus the method used for the
request. In addition, a request can include a `metric-id` option, which takes
a tuple of metric ids. If this request option is specified, a timer will be
created for each element of the metric id tuple - thus if the tuple is [:foo
:bar :baz] there will be a foo timer, a foo.bar timer, and a foo.bar.baz
timer.

In addition, each timer has a "MetricType" - currently there is only one
metric type, bytes-read, which is stopped when the full response has been
read. In the future, we may add "response-init" timers that get stopped when
the first byte of the response has been read.

This commit also adds a `get-client-metrics`/`.getClientMetrics` function that
takes a client instance and returns the http client-specific metrics from the
metric registry and a `get-client-metrics-data`/`.getClientMetricsData`
function for clojure and java sync and async clients to get out metrics data
from the client. This function takes a client instance and returns a map of
metric name to a map of metric data (for clojure) or a ClientMetricData object
(for java), both of which include the mean, count, and aggregate for the timer

These `get-client-metrics*`/`.getClientMetrics*` functions also have versions
that take a url, url and method, or metric id to allow for filtering of the
timers/metrics data returned by these functions.

The clojure versions of these functions take a metric filter map. There are
also metric filter builder functions to build up the type of metric filter
desired from a url, a url and method, or a metric id. These will prevent users
from having to know the specifics of how to build a metric themselves; instead
they can use a convenience function.

An empty metric id can be passed in to the filter to return all metric-id
timers.
This commit does several things to improve the metrics API:

* Move get-client-metrics(-data) functions off of client:
Previously, the `get-client-metrics(-data)`/`.getClientMetrics(Data)`
functions were on the client. This didn't entirely make sense because if two
clients were using the same metric registry, these functions would actually
return metrics data for *all* clients, rather than just for the client the
function was called on. This commit removes these functions and changes the
tests to use the metrics namepsace/Metrics class versions instead, which take
a metric registry return all http-client related metrics on that metric
registry.

* Add a `with-url-and-method` namespace:
Move the timers that have both the url and the method from the `with-url`
namespace to a new `with-url-and-method` namepsace. This better matches the
data structure are returned in and the filtering mechanisms for getting them
back out (discussed below), and also removes a slight chance of having a
conflict with a url timer.

* Add a ClientTimer class:
Add a ClientTimer class that wraps the Timer class and also holds onto the
url, method, or metric id used to create the timer. Use this to add url,
method, metricId to ClientMetricData.

* Change the `getClientMetrics(Data)` methods to return a map of metric
category to an array of timers or timer data:
Previously, the methods for getting http client metrics/metric data out of a
metric registry returned a map of metric name to timer instance or metric
data. However, the metric name was most likely not useful to users, who
probably just want to iterate through the timers/data. This commit makes the
output of these functions more useful by returning arrays of timers/data
sorted into a map indexed by metric category (url, url and method, or metric
id).

* Add `getClientMetricsBy*` methods:
Add `getClientMetrics(Data)ByUrl`, `getClientMetrics(Data)ByUrlAndMethod`, and
`getClientMetrics(Data)ByMetricId` methods (and clojure versions) that allow
for filtering by a specific metric category and return an array of timers or
timer data that match the url, url and method, or metric id specified.

* Remove the filter-builder functions:
Previously, the `get-client-metrics(-data)` functions did filtering by taking
a filter map, and there were filter-builder functions to build these filter
maps. Now that there are separate filtering methods, these filter maps are no
longer used and the filter-builder functions are removed.
@rlinehan
Copy link
Contributor Author

I think this addresses everything @cprice404 and I were discussing on #51. I decided to just return arrays of timers/metric data, rather than having it indexed by metric tuple/url. It seems like most of the time users will want to iterate over all of these. Furthermore, I ran into difficulties trying to do this for the case where all the different categories are being returned, because then some of the keys would be arrays and some would be strings, which doesn't work. In addition, it didn't seem like indexing by metric id would be that useful even when only metric id timers were being returned, since much of the time users will probably want to dump this data into json for a status endpoint, and then they will need to serialize the metric ids into a string, since json doesn't allow arrays as keys. Thus, if we were to do this, we would be doing extra work that might be for nothing and cause users to also do more work.

There are a few things that I think still need to be done, the main thing being renaming. I talked with @nfagerlund about how things are named in this PR, and he felt that the one thing that should be changed are the names of the metric type, which are currently "bytes-read" and "response-init." He recommended going with names that show how they are semantically related - for example "full-response" and "initial-response" or "response-complete" and "response-init." I like either of these options, with maybe a slight preference to "full-response" and "initial-response."

I have two other questions that I think affect the API:

  1. should we return nil if a nil metric registry is passed in to the get-client-metrics-* functions (and java versions), or throw an error. I have this currently implemented to return nil, but I think that it actually makes more sense (now that these functions aren't on the client) to just throw an error.
  2. right now all of the get-client-metrics-* functions have two arities to allow for taking an optional metric type. Right now there's only one metric type, but this is done for forwards compatibility (the functions with no metric type return only the bytes read timers/data, and they can continue doing that in the future). However, I think we could eliminate the arities that take the metric type and just have it hardcoded everywhere, and then later on add in another arity or an option map or something when/if we add additional metric types.

In addition, I would like to do some refactoring of some of the tests, and there are a couple places where the implementation could probably do with some refactoring, but I wanted to get a PR up and make sure the API was agreed on, and then these changes could even be made after this PR is been merged.

@rlinehan
Copy link
Contributor Author

ping @camlow325 for review


private static final Logger LOGGER = LoggerFactory.getLogger(Metrics.class);

synchronized private static ClientTimer getOrAddTimer(MetricRegistry metricRegistry,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this is very non-performant. This is implemented upstream in MetricRegistry with https://github.com/dropwizard/metrics/blob/3.1-maintenance/metrics-core/src/main/java/com/codahale/metrics/MetricRegistry.java#L311-L326 (which is unfortunately private), and while it's rather gross it's what we've basically been using already, and in discussions offline @cprice404 and I were thinking that consequently maybe trying to register and then catching the IllegalArgumentException is an okay way do to this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with what the getOrAdd implementation does more closely, not sure this would need to be marked as synchronized. Seems like their implementation - which uses a ConcurrentMap under the hood for synchronization - would allow for a metric of the same type that has been registered to be returned or a new one added without needing the extra level of locking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that was my thinking - we just go with a version of their getOrAdd implementation and remove the synchronized here. If you're cool with that then I'll change this method to follow the logic in their implementation. The way I currently have it implemented was really just meant to be a first pass to get it working, and I wanted to make sure that others were okay with the general direction this whole thing was making before making the logic more complicated.

@camlow325
Copy link
Contributor

nfagerlund ... felt that the one thing that should be changed are the names of the metric type, which are currently "bytes-read" and "response-init." He recommended going with names that show how they are semantically related - for example "full-response" and "initial-response" or "response-complete" and "response-init." I like either of these options, with maybe a slight preference to "full-response" and "initial-response."

Either of the two alternatives sound better to me as well than what we currently have. If you like "full-response" and "initial-response" best, I'm good with that.

@camlow325
Copy link
Contributor

  1. should we return nil if a nil metric registry is passed in to the get-client-metrics-* functions (and java versions), or throw an error. I have this currently implemented to return nil, but I think that it actually makes more sense (now that these functions aren't on the client) to just throw an error.

If I were implementing this from scratch, I probably would have gone with throwing an exception over returning nil. If we had cases where nil could be returned for cases where no metrics were found vs. just the metrics registry passed in being nil, I'd feel even more strongly that throwing an error to clearly distinguish between the two cases. I think we're always returning non-nil data structures if a valid registry is passed in, though, so probably not relevant here. Suppose I'd defer to your judgement on this one.

@camlow325
Copy link
Contributor

  1. right now all of the get-client-metrics-* functions have two arities to allow for taking an optional metric type. Right now there's only one metric type, but this is done for forwards compatibility (the functions with no metric type return only the bytes read timers/data, and they can continue doing that in the future). However, I think we could eliminate the arities that take the metric type and just have it hardcoded everywhere, and then later on add in another arity or an option map or something when/if we add additional metric types.

Even though we only support 1 type at this point, I like the idea of having it possible up front to filter for only the id. Might be annoying for clients that just are interested in the data for the 1 type to be using a function that suddenly starts returning a bunch of extra data that they don't want. For those clients, using the form of the API scoped to a specific type upfront would seem like a good safeguard.

As to whether to have the type be a named parameter vs. a parameter in an options map, good question. Seems like it's going to be common enough for clients to want to filter for a specific metric that having it be a named parameter initially might be the best way to go.

So I suppose overall, I'd lean toward going with what you've already done on the PR for this.


public boolean matches(String s, Metric metric) {
if ( metric instanceof ClientTimer ){
return isMatch((ClientTimer) metric, url, method, metricId, metricType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't need to pass in the url, method, metricId, and metricType since the method being called isn't static and so should have access to these as member variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh right! I forgot I'm not in functional programming land.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rlinehan
Copy link
Contributor Author

Even though we only support 1 type at this point, I like the idea of having it possible up front to filter for only the id. Might be annoying for clients that just are interested in the data for the 1 type to be using a function that suddenly starts returning a bunch of extra data that they don't want. For those clients, using the form of the API scoped to a specific type upfront would seem like a good safeguard.

As to whether to have the type be a named parameter vs. a parameter in an options map, good question. Seems like it's going to be common enough for clients to want to filter for a specific metric that having it be a named parameter initially might be the best way to go.

I was thinking that even when we add another type, we would continue having the methods that don't include the specific type as a parameter return only the :bytes-read/:full-response, for backward compatibility, and then add not only the new type but also an ALL option for filtering. I worry that since we aren't completely sure how the API will look as we add more things (such as an option for returning everything under a metric-id, rather than just the explicit metric-id), keeping the methods that take the metric type will make expanding the API more difficult.

Metric timer = metricRegistry.getMetrics().get(name);

if ( timer == null ) {
return metricRegistry.register(name, newTimer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do something more like what the MetricRegistry.getOrAdd call does that you mentioned? They have a try/catch block around this part that would allow for the metric to still be returned rather than throwing an IllegalArgumentException if a metric with the same name and class type happens to be registered right around when this is called. Not sure exactly why we'd want to deviate from their implementation for what we're doing?

@camlow325
Copy link
Contributor

I can't remember if we discussed this previously or not but did we decide that we wanted to have the timer objects returned back through the Clojure get-client-metrics API to be Java objects - like as opposed to Clojure maps? I had been thinking we would want a layer that translates those for the benefit of those who would want to consume the metrics in a format that looks more native to Clojure? Looks like the get-client-metrics-data API result consists of primitive Clojure objects all the way down.

@rlinehan
Copy link
Contributor Author

The get-client-metrics methods return Java ClientTimer objects. The get-client-metrics-data methods return Clojure maps of data (and the Java version of that - getClientMetricsData returns instances of the Java class ClientMetricsData) . Does that answer your question/provide what you were asking about?

@camlow325
Copy link
Contributor

I was thinking that even when we add another type, we would continue having the methods that don't include the specific type as a parameter return only the :bytes-read/:full-response, for backward compatibility, and then add not only the new type but also an ALL option for filtering.

I guess it would be more intuitive to me as an API consumer to have the version that you call with no argument for filtering just return everything it knows about - rather than designating a separate ALL option later for that purpose.

I worry that since we aren't completely sure how the API will look as we add more things (such as an option for returning everything under a metric-id, rather than just the explicit metric-id), keeping the methods that take the metric type will make expanding the API more difficult.

Yeah, good point. A bit hard to anticipate that at this point. I suppose I'd be okay with moving it into an options map in anticipation of that if it makes sense to you.

@camlow325
Copy link
Contributor

The get-client-metrics methods return Java ClientTimer objects.

Yeah, that's what I'm seeing. I was just questioning if that's what we want.

I guess there's a lot of stuff inherited from a com.codahale.metrics.Timer object that would be tedious to coerce into a Clojure map. It wasn't obvious to me that clients of this API would want/need to get the full Java form of the object back in order to be able to do anything with it other than read data - i.e., not to call .stop or .close on it. I'm good with it as it is. It just surprised me a bit to see that form of the object come back when calling from the Clojure REPL.

@rlinehan
Copy link
Contributor Author

I guess there's a lot of stuff inherited from a com.codahale.metrics.Timer object that would be tedious to coerce into a Clojure map. It wasn't obvious to me that clients of this API would want/need to get the full Java form of the object back in order to be able to do anything with it other than read data - i.e., not to call .stop or .close on it. I'm good with it as it is. It just surprised me a bit to see that form of the object come back when calling from the Clojure REPL.

When I was first discussing this API with @cprice404 we settled on having two methods/APIs - one that returned a map of some of the most useful data (same as what the tk-comidi-metrics returns https://github.com/puppetlabs/trapperkeeper-comidi-metrics/blob/master/src/puppetlabs/metrics/http.clj#L31) and one that returned the Timer instance, which users could then use as they needed.

@camlow325
Copy link
Contributor

When I was first discussing this API with @cprice404 we settled on having two methods/APIs - one that returned a map of some of the most useful data (same as what the tk-comidi-metrics returns https://github.com/puppetlabs/trapperkeeper-comidi-metrics/blob/master/src/puppetlabs/metrics/http.clj#L31) and one that returned the Timer instance, which users could then use as they needed.

Fair enough. Thanks for the explanation.

@camlow325
Copy link
Contributor

Done reviewing this for now. Overall, I'm still 👍 on the direction.

I suppose the biggest question about the shape of the API is whether returning lists of flat metric maps for the get-client-metrics-data-* calls is sufficient or if we need to index them further, e.g., by url. We'll probably want to have @cprice404 make the final call on that.

I'm curious to hear what you think about the .getOrAddTimer synchronization questions.

As for remaining naming questions and how to pass in the metric type parameter on the get-client-metrics-* function, I'll defer to your judgment.

I really like the general shape of the get-client-metrics-* APIs as being separate from the clients themselves and pretty lightweight in terms of what you have to do to construct a query. @cprice404 will presumably have much stronger opinions on the exact nature of how things are aggregated in the query responses than I do.

Looking good to me! Can't wait to get this stuff up into Puppet Server!

We are subclassing `Timer` to create our own `ClientTimer` instances.
Unfortunately, the method we would like to use to register these on the
MetricRegistry, `getOrAdd()`, is private. Instead, we have to have our own
`getOrAddTimer()` to handle getting the timer if it has already been
registered, or registering a new one. This commit updates our implementation
of `getOrAddTimer()` to match the logic of `getOrAdd()`.
Rename the Metric Types from `bytes-read`/`init-repsonse` to
`full-response`/`initial-response` (note that the `initial-response` timer has
not yet been implemented, but when it is this will be its name).
* Don't pass args into `isMatch()` in `ClientMetricFilter`, since this method
isn't static and so has access to the member variables.
Relevant gif: https://aphyr.com/data/posts/317/state.gif.

* Remove unused `metricTypeString` method
@rlinehan
Copy link
Contributor Author

@camlow325 I've pushed up several commits that update the getOrAddTimer implementation, rename bytes-read to full-response, and respond to a couple other bits of PR feedback.

I'll leave the other questions about nil, versions of the get- functions that explicitly take metric type, and anything else there still might be discussion about til Chris gets back.

@camlow325
Copy link
Contributor

👍 to the latest changes. Sounds like we're waiting for feedback from @cprice404 at this point on some of the remaining API design questions.

@cprice404
Copy link

Attempt to roll up some responses to all of the non-threaded comments:

  • I'm good with the new name suggestions
  • Definitely good with getting rid of that synchronized block
  • Agree with @camlow325 that I'd prefer we throw rather than return a nil if someone asks for metrics when there was no registry
  • I am in favor of removing the arities that take the metric type and then surfacing that in a backward-compatible way sometime in the future. When we get to that point, I'd vote for an options map so that we don't end up in a situation where we have to juggle additional arities in the future.
  • Slight preference for switching the strippedURL thing to call the URI constructor rather than concatenating strings.
  • I'm good with the different sigs of get-client-metrics-* so that you can get either the data or the actual Timer objects.
  • I'm good with just returning vectors/seqs of the metrics rather than trying to put another layer of nested maps in. The concerns that @rlinehan raised seem compelling to me, and I imagine that most consumers would just be calling val on those in most cases anyway.

Unrelated to any of the previous comments:

  • We are definitely going to need a way to ask for only the "leaf node" metrics in the metrics-id code paths. For displaying data in a status endpoint, we almost certainly only want the leaves. That doesn't have to come in this PR as long as we have an idea of how the API will need to change to support that before we do a release. But I think it has to happen before we can expect to be able to consume this easily for our status endpoints.
  • I think it'd probably be best to use more specific schemas for the metrics-data for the different types; one for metric-id, one for url, and one for url-and-method or whatever. Having the nil values in there reminds me of running into similar stuff like this when trying to deal with polymorphic objects in OO land in the past, and my gut is that having nil-able fields is probably not the best solution. We can discuss this further when we pair tomorrow, hopefully if we decide to change it it won't be too big of a change.
  • We are returning ClientMetricData from public APIs, but that class is defined in the impl namespace, which somewhat implies that it is private and that we can change it in the future. We should probably change that in some way, and the decision may end up involving the same questions about polymorphism from my previous comment. Hopefully that won't be a big deal either, though, and we can poke at it tomorrow as well.

I think that's all from my end!

rlinehan added 2 commits May 4, 2016 10:07
Rather than having e.g a nil url and method in the metric data when metric-id
is filtered on, or a nil metric id if url and method are filtered on, use
specific schemas for metrics data returned for each metric category - metric
id, url, and url and method.
Previously, all the get-client-metrics* functions had two arities - one with a
metric type and one without. This commit removes the arity with a metric type
from all these functions since currently we only have one metric type and
since we aren't entirely certain of what we want the API to be when we add
additional metric types. When we add more metric types, we will change the API
in a backward compatible way.
rlinehan and others added 14 commits May 4, 2016 15:10
…rics* fns

Previously, if any of the `get-client-metrics*` methods (in Clojure or Java)
had a nil metric registry passed into them, they would return nil. This commit
updates the behavior of all of these methods to instead return an error - a
schema error (clojure.lang.ExceptionInfo) for the clojure functions, and an
IllegalArgumentException for the java methods.
(TK-316) Move Java classes to separate metrics namespaces
When constructing url timers for request metrics, we strip off any username,
password, query params, and path fragments on the request uri. This commit
provides a public method to do this conversion of a url, so that users can
take the url they used for their request and easily figure out the url we used
for the metric.
…ING_ON_INSTANCEOF_REFACTOR

(TK-316) refactor java code to reduce casting and simplify logic
@cprice404
Copy link

For posterity: @rlinehan and I discussed all of the bullet points from my previous comment IRL yesterday and I think we are on the same page on all of them now.

@cprice404
Copy link

Reviewed the last few commits that @rlinehan added; I'm officially +1 now. @camlow325 if you get a chance to take a peek at this last slew of changes, I think we're ready to push the button!

finally

import com.puppetlabs.http.client.RequestOptions;
import com.puppetlabs.http.client.ResponseBodyType;
import com.codahale.metrics.MetricRegistry;
import com.puppetlabs.http.client.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to expand these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I blame @cprice404 and his incorrect intellij settings, but I can fix.

@camlow325
Copy link
Contributor

👍 I really like the recent changes which add in polymorphism / reduce conditional logic for the Filter and Timer classes.

@rlinehan - I can merge this as soon as you fix up the one nit with importing "metrics.*" from CategoryClientTimerMetricFilter.java.

@rlinehan
Copy link
Contributor Author

rlinehan commented May 6, 2016

@camlow325 I've fixed up the imports

@camlow325
Copy link
Contributor

👍 💯 🉑 🍰 💃 👯 📦 🐰 🐇 🐎 🍶 🎉 ✌️

@camlow325 camlow325 merged commit 70b5d6d into puppetlabs:master May 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants