-
Notifications
You must be signed in to change notification settings - Fork 29
(TK-316) Improved client metrics support #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(TK-316) Improved client metrics support #52
Conversation
This commit adds metrics support to the http client (clojure and java, sync and async). A metric registry can optionally be passed into the client as a client option on creation. If a metric registry is present, timers will be added to time each request. By default, a timer is added for the URL (stripped of username, password, query string, and path fragments) and the URL plus the method used for the request. In addition, a request can include a `metric-id` option, which takes a tuple of metric ids. If this request option is specified, a timer will be created for each element of the metric id tuple - thus if the tuple is [:foo :bar :baz] there will be a foo timer, a foo.bar timer, and a foo.bar.baz timer. In addition, each timer has a "MetricType" - currently there is only one metric type, bytes-read, which is stopped when the full response has been read. In the future, we may add "response-init" timers that get stopped when the first byte of the response has been read. This commit also adds a `get-client-metrics`/`.getClientMetrics` function that takes a client instance and returns the http client-specific metrics from the metric registry and a `get-client-metrics-data`/`.getClientMetricsData` function for clojure and java sync and async clients to get out metrics data from the client. This function takes a client instance and returns a map of metric name to a map of metric data (for clojure) or a ClientMetricData object (for java), both of which include the mean, count, and aggregate for the timer These `get-client-metrics*`/`.getClientMetrics*` functions also have versions that take a url, url and method, or metric id to allow for filtering of the timers/metrics data returned by these functions. The clojure versions of these functions take a metric filter map. There are also metric filter builder functions to build up the type of metric filter desired from a url, a url and method, or a metric id. These will prevent users from having to know the specifics of how to build a metric themselves; instead they can use a convenience function. An empty metric id can be passed in to the filter to return all metric-id timers.
This commit does several things to improve the metrics API: * Move get-client-metrics(-data) functions off of client: Previously, the `get-client-metrics(-data)`/`.getClientMetrics(Data)` functions were on the client. This didn't entirely make sense because if two clients were using the same metric registry, these functions would actually return metrics data for *all* clients, rather than just for the client the function was called on. This commit removes these functions and changes the tests to use the metrics namepsace/Metrics class versions instead, which take a metric registry return all http-client related metrics on that metric registry. * Add a `with-url-and-method` namespace: Move the timers that have both the url and the method from the `with-url` namespace to a new `with-url-and-method` namepsace. This better matches the data structure are returned in and the filtering mechanisms for getting them back out (discussed below), and also removes a slight chance of having a conflict with a url timer. * Add a ClientTimer class: Add a ClientTimer class that wraps the Timer class and also holds onto the url, method, or metric id used to create the timer. Use this to add url, method, metricId to ClientMetricData. * Change the `getClientMetrics(Data)` methods to return a map of metric category to an array of timers or timer data: Previously, the methods for getting http client metrics/metric data out of a metric registry returned a map of metric name to timer instance or metric data. However, the metric name was most likely not useful to users, who probably just want to iterate through the timers/data. This commit makes the output of these functions more useful by returning arrays of timers/data sorted into a map indexed by metric category (url, url and method, or metric id). * Add `getClientMetricsBy*` methods: Add `getClientMetrics(Data)ByUrl`, `getClientMetrics(Data)ByUrlAndMethod`, and `getClientMetrics(Data)ByMetricId` methods (and clojure versions) that allow for filtering by a specific metric category and return an array of timers or timer data that match the url, url and method, or metric id specified. * Remove the filter-builder functions: Previously, the `get-client-metrics(-data)` functions did filtering by taking a filter map, and there were filter-builder functions to build these filter maps. Now that there are separate filtering methods, these filter maps are no longer used and the filter-builder functions are removed.
I think this addresses everything @cprice404 and I were discussing on #51. I decided to just return arrays of timers/metric data, rather than having it indexed by metric tuple/url. It seems like most of the time users will want to iterate over all of these. Furthermore, I ran into difficulties trying to do this for the case where all the different categories are being returned, because then some of the keys would be arrays and some would be strings, which doesn't work. In addition, it didn't seem like indexing by metric id would be that useful even when only metric id timers were being returned, since much of the time users will probably want to dump this data into json for a status endpoint, and then they will need to serialize the metric ids into a string, since json doesn't allow arrays as keys. Thus, if we were to do this, we would be doing extra work that might be for nothing and cause users to also do more work. There are a few things that I think still need to be done, the main thing being renaming. I talked with @nfagerlund about how things are named in this PR, and he felt that the one thing that should be changed are the names of the metric type, which are currently "bytes-read" and "response-init." He recommended going with names that show how they are semantically related - for example "full-response" and "initial-response" or "response-complete" and "response-init." I like either of these options, with maybe a slight preference to "full-response" and "initial-response." I have two other questions that I think affect the API:
In addition, I would like to do some refactoring of some of the tests, and there are a couple places where the implementation could probably do with some refactoring, but I wanted to get a PR up and make sure the API was agreed on, and then these changes could even be made after this PR is been merged. |
ping @camlow325 for review |
|
||
private static final Logger LOGGER = LoggerFactory.getLogger(Metrics.class); | ||
|
||
synchronized private static ClientTimer getOrAddTimer(MetricRegistry metricRegistry, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this is very non-performant. This is implemented upstream in MetricRegistry with https://github.com/dropwizard/metrics/blob/3.1-maintenance/metrics-core/src/main/java/com/codahale/metrics/MetricRegistry.java#L311-L326 (which is unfortunately private), and while it's rather gross it's what we've basically been using already, and in discussions offline @cprice404 and I were thinking that consequently maybe trying to register and then catching the IllegalArgumentException is an okay way do to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go with what the getOrAdd
implementation does more closely, not sure this would need to be marked as synchronized
. Seems like their implementation - which uses a ConcurrentMap
under the hood for synchronization - would allow for a metric of the same type that has been registered to be returned or a new one added without needing the extra level of locking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that was my thinking - we just go with a version of their getOrAdd
implementation and remove the synchronized
here. If you're cool with that then I'll change this method to follow the logic in their implementation. The way I currently have it implemented was really just meant to be a first pass to get it working, and I wanted to make sure that others were okay with the general direction this whole thing was making before making the logic more complicated.
Either of the two alternatives sound better to me as well than what we currently have. If you like "full-response" and "initial-response" best, I'm good with that. |
If I were implementing this from scratch, I probably would have gone with throwing an exception over returning |
Even though we only support 1 type at this point, I like the idea of having it possible up front to filter for only the id. Might be annoying for clients that just are interested in the data for the 1 type to be using a function that suddenly starts returning a bunch of extra data that they don't want. For those clients, using the form of the API scoped to a specific type upfront would seem like a good safeguard. As to whether to have the type be a named parameter vs. a parameter in an options map, good question. Seems like it's going to be common enough for clients to want to filter for a specific metric that having it be a named parameter initially might be the best way to go. So I suppose overall, I'd lean toward going with what you've already done on the PR for this. |
|
||
public boolean matches(String s, Metric metric) { | ||
if ( metric instanceof ClientTimer ){ | ||
return isMatch((ClientTimer) metric, url, method, metricId, metricType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably don't need to pass in the url
, method
, metricId
, and metricType
since the method being called isn't static and so should have access to these as member variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh right! I forgot I'm not in functional programming land.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that even when we add another type, we would continue having the methods that don't include the specific type as a parameter return only the |
Metric timer = metricRegistry.getMetrics().get(name); | ||
|
||
if ( timer == null ) { | ||
return metricRegistry.register(name, newTimer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do something more like what the MetricRegistry.getOrAdd
call does that you mentioned? They have a try/catch block around this part that would allow for the metric to still be returned rather than throwing an IllegalArgumentException
if a metric with the same name and class type happens to be registered right around when this is called. Not sure exactly why we'd want to deviate from their implementation for what we're doing?
I can't remember if we discussed this previously or not but did we decide that we wanted to have the timer objects returned back through the Clojure |
The |
I guess it would be more intuitive to me as an API consumer to have the version that you call with no argument for filtering just return everything it knows about - rather than designating a separate ALL option later for that purpose.
Yeah, good point. A bit hard to anticipate that at this point. I suppose I'd be okay with moving it into an options map in anticipation of that if it makes sense to you. |
Yeah, that's what I'm seeing. I was just questioning if that's what we want. I guess there's a lot of stuff inherited from a |
When I was first discussing this API with @cprice404 we settled on having two methods/APIs - one that returned a map of some of the most useful data (same as what the tk-comidi-metrics returns https://github.com/puppetlabs/trapperkeeper-comidi-metrics/blob/master/src/puppetlabs/metrics/http.clj#L31) and one that returned the Timer instance, which users could then use as they needed. |
Fair enough. Thanks for the explanation. |
Done reviewing this for now. Overall, I'm still 👍 on the direction. I suppose the biggest question about the shape of the API is whether returning lists of flat metric maps for the I'm curious to hear what you think about the As for remaining naming questions and how to pass in the metric type parameter on the I really like the general shape of the Looking good to me! Can't wait to get this stuff up into Puppet Server! |
We are subclassing `Timer` to create our own `ClientTimer` instances. Unfortunately, the method we would like to use to register these on the MetricRegistry, `getOrAdd()`, is private. Instead, we have to have our own `getOrAddTimer()` to handle getting the timer if it has already been registered, or registering a new one. This commit updates our implementation of `getOrAddTimer()` to match the logic of `getOrAdd()`.
Rename the Metric Types from `bytes-read`/`init-repsonse` to `full-response`/`initial-response` (note that the `initial-response` timer has not yet been implemented, but when it is this will be its name).
* Don't pass args into `isMatch()` in `ClientMetricFilter`, since this method isn't static and so has access to the member variables. Relevant gif: https://aphyr.com/data/posts/317/state.gif. * Remove unused `metricTypeString` method
@camlow325 I've pushed up several commits that update the I'll leave the other questions about |
👍 to the latest changes. Sounds like we're waiting for feedback from @cprice404 at this point on some of the remaining API design questions. |
Attempt to roll up some responses to all of the non-threaded comments:
Unrelated to any of the previous comments:
I think that's all from my end! |
Rather than having e.g a nil url and method in the metric data when metric-id is filtered on, or a nil metric id if url and method are filtered on, use specific schemas for metrics data returned for each metric category - metric id, url, and url and method.
Previously, all the get-client-metrics* functions had two arities - one with a metric type and one without. This commit removes the arity with a metric type from all these functions since currently we only have one metric type and since we aren't entirely certain of what we want the API to be when we add additional metric types. When we add more metric types, we will change the API in a backward compatible way.
…rics* fns Previously, if any of the `get-client-metrics*` methods (in Clojure or Java) had a nil metric registry passed into them, they would return nil. This commit updates the behavior of all of these methods to instead return an error - a schema error (clojure.lang.ExceptionInfo) for the clojure functions, and an IllegalArgumentException for the java methods.
(TK-316) Move Java classes to separate metrics namespaces
When constructing url timers for request metrics, we strip off any username, password, query params, and path fragments on the request uri. This commit provides a public method to do this conversion of a url, so that users can take the url they used for their request and easily figure out the url we used for the metric.
…ING_ON_INSTANCEOF_REFACTOR (TK-316) refactor java code to reduce casting and simplify logic
For posterity: @rlinehan and I discussed all of the bullet points from my previous comment IRL yesterday and I think we are on the same page on all of them now. |
Reviewed the last few commits that @rlinehan added; I'm officially +1 now. @camlow325 if you get a chance to take a peek at this last slew of changes, I think we're ready to push the button! |
import com.puppetlabs.http.client.RequestOptions; | ||
import com.puppetlabs.http.client.ResponseBodyType; | ||
import com.codahale.metrics.MetricRegistry; | ||
import com.puppetlabs.http.client.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to expand these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I blame @cprice404 and his incorrect intellij settings, but I can fix.
👍 I really like the recent changes which add in polymorphism / reduce conditional logic for the Filter and Timer classes. @rlinehan - I can merge this as soon as you fix up the one nit with importing "metrics.*" from |
@camlow325 I've fixed up the imports |
👍 💯 🉑 🍰 💃 👯 📦 🐰 🐇 🐎 🍶 🎉 ✌️ |
This PR improves on #51 and hopefully addresses all feedback. In particular, it improves the metrics API by:
get-client-metrics(-data)
functions off of the clientwith-url-and-method
namespaceClientTimer
class that wrapsTimer
and holds onto the metric name, and url, method, or metric id used to create the timergetClientMetrics(Data)
methods to return a map of metric category to an array of timers or timer datagetClientMetricsBy*
methods to filter by a url, url and method, or metric id