Move metrics in lbcore #300

svkm102 · 2017-08-11T06:33:24Z

No description provided.

alechenninger

Thanks! Makes more sense with lightblue-platform/lightblue-core#808 review also.

alechenninger · 2017-08-21T20:21:02Z

crud/src/main/java/com/redhat/lightblue/rest/crud/AbstractCrudResource.java

+     * CDI injection.
+     */
+    private static final RequestMetrics metrics =
+            new DropwizardRequestMetrics(MetricRegistryFactory.getMetricRegistry());


This should probably move to the RestConfiguration class, where it can pass it to the LightblueFactory.

Also, I forgot that getMetricRegistry() also initializes the JMX reporter. We should probably make that clearer somehow. I'd go with simply getJmxMetricRegistry().

Changed getMetricRegistry() to getJmxMetricRegistry() for clarity.

alechenninger · 2017-08-21T20:24:58Z

crud/src/main/java/com/redhat/lightblue/rest/crud/AbstractCrudResource.java

@@ -111,7 +122,7 @@ public Response getSearchesForEntity(@PathParam("entity") String entity,
        }
        CallStatus st=new FindCommand(freq.getEntityVersion().getEntity(),
                                      freq.getEntityVersion().getVersion(),
-                                      freq.toJson().toString()).run();
+                                      freq.toJson().toString(), metrics).run();


I'm surprised we need to pass in the metrics here. Mediator should already know how to track this; it's now not the responsibility of the HTTP layer to track metrics.

Actually, I had an idea about this I just commented on the other PR. We have the option to start tracking in the REST layer (here), and then pass the context along.

Another alternative is to go back to doing most tracking in the REST layer (not passing the ctx along), but for bulk requests just pass along the RequestMetrics so it can track the individual requests itself. You'll still need the RequestMetrics interface in core to accomplish that.

I would prefer starting metric tracking at REST layer and then passing along the ctx to mediator for all requests (including bulk). One benefit that was seen by tracking metrics at mediator level is that we get more insights on errors which occur at mediator level. If we track everything at REST level, it just marks mediator errors as completed requests, instead of error requests.

So RequestMetrics will get passed on along with each request to commands, commands will initialize the context and then pass ctx to mediator for further tracking at mediator level (error tracking and marking request as completed).

I was thinking of marking request completion at REST level, but then that would not be able to handle Bulk requests. So request ending should have to be handled at mediator level itself.

If we track everything at REST level, it just marks mediator errors as completed requests, instead of error requests.

We can use the errors in the Response object, and add all of those to the context. This is same error resolution as what was being done in mediator, so it's no different, just happens at a different time+place which doesn't matter for error counters.

I was thinking of marking request completion at REST level, but then that would not be able to handle Bulk requests. So request ending should have to be handled at mediator level itself.

Why not? The bulk request code would then do the job of tracking the individual requests (start, end, + errors). Around the bulk request itself would be the REST layer.

I wrote that comment before I made the changes to track errors via response object. With response object based tracking, the Bulk requests can be handled easily as well as mediator errors.

@svkm102 Can you do a find/replace on version number to substitute underscore ( _ ) for the dot ( . ) in the version? Graphite treats . specially so it will throw things off if we pass that along. So something like metrics:name=api.find.test.1.0.0.requests.active
would become
metrics:name=api.find.test.1_0_0.requests.active

Note that I didn't mean to change it in the command calls, just wherever you are defining the name of the metric being captured

Updated the version handling to replace "." with "_" in lightblue-core. That should take care of versions in all commands.

alechenninger · 2017-08-21T20:25:34Z

crud/src/main/java/com/redhat/lightblue/rest/crud/cmd/AbstractLockCommand.java

@@ -86,6 +90,7 @@ public static AbstractLockCommand getLockCommand(String request) {

    @Override
    public CallStatus run() {
+        RequestMetrics.Context context = metrics.startLockRequest(getLockCommandName(), domain);


Does this have to be done at the HTTP layer?

I understand if locking is special, just wondering.

I don't think we have lock implementation at crud level. Everything lock related seems to be at rest level only. Same is the case with GenerateCommand. So this is the only place their metrics can be tracked.

Makes sense

alechenninger · 2017-08-21T20:26:16Z

crud/src/main/java/com/redhat/lightblue/rest/crud/cmd/FindCommand.java

+        // only when stream is fully written.
+        if(stream) {
+            context = metrics.startStreamingEntityRequest("find", entity, version);
+        }


We don't need to pollute the HTTP layer with this here, we can still track it in the Mediator. We just have to listen for when the streaming results are depleted.

Moved Streaming request tracking to mediator.

alechenninger · 2017-08-21T20:29:35Z

crud/src/main/java/com/redhat/lightblue/rest/crud/cmd/AbstractRestCommand.java

@@ -65,13 +66,13 @@ public AbstractRestCommand() {
     * @return
     * @throws Exception
     */
-    protected Mediator getMediator() {
+    protected Mediator getMediator(RequestMetrics metrics) {


I suggest not passing RequestMetrics to getMediator like this. It's a bit odd and makes the code a bit messy. See comments in lightblue-platform/lightblue-core#808.

alechenninger

Just about there! Thanks!

alechenninger · 2017-08-22T11:59:51Z

crud/src/main/java/com/redhat/lightblue/rest/crud/AbstractCrudResource.java

@@ -111,7 +122,7 @@ public Response getSearchesForEntity(@PathParam("entity") String entity,
        }
        CallStatus st=new FindCommand(freq.getEntityVersion().getEntity(),
                                      freq.getEntityVersion().getVersion(),
-                                      freq.toJson().toString()).run();
+                                      freq.toJson().toString(), metrics).run();


If we track everything at REST level, it just marks mediator errors as completed requests, instead of error requests.

We can use the errors in the Response object, and add all of those to the context. This is same error resolution as what was being done in mediator, so it's no different, just happens at a different time+place which doesn't matter for error counters.

I was thinking of marking request completion at REST level, but then that would not be able to handle Bulk requests. So request ending should have to be handled at mediator level itself.

Why not? The bulk request code would then do the job of tracking the individual requests (start, end, + errors). Around the bulk request itself would be the REST layer.

alechenninger · 2017-08-22T12:00:27Z

crud/src/main/java/com/redhat/lightblue/rest/crud/cmd/AbstractLockCommand.java

@@ -86,6 +90,7 @@ public static AbstractLockCommand getLockCommand(String request) {

    @Override
    public CallStatus run() {
+        RequestMetrics.Context context = metrics.startLockRequest(getLockCommandName(), domain);


Makes sense

alechenninger · 2017-08-22T16:22:01Z

crud/src/main/java/com/redhat/lightblue/rest/crud/cmd/FindCommand.java

+                    if (context != null) {  
+                        context.endRequestMonitoring();
+                    }
+                    // TODO: What if there is IOException?


Still need to mark any errors that occur here I think. Otherwise I suspect they'll bubble up to RESTEasy and get lost.

Added catch block to mark request exception.

alechenninger · 2017-08-22T16:40:04Z

crud/src/main/java/com/redhat/lightblue/rest/crud/cmd/FindCommand.java

                return new CallStatus(new Response());
            } else {
-                return new CallStatus(getMediator().find(ireq));
+                return new CallStatus(getMediator().find(ireq, metricCtx));
            }
        } catch (Error e) {


I think we also need to mark errors for conditions above:

try { ireq = getJsonTranslator().parse(FindRequest.class, JsonUtils.json(request)); } catch (Exception e) { LOGGER.error("find:parse failure: {}", e); return new CallStatus(Error.get(RestCrudConstants.ERR_REST_FIND, "Error during the parse of the request")); } LOGGER.debug("Find request:{}", ireq); try { validateReq(ireq, entity, version); } catch (Exception e) { LOGGER.error("find:validate failure: {}", e); return new CallStatus(Error.get(RestCrudConstants.ERR_REST_FIND, "Request is not valid")); }

Similar errors can probably occur in other commands, too.

Added error conditions in all missing places in all commands.

alechenninger

Few small fixes, I think one more update and we should be good.

alechenninger · 2017-08-23T15:28:45Z

crud/src/main/java/com/redhat/lightblue/rest/crud/cmd/FindCommand.java

@@ -155,14 +163,26 @@ public CallStatus run() {
                streamResponse=getMediator().findAndStream(ireq);
                return new CallStatus(new Response());
            } else {
-                return new CallStatus(getMediator().find(ireq));
+                metricCtx.endRequestMonitoring();
+                r = getMediator().find(ireq);


I think you'll want to remove metricCtx.endRequestMonitoring(); here since it's in the finally block (also it's in the wrong order here).

Copy/Paste at wrong location.Thanks for catching it :). Fixed.

alechenninger · 2017-08-24T01:25:51Z

crud/src/main/java/com/redhat/lightblue/rest/crud/cmd/FindCommand.java

                LOGGER.error("find:parse failure: {}", e);
                return new CallStatus(Error.get(RestCrudConstants.ERR_REST_FIND, "Error during the parse of the request"));
            }
            LOGGER.debug("Find request:{}", ireq);
            try {
                validateReq(ireq, entity, version);
            } catch (Exception e) {
+                metricCtx.markRequestException(e);


For these cases we might want to mark with the exception instance returned from Error.get(...) to be consistent with other errors. That said, it looks like we would lose the context of what the specific exception is if we did that, because lightblue's Error exceptions will just be logged simply as "Error". Unfortunately this is already the case for all of the other Errors in the Response. Not very specific!

I have an idea to fix that though. Will comment about that in core.

Oh nevermind, looks like you already added improvements around the lightblue Error objects :-). Nice! So yeah, using the Error.get value to log the errors here like the others should be more consistent.

Replaced all instances of metricCtx.markRequestException(e) with metricCtx.markRequestException(e, e.getErrorCode()) (for lightblue specific errors) and metricCtx.markRequestException(e, e.getMessage()) to get error details in metric.

alechenninger · 2017-08-24T01:33:09Z

crud/src/main/java/com/redhat/lightblue/rest/crud/cmd/FindCommand.java

                }
            }
        };
    }

    @Override
    public CallStatus run() {
+    	metricCtx = metrics.startStreamingEntityRequest("find", entity, version);


We only want to start the streaming version if stream is true. Otherwise, use the regular version.

Also, slight nit pick, there is a mix of tabs and spaces in some of the changes it looks like

I had it originally but looks like it got removed during refractoring. Thanks for pointing it out. Fixed.

alechenninger · 2017-08-24T14:56:45Z

crud/src/main/java/com/redhat/lightblue/rest/crud/cmd/AbstractLockCommand.java

            LOGGER.error("failure: {}", e);
            return new CallStatus(e);
        } catch (Exception e) {
+            metricCtx.markRequestException(e, e.getMessage());
            LOGGER.error("failure: {}", e);
            return new CallStatus(Error.get(RestCrudConstants.ERR_REST_ERROR, e.toString()));


So I think this case will still benefit from using the Error instance so it's consistent with the response like other Errors that we handle, like so:

LOGGER.error("failure: {}", e); Error error = Error.get(RestCrudConstants.ERR_REST_ERROR, e.toString()); metricCtx.markRequestException(error); return new CallStatus(error);

This implies an overload like:

@Override public void markRequestException(Error e) { markRequestException(e, e.getErrorCode()); }

Which I think is a good idea anyway. We want to encapsulate as much as possible, not rely on clients to know how to map an Error to marking exceptions.

Would use that for above catch too.

I misunderstood your earlier comment regarding the same.Thanks for clarification and example. Update it as suggested.

alechenninger · 2017-08-24T15:00:53Z

crud/src/main/java/com/redhat/lightblue/rest/crud/cmd/AbstractLockCommand.java

        }
    }

    protected abstract JsonNode runLockCommand(Locking locking);
+
+    public abstract String getLockCommandName();


Would probably keep this protected if possible FWIW

Changed visibility.

alechenninger · 2017-08-24T15:02:35Z

crud/src/main/java/com/redhat/lightblue/rest/crud/cmd/BulkRequestCommand.java

@@ -50,6 +53,7 @@ public CallStatus run() {
            try {
                req = getJsonTranslator().parse(BulkRequest.class, JsonUtils.json(request));
            } catch (Exception e) {
+                metricCtx.markRequestException(e, e.getMessage());
                LOGGER.error("bulk:parse failure: {}", e);
                return new CallStatus(Error.get(RestCrudConstants.ERR_REST_ERROR, "Error parsing request"));


Again for these I would use the Error type returned from Error.get(RestCrudConstants.ERR_REST_ERROR, "Error parsing request").

alechenninger

Missed the saved search stuff before. Thanks!

alechenninger · 2017-08-25T15:25:58Z

crud/src/main/java/com/redhat/lightblue/rest/crud/cmd/RunSavedSearchCommand.java

    }

    @Override
    public CallStatus run() {
+        RequestMetrics.Context savedSearchMetricCtx = metrics.startEntityRequest("savedsearch", entity, version);


Can we use the saved search name in the metrics for this actually? That will tell us exactly how certain queries are performing. Because this has an additional parameter, I'd probably track this as startSavedSearchRequest.

Added startSavedSearchRequest to track monitoring of savedSearch with searchName as an added parameter.

alechenninger · 2017-08-25T15:29:11Z

crud/src/main/java/com/redhat/lightblue/rest/crud/cmd/RunSavedSearchCommand.java

    }

    @Override
    public CallStatus run() {
+        RequestMetrics.Context savedSearchMetricCtx = metrics.startEntityRequest("savedsearch", entity, version);
+        RequestMetrics.Context findMetricCtx = null;


I'm not sure we need to track both saved search and find for a saved search. The saved search metric count as a find, so you could argue that we can combine this with graphite queries if we want. Then, we don't have to worry about tracking two things in the code. This also technically allows us to do things like, compare non-saved searches with saved searches. If we put saved searches in finds, too, then it is sort of lossy.

Removed redundant find monitoring.

alechenninger · 2017-08-28T13:29:31Z

crud/src/main/java/com/redhat/lightblue/rest/crud/cmd/RunSavedSearchCommand.java

    }

    @Override
    public CallStatus run() {
+        RequestMetrics.Context metricCtx = metrics.startSavedSearchRequest("savedsearch", searchName, entity, version);


I think passing the "savedsearch" argument is not needed now (we know it is a saved search already).

Forgot to remove the extra param. Thanks for catching this one. Fixed.

alechenninger · 2017-08-28T18:04:59Z

crud/src/main/java/com/redhat/lightblue/rest/crud/cmd/FindCommand.java

-            return new CallStatus(Error.get(RestCrudConstants.ERR_REST_FIND, e.toString()));
+            return new CallStatus(error);
+        } finally {
+            if (!stream) {


There are failure cases where we are streaming, but the context is not ended. We should unit test these cases to make sure they are caught and handled. For example, the catch block at line 150: a failure there with a streaming request will never close the context.

I think if you make line 185 if (streamResponse == null) { instead of if (!stream) { it will fix these cases, but I'd highly recommend having unit tests to make sure these branches are all handled as it's too easy to miss one.

Using streamresponse null check does handle both streaming and non streaming cases more efficiently. Changed the test condition accordingly and added test cases in FindCommandTest to test both streaming and non streaming cases with metrics handling to cover all cases.

alechenninger · 2017-08-28T18:53:37Z

Did a final once over, all I can see is the issue with errors during streaming find requests as mentioned. Otherwise looks good!

Conflicts: crud/src/test/java/com/redhat/lightblue/rest/crud/cmd/FindCommandTest.java

alechenninger

Nice tests you added. Thank you so much for your patience, I think this is ready to try out in dev. Thanks!

(Note there is one change I requested to core, but this PR is good to go when that is).

derek63 · 2017-08-31T16:45:40Z

@svkm102 I merged your core PR, but this one has some failing tests. Can you have a look when you have the chance?

Failed tests:   runStreamFindWithParseProblemAndMetrics(com.redhat.lightblue.rest.crud.cmd.FindCommandTest): expected:<1> but was:<0>
  runFindWithInvalidAndMetrics(com.redhat.lightblue.rest.crud.cmd.FindCommandTest): expected:<1> but was:<0>
  runStreamFindWithInvalidAndMetrics(com.redhat.lightblue.rest.crud.cmd.FindCommandTest): expected:<1> but was:<0>
  runFindWithParseProblemAndMetrics(com.redhat.lightblue.rest.crud.cmd.FindCommandTest): expected:<1> but was:<0>

metric-refractoring-changes

7487c18

svkm102 mentioned this pull request Aug 11, 2017

Expose JVM and REST endpoints metrics to JMX using Command objects #297

Closed

Sunny Mourya added 3 commits August 11, 2017 16:48

move metric tracking to mediator for supported operations

f02095a

removed isBulkRequest parameter

aa8641c

handle streaming requests

cfc8895

svkm102 changed the title ~~WIP : Move metrics in lbcore~~ Move metrics in lbcore Aug 16, 2017

svkm102 added the ready for review label Aug 16, 2017

svkm102 requested review from derek63 and alechenninger August 16, 2017 10:05

Sunny Mourya added 3 commits August 17, 2017 10:41

minor refractoring

b12872e

review comments

c0941c2

added metrics as constructor param to mediator

2304c59

svkm102 mentioned this pull request Aug 18, 2017

Move metrics in lbcore lightblue-platform/lightblue-core#808

Merged

alechenninger requested changes Aug 21, 2017

View reviewed changes

review comments

6fd935f

alechenninger requested changes Aug 22, 2017

View reviewed changes

svkm102 force-pushed the metrics-refractoring branch from 267f372 to 865294a Compare August 23, 2017 14:51

review comments

24ad3c7

svkm102 force-pushed the metrics-refractoring branch from 865294a to 24ad3c7 Compare August 23, 2017 14:55

alechenninger requested changes Aug 24, 2017

View reviewed changes

Sunny Mourya added 2 commits August 24, 2017 12:56

minor refractoring

9395916

minor refractoring

b6baf85

alechenninger requested changes Aug 24, 2017

View reviewed changes

Sunny Mourya added 2 commits August 25, 2017 12:38

review comments

ab9c4ed

formatting changes

60df787

alechenninger requested changes Aug 25, 2017

View reviewed changes

monitor savedsearch using it's own method

128975c

alechenninger reviewed Aug 28, 2017

View reviewed changes

changed param list of startSavedSearchRequest()

5bea4d2

alechenninger reviewed Aug 28, 2017

View reviewed changes

Sunny Mourya added 3 commits August 29, 2017 11:44

Merge branch 'master' into metrics-refractoring

a3cad9d

Conflicts: crud/src/test/java/com/redhat/lightblue/rest/crud/cmd/FindCommandTest.java

updated test cases to handle streaming find

e1ee3d6

added new test case for find

a008ee1

alechenninger approved these changes Aug 29, 2017

View reviewed changes

updated test case

7cc9176

updated test case

33d042f

svkm102 merged commit 753b819 into lightblue-platform:master Sep 1, 2017

svkm102 deleted the metrics-refractoring branch November 7, 2017 06:15

Move metrics in lbcore #300

Move metrics in lbcore #300

Conversation

svkm102 commented Aug 11, 2017

alechenninger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alechenninger Aug 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alechenninger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alechenninger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alechenninger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alechenninger commented Aug 28, 2017

alechenninger left a comment

Choose a reason for hiding this comment

derek63 commented Aug 31, 2017

alechenninger Aug 21, 2017 •

edited

Loading