Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JBPM-6740 - Expose readiness and liveness checks in KIE Server #1332

Merged
merged 1 commit into from
Jan 17, 2018

Conversation

mswiderski
Copy link
Contributor

@etirelli @ge0ffrey @tarilabs @sutaakar

here is the implementation that aims at providing basic readiness and liveness checks for KIE Server and all extensions activated.
In general:

  • readiness will either respond with 200 (OK) when it's actually ready or with 503 (Service Unavailable) when it's still booting/deploying containers/waiting for controller.
  • liveness (aka health check) will perform following:
    • will check readiness
    • will check for failed kie containers
    • will ask each active extension to health check itself
      response codes for health check are same as for readiness. Meaning that any error found will result in response 503, regardless if that is failed container, failed extension or not ready yet

Health check can be invoked in two modes:

  • basic - that will return status only 200 or 503
  • report - will respond with both status and report in response body that will provide info like below (body can be XML or JSON)

Now I did implement some checks in extensions but not sure if the make sense or more stuff should be added or not. So please take a moment and let me know what you would like to health check in extension you deal with.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<list-type>
    <items>
        <items xsi:type="kie-message" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <content>KIE Server 'managed-server' is ready to serve requests true</content>
            <content>Server is up for 0 days, 0 hours, 2 minutes, 31 seconds</content>
            <severity>INFO</severity>
            <timestamp>2018-01-12T10:40:34.375+01:00</timestamp>
        </items>
        <items xsi:type="kie-message" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <content>Drools is alive</content>
            <severity>INFO</severity>
            <timestamp>2018-01-12T10:40:34.375+01:00</timestamp>
        </items>
        <items xsi:type="kie-message" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <content>jBPM is alive</content>
            <severity>INFO</severity>
            <timestamp>2018-01-12T10:40:34.378+01:00</timestamp>
        </items>
        <items xsi:type="kie-message" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <content>Case-Mgmt is alive</content>
            <severity>INFO</severity>
            <timestamp>2018-01-12T10:40:34.378+01:00</timestamp>
        </items>
        <items xsi:type="kie-message" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <content>jBPM-UI is alive</content>
            <severity>INFO</severity>
            <timestamp>2018-01-12T10:40:34.378+01:00</timestamp>
        </items>
        <items xsi:type="kie-message" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <content>OptaPlanner is alive</content>
            <severity>INFO</severity>
            <timestamp>2018-01-12T10:40:34.378+01:00</timestamp>
        </items>
        <items xsi:type="kie-message" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <content>DMN is alive</content>
            <severity>INFO</severity>
            <timestamp>2018-01-12T10:40:34.378+01:00</timestamp>
        </items>
        <items xsi:type="kie-message" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <content>Swagger is alive</content>
            <severity>INFO</severity>
            <timestamp>2018-01-12T10:40:34.378+01:00</timestamp>
        </items>
        <items xsi:type="kie-message" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
            <content>Health check done in 3 ms</content>
            <severity>INFO</severity>
            <timestamp>2018-01-12T10:40:34.378+01:00</timestamp>
        </items>
    </items>
</list-type>


}

private String calculateUptime() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider using java.time.Duration for these calculations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually did and the problem is that it does not take into account already used parts of it, see example:

0 days, 0 hours, 30 minutes, 1800 seconds
0 days, 0 hours, 30 minutes, 0 seconds

first one is done with duration.toMinutes and duration.toMillis()/1000 as there seems to be missing toSeconds method.

Or did you have something else in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Was thinking about using some formatter to convert Duration to some string representation, but it seems that there isn't any out of the box.
Current approach seems ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use plain toString method on a duration, it will generate for example PT41M29.064S. It is ISO 8601 format. In fact, Duration's toString method does it manually with div and mod too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarianMacik correct, but it's way less readable in my opinion, especially for those that are not familiar with ISO date format.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also easy to use things DurationFormatUtils in commons-lang3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhrcek you rock! replaced with DurationFormatUtils, thanks!

Copy link
Contributor

@sutaakar sutaakar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks. One idea included in comment.

@mswiderski mswiderski force-pushed the JBPM-6740 branch 2 times, most recently from 2666487 to acdc678 Compare January 12, 2018 19:02
response=Void.class, code=200)
@ApiResponses(value = { @ApiResponse(code = 503, message = "Service not yet available") })
@GET
@Path("readiness")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would rename this to "readycheck" (same for method name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

response=Void.class, code=200)
@ApiResponses(value = { @ApiResponse(code = 503, message = "If any of the checks failed") })
@GET
@Path("health")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would rename this to "healthcheck" (same for method name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if (server.isKieServerReady()) {
return Response.status(Response.Status.OK).build();
}
return serviceUnavailable();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all "check/monitoring" methods should have always OK response status and show issues as messages to user in case of error. Might be better if users build monitoring apps against these checks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these endpoints are meant for machine reading rather than human - so the most important is to communicate clearly over status codes with minimal payload that's why by default they always return just status and no payload. When user wants to see what is not healthy then there is ?report=true query param that will return payload with messages regardless of the status code (both 200 and 503)

@mswiderski mswiderski merged commit dccaa60 into kiegroup:master Jan 17, 2018
@mswiderski mswiderski deleted the JBPM-6740 branch January 17, 2018 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants