New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Mongo health checks #16094
Improve Mongo health checks #16094
Conversation
cescoffier
commented
Mar 29, 2021
- use parallel composition to avoid cumulative timeouts
- allows the user to set the checked database
Hey @loicmathieu, can you have a look? It's about #15945. |
private final Map<String, MongoClient> clients = new HashMap<>(); | ||
private final Map<String, ReactiveMongoClient> reactiveClients = new HashMap<>(); | ||
|
||
@ConfigProperty(name = "quarkus.mongodb.health.database", defaultValue = "admin") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an hidden property as you didn't define it inside the MongoClient configuration object.
I'd rather add this config item to the MongoClient configuration object and add the default value there, this will also allow to add a documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, that's just a quick hack to be sure we agree on that.
I wanted to get a timeout too, but not sure which one to use? read timeout or heartbeat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read timeout seems to be right here.
But timeout on healthcheck is not straightforward as their could be multiple health checks (mongo, hibernate, vault) on the same application so maybe it should be done globally on health check implementation not check by check.
And if using some orchestration tool to call the health check, it can have a timeout at its side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple issues. Adding a reasonable timeout is a defensive approach to avoid thread starvation.
- You must not keep the thread block for more than what you expect to be a failure (right now it's waiting indefinitely and you may end up in a situation where it's really really long - high latency, corrupted data... yes the client should handle that... or not)
- While the infrastructure has a timeout, it's not the case everywhere - the dev UI has a different timeout (I need to check to be sure it has one actually)
- I believe SmallRye Health Check has a global timeout (@xstefank can you confirm?) - however, a blocked thread is a blocked thread and even if the check reports the failure, you may have a blocked thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I pushed the config change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that adding a timeout at this level is needed to catch all possible issue.
dce8a4b
to
9f3b198
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, having a timeout at this level is OK, as you said, to avoid having blocking thread.
@@ -183,4 +183,11 @@ | |||
*/ | |||
@ConfigDocSection | |||
public CredentialConfig credentials; | |||
|
|||
/** | |||
* The database used durong the readiness health checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* The database used durong the readiness health checks | |
* The database used during the readiness health checks |
} | ||
} | ||
return builder.build(); | ||
} | ||
|
||
private class MongoHealthConfig implements Supplier<Uni<Tuple2<String, String>>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really a config object but a check right ?
Anyway, naming is hard, we can live with this one ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right! Let me change that!
The test failures seem relevant |
9f3b198
to
82863e7
Compare
@geoand yes, my bad, I fixed it. |
Awesome, all green! |
- use parallel composition to avoid cumulative timeouts - allows the user to set the checked database - only configure health checks if the health is on the classpath
82863e7
to
72fd868
Compare