Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Mongo health checks #16094

Merged
merged 1 commit into from Mar 31, 2021

Conversation

cescoffier
Copy link
Member

  • use parallel composition to avoid cumulative timeouts
  • allows the user to set the checked database

@cescoffier
Copy link
Member Author

Hey @loicmathieu, can you have a look? It's about #15945.

@cescoffier cescoffier linked an issue Mar 29, 2021 that may be closed by this pull request
private final Map<String, MongoClient> clients = new HashMap<>();
private final Map<String, ReactiveMongoClient> reactiveClients = new HashMap<>();

@ConfigProperty(name = "quarkus.mongodb.health.database", defaultValue = "admin")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an hidden property as you didn't define it inside the MongoClient configuration object.
I'd rather add this config item to the MongoClient configuration object and add the default value there, this will also allow to add a documentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, that's just a quick hack to be sure we agree on that.

I wanted to get a timeout too, but not sure which one to use? read timeout or heartbeat?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read timeout seems to be right here.

But timeout on healthcheck is not straightforward as their could be multiple health checks (mongo, hibernate, vault) on the same application so maybe it should be done globally on health check implementation not check by check.
And if using some orchestration tool to call the health check, it can have a timeout at its side.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple issues. Adding a reasonable timeout is a defensive approach to avoid thread starvation.

  • You must not keep the thread block for more than what you expect to be a failure (right now it's waiting indefinitely and you may end up in a situation where it's really really long - high latency, corrupted data... yes the client should handle that... or not)
  • While the infrastructure has a timeout, it's not the case everywhere - the dev UI has a different timeout (I need to check to be sure it has one actually)
  • I believe SmallRye Health Check has a global timeout (@xstefank can you confirm?) - however, a blocked thread is a blocked thread and even if the check reports the failure, you may have a blocked thread.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I pushed the config change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that adding a timeout at this level is needed to catch all possible issue.

@cescoffier cescoffier force-pushed the mongo-health-enhancement branch 2 times, most recently from dce8a4b to 9f3b198 Compare March 29, 2021 15:54
@cescoffier cescoffier marked this pull request as ready for review March 30, 2021 05:46
Copy link
Contributor

@loicmathieu loicmathieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, having a timeout at this level is OK, as you said, to avoid having blocking thread.

@@ -183,4 +183,11 @@
*/
@ConfigDocSection
public CredentialConfig credentials;

/**
* The database used durong the readiness health checks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* The database used durong the readiness health checks
* The database used during the readiness health checks

}
}
return builder.build();
}

private class MongoHealthConfig implements Supplier<Uni<Tuple2<String, String>>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really a config object but a check right ?
Anyway, naming is hard, we can live with this one ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right! Let me change that!

@geoand
Copy link
Contributor

geoand commented Mar 30, 2021

The test failures seem relevant

@cescoffier
Copy link
Member Author

@geoand yes, my bad, I fixed it.

@geoand
Copy link
Contributor

geoand commented Mar 30, 2021

Awesome, all green!

- use parallel composition to avoid cumulative timeouts
- allows the user to set the checked database
- only configure health checks if the health is on the classpath
@cescoffier cescoffier merged commit 65ab459 into quarkusio:main Mar 31, 2021
@quarkus-bot quarkus-bot bot added this to the 1.14 - main milestone Mar 31, 2021
@cescoffier cescoffier deleted the mongo-health-enhancement branch March 31, 2021 05:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MongoDB Health Check may be waiting forever
3 participants