-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend repository_integrity
health indicator for unknown and invalid repos
#104614
Conversation
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @nielsbauman, I've created a changelog YAML for you. |
server/src/main/java/org/elasticsearch/health/node/LocalHealthMonitor.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/health/node/LocalHealthMonitor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @nielsbauman for picking this up. This is a great start, I focused my comments at 2 points that I found a bit more difficult to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @nielsbauman . I haven't finished the review but because you have been waiting long, I send this initial comments and I will continue tomorrow.
server/src/main/java/org/elasticsearch/health/node/HealthInfo.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/health/node/RepositoriesHealthInfo.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/health/node/UpdateHealthInfoCacheAction.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/health/node/UpdateHealthInfoCacheAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/health/node/UpdateHealthInfoCacheAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/health/node/UpdateHealthInfoCacheAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/health/node/UpdateHealthInfoCacheAction.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/health/node/UpdateHealthInfoCacheAction.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/snapshots/RepositoryIntegrityHealthIndicatorService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/snapshots/RepositoryIntegrityHealthIndicatorService.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @nielsbauman great progress, I think the structure is already in place and now we are moving towards the small changes.
I moved further with the review; however I would like to ask you if you can revert the conversion of assertThat
to assertEquals
, since the deprecation is fixed and the code is already there it will greatly help the review process. Changing these assertions introduces extra noise in the PR and it's harder to find the actual changes.
Thank you for the great work!
In the new tests you write you can use the assertion you prefer, no problem there.
docs/reference/troubleshooting/snapshot/add-repository.asciidoc
Outdated
Show resolved
Hide resolved
docs/reference/troubleshooting/snapshot/add-repository.asciidoc
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/health/node/HealthInfoCache.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/health/node/LocalHealthMonitor.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/health/node/LocalHealthMonitor.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/health/node/tracker/RepositoriesHealthTrackerTests.java
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/health/node/FetchHealthInfoCacheActionTests.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/health/node/FetchHealthInfoCacheActionTests.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/health/node/FetchHealthInfoCacheActionTests.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/snapshots/RepositoryIntegrityHealthIndicatorService.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/snapshots/RepositoryIntegrityHealthIndicatorService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/snapshots/RepositoryIntegrityHealthIndicatorService.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/snapshots/RepositoryIntegrityHealthIndicatorService.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, I think this is the last round :)
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, assuming the CI agrees too :) . Congrats on completing an important and complex feature @nielsbauman !!!
This page was split up in elastic#104614 but the `ReferenceDocs` symbol links to the top-level page still rather than the correct subpage. This fixes the link.
Since elastic#104614 the top-level repo troubleshooting page is just a short paragraph which talks about "this page" but in fact refers to information spread across a number of subsequent pages. It's not obvious to the reader that they need to use the navigation menu to get to the information they seek. Moreover we link to this page from an exception message today so there's a reasonable chance that users will find it when trying to troubleshoot a genuine problem. This commit rewords things slightly and adds links to the subsequent pages to the body of the page to avoid this confusion.
Since #104614 the top-level repo troubleshooting page is just a short paragraph which talks about "this page" but in fact refers to information spread across a number of subsequent pages. It's not obvious to the reader that they need to use the navigation menu to get to the information they seek. Moreover we link to this page from an exception message today so there's a reasonable chance that users will find it when trying to troubleshoot a genuine problem. This commit rewords things slightly and adds links to the subsequent pages to the body of the page to avoid this confusion.
This page was split up in elastic#104614 but the `ReferenceDocs` symbol links to the top-level page still rather than the correct subpage. This fixes the link.
Since elastic#104614 the top-level repo troubleshooting page is just a short paragraph which talks about "this page" but in fact refers to information spread across a number of subsequent pages. It's not obvious to the reader that they need to use the navigation menu to get to the information they seek. Moreover we link to this page from an exception message today so there's a reasonable chance that users will find it when trying to troubleshoot a genuine problem. This commit rewords things slightly and adds links to the subsequent pages to the body of the page to avoid this confusion.
Since elastic#104614 the top-level repo troubleshooting page is just a short paragraph which talks about "this page" but in fact refers to information spread across a number of subsequent pages. It's not obvious to the reader that they need to use the navigation menu to get to the information they seek. Moreover we link to this page from an exception message today so there's a reasonable chance that users will find it when trying to troubleshoot a genuine problem. This commit rewords things slightly and adds links to the subsequent pages to the body of the page to avoid this confusion.
Since #104614 the top-level repo troubleshooting page is just a short paragraph which talks about "this page" but in fact refers to information spread across a number of subsequent pages. It's not obvious to the reader that they need to use the navigation menu to get to the information they seek. Moreover we link to this page from an exception message today so there's a reasonable chance that users will find it when trying to troubleshoot a genuine problem. This commit rewords things slightly and adds links to the subsequent pages to the body of the page to avoid this confusion.
Since #104614 the top-level repo troubleshooting page is just a short paragraph which talks about "this page" but in fact refers to information spread across a number of subsequent pages. It's not obvious to the reader that they need to use the navigation menu to get to the information they seek. Moreover we link to this page from an exception message today so there's a reasonable chance that users will find it when trying to troubleshoot a genuine problem. This commit rewords things slightly and adds links to the subsequent pages to the body of the page to avoid this confusion.
The main objective of this PR is to address the linked issue below. The main changes in this PR include:
LocalHealthMonitor
: to allow for a more "scalable" setup, I added the abstract classHealthCheck<T>
, which represents a health check that will be performed on every node, and madeLocalHealthMonitor
work with a list ofHealthCheck
s. This reduces future effort to add new health checks.DiskCheck
to a separate class/file.RepositoriesCheck
for this new feature.RepositoryIntegrityHealthIndicatorService
: refactored this class to do a more extensive analysis of the repository health in the cluster.TransportVersion
, constructor updates, etc.Fixes #103784