Add tier_preference and node.role columns to _cat/shards API #138405

pswaao88 · 2025-11-21T09:46:02Z

Description

This PR adds tier_preference and node.role columns to the _cat/shards API to facilitate troubleshooting of ILM allocation issues, as requested in #136895.

Implementation Details

tier_preference (tp): Retrieves the index.routing.allocation.include._tier_preference setting from IndexMetadata.
- Used Metadata#findIndex(Index) instead of the deprecated index(String) or getProject() methods to safely handle index lookup in the current architecture.
node.role (r): Retrieves the node role abbreviation using DiscoveryNode#getRoleAbbreviationString(), ensuring consistency with the _cat/nodes API.
Safe Access: Implemented using getOrNull to prevent NullPointerException when shards are unassigned or metadata is missing.

Related Issues

Closes #136895

Note to Reviewers

This is my first contribution to Elasticsearch! 🚀
As a non-native English speaker and a first-time contributor, I apologize in advance if I missed any conventions or used awkward phrasing. If there is anything I overlooked or need to improve, please let me know, and I will address it immediately. Thank you for the opportunity to contribute.

cla-checker-service · 2025-11-21T09:46:07Z

💚 CLA has been signed

github-actions · 2025-11-21T09:57:59Z

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

Check out the cumulative docs guidelines
Reach out in the #docs Slack channel

elasticsearchmachine · 2025-11-21T11:20:35Z

Pinging @elastic/es-data-management (Team:Data Management)

szybia · 2025-11-21T11:34:31Z

hi @pswaao88, thank you for your interest in elasticsearch!

few preliminary things before reviewing:

if you could refrain from git force-pushing/rebasing and use merging instead, if needed. makes it more difficult to review and reason about things
regarding testing: i've ran the CI for you, if there's any failures, if you could delve into these and investigate them. but whether there are or aren't failures, i'd suggest we should be adding some tests here to test the changes you're making

pswaao88 · 2025-11-21T11:43:46Z

Hi @szybia,

Thank you very much for running the CI and for the clear feedback. As this is my first contribution to Elasticsearch, I wasn't fully aware of these processes. Thank you for the guidance!

Regarding the preliminary items:

Git History:
I sincerely apologize for the force-pushes; they were unfortunately necessary to correct the failed CLA signature and the Changelog filename. I fully understand the policy and will use standard merging going forward to ensure a clean history.

Testing:
I am currently reviewing the CI results, and based on the feedback, I will add the required unit and integration tests. Thank you again for pointing this out.

Thank you again for your time!

szybia

i'll run the CI for you again, but you'll still probably get a bunch of failures due to not adjusting the existing tests that assert the body of the response (have a scan through all the different failures in buildkite/CI)

helpful suggestion: ctrl-f for reproduce with once you expand a job that failed, and that will highlight all the tests that have failed within that job, run the gradle command locally, and then you can start figuring it why it failed and how to fix it

szybia · 2025-11-21T15:51:19Z

server/src/main/java/org/elasticsearch/rest/action/cat/RestShardsAction.java

            table.addCell(getOrNull(commonStats, CommonStats::getSparseVectorStats, SparseVectorStats::getValueCount));

+            table.addCell(
+                Optional.ofNullable(getOrNull(


without having a deeper look, it surprises me that we need a null check here when the other cells/fields above seem to be fine with null

mind helping me understand why this is needed? 🙏

Hi @szybia,

Thank you for the guidance. As a university student who has recently started studying this field (and is new to contributing to complex systems), I view each CI failure as a valuable learning opportunity.

I've reviewed the failure logs and have made the following attempt to fix the issue:

Issue Identification: I found that the widespread CI failures were related to a NullPointerException (NPE) occurring during Backward Compatibility (BWC) tests.

Hypothesis: I suspect the issue is that the newly added String fields receive a raw null value from older nodes, causing the system to crash later.

Attempted Solution: To fix this, I applied the Optional.ofNullable().orElse("") pattern to ensure a safe String is returned instead of null.

I would be very grateful if you could confirm my understanding.

Could you please confirm if my diagnosis of the root cause and the need for a null check is fundamentally correct? Also, if my approach is missing the preferred project convention, could you kindly provide a small hint or gentle guidance on the correct direction? I want to ensure I'm adopting the best practices for future contributions. 🙏

Additionally, is it okay for me to manually trigger the tests by commenting 'buildkite test this' if they don't start automatically?

Thank you for your patience and guidance!

Hi @pswaao88 and thanks for your contribution here.

You should be running these tests locally as part of your development process - see these docs. To be fair ./gradlew check takes a while and mostly will be running tests that are not germane to your change, but the ones that you've seen fail in CI are the important ones and you should be re-running them yourself before asking for another CI run and code review. Looking at the recent failures that means you need to make sure that at least the following command completes successfully on your local machine first:

./gradlew :server:test :rest-api-spec:yamlRestTest :qa:smoke-test-multinode:yamlRestTest

Additionally, is it okay for me to manually trigger the tests by commenting 'buildkite test this' if they don't start automatically?

Unfortunately no, for security reasons we can't allow external contributors to trigger their own test runs in CI. We have to check there's at least nothing obviously malicious in the changes we're about to test before running anything.

I would be very grateful if you could confirm my understanding.

This (and more) will come out in the code review, once the tests are all passing and you've added some more tests to support your own change. Please bear in mind that our capacity for reviewing contributions like this is bounded, so please try not to exhaust your share of this capacity on relatively minor questions like this. I'd love it if we could welcome contributions of all levels but unfortunately we do not have the infinite time that this would require.

Please also carefully read this section of the contributing guide noting particularly (emphasis mine):

We sometimes reject contributions due to the low quality of the submission since low-quality submissions tend to take unreasonable effort to review properly. Quality is rather subjective so it is hard to describe exactly how to avoid this, but there are some basic steps you can take to reduce the chances of rejection. Follow the guidelines listed above when preparing your changes. You should add tests that correspond with your changes, and your PR should pass affected test suites too. It makes it much easier to review if your code is formatted correctly and does not include unnecessary extra changes.

szybia · 2025-11-21T16:04:46Z

buildkite test this

…m/pswaao88/elasticsearch into feature/136895-cat-shards-columns

pswaao88 · 2025-11-24T00:30:54Z

Hi @DaveCTurner and @szybia,

Thank you both for the guidance and patience. As a student new to contributing, I really appreciate your help in getting the process right.

Following @DaveCTurner's instructions, I have successfully run and passed the local tests (:server:test, :rest-api-spec:yamlRestTest, :qa:smoke-test-multinode:yamlRestTest) on my machine.

I have pushed the commits that resolve the BWC NPE (using Optional for safety) and adjusted the existing tests to account for the new columns. I would appreciate it if you could take a look.

Thanks!

elasticsearchmachine added needs:triage Requires assignment of a team area label v9.3.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Nov 21, 2025

pswaao88 force-pushed the feature/136895-cat-shards-columns branch from d559f89 to 2bf3173 Compare November 21, 2025 09:56

github-actions bot deployed to docs-preview November 21, 2025 09:56 View deployment

pswaao88 force-pushed the feature/136895-cat-shards-columns branch from 2bf3173 to 49dcfbe Compare November 21, 2025 11:17

github-actions bot deployed to docs-preview November 21, 2025 11:18 View deployment

szybia added :Data Management/CAT APIs Text APIs behind /_cat and removed needs:triage Requires assignment of a team area label labels Nov 21, 2025

elasticsearchmachine added the Team:Data Management Meta label for data/management team label Nov 21, 2025

Add tier_preference and node.role columns to _cat/shards API

fc1aa0c

pswaao88 force-pushed the feature/136895-cat-shards-columns branch from 49dcfbe to fc1aa0c Compare November 21, 2025 11:22

fix: resolve CI failure by handling null

93daa6b

szybia reviewed Nov 21, 2025

View reviewed changes

[CI] Auto commit changes from spotless

db16b70

pswaao88 added 2 commits November 23, 2025 17:22

Fix: Resolve BWC NPE, update tests, and fix changelog

96611d3

Merge branch 'feature/136895-cat-shards-columns' of https://github.co…

e1bb7a2

…m/pswaao88/elasticsearch into feature/136895-cat-shards-columns

Meta: Adjust changelog issue format to follow convention

558125f

szybia self-assigned this Nov 27, 2025

Add tier_preference and node.role columns to _cat/shards API #138405

Are you sure you want to change the base?

Add tier_preference and node.role columns to _cat/shards API #138405

Conversation

pswaao88 commented Nov 21, 2025

Description

Implementation Details

Related Issues

Note to Reviewers

Uh oh!

cla-checker-service bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 21, 2025

ℹ️ Important: Docs version tagging

When to use applies_to tags:

What NOT to do:

🤔 Need help?

Uh oh!

elasticsearchmachine commented Nov 21, 2025

Uh oh!

szybia commented Nov 21, 2025

Uh oh!

pswaao88 commented Nov 21, 2025

Uh oh!

szybia left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szybia Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

pswaao88 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

szybia commented Nov 21, 2025

Uh oh!

pswaao88 commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cla-checker-service bot commented Nov 21, 2025 •

edited

Loading

szybia left a comment •

edited

Loading