Skip to content

fix(search): scope alias lookups to cluster prefix on shared OpenSearch clusters#27466

Merged
pmbrull merged 6 commits intomainfrom
claude/wizardly-lamarr-1e1ca0
Apr 17, 2026
Merged

fix(search): scope alias lookups to cluster prefix on shared OpenSearch clusters#27466
pmbrull merged 6 commits intomainfrom
claude/wizardly-lamarr-1e1ca0

Conversation

@mohityadav766
Copy link
Copy Markdown
Member

Summary

  • Fixes 403 Forbidden on shared OpenSearch/Elasticsearch clusters where tenant roles only grant indices:admin/aliases/get on their own prefix. The orphaned index cleanup and metrics refresh were issuing unscoped GET /*/_alias and stats("*") calls that the tenant role is not allowed to perform.
  • Routes listIndicesByPrefix and getAllIndexStats through a shared buildScopedPattern() in both OpenSearchIndexManager and ElasticSearchIndexManager. When the caller passes an empty/null prefix and a clusterAlias is configured, the request is scoped to {clusterAlias}_* so each deployment only reads its own indices. Explicit non-empty prefixes (already cluster-qualified by their callers) are left untouched; deployments with no cluster alias keep the existing * behavior.

Context

Reported in open-metadata/openmetadata-collate#3557:

security_exception: no permissions for [indices:admin/aliases/get]
URI [/*/_alias], status [403]

The unscoped calls come from:

  • OrphanedIndexCleaner.findAllRebuildIndices()listIndicesByPrefix("")
  • SearchIndexMetrics.countTotalIndices()listIndicesByPrefix("")
  • getAllIndexStats() → hard-coded stats("*")

Both managers already held a clusterAlias field (from ElasticSearchConfiguration.getClusterAlias()); this PR simply uses it for the two pattern builders that were ignoring it. Callers that already pass cluster-qualified prefixes (DefaultRecreateHandler, OpenMetadataOperations, internal addAliasesInternal) are unaffected.

Test plan

  • mvn test -Dtest='OpenSearchIndexManagerTest,ElasticSearchIndexManagerTest,OrphanedIndexCleanerTest,SearchIndexMetricsTest' → 129/129 pass
  • Added testListIndicesByPrefix_EmptyPrefixScopesToClusterAlias and testListIndicesByPrefix_EmptyPrefixWithoutClusterAliasUsesWildcard for both ES and OS managers, asserting the emitted GetAliasRequest index pattern via ArgumentCaptor
  • mvn spotless:apply — no formatting drift
  • Verify in a shared-OpenSearch environment that orphaned index cleanup no longer produces 403 and only touches {clusterAlias}_* indices

🤖 Generated with Claude Code

On shared OpenSearch/Elasticsearch clusters where tenant roles only
grant indices:admin/aliases/get on their own prefix, the orphaned
index cleanup and metrics refresh were failing with 403 Forbidden
because listIndicesByPrefix("") and getAllIndexStats() issued
unscoped GET /*/_alias and stats("*") requests.

Route both through a shared buildScopedPattern() that substitutes
{clusterAlias}_* when the caller passes an empty prefix and a
cluster alias is configured, so each deployment only reads its own
indices. Explicit non-empty prefixes are already cluster-qualified
by their callers and are left untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 17, 2026 09:42
@mohityadav766 mohityadav766 self-assigned this Apr 17, 2026
@github-actions github-actions bot added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 17, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes permission failures on shared OpenSearch/Elasticsearch clusters by scoping previously-unbounded alias and stats lookups to the deployment’s configured clusterAlias prefix, preventing GET /*/_alias and stats("*") from hitting indices outside the tenant’s allowed namespace.

Changes:

  • Updated listIndicesByPrefix to use a cluster-scoped pattern ({clusterAlias}_*) when called with an empty/null prefix and a clusterAlias is configured.
  • Updated getAllIndexStats to use the same scoped pattern instead of hard-coded "*".
  • Added unit tests (ES + OS) asserting the emitted alias lookup pattern for empty/null prefix with/without clusterAlias.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchIndexManager.java Introduces scoped pattern builder and applies it to alias listing + index stats retrieval.
openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchIndexManager.java Mirrors scoped pattern behavior for Elasticsearch alias listing + index stats retrieval.
openmetadata-service/src/test/java/org/openmetadata/service/search/opensearch/OpenSearchIndexManagerTest.java Adds coverage to validate scoping behavior for empty/null prefixes.
openmetadata-service/src/test/java/org/openmetadata/service/search/elasticsearch/ElasticSearchIndexManagerTest.java Adds coverage to validate scoping behavior for empty/null prefixes.

mohityadav766 and others added 2 commits April 17, 2026 15:17
Verifies that on a shared search cluster where the app is configured
with clusterAlias="openmetadata", the orphaned-index cleanup and
index-listing paths only read / touch indices matching
{clusterAlias}_*.

The test provisions a "foreign tenant" directly against the real
OpenSearch/Elasticsearch container by creating indices under a
different prefix (foreigntenant_it_orphans_*), then asserts:

  1. listIndicesByPrefix("") never returns foreign-prefixed indices
  2. getAllIndexStats() never returns foreign-prefixed indices
  3. OrphanedIndexCleaner.cleanupOrphanedIndices() only deletes
     orphans under the configured cluster prefix, leaving foreign
     tenant indices (both orphaned and live) intact

Security plugin is disabled in the IT bootstrap, so the exact 403
cannot be reproduced — but the behavioral guarantee that prevents
it (never issuing unscoped GET /*/_alias) is verified here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tern

Centralize the cluster-prefix separator so the scoped wildcard is
built from the same constant used by getIndexName() / getAlias().

Addresses review feedback on #27466.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 17, 2026 09:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

- Use IndexMapping.INDEX_NAME_SEPARATOR in unit test assertions for
  parity with production code.
- Rewrite the IT's cleanup test as a read-only discovery test via
  findOrphanedRebuildIndices(). cleanupOrphanedIndices() is a
  globally-scoped destructive op that could race with parallel ITs
  creating _rebuild_ indices under the same shared openmetadata_*
  namespace. Discovery-scope is the invariant that produces the 403
  prevention; per-index deletion is already covered by unit tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 17, 2026 11:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

@BeforeAll was calling createIndex() directly, which returns 400 if
the index already exists from a prior failed run (or a re-run against
a reused container). Delete first, then create, so setUp is safe to
rerun.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Apr 17, 2026

Code Review ✅ Approved

Scopes alias lookups to the cluster prefix on shared OpenSearch clusters to improve multi-tenancy isolation. No issues found.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link
Copy Markdown

@github-actions
Copy link
Copy Markdown
Contributor

🔴 Playwright Results — 1 failure(s), 17 flaky

✅ 3667 passed · ❌ 1 failed · 🟡 17 flaky · ⏭️ 89 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 478 0 2 4
🟡 Shard 2 648 0 3 7
🔴 Shard 3 656 1 3 1
🟡 Shard 4 632 0 2 27
🟡 Shard 5 610 0 1 42
🟡 Shard 6 643 0 6 8

Genuine Failures (failed on all attempts)

Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3)
Error: Waiting for data product "PW Data Product dfaa77fb" to appear after save

�[2mexpect(�[22m�[31mreceived�[39m�[2m).�[22mtoBe�[2m(�[22m�[32mexpected�[39m�[2m) // Object.is equality�[22m

Expected: �[32mtrue�[39m
Received: �[31mfalse�[39m

Call Log:
- Timeout 60000ms exceeded while waiting on the predicate
🟡 17 flaky test(s) (passed on retry)
  • Features/CustomizeDetailPage.spec.ts › Dashboard - customization should work (shard 1, 1 retry)
  • Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Features/ChangeSummaryBadge.spec.ts › AI badge should NOT appear for manually-edited descriptions (shard 2, 1 retry)
  • Features/Glossary/GlossaryWorkflow.spec.ts › should inherit reviewers from glossary when term is created (shard 2, 1 retry)
  • Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 1 retry)
  • Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 1 retry)
  • Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 2 retries)
  • Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
  • Pages/Domains.spec.ts › Rename domain with tags and glossary terms preserves associations (shard 4, 1 retry)
  • Pages/EntityDataSteward.spec.ts › Glossary Term Add, Update and Remove (shard 5, 1 retry)
  • Pages/InputOutputPorts.spec.ts › Lineage controls work (shard 6, 1 retry)
  • Pages/Lineage/DataAssetLineage.spec.ts › verify create lineage for entity - Container (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Tag.spec.ts › Verify Owner Add Delete (shard 6, 1 retry)
  • Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)
  • VersionPages/EntityVersionPages.spec.ts › Directory (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

@pmbrull pmbrull merged commit 958b33f into main Apr 17, 2026
53 of 55 checks passed
@pmbrull pmbrull deleted the claude/wizardly-lamarr-1e1ca0 branch April 17, 2026 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants