Skip to content

fix(delete): handle time-series children in bulk hard-delete cascade#28367

Merged
harshach merged 4 commits into
mainfrom
fix/bulk-hard-delete-time-series-children
May 24, 2026
Merged

fix(delete): handle time-series children in bulk hard-delete cascade#28367
harshach merged 4 commits into
mainfrom
fix/bulk-hard-delete-time-series-children

Conversation

@Khairajani
Copy link
Copy Markdown
Contributor

@Khairajani Khairajani commented May 22, 2026

Summary

bulkHardDeleteSubtree walks CONTAINS + PARENT_OF children at every level and calls Entity.getEntityRepository on each child type. Time-series entities (e.g. testCaseResolutionStatus, stored as TestCase --PARENT_OF--> testCaseResolutionStatus) are registered in ENTITY_TS_REPOSITORY_MAP, not ENTITY_REPOSITORY_MAP, so the cascade crashed with Entity repository for testCaseResolutionStatus not found. Is the ENTITY_TYPE_MAP initialized? the moment it reached a test case that had at least one resolution-status row.

This regressed in #27997, which rewrote the recursive hard-delete cascade. The pre-existing per-entity path was unaffected because TestCaseRepository.deleteChildren correctly routed the cleanup through getEntityTimeSeriesRepository.

Fix: dispatchToContainedChildren now takes an optional timeSeriesDispatcher. Hard-delete passes one that purges each row via EntityTimeSeriesRepository.deleteById(id, true); bulk restore / soft-delete pass null since time-series rows are immutable and have no deleted state.

Test plan

  • New TestCaseResourceIT#test_recursiveHardDeleteCascadesPastResolutionStatusChildren — fails before the fix with EntityRepositoryNotFound, passes after.
  • Existing TestCaseResourceIT#test_deleteTableDeletesTestCases still passes (no resolution-status rows on the test case).
  • Existing RestoreHierarchyIT (added in Fixes #4003: bulk + async restore for large entity hierarchies #27997) still passes — restore path was not behaviorally changed for non-time-series children, and time-series children are correctly skipped.

Summary by Gitar

  • Refactored core logic:
    • Simplified dispatchToContainedChildren by removing the timeSeriesDispatcher parameter as a cleaner alternative to explicit handling.
  • Updated deletion cascade:
    • Explicitly skip time-series entities in bulk operations to prevent EntityRepositoryNotFound errors and avoid table locking.
    • Delegated orphan time-series row cleanup to DataRetention app, consistent with handling for large-scale execution logs.
  • Enhanced integration test coverage:
    • Added test_recursiveHardDeleteCascadesPastAgentExecutionChildren in AIApplicationResourceIT.java to verify cascade stability.
    • Added testRecursiveHardDeleteCascadesPastMcpExecutionChildren in McpServerResourceIT.java to ensure proper handling of mcpExecution relationships.
  • AI Application API:
    • Exposed recursive flag in AIApplicationResource endpoints to enable recursive subtree deletion for AI entities.

This will update automatically on new commits.

bulkHardDeleteSubtree walks CONTAINS + PARENT_OF children at every level
and calls Entity.getEntityRepository on each child type, but time-series
entities (e.g. testCaseResolutionStatus, stored as TestCase --PARENT_OF-->
testCaseResolutionStatus) are registered in ENTITY_TS_REPOSITORY_MAP, not
ENTITY_REPOSITORY_MAP. Recursive hard-delete of any service that owned a
test case with at least one resolution-status row crashed with
"Entity repository for testCaseResolutionStatus not found".

Route time-series children through EntityTimeSeriesRepository.deleteById
for hard-delete; bulk restore / soft-delete skip them since time-series
rows are immutable and have no deleted state.

Regression introduced in #27997.
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 22, 2026

🟡 Playwright Results — all passed (14 flaky)

✅ 4241 passed · ❌ 0 failed · 🟡 14 flaky · ⏭️ 87 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 298 0 1 4
✅ Shard 2 803 0 0 8
🟡 Shard 3 796 0 4 8
🟡 Shard 4 839 0 2 12
✅ Shard 5 719 0 0 47
🟡 Shard 6 786 0 7 8
🟡 14 flaky test(s) (passed on retry)
  • Pages/AuditLogs.spec.ts › should apply both User and EntityType filters simultaneously (shard 1, 1 retry)
  • Features/OntologyExplorerCardinality.spec.ts › edges for cardinality-typed relations appear in the graph edge data (shard 3, 1 retry)
  • Features/OntologyExplorerCardinality.spec.ts › stats reflect the cardinality-typed edges in the relation count (shard 3, 1 retry)
  • Features/OntologyExplorerE2E.spec.ts › toggling edge labels off and back on leaves the graph and cardinality map intact (shard 3, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 2 retries)
  • Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 4, 1 retry)
  • Pages/CustomProperties.spec.ts › Hyperlink (shard 4, 1 retry)
  • Features/AutoPilot.spec.ts › Agents created by AutoPilot should be deleted (shard 6, 1 retry)
  • Pages/Glossary.spec.ts › Column dropdown drag-and-drop functionality for Glossary Terms table (shard 6, 1 retry)
  • Pages/Lineage/DataAssetLineage.spec.ts › Column lineage for apiEndpoint -> dashboard (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify Impact Analysis service filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/ODCSImportExport.spec.ts › Multi-object ODCS contract - object selector shows all schema objects (shard 6, 1 retry)
  • Pages/ServiceEntity.spec.ts › Tier Add, Update and Remove (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

… DataRetention

PR review feedback: the previous fix routed time-series children through a per-row
tsRepo.deleteById loop. That's 2N DB round-trips per parent (getById + delete) and
under heavy ingestion the tables can hold millions of rows (active AIApplication
agentExecutions, MCP server executions, profile / queryCost time-series). A
synchronous bulk DELETE on those would lock the table for the duration of the
cascade and stall the user's delete request.

Skip time-series children entirely in the cascade. Restore and soft-delete already
have nothing to do (rows are immutable, no deleted state); hard-delete now leaves
them as orphans whose parent refs resolve to null via the existing
getFromEntityRef + shouldSkipSearchResultOnInheritedFieldError path. The
DataRetention app will sweep them on its regular cadence (TODO #28367).

Contract cleanup: dispatchToContainedChildren drops the timeSeriesDispatcher
parameter and the null callsites in bulkRestore / bulkSoftDelete. The skip
becomes a single inverted guard inside the existing loop. Net change vs main
is smaller than the original fix.

Adds two ITs that exercise the cascade across additional time-series parent
shapes — AIApplication --CONTAINS--> agentExecution and McpServer --CONTAINS-->
mcpExecution — alongside the existing TestCase --PARENT_OF-->
testCaseResolutionStatus regression test. All three assert the parent is
deleted, which is the regression-guard the original EntityRepositoryNotFound
crash would fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…re AIApplication recursive flag

Two CI test failures from the prior commit (ec3c776) traced to two
separate gaps:

1. EntityRepository.deleteChildren(List, hardDelete, updatedBy) — the first-
   level entry point from the recursive DELETE API — was still calling
   Entity.getEntityRepository(childType) unconditionally and crashing with
   "Entity repository for mcpExecution not found" the moment the user
   hard-deleted an MCP server with execution rows. The dispatcher fix in
   dispatchToContainedChildren only covered nested recursion levels;
   the entry-level loop needed the same inverted guard.

2. AIApplicationResource.delete(id) and deleteByIdAsync(id) were missing
   the @QueryParam("recursive") annotation, hardcoding recursive=false at
   the dispatch site. The FQN-based delete had the annotation but ignored
   the parsed value (also hardcoded false on the call). Users could not
   hard-delete an AI Application with any agentExecution rows via the ID
   endpoint — the request was rejected upstream with "aiApplication is not
   empty". Mirror the McpServerResource pattern: add the annotation on
   both ID variants and thread the parsed value through all three
   delete dispatches.

Verified end-to-end against the new ITs locally: cascade now succeeds and
the parent entity is gone post-delete in both AIApplication and McpServer
recursive hard-delete paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@DefaultValue("false")
boolean hardDelete,
@Parameter(
description = "Recursively delete this entity and it's children. (Default `false`)")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: Typo: "it's" should be "its" (possessive) in API docs

The @Parameter description uses "it's children" (contraction of "it is") instead of the correct possessive "its children". This appears in both the delete and deleteByIdAsync endpoint descriptions.

Was this helpful? React with 👍 / 👎

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented May 22, 2026

Code Review 👍 Approved with suggestions 1 resolved / 2 findings

Refactors the bulk hard-delete cascade to delegate time-series cleanup to the DataRetention app, resolving EntityRepositoryNotFound errors during recursive operations. Please correct the 'it's' to 'its' possessive typo within the AIApplication API documentation.

💡 Quality: Typo: "it's" should be "its" (possessive) in API docs

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/ai/AIApplicationResource.java:483 📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/ai/AIApplicationResource.java:513

The @Parameter description uses "it's children" (contraction of "it is") instead of the correct possessive "its children". This appears in both the delete and deleteByIdAsync endpoint descriptions.

✅ 1 resolved
Performance: Time-series hard-delete loops one DB call per child ID

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java:5984-5988
The hard-delete time-series dispatcher at line 5984-5988 iterates over childIds calling tsRepo.deleteById(childId, true) one at a time. Each call issues a getById SELECT followed by a deleteById DELETE — so 2 DB round-trips per child. For a test case with many resolution-status rows this becomes an N+1 (really 2N) problem.

Since EntityTimeSeriesRepository currently has no bulk-delete method this is an acceptable interim solution, but adding a batch DELETE … WHERE id IN (?) would be more efficient. This is a minor concern since the overall cascade already batches at the tree level and resolution-status rows per test case are typically few.

🤖 Prompt for agents
Code Review: Refactors the bulk hard-delete cascade to delegate time-series cleanup to the DataRetention app, resolving EntityRepositoryNotFound errors during recursive operations. Please correct the 'it's' to 'its' possessive typo within the AIApplication API documentation.

1. 💡 Quality: Typo: "it's" should be "its" (possessive) in API docs
   Files: openmetadata-service/src/main/java/org/openmetadata/service/resources/ai/AIApplicationResource.java:483, openmetadata-service/src/main/java/org/openmetadata/service/resources/ai/AIApplicationResource.java:513

   The `@Parameter` description uses "it's children" (contraction of "it is") instead of the correct possessive "its children". This appears in both the `delete` and `deleteByIdAsync` endpoint descriptions.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link
Copy Markdown

harshach added a commit that referenced this pull request May 24, 2026
#28384)

* feat(dataRetention): sweep orphan time-series rows for per-type tables

Adds `cleanOrphanedTimeSeriesRows()` to `DataRetention` alongside the existing
orphan-relationship and orphan-tag sweeps. Each of the five affected DAOs gets a
`deleteOrphanedRecords(int limit)` query (MySQL + PostgreSQL) that left-joins
to its parent and deletes rows the parent no longer covers:

- `testCaseResolutionStatus`: parent link via `entity_relationship` PARENT_OF
- `agentExecution`: `agentId` → `ai_application_entity.id`
- `mcpExecution`: `serverId` → `mcp_server_entity.id`
- `profile_data`: `entityFQNHash` → `table_entity.fqnHash`
- `query_cost_time_series`: `entityFQNHash` → `query_entity.fqnHash`

The sweep runs after `cleanOrphanedRelationshipsAndHierarchies()` so the
PARENT_OF check sees the post-cleanup `entity_relationship` state. Pairs with
PR #28367, where the bulk hard-delete cascade now skips time-series children
and relies on `DataRetention` to reclaim them out-of-band.

Adds `OrphanedTimeSeriesCleanupIT` covering all five per-type queries:
inserts a real-parent row and a bogus-parent row through the existing DAO
`insert(...)` paths, runs `deleteOrphanedRecords(BATCH)`, asserts the orphan
is gone and the valid row is preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(dataRetention): fix orphan-cleanup IT name lengths and McpExecution table arg

- McpExecutionDAO.insertWithoutExtension uses `<table>` placeholder; the test
  was passing `null`, which made Jdbi fail to render the SQL template. Pass
  the literal table name `mcp_execution_entity`.
- `ns.prefix(...)` embeds class + method names, so chaining it through
  database -> schema -> table -> auto-created test_suite pushed the test_suite
  `name` column past its VARCHAR(256) bound. Use `ns.uniqueShortId()` for the
  hierarchy components and shorten the test method names so the resulting
  FQN stays well under the column limit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* review(dataRetention): bound profile-data orphan delete by rows + clarify PARENT_OF=9

Addresses two PR review findings:

- `profiler_data_time_series.deleteOrphanedRecords` previously LIMITed distinct
  `entityFQNHash` values, then deleted every row for each hash — a batch could
  delete tens of millions of rows if many orphan hashes each had thousands of
  rows. Switch to row-level limiting: single-table `DELETE ... WHERE NOT EXISTS
  (...) LIMIT N` on MySQL, and `ctid IN (SELECT ... LIMIT N)` on PostgreSQL
  (the table has no `id` column, so we use Postgres ctid for the inner subquery).
  This matches the row-count cap used by the other four orphan-cleanup queries.

- Annotate `er.relation = 9` in the testCaseResolutionStatus query with a
  `// 9 = Relationship.PARENT_OF` inline comment plus a leading block comment
  noting the ordinal is stable because the enum appends new values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@harshach harshach merged commit aebdf23 into main May 24, 2026
55 checks passed
@harshach harshach deleted the fix/bulk-hard-delete-time-series-children branch May 24, 2026 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants