fix(delete): handle time-series children in bulk hard-delete cascade#28367
Conversation
bulkHardDeleteSubtree walks CONTAINS + PARENT_OF children at every level and calls Entity.getEntityRepository on each child type, but time-series entities (e.g. testCaseResolutionStatus, stored as TestCase --PARENT_OF--> testCaseResolutionStatus) are registered in ENTITY_TS_REPOSITORY_MAP, not ENTITY_REPOSITORY_MAP. Recursive hard-delete of any service that owned a test case with at least one resolution-status row crashed with "Entity repository for testCaseResolutionStatus not found". Route time-series children through EntityTimeSeriesRepository.deleteById for hard-delete; bulk restore / soft-delete skip them since time-series rows are immutable and have no deleted state. Regression introduced in #27997.
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
🟡 Playwright Results — all passed (14 flaky)✅ 4241 passed · ❌ 0 failed · 🟡 14 flaky · ⏭️ 87 skipped
🟡 14 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
… DataRetention PR review feedback: the previous fix routed time-series children through a per-row tsRepo.deleteById loop. That's 2N DB round-trips per parent (getById + delete) and under heavy ingestion the tables can hold millions of rows (active AIApplication agentExecutions, MCP server executions, profile / queryCost time-series). A synchronous bulk DELETE on those would lock the table for the duration of the cascade and stall the user's delete request. Skip time-series children entirely in the cascade. Restore and soft-delete already have nothing to do (rows are immutable, no deleted state); hard-delete now leaves them as orphans whose parent refs resolve to null via the existing getFromEntityRef + shouldSkipSearchResultOnInheritedFieldError path. The DataRetention app will sweep them on its regular cadence (TODO #28367). Contract cleanup: dispatchToContainedChildren drops the timeSeriesDispatcher parameter and the null callsites in bulkRestore / bulkSoftDelete. The skip becomes a single inverted guard inside the existing loop. Net change vs main is smaller than the original fix. Adds two ITs that exercise the cascade across additional time-series parent shapes — AIApplication --CONTAINS--> agentExecution and McpServer --CONTAINS--> mcpExecution — alongside the existing TestCase --PARENT_OF--> testCaseResolutionStatus regression test. All three assert the parent is deleted, which is the regression-guard the original EntityRepositoryNotFound crash would fail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…re AIApplication recursive flag Two CI test failures from the prior commit (ec3c776) traced to two separate gaps: 1. EntityRepository.deleteChildren(List, hardDelete, updatedBy) — the first- level entry point from the recursive DELETE API — was still calling Entity.getEntityRepository(childType) unconditionally and crashing with "Entity repository for mcpExecution not found" the moment the user hard-deleted an MCP server with execution rows. The dispatcher fix in dispatchToContainedChildren only covered nested recursion levels; the entry-level loop needed the same inverted guard. 2. AIApplicationResource.delete(id) and deleteByIdAsync(id) were missing the @QueryParam("recursive") annotation, hardcoding recursive=false at the dispatch site. The FQN-based delete had the annotation but ignored the parsed value (also hardcoded false on the call). Users could not hard-delete an AI Application with any agentExecution rows via the ID endpoint — the request was rejected upstream with "aiApplication is not empty". Mirror the McpServerResource pattern: add the annotation on both ID variants and thread the parsed value through all three delete dispatches. Verified end-to-end against the new ITs locally: cascade now succeeds and the parent entity is gone post-delete in both AIApplication and McpServer recursive hard-delete paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| @DefaultValue("false") | ||
| boolean hardDelete, | ||
| @Parameter( | ||
| description = "Recursively delete this entity and it's children. (Default `false`)") |
There was a problem hiding this comment.
💡 Quality: Typo: "it's" should be "its" (possessive) in API docs
The @Parameter description uses "it's children" (contraction of "it is") instead of the correct possessive "its children". This appears in both the delete and deleteByIdAsync endpoint descriptions.
Was this helpful? React with 👍 / 👎
Code Review 👍 Approved with suggestions 1 resolved / 2 findingsRefactors the bulk hard-delete cascade to delegate time-series cleanup to the DataRetention app, resolving EntityRepositoryNotFound errors during recursive operations. Please correct the 'it's' to 'its' possessive typo within the AIApplication API documentation. 💡 Quality: Typo: "it's" should be "its" (possessive) in API docs📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/ai/AIApplicationResource.java:483 📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/ai/AIApplicationResource.java:513 The ✅ 1 resolved✅ Performance: Time-series hard-delete loops one DB call per child ID
🤖 Prompt for agentsOptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
#28384) * feat(dataRetention): sweep orphan time-series rows for per-type tables Adds `cleanOrphanedTimeSeriesRows()` to `DataRetention` alongside the existing orphan-relationship and orphan-tag sweeps. Each of the five affected DAOs gets a `deleteOrphanedRecords(int limit)` query (MySQL + PostgreSQL) that left-joins to its parent and deletes rows the parent no longer covers: - `testCaseResolutionStatus`: parent link via `entity_relationship` PARENT_OF - `agentExecution`: `agentId` → `ai_application_entity.id` - `mcpExecution`: `serverId` → `mcp_server_entity.id` - `profile_data`: `entityFQNHash` → `table_entity.fqnHash` - `query_cost_time_series`: `entityFQNHash` → `query_entity.fqnHash` The sweep runs after `cleanOrphanedRelationshipsAndHierarchies()` so the PARENT_OF check sees the post-cleanup `entity_relationship` state. Pairs with PR #28367, where the bulk hard-delete cascade now skips time-series children and relies on `DataRetention` to reclaim them out-of-band. Adds `OrphanedTimeSeriesCleanupIT` covering all five per-type queries: inserts a real-parent row and a bogus-parent row through the existing DAO `insert(...)` paths, runs `deleteOrphanedRecords(BATCH)`, asserts the orphan is gone and the valid row is preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(dataRetention): fix orphan-cleanup IT name lengths and McpExecution table arg - McpExecutionDAO.insertWithoutExtension uses `<table>` placeholder; the test was passing `null`, which made Jdbi fail to render the SQL template. Pass the literal table name `mcp_execution_entity`. - `ns.prefix(...)` embeds class + method names, so chaining it through database -> schema -> table -> auto-created test_suite pushed the test_suite `name` column past its VARCHAR(256) bound. Use `ns.uniqueShortId()` for the hierarchy components and shorten the test method names so the resulting FQN stays well under the column limit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review(dataRetention): bound profile-data orphan delete by rows + clarify PARENT_OF=9 Addresses two PR review findings: - `profiler_data_time_series.deleteOrphanedRecords` previously LIMITed distinct `entityFQNHash` values, then deleted every row for each hash — a batch could delete tens of millions of rows if many orphan hashes each had thousands of rows. Switch to row-level limiting: single-table `DELETE ... WHERE NOT EXISTS (...) LIMIT N` on MySQL, and `ctid IN (SELECT ... LIMIT N)` on PostgreSQL (the table has no `id` column, so we use Postgres ctid for the inner subquery). This matches the row-count cap used by the other four orphan-cleanup queries. - Annotate `er.relation = 9` in the testCaseResolutionStatus query with a `// 9 = Relationship.PARENT_OF` inline comment plus a leading block comment noting the ordinal is stable because the enum appends new values. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>



Summary
bulkHardDeleteSubtreewalks CONTAINS + PARENT_OF children at every level and callsEntity.getEntityRepositoryon each child type. Time-series entities (e.g.testCaseResolutionStatus, stored asTestCase --PARENT_OF--> testCaseResolutionStatus) are registered inENTITY_TS_REPOSITORY_MAP, notENTITY_REPOSITORY_MAP, so the cascade crashed withEntity repository for testCaseResolutionStatus not found. Is the ENTITY_TYPE_MAP initialized?the moment it reached a test case that had at least one resolution-status row.This regressed in #27997, which rewrote the recursive hard-delete cascade. The pre-existing per-entity path was unaffected because
TestCaseRepository.deleteChildrencorrectly routed the cleanup throughgetEntityTimeSeriesRepository.Fix:
dispatchToContainedChildrennow takes an optionaltimeSeriesDispatcher. Hard-delete passes one that purges each row viaEntityTimeSeriesRepository.deleteById(id, true); bulk restore / soft-delete passnullsince time-series rows are immutable and have no deleted state.Test plan
TestCaseResourceIT#test_recursiveHardDeleteCascadesPastResolutionStatusChildren— fails before the fix withEntityRepositoryNotFound, passes after.TestCaseResourceIT#test_deleteTableDeletesTestCasesstill passes (no resolution-status rows on the test case).RestoreHierarchyIT(added in Fixes #4003: bulk + async restore for large entity hierarchies #27997) still passes — restore path was not behaviorally changed for non-time-series children, and time-series children are correctly skipped.Summary by Gitar
dispatchToContainedChildrenby removing thetimeSeriesDispatcherparameter as a cleaner alternative to explicit handling.EntityRepositoryNotFounderrors and avoid table locking.DataRetentionapp, consistent with handling for large-scale execution logs.test_recursiveHardDeleteCascadesPastAgentExecutionChildreninAIApplicationResourceIT.javato verify cascade stability.testRecursiveHardDeleteCascadesPastMcpExecutionChildreninMcpServerResourceIT.javato ensure proper handling ofmcpExecutionrelationships.recursiveflag inAIApplicationResourceendpoints to enable recursive subtree deletion for AI entities.This will update automatically on new commits.