fix(search): treat stale parentOf as warning during time-series reindex (#27417)#27800
fix(search): treat stale parentOf as warning during time-series reindex (#27417)#27800mohityadav766 wants to merge 18 commits intomainfrom
Conversation
…ex (#27417) Time-series records (testCaseResolutionStatus, testCaseResult, ...) whose parentOf entity_relationship row is missing surface as "does not have expected relationship parentOf to/from entity type ..." from EntityRepository.ensureSingleRelationship and were failing the entire reindex batch. Mirror the warning-vs-failure split already used for EntityInterface sources: extract isEntityNotFoundError into ReindexingUtil, broaden it to match the relationship-not-found message, and apply the same partitioning to PaginatedEntityTimeSeriesSource.read/readWithCursor/ readNextKeyset so orphaned rows are counted as warnings via result.getWarningsCount() instead of failing the job. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes time-series reindex batches failing on orphaned testCaseResolutionStatus records by reclassifying “stale relationship / missing entity” reader errors as warnings (instead of fatal failures), aligning behavior with existing entity reindexing.
Changes:
- Centralizes stale/missing-entity error matching in
ReindexingUtil.isEntityNotFoundErrorand addspartitionErrorshelper. - Updates
PaginatedEntityTimeSeriesSourceto filter stale relationship errors into warnings (settingwarningsCount) for offset and keyset reads. - Updates
PaginatedEntitiesSourceto use the shared matcher and adds focused regression tests for both the matcher and time-series reader behavior.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-service/src/main/java/org/openmetadata/service/workflows/searchIndex/ReindexingUtil.java | Adds shared stale/missing-entity matcher and error partitioning helper for reindex readers. |
| openmetadata-service/src/main/java/org/openmetadata/service/workflows/searchIndex/PaginatedEntitiesSource.java | Reuses shared matcher (removes local duplicate) to keep stale relationship errors as warnings. |
| openmetadata-service/src/main/java/org/openmetadata/service/workflows/searchIndex/PaginatedEntityTimeSeriesSource.java | Filters stale relationship errors into warnings for time-series reads and tracks warning counts. |
| openmetadata-service/src/test/java/org/openmetadata/service/workflows/searchIndex/ReindexingUtilStaleRelationshipTest.java | Adds unit tests covering matcher behavior and partitioning. |
| openmetadata-service/src/test/java/org/openmetadata/service/workflows/searchIndex/PaginatedEntityTimeSeriesSourceStaleRelationshipTest.java | Adds tests ensuring stale relationship errors become warnings (not failures) for time-series reads. |
| public static List<EntityError> partitionErrors( | ||
| List<EntityError> errors, List<EntityError> warningsOut) { | ||
| if (CommonUtil.nullOrEmpty(errors)) { | ||
| return new ArrayList<>(); | ||
| } | ||
| List<EntityError> realErrors = new ArrayList<>(errors.size()); | ||
| for (EntityError error : errors) { | ||
| if (isEntityNotFoundError(error)) { | ||
| warningsOut.add(error); | ||
| } else { |
There was a problem hiding this comment.
partitionErrors assumes warningsOut is non-null and mutable; passing null (or an unmodifiable list) will throw at warningsOut.add(error). Since this is a public utility, consider either validating warningsOut (e.g., initialize when null / throw an explicit IllegalArgumentException) or documenting/annotating it as non-null to avoid surprising NPEs at runtime.
Address PR #27800 review: - Drop bare "not found" from isEntityNotFoundError; add "entity not found" so we still match EntityNotFoundException's "Entity not found:" form but no longer misclassify "Column 'foo' not found in result set" or "SSL certificate not found" as warnings. - Call updateStats(success, failed, warnings) in readWithCursor and readNextKeyset so the source's StepStats.warningRecords reflects warnings for cursor- and keyset-based reads (was already correct in read()). - partitionErrors: requireNonNull(warningsOut) and document the contract. - Tests: cover the new bare-"not found" exclusion, the "Entity not found:" inclusion, the readWithCursor stats propagation, and the null-warningsOut guard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks for the review — addressed all three findings in 91bbdc1: 1. 2. Overly broad 3. Tests updated to cover all three: the new bare- |
🟡 Playwright Results — all passed (8 flaky)✅ 3990 passed · ❌ 0 failed · 🟡 8 flaky · ⏭️ 86 skipped
🟡 8 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
| // Always recreate | ||
| true, | ||
| Boolean.TRUE.equals(jobData.getAutoTune()), |
| LOG.debug( | ||
| "[PaginatedEntitiesSource] Batch Stats :- %n Submitted : {} Success: {} Failed: {}", | ||
| batchSize, result.getData().size(), result.getErrors().size()); | ||
| "[PaginatedEntityTimeSeriesSource] Batch Stats :- Submitted: {} Success: {} Failed: {} Warnings: {}", | ||
| batchSize, | ||
| result.getData().size(), | ||
| result.getErrors().size(), |
| int warningsCount = filterStaleRelationshipErrors(result); | ||
|
|
||
| if (!result.getErrors().isEmpty()) { | ||
| LOG.warn( | ||
| "[PaginatedEntityTimeSeriesSource] Real errors found: {}", result.getErrors().size()); |
| keysetCursor, | ||
| cachedTotalCount, | ||
| true, | ||
| Entity.getFields(entityType, fields)); | ||
| Entity.getOnlySupportedFields(entityType, fields)); | ||
|
|
| // Always recreate | ||
| true, |
This reverts commit 23578b0.
| LOG.warn( | ||
| "Skipping reader failure record for entityType={}: entityId is null, message={}", | ||
| entityType, | ||
| entityError.getMessage()); | ||
| continue; |
| String message = error.getMessage().toLowerCase(); | ||
| return message.contains("instance for") | ||
| || message.contains("entity not found") | ||
| || message.contains("does not exist") | ||
| || message.contains("entitynotfoundexception") |
Address two review concerns on PR #27800: 1. The stale-reference matcher missed EntityNotFoundException's primary factory messages — byId ("Entity with id [...] not found."), byName ("Entity with name [...] not found."), byVersion ("Entity with id [...] and version [...] not found."), and byParserSchema ("Parser schema not found ..."). These would have been classified as real failures instead of warnings. Add specific contains() patterns ("entity with id", "entity with name", "parser schema not found") that match each factory's exact prefix without reintroducing the over-broad bare "not found" check. byFilter ("Entity not found for query params [...]") was already covered by "entity not found". 2. PartitionWorker.recordReaderFailures emitted WARN per error when entityId was null. EntityTimeSeriesRepository builds EntityError with only a message (no entity reference), so every time-series error hit that branch — log spam under load. Downgrade to DEBUG with a comment explaining why the id is absent. Tests: ReindexingUtilStaleRelationshipTest now exercises every EntityNotFoundException factory message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Both concerns addressed in 0601641. 1. Missed
Specific patterns instead of bare 2. WARN log spam in |
…adoc Address remaining Copilot comments on PR #27800: 1. EntityTimeSeriesRepository.getResultList(...) returns ResultList with errors=null on the success path. PaginatedEntityTimeSeriesSource was reading result.getErrors().size() / .isEmpty() right after, which would NPE on the common no-error path. Normalize errors to an empty list inside filterStaleRelationshipErrors so callers can rely on non-null. 2. ReindexingUtil.isStaleReferenceError now lowercases with Locale.ROOT instead of the platform default — avoids Turkish-locale style edge cases where 'I' lowercases differently and substring matches fail. 3. PaginatedEntityTimeSeriesSource.read() was logging every real reader error message at WARN. For large failed batches this floods logs. Switch to a single WARN with the error count plus the first 5 message details at DEBUG (gated by isDebugEnabled). 4. OmAppJobListener.fillTerminalTimings Javadoc claimed "does nothing if endTime is already set" but the body still backfills executionTime in that case. Rewrite to accurately describe the per-field idempotency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed 4 more Copilot comments in eac76b6:
Skipping the 109/109 tests still pass. |
Code Review 👍 Approved with suggestions 2 resolved / 3 findingsRefactors search reindexing logic to classify stale parentOf relationship errors as warnings rather than failures, preventing batch interruptions. Consider updating the "Always recreate" comment to explain the reasoning behind the forced recreation. 💡 Quality: Comment explains "what" not "why" for forced recreateIndexThe comment Suggested fix✅ 2 resolved✅ Bug: readWithCursor and readNextKeyset never update warning stats
✅ Edge Case: isEntityNotFoundError matcher is overly broad for 'not found'
🤖 Prompt for agentsOptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
| /** | ||
| * Relationship/enrichment fields fetched by {@code EntityRepository.setFields} that every search | ||
| * document populates via {@link #populateCommonFields(Map, EntityInterface, String)}. Stored-JSON | ||
| * fields (name, displayName, description, service, entity-native counts) are NOT in this set — | ||
| * they live on the entity row and need no extra fetch. | ||
| */ | ||
| Set<String> COMMON_REINDEX_FIELDS = | ||
| Set.of( | ||
| "owners", | ||
| "domains", | ||
| "reviewers", | ||
| "followers", | ||
| "votes", | ||
| "extension", | ||
| "tags", | ||
| "certification", | ||
| "dataProducts"); |
| /** | ||
| * Returns true when an EntityError represents a stale reference — either a missing entity | ||
| * (canonical {@code EntityNotFoundException}) or a missing entity_relationship row (raised by | ||
| * {@code EntityRepository.ensureSingleRelationship} as "does not have expected relationship | ||
| * ..."). Both are expected during reindexing of long-lived records: e.g. a | ||
| * {@code testCaseResolutionStatus} migrated without a corresponding {@code parentOf} row, or | ||
| * an entity hard-deleted out-of-band leaving its relationship rows behind. Such records | ||
| * cannot be meaningfully indexed and are reported as warnings rather than failing the entire | ||
| * batch. | ||
| * | ||
| * <p>The patterns are deliberately specific so we do not misclassify unrelated errors that | ||
| * happen to contain {@code "not found"} (e.g. {@code "Column 'foo' not found in result set"} | ||
| * or {@code "SSL certificate not found"}). They cover every {@code EntityNotFoundException} | ||
| * factory message ({@code byId}, {@code byName}, {@code byFilter}, {@code byVersion}, | ||
| * {@code byParserSchema}) plus the legacy {@code CatalogExceptionMessage.entityNotFound} | ||
| * format and the relationship-not-found shape. | ||
| */ | ||
| public static boolean isStaleReferenceError(EntityError error) { |
|



Summary
testCaseResolutionStatusentities whoseparentOfentity_relationshiprow to atestCaseis missing.EntityRepository.ensureSingleRelationshipraises"... does not have expected relationship parentOf to/from entity type testCase"(viaCatalogExceptionMessage.entityRelationshipNotFound).PaginatedEntityTimeSeriesSourcewas returning these as failures, killing the batch. The orphans themselves originate from the 1.4.0migrateTestCaseResolutionmigration (MigrationUtil.java:80–124), which silently skips rows with a nulltestCaseReferenceor that hit any per-row exception — leaving the time-series row in place with noentity_relationshipPARENT_OF row.PaginatedEntitiesSourceand broadens it to also catch the relationship-not-found message.Changes
ReindexingUtil: extract sharedisEntityNotFoundErrormatcher (now matches"expected relationship"in addition to"not found","instance for","does not exist","entitynotfoundexception"); addpartitionErrorshelper.PaginatedEntitiesSource: replace local matcher copy with the shared helper. Same behavior for entities, plus the broader matcher.PaginatedEntityTimeSeriesSource: apply the warning-vs-failure split toread,readWithCursor, andreadNextKeyset. AddsfilterStaleRelationshipErrors, setsresult.setWarningsCount(...)(already consumed bySearchIndexExecutorandPartitionWorker), and adds a 3-argupdateStats(success, failed, warnings)overload to record warnings on the source'sStepStats.Why this is safe
warningRecordsand in DEBUG logs, so they remain observable.failedRecordsand still setlastFailedCursorso the failure cursor is preserved.warningRecordsalready exists ineventPublisherJob.jsonand is already populated byPaginatedEntitiesSource.Follow-up not in this PR
testCaseResolutionStatusrows (those whoseentity_relationshipparentOf row is missing AND whose JSONtestCaseReferenceis null) to actually shrink the orphan set in production. This PR just stops them from failing reindex.Test plan
ReindexingUtilStaleRelationshipTest— 5 tests covering matcher + partitioning, including the exactensureSingleRelationshipmessage formatPaginatedEntityTimeSeriesSourceStaleRelationshipTest— 4 tests coveringreadNextandreadWithCursorwith stale-only, mixed (stale + real), and exception cases; assertsresult.getWarningsCount(),result.getErrors(), and sourceStepStatsEntityReaderLifecycleTest(8),SearchIndexExecutorControlFlowTest(60),PartitionWorkerTest(30) all still passmvn spotless:applycleantestCaseResolutionStatus, confirmwarningRecords > 0,failedRecords == 0, and run completes🤖 Generated with Claude Code
Summary by Gitar
PaginatedEntityTimeSeriesSourcetoMAX_ERROR_DETAILS_LOGGEDto prevent log flooding.result.getErrors()infilterStaleRelationshipErrorsto ensure non-null list for downstream processing.ReindexingUtil.isStaleReferenceErrorto useLocale.ROOTfor case-insensitive string matching to ensure locale-stability.OmAppJobListener.fillTerminalTimingsto independently computeexecutionTimeifendTimeis already present.This will update automatically on new commits.