Conversation
…n and fix row-count accounting
Fixes two bugs in recursive CSV import (PUT /api/v1/services/databaseServices/name/{svc}/import?recursive=true):
1. **Wrong entityType in extension validation**: All rows in a recursive import were validated
against the top-level entityType (e.g. "database"), so custom properties registered on "table"
triggered false "Unknown custom field" errors. Added `rowEntityType` field on EntityCsv set
per-row in DatabaseServiceRepository, DatabaseRepository, and DatabaseSchemaRepository.
2. **Row-count accounting bugs**:
- Header row was counted as a processed/passed row
- `getRecordNumber()` is 1-indexed including the header, so subtract 1 from all processed counts
- Multiple field failures on one row incremented `numberOfRowsFailed` once per field; added
`countedFailureRecords` Set to deduplicate per-row failure counting
Fixes: open-metadata/openmetadata-collate#3744
- Update EntityCsvTest assertSummary counts and queuePendingTableUpdate to match corrected behavior
- Add unit tests: header not counted, multi-field dedup, rowEntityType override
- Add IT test in DatabaseServiceResourceIT for recursive import with custom property extension
- Fix GlossaryResourceIT and TestCaseResourceIT expected counts
There was a problem hiding this comment.
Pull request overview
This PR fixes CSV import correctness for recursive entity imports by (1) validating extension custom properties against the per-row entityType (instead of the top-level CSV entity type) and (2) correcting row-count accounting to exclude the header row and deduplicate per-row failure counting.
Changes:
- Add per-row entity typing for extension validation via
rowEntityType+currentEntityType()and set it in recursive CSV importers. - Fix row counts: exclude header from processed/passed, apply
getRecordNumber() - 1, and dedupe failure counting per record. - Update/add unit + integration tests to match and cover the corrected behavior.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-service/src/main/java/org/openmetadata/csv/EntityCsv.java | Core fixes: per-row entity type for extension validation; corrected processed/passed/failed accounting and failure dedup. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DatabaseServiceRepository.java | Set rowEntityType per recursive row before dispatching entity-specific creation. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DatabaseRepository.java | Set rowEntityType per recursive row for schema/table/SP/column import dispatch. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DatabaseSchemaRepository.java | Set rowEntityType per recursive row for table/SP/column import dispatch. |
| openmetadata-service/src/test/java/org/openmetadata/csv/EntityCsvTest.java | Update expected counts and add unit coverage for header exclusion, deduped failures, and row entity type override. |
| openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/DatabaseServiceResourceIT.java | Add IT to validate recursive import custom-property extension behavior + deduped failure counting. |
| openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/GlossaryResourceIT.java | Update expected processed/passed counts to exclude header row. |
| openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/TestCaseResourceIT.java | Update expected processed counts to exclude header row. |
| CSVRecord csvRecord = getNextRecord(printer, csvRecords); | ||
|
|
||
| // Get entityType and fullyQualifiedName if provided | ||
| String entityType = csvRecord.size() > 12 ? csvRecord.get(12) : DATABASE; | ||
| String entityFQN = csvRecord.size() > 13 ? csvRecord.get(13) : null; | ||
| rowEntityType = entityType; | ||
|
|
||
| if (DATABASE.equals(entityType)) { |
There was a problem hiding this comment.
getNextRecord(...) can return null (e.g., invalid column count / missing required values). createEntityWithRecursion dereferences csvRecord immediately (csvRecord.size() / csvRecord.get(...)) which can throw a NullPointerException and abort the import instead of reporting the row-level failure. Add the same if (csvRecord == null) { return; } guard used in the other recursive CSV implementations before reading entityType/entityFQN (and consider recording an import failure for unknown/invalid rows if not already handled).
🟡 Playwright Results — all passed (19 flaky)✅ 3692 passed · ❌ 0 failed · 🟡 19 flaky · ⏭️ 89 skipped
🟡 19 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
Playwright E2E tests and IT cleanup code were written against the old row-count behavior where the header row was counted in processed/passed. Now that EntityCsv correctly excludes the header, update all hardcoded counts (N+1 -> N) and fix the custom property cleanup in DatabaseServiceResourceIT to use PATCH instead of a non-existent DELETE endpoint.
34add12 to
f6a7753
Compare
Code Review ✅ ApprovedCorrects the entityType validation during recursive import and ensures accurate row-count accounting. No issues found. OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
|
|
Failed to cherry-pick changes to the 1.12.7 branch. |
… extension validation + row-count accounting (#27593) (#27669) Cherry-pick of 4930b10 to 1.12.7. Key changes: - CsvUtil.addExtension: filter blank/empty extension tokens on export - EntityCsv: rowEntityType field for per-row entity type override in extension validation - EntityCsv: header row excluded from numberOfRowsProcessed/Passed counts - EntityCsv: countedFailureRecords dedup to count per-row failures once - EntityCsv: skip empty-value extension tokens instead of failing - DatabaseServiceRepository/DatabaseRepository/DatabaseSchemaRepository: set rowEntityType per row - BulkEntityImportPage: only short-circuit to upload on aborted or failure+processed=0 - GlossaryImportExport.spec.ts: fix version-history row counts (3->2) for header exclusion - BulkEditEntity/BulkImport/TestCaseImportExport E2E: update row counts for header exclusion - GlossaryResourceIT/TestCaseResourceIT: update expected counts for header exclusion - DatabaseServiceResourceIT: add recursive import custom property extension IT test - EntityCsvTest/CsvUtilTest: update assertSummary counts and blank extension filter assertion
… + row-count accounting (#27593) * fix(csv): correct entity type in recursive import extension validation and fix row-count accounting Fixes two bugs in recursive CSV import (PUT /api/v1/services/databaseServices/name/{svc}/import?recursive=true): 1. **Wrong entityType in extension validation**: All rows in a recursive import were validated against the top-level entityType (e.g. "database"), so custom properties registered on "table" triggered false "Unknown custom field" errors. Added `rowEntityType` field on EntityCsv set per-row in DatabaseServiceRepository, DatabaseRepository, and DatabaseSchemaRepository. 2. **Row-count accounting bugs**: - Header row was counted as a processed/passed row - `getRecordNumber()` is 1-indexed including the header, so subtract 1 from all processed counts - Multiple field failures on one row incremented `numberOfRowsFailed` once per field; added `countedFailureRecords` Set to deduplicate per-row failure counting Fixes: open-metadata/openmetadata-collate#3744 - Update EntityCsvTest assertSummary counts and queuePendingTableUpdate to match corrected behavior - Add unit tests: header not counted, multi-field dedup, rowEntityType override - Add IT test in DatabaseServiceResourceIT for recursive import with custom property extension - Fix GlossaryResourceIT and TestCaseResourceIT expected counts * address copilot review: null guard for csvRecord + assert cleanup DELETE status * fix(csv): update test expectations to reflect header-excluded row counts Playwright E2E tests and IT cleanup code were written against the old row-count behavior where the header row was counted in processed/passed. Now that EntityCsv correctly excludes the header, update all hardcoded counts (N+1 -> N) and fix the custom property cleanup in DatabaseServiceResourceIT to use PATCH instead of a non-existent DELETE endpoint. * fix(csv): skip empty-value extension tokens instead of failing import Empty-valued tokens like 'inputformat:;outputformat:' are emitted by export when custom properties are stored as empty strings (valid per JSON Schema). Re-importing such CSVs failed with INVALID_FIELD. Treat empty value as a cleared key (skip it) since withExtension replaces the whole map. * fix(playwright): correct bulk-edit processed/passed row counts in E2E flow Missed in f6a7753 — the bulk-edit validateImportStatus step also counts only data rows now that the header is excluded, so 3→2 for passed/processed. * fix(csv): filter empty extension tokens on export + fix UI grid regression for all-fail imports - CsvUtil.addExtension: filter entries whose formatted value is empty so null/empty-list custom properties are never written as bare "key:" tokens in the exported CSV - CsvUtilTest: assert empty list key is absent from the extension string - BulkEntityImportPage: stop routing FAILURE back to upload step when rows were processed; only short-circuit for aborted or failure with processed=0 (malformed CSV) — all-fail CSVs now show the validation grid so users can inspect and fix errors - GlossaryImportExport.spec.ts: fix version-history row-count assertions ('3'→'2') missed when header was excluded from counts in the prior commit * fix(csv): filter blank extension values on export to match import-side behavior (cherry picked from commit 4930b10)
… + row-count accounting (open-metadata#27593) * fix(csv): correct entity type in recursive import extension validation and fix row-count accounting Fixes two bugs in recursive CSV import (PUT /api/v1/services/databaseServices/name/{svc}/import?recursive=true): 1. **Wrong entityType in extension validation**: All rows in a recursive import were validated against the top-level entityType (e.g. "database"), so custom properties registered on "table" triggered false "Unknown custom field" errors. Added `rowEntityType` field on EntityCsv set per-row in DatabaseServiceRepository, DatabaseRepository, and DatabaseSchemaRepository. 2. **Row-count accounting bugs**: - Header row was counted as a processed/passed row - `getRecordNumber()` is 1-indexed including the header, so subtract 1 from all processed counts - Multiple field failures on one row incremented `numberOfRowsFailed` once per field; added `countedFailureRecords` Set to deduplicate per-row failure counting Fixes: open-metadata/openmetadata-collate#3744 - Update EntityCsvTest assertSummary counts and queuePendingTableUpdate to match corrected behavior - Add unit tests: header not counted, multi-field dedup, rowEntityType override - Add IT test in DatabaseServiceResourceIT for recursive import with custom property extension - Fix GlossaryResourceIT and TestCaseResourceIT expected counts * address copilot review: null guard for csvRecord + assert cleanup DELETE status * fix(csv): update test expectations to reflect header-excluded row counts Playwright E2E tests and IT cleanup code were written against the old row-count behavior where the header row was counted in processed/passed. Now that EntityCsv correctly excludes the header, update all hardcoded counts (N+1 -> N) and fix the custom property cleanup in DatabaseServiceResourceIT to use PATCH instead of a non-existent DELETE endpoint. * fix(csv): skip empty-value extension tokens instead of failing import Empty-valued tokens like 'inputformat:;outputformat:' are emitted by export when custom properties are stored as empty strings (valid per JSON Schema). Re-importing such CSVs failed with INVALID_FIELD. Treat empty value as a cleared key (skip it) since withExtension replaces the whole map. * fix(playwright): correct bulk-edit processed/passed row counts in E2E flow Missed in f6a7753 — the bulk-edit validateImportStatus step also counts only data rows now that the header is excluded, so 3→2 for passed/processed. * fix(csv): filter empty extension tokens on export + fix UI grid regression for all-fail imports - CsvUtil.addExtension: filter entries whose formatted value is empty so null/empty-list custom properties are never written as bare "key:" tokens in the exported CSV - CsvUtilTest: assert empty list key is absent from the extension string - BulkEntityImportPage: stop routing FAILURE back to upload step when rows were processed; only short-circuit for aborted or failure with processed=0 (malformed CSV) — all-fail CSVs now show the validation grid so users can inspect and fix errors - GlossaryImportExport.spec.ts: fix version-history row-count assertions ('3'→'2') missed when header was excluded from counts in the prior commit * fix(csv): filter blank extension values on export to match import-side behavior



Summary
?recursive=trueCSV import were validated against the top-level entityType (e.g."database"), so custom properties registered ontableordatabaseSchematriggered false#INVALID_FIELD: Unknown custom fielderrors. Fixed by adding arowEntityTypefield onEntityCsvset per-row inDatabaseServiceRepository,DatabaseRepository, andDatabaseSchemaRepositorybefore dispatching each row.CSVRecord.getRecordNumber()is 1-indexed including the header (subtract 1 everywhere), (c) multiple field failures on one row incrementednumberOfRowsFailedonce per field — addedcountedFailureRecordsSet to deduplicate per-row failure counting.CsvUtil.addExtensionwas emitting barekey:tokens for null/empty-list custom properties. Re-importing such a CSV would fail on the empty value. Fixed by filtering out entries whose formatted value is empty before building the extension string.ApiStatus.FAILUREinstead ofPARTIAL_SUCCESS.BulkEntityImportPagewas short-circuiting onfailureand bouncing the user back to the upload step, so the validation grid never rendered and users couldn't see what failed. Fixed by only short-circuiting forabortedorfailurewithnumberOfRowsProcessed=0(genuinely malformed CSV); all-fail imports now fall through and show the validation grid with failures highlighted.Fixes: https://github.com/open-metadata/openmetadata-collate/issues/3744
Changes
EntityCsv.java:rowEntityTypefield +currentEntityType()helper; header exclusion from counts;-1on allgetRecordNumber()processed calls;countedFailureRecordsdedup indeferredFailure/importFailureDatabaseServiceRepository.java,DatabaseRepository.java,DatabaseSchemaRepository.java: setrowEntityTypeper-row increateEntityWithRecursionEntityCsvTest.java: fix expected counts in existing tests; add 3 new unit tests (header not counted, multi-field dedup, rowEntityType override)DatabaseServiceResourceIT.java: new IT testtest_recursiveImportCustomPropertyExtensioncovering valid extension, unknown field, and multi-field dedupGlossaryResourceIT.java,TestCaseResourceIT.java: fix expected row counts to match corrected behaviorCsvUtil.java: filter empty/null extension values inaddExtensionbefore serialising to CSVCsvUtilTest.java: assert empty-list key is absent from the extension stringBulkEntityImportPage.tsx: narrow the failure short-circuit — only reset to upload for aborted or unprocessed failures; processed-but-all-failed imports render the validation gridGlossaryImportExport.spec.ts: fix version-history row-count assertions ('3'→'2') missed in the header-exclusion sweepTest plan
mvn test -pl openmetadata-service -Dtest=EntityCsvTest— 58 tests, all greenDatabaseServiceResourceIT#test_recursiveImportCustomPropertyExtension— registerpotatoontable, import recursive CSV; assert SUCCESS + correct counts for valid, unknown-field, and dedup cases🤖 Generated with Claude Code
Summary by Gitar
getDataQualityPagePathwithobservabilityRouterClassBase.getDataQualityPagePathinBulkEntityImportPageto improve modularity in route resolution.This will update automatically on new commits.