Skip to content

fix(csv): correct entityType in recursive import extension validation + row-count accounting#27593

Merged
yan-3005 merged 9 commits intomainfrom
ram/fix-csv-recursive-import-entity-type-counts
Apr 23, 2026
Merged

fix(csv): correct entityType in recursive import extension validation + row-count accounting#27593
yan-3005 merged 9 commits intomainfrom
ram/fix-csv-recursive-import-entity-type-counts

Conversation

@yan-3005
Copy link
Copy Markdown
Contributor

@yan-3005 yan-3005 commented Apr 21, 2026

Summary

  • Bug 1 — wrong entityType in recursive import extension validation: All rows in a ?recursive=true CSV import were validated against the top-level entityType (e.g. "database"), so custom properties registered on table or databaseSchema triggered false #INVALID_FIELD: Unknown custom field errors. Fixed by adding a rowEntityType field on EntityCsv set per-row in DatabaseServiceRepository, DatabaseRepository, and DatabaseSchemaRepository before dispatching each row.
  • Bug 2 — row-count accounting: Three fixes: (a) header row was being counted as processed/passed, (b) CSVRecord.getRecordNumber() is 1-indexed including the header (subtract 1 everywhere), (c) multiple field failures on one row incremented numberOfRowsFailed once per field — added countedFailureRecords Set to deduplicate per-row failure counting.
  • Bug 3 — empty extension tokens on export: CsvUtil.addExtension was emitting bare key: tokens for null/empty-list custom properties. Re-importing such a CSV would fail on the empty value. Fixed by filtering out entries whose formatted value is empty before building the extension string.
  • Bug 4 — validation grid hidden on all-fail imports: After the header-exclusion fix (Bug 2), an all-fail CSV import returns ApiStatus.FAILURE instead of PARTIAL_SUCCESS. BulkEntityImportPage was short-circuiting on failure and bouncing the user back to the upload step, so the validation grid never rendered and users couldn't see what failed. Fixed by only short-circuiting for aborted or failure with numberOfRowsProcessed=0 (genuinely malformed CSV); all-fail imports now fall through and show the validation grid with failures highlighted.

Fixes: https://github.com/open-metadata/openmetadata-collate/issues/3744

Changes

  • EntityCsv.java: rowEntityType field + currentEntityType() helper; header exclusion from counts; -1 on all getRecordNumber() processed calls; countedFailureRecords dedup in deferredFailure/importFailure
  • DatabaseServiceRepository.java, DatabaseRepository.java, DatabaseSchemaRepository.java: set rowEntityType per-row in createEntityWithRecursion
  • EntityCsvTest.java: fix expected counts in existing tests; add 3 new unit tests (header not counted, multi-field dedup, rowEntityType override)
  • DatabaseServiceResourceIT.java: new IT test test_recursiveImportCustomPropertyExtension covering valid extension, unknown field, and multi-field dedup
  • GlossaryResourceIT.java, TestCaseResourceIT.java: fix expected row counts to match corrected behavior
  • CsvUtil.java: filter empty/null extension values in addExtension before serialising to CSV
  • CsvUtilTest.java: assert empty-list key is absent from the extension string
  • BulkEntityImportPage.tsx: narrow the failure short-circuit — only reset to upload for aborted or unprocessed failures; processed-but-all-failed imports render the validation grid
  • GlossaryImportExport.spec.ts: fix version-history row-count assertions ('3''2') missed in the header-exclusion sweep

Test plan

  • mvn test -pl openmetadata-service -Dtest=EntityCsvTest — 58 tests, all green
  • DatabaseServiceResourceIT#test_recursiveImportCustomPropertyExtension — register potato on table, import recursive CSV; assert SUCCESS + correct counts for valid, unknown-field, and dedup cases
  • Existing glossary and test-case IT counts updated
  • Glossary Playwright spec — version-history row counts + "Import validation" grid tests pass end-to-end

🤖 Generated with Claude Code


Summary by Gitar

  • Refactored Routing:
    • Replaced getDataQualityPagePath with observabilityRouterClassBase.getDataQualityPagePath in BulkEntityImportPage to improve modularity in route resolution.

This will update automatically on new commits.

…n and fix row-count accounting

Fixes two bugs in recursive CSV import (PUT /api/v1/services/databaseServices/name/{svc}/import?recursive=true):

1. **Wrong entityType in extension validation**: All rows in a recursive import were validated
   against the top-level entityType (e.g. "database"), so custom properties registered on "table"
   triggered false "Unknown custom field" errors. Added `rowEntityType` field on EntityCsv set
   per-row in DatabaseServiceRepository, DatabaseRepository, and DatabaseSchemaRepository.

2. **Row-count accounting bugs**:
   - Header row was counted as a processed/passed row
   - `getRecordNumber()` is 1-indexed including the header, so subtract 1 from all processed counts
   - Multiple field failures on one row incremented `numberOfRowsFailed` once per field; added
     `countedFailureRecords` Set to deduplicate per-row failure counting

Fixes: open-metadata/openmetadata-collate#3744

- Update EntityCsvTest assertSummary counts and queuePendingTableUpdate to match corrected behavior
- Add unit tests: header not counted, multi-field dedup, rowEntityType override
- Add IT test in DatabaseServiceResourceIT for recursive import with custom property extension
- Fix GlossaryResourceIT and TestCaseResourceIT expected counts
Copilot AI review requested due to automatic review settings April 21, 2026 13:45
@yan-3005 yan-3005 added safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch backend labels Apr 21, 2026
@yan-3005 yan-3005 self-assigned this Apr 21, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes CSV import correctness for recursive entity imports by (1) validating extension custom properties against the per-row entityType (instead of the top-level CSV entity type) and (2) correcting row-count accounting to exclude the header row and deduplicate per-row failure counting.

Changes:

  • Add per-row entity typing for extension validation via rowEntityType + currentEntityType() and set it in recursive CSV importers.
  • Fix row counts: exclude header from processed/passed, apply getRecordNumber() - 1, and dedupe failure counting per record.
  • Update/add unit + integration tests to match and cover the corrected behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
openmetadata-service/src/main/java/org/openmetadata/csv/EntityCsv.java Core fixes: per-row entity type for extension validation; corrected processed/passed/failed accounting and failure dedup.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DatabaseServiceRepository.java Set rowEntityType per recursive row before dispatching entity-specific creation.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DatabaseRepository.java Set rowEntityType per recursive row for schema/table/SP/column import dispatch.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DatabaseSchemaRepository.java Set rowEntityType per recursive row for table/SP/column import dispatch.
openmetadata-service/src/test/java/org/openmetadata/csv/EntityCsvTest.java Update expected counts and add unit coverage for header exclusion, deduped failures, and row entity type override.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/DatabaseServiceResourceIT.java Add IT to validate recursive import custom-property extension behavior + deduped failure counting.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/GlossaryResourceIT.java Update expected processed/passed counts to exclude header row.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/TestCaseResourceIT.java Update expected processed counts to exclude header row.

Comment on lines 300 to 307
CSVRecord csvRecord = getNextRecord(printer, csvRecords);

// Get entityType and fullyQualifiedName if provided
String entityType = csvRecord.size() > 12 ? csvRecord.get(12) : DATABASE;
String entityFQN = csvRecord.size() > 13 ? csvRecord.get(13) : null;
rowEntityType = entityType;

if (DATABASE.equals(entityType)) {
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getNextRecord(...) can return null (e.g., invalid column count / missing required values). createEntityWithRecursion dereferences csvRecord immediately (csvRecord.size() / csvRecord.get(...)) which can throw a NullPointerException and abort the import instead of reporting the row-level failure. Add the same if (csvRecord == null) { return; } guard used in the other recursive CSV implementations before reading entityType/entityFQN (and consider recording an import failure for unknown/invalid rows if not already handled).

Copilot uses AI. Check for mistakes.
harshach
harshach previously approved these changes Apr 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

🟡 Playwright Results — all passed (19 flaky)

✅ 3692 passed · ❌ 0 failed · 🟡 19 flaky · ⏭️ 89 skipped

Shard Passed Failed Flaky Skipped
✅ Shard 1 481 0 0 4
🟡 Shard 2 653 0 3 7
🟡 Shard 3 662 0 4 1
🟡 Shard 4 646 0 2 27
🟡 Shard 5 609 0 2 42
🟡 Shard 6 641 0 8 8
🟡 19 flaky test(s) (passed on retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Features/DataQuality/TestCaseImportExportE2eFlow.spec.ts › EditAll User: Complete export-import-validate flow (shard 2, 1 retry)
  • Features/DataQuality/TestCaseResultPermissions.spec.ts › User with only VIEW cannot PATCH results (shard 2, 1 retry)
  • Features/OntologyExplorer.spec.ts › should accept a search query in the graph search input (shard 3, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
  • Pages/ClassificationConditionalRendering.spec.ts › Should render correct content when switching between classifications (shard 3, 1 retry)
  • Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Table (shard 4, 1 retry)
  • Pages/EntityDataSteward.spec.ts › User as Owner Add, Update and Remove (shard 5, 1 retry)
  • Pages/Glossary.spec.ts › Add and Remove Assets (shard 5, 1 retry)
  • Pages/Glossary.spec.ts › Create term with related terms, tags and owners during creation (shard 6, 1 retry)
  • Pages/HyperlinkCustomProperty.spec.ts › should display URL when no display text is provided (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/Lineage/PlatformLineage.spec.ts › Verify domain platform view (shard 6, 1 retry)
  • Pages/UserDetails.spec.ts › Create team with domain and verify visibility of inherited domain in user profile after team removal (shard 6, 1 retry)
  • Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)
  • VersionPages/ServiceEntityVersionPage.spec.ts › Dashboard Service (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Copilot AI review requested due to automatic review settings April 21, 2026 17:43
@yan-3005 yan-3005 requested a review from a team as a code owner April 21, 2026 17:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Playwright E2E tests and IT cleanup code were written against the old
row-count behavior where the header row was counted in processed/passed.
Now that EntityCsv correctly excludes the header, update all hardcoded
counts (N+1 -> N) and fix the custom property cleanup in
DatabaseServiceResourceIT to use PATCH instead of a non-existent DELETE
endpoint.
@yan-3005 yan-3005 force-pushed the ram/fix-csv-recursive-import-entity-type-counts branch from 34add12 to f6a7753 Compare April 21, 2026 17:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated no new comments.

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 22, 2026

Code Review ✅ Approved

Corrects the entityType validation during recursive import and ensures accurate row-count accounting. No issues found.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copilot AI review requested due to automatic review settings April 22, 2026 17:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated no new comments.

@sonarqubecloud
Copy link
Copy Markdown

@sonarqubecloud
Copy link
Copy Markdown

@yan-3005 yan-3005 merged commit 4930b10 into main Apr 23, 2026
72 checks passed
@yan-3005 yan-3005 deleted the ram/fix-csv-recursive-import-entity-type-counts branch April 23, 2026 10:46
@github-actions
Copy link
Copy Markdown
Contributor

Failed to cherry-pick changes to the 1.12.7 branch.
Please cherry-pick the changes manually.
You can find more details here.

yan-3005 added a commit that referenced this pull request Apr 23, 2026
… extension validation + row-count accounting (#27593) (#27669)

Cherry-pick of 4930b10 to 1.12.7.

Key changes:
- CsvUtil.addExtension: filter blank/empty extension tokens on export
- EntityCsv: rowEntityType field for per-row entity type override in extension validation
- EntityCsv: header row excluded from numberOfRowsProcessed/Passed counts
- EntityCsv: countedFailureRecords dedup to count per-row failures once
- EntityCsv: skip empty-value extension tokens instead of failing
- DatabaseServiceRepository/DatabaseRepository/DatabaseSchemaRepository: set rowEntityType per row
- BulkEntityImportPage: only short-circuit to upload on aborted or failure+processed=0
- GlossaryImportExport.spec.ts: fix version-history row counts (3->2) for header exclusion
- BulkEditEntity/BulkImport/TestCaseImportExport E2E: update row counts for header exclusion
- GlossaryResourceIT/TestCaseResourceIT: update expected counts for header exclusion
- DatabaseServiceResourceIT: add recursive import custom property extension IT test
- EntityCsvTest/CsvUtilTest: update assertSummary counts and blank extension filter assertion
yan-3005 added a commit that referenced this pull request Apr 30, 2026
… + row-count accounting (#27593)

* fix(csv): correct entity type in recursive import extension validation and fix row-count accounting

Fixes two bugs in recursive CSV import (PUT /api/v1/services/databaseServices/name/{svc}/import?recursive=true):

1. **Wrong entityType in extension validation**: All rows in a recursive import were validated
   against the top-level entityType (e.g. "database"), so custom properties registered on "table"
   triggered false "Unknown custom field" errors. Added `rowEntityType` field on EntityCsv set
   per-row in DatabaseServiceRepository, DatabaseRepository, and DatabaseSchemaRepository.

2. **Row-count accounting bugs**:
   - Header row was counted as a processed/passed row
   - `getRecordNumber()` is 1-indexed including the header, so subtract 1 from all processed counts
   - Multiple field failures on one row incremented `numberOfRowsFailed` once per field; added
     `countedFailureRecords` Set to deduplicate per-row failure counting

Fixes: open-metadata/openmetadata-collate#3744

- Update EntityCsvTest assertSummary counts and queuePendingTableUpdate to match corrected behavior
- Add unit tests: header not counted, multi-field dedup, rowEntityType override
- Add IT test in DatabaseServiceResourceIT for recursive import with custom property extension
- Fix GlossaryResourceIT and TestCaseResourceIT expected counts

* address copilot review: null guard for csvRecord + assert cleanup DELETE status

* fix(csv): update test expectations to reflect header-excluded row counts

Playwright E2E tests and IT cleanup code were written against the old
row-count behavior where the header row was counted in processed/passed.
Now that EntityCsv correctly excludes the header, update all hardcoded
counts (N+1 -> N) and fix the custom property cleanup in
DatabaseServiceResourceIT to use PATCH instead of a non-existent DELETE
endpoint.

* fix(csv): skip empty-value extension tokens instead of failing import

Empty-valued tokens like 'inputformat:;outputformat:' are emitted by export
when custom properties are stored as empty strings (valid per JSON Schema).
Re-importing such CSVs failed with INVALID_FIELD. Treat empty value as a
cleared key (skip it) since withExtension replaces the whole map.

* fix(playwright): correct bulk-edit processed/passed row counts in E2E flow

Missed in f6a7753 — the bulk-edit validateImportStatus step also counts
only data rows now that the header is excluded, so 3→2 for passed/processed.

* fix(csv): filter empty extension tokens on export + fix UI grid regression for all-fail imports

- CsvUtil.addExtension: filter entries whose formatted value is empty so null/empty-list
  custom properties are never written as bare "key:" tokens in the exported CSV
- CsvUtilTest: assert empty list key is absent from the extension string
- BulkEntityImportPage: stop routing FAILURE back to upload step when rows were processed;
  only short-circuit for aborted or failure with processed=0 (malformed CSV) — all-fail
  CSVs now show the validation grid so users can inspect and fix errors
- GlossaryImportExport.spec.ts: fix version-history row-count assertions ('3'→'2')
  missed when header was excluded from counts in the prior commit

* fix(csv): filter blank extension values on export to match import-side behavior

(cherry picked from commit 4930b10)
jatinmasaram pushed a commit to jatinmasaram/OpenMetadata that referenced this pull request May 2, 2026
… + row-count accounting (open-metadata#27593)

* fix(csv): correct entity type in recursive import extension validation and fix row-count accounting

Fixes two bugs in recursive CSV import (PUT /api/v1/services/databaseServices/name/{svc}/import?recursive=true):

1. **Wrong entityType in extension validation**: All rows in a recursive import were validated
   against the top-level entityType (e.g. "database"), so custom properties registered on "table"
   triggered false "Unknown custom field" errors. Added `rowEntityType` field on EntityCsv set
   per-row in DatabaseServiceRepository, DatabaseRepository, and DatabaseSchemaRepository.

2. **Row-count accounting bugs**:
   - Header row was counted as a processed/passed row
   - `getRecordNumber()` is 1-indexed including the header, so subtract 1 from all processed counts
   - Multiple field failures on one row incremented `numberOfRowsFailed` once per field; added
     `countedFailureRecords` Set to deduplicate per-row failure counting

Fixes: open-metadata/openmetadata-collate#3744

- Update EntityCsvTest assertSummary counts and queuePendingTableUpdate to match corrected behavior
- Add unit tests: header not counted, multi-field dedup, rowEntityType override
- Add IT test in DatabaseServiceResourceIT for recursive import with custom property extension
- Fix GlossaryResourceIT and TestCaseResourceIT expected counts

* address copilot review: null guard for csvRecord + assert cleanup DELETE status

* fix(csv): update test expectations to reflect header-excluded row counts

Playwright E2E tests and IT cleanup code were written against the old
row-count behavior where the header row was counted in processed/passed.
Now that EntityCsv correctly excludes the header, update all hardcoded
counts (N+1 -> N) and fix the custom property cleanup in
DatabaseServiceResourceIT to use PATCH instead of a non-existent DELETE
endpoint.

* fix(csv): skip empty-value extension tokens instead of failing import

Empty-valued tokens like 'inputformat:;outputformat:' are emitted by export
when custom properties are stored as empty strings (valid per JSON Schema).
Re-importing such CSVs failed with INVALID_FIELD. Treat empty value as a
cleared key (skip it) since withExtension replaces the whole map.

* fix(playwright): correct bulk-edit processed/passed row counts in E2E flow

Missed in f6a7753 — the bulk-edit validateImportStatus step also counts
only data rows now that the header is excluded, so 3→2 for passed/processed.

* fix(csv): filter empty extension tokens on export + fix UI grid regression for all-fail imports

- CsvUtil.addExtension: filter entries whose formatted value is empty so null/empty-list
  custom properties are never written as bare "key:" tokens in the exported CSV
- CsvUtilTest: assert empty list key is absent from the extension string
- BulkEntityImportPage: stop routing FAILURE back to upload step when rows were processed;
  only short-circuit for aborted or failure with processed=0 (malformed CSV) — all-fail
  CSVs now show the validation grid so users can inspect and fix errors
- GlossaryImportExport.spec.ts: fix version-history row-count assertions ('3'→'2')
  missed when header was excluded from counts in the prior commit

* fix(csv): filter blank extension values on export to match import-side behavior
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants