Conversation
There was a problem hiding this comment.
Pull request overview
Adds asynchronous CSV export for Explore/search results, with UI wiring, backend endpoint support, and WebSocket-based progress/completion notifications.
Changes:
- UI: adds an “Export” dropdown in Explore and a REST helper to start
/search/exportAsyncjobs. - Backend: introduces CSV row formatting for search hits and a batched export flow using
search_after, exposed via a new/search/exportAsyncendpoint. - Tests: adds a Playwright e2e test for the new Explore export UI and a Java unit test suite for CSV formatting.
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-ui/src/main/resources/ui/src/rest/searchAPI.ts | Adds REST client helper to initiate async search CSV export. |
| openmetadata-ui/src/main/resources/ui/src/components/ExploreV1/ExploreV1.component.tsx | Adds Explore export dropdown and hooks it into the export modal/provider. |
| openmetadata-ui/src/main/resources/ui/playwright/e2e/Features/SearchExport.spec.ts | Adds e2e coverage for the Explore export dropdown + modal. |
| openmetadata-ui/src/main/resources/ui/src/locale/languages/en-us.json | Adds new translation keys used by the export UI. |
| openmetadata-ui/src/main/resources/ui/src/locale/languages/zh-tw.json | Adds new translation keys used by the export UI. |
| openmetadata-ui/src/main/resources/ui/src/locale/languages/zh-cn.json | Adds new translation keys used by the export UI. |
| openmetadata-ui/src/main/resources/ui/src/locale/languages/tr-tr.json | Adds new translation keys used by the export UI. |
| openmetadata-ui/src/main/resources/ui/src/locale/languages/th-th.json | Adds new translation keys used by the export UI. |
| openmetadata-ui/src/main/resources/ui/src/locale/languages/ru-ru.json | Adds new translation keys used by the export UI. |
| openmetadata-ui/src/main/resources/ui/src/locale/languages/pt-pt.json | Adds new translation keys used by the export UI. |
| openmetadata-ui/src/main/resources/ui/src/locale/languages/pt-br.json | Adds new translation keys used by the export UI. |
| openmetadata-ui/src/main/resources/ui/src/locale/languages/pr-pr.json | Adds new translation keys used by the export UI. |
| openmetadata-ui/src/main/resources/ui/src/locale/languages/nl-nl.json | Adds new translation keys used by the export UI. |
| openmetadata-ui/src/main/resources/ui/src/locale/languages/mr-in.json | Adds new translation keys used by the export UI. |
| openmetadata-ui/src/main/resources/ui/src/locale/languages/ko-kr.json | Adds new translation keys used by the export UI. |
| openmetadata-ui/src/main/resources/ui/src/locale/languages/ja-jp.json | Adds new translation keys used by the export UI. |
| openmetadata-ui/src/main/resources/ui/src/locale/languages/he-he.json | Adds new translation keys used by the export UI. |
| openmetadata-ui/src/main/resources/ui/src/locale/languages/gl-es.json | Adds new translation keys used by the export UI. |
| openmetadata-ui/src/main/resources/ui/src/locale/languages/fr-fr.json | Adds new translation keys used by the export UI. |
| openmetadata-ui/src/main/resources/ui/src/locale/languages/es-es.json | Adds new translation keys used by the export UI. |
| openmetadata-ui/src/main/resources/ui/src/locale/languages/de-de.json | Adds new translation keys used by the export UI. |
| openmetadata-ui/src/main/resources/ui/src/locale/languages/ar-sa.json | Adds new translation keys used by the export UI. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/SearchResultCsvExporter.java | New utility to render ES _source maps into CSV rows with escaping and field extraction. |
| openmetadata-service/src/test/java/org/openmetadata/service/search/SearchResultCsvExporterTest.java | Unit tests for CSV escaping/field extraction (needs assertion fixes). |
| openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java | Implements batched CSV export of search hits with progress callbacks. |
| openmetadata-service/src/main/java/org/openmetadata/service/resources/search/SearchResource.java | Adds /search/exportAsync endpoint to start async export job and emit WebSocket notifications. |
| void testExportSourceFieldsContainsRequiredFields() { | ||
| List<String> fields = SearchResultCsvExporter.EXPORT_SOURCE_FIELDS; | ||
|
|
||
| assert fields.contains("entityType"); | ||
| assert fields.contains("name"); |
There was a problem hiding this comment.
These checks use the Java assert keyword, which is commonly disabled in unit test runs unless -ea is enabled. As a result, the test may not actually validate EXPORT_SOURCE_FIELDS in CI. Replace these with JUnit assertions (e.g., assertTrue / assertAll).
TeddyCr
left a comment
There was a problem hiding this comment.
Should we have a progress bar? I have a vague memory we do that somewhere. i am assuming all data asset is everything in OpenMetadata right? That might take a while.
if (progressCallback != null) {
progressCallback.onProgress(
exported, totalHits, String.format("Exported %d of %d rows", exported, totalHits));
}I see we have that in the code as well, maybe this is something we can leverage to report progress back?
This comment was marked as outdated.
This comment was marked as outdated.
| writer.write(SearchResultCsvExporter.CSV_HEADER); | ||
| writer.newLine(); | ||
| writer.flush(); | ||
|
|
||
| if (totalHits > 0) { | ||
| writeCsvBatches(baseRequest, subjectContext, writer, totalHits, from); | ||
| } | ||
| } finally { | ||
| writer.flush(); | ||
| } | ||
| } | ||
|
|
||
| private void writeCsvBatches( | ||
| SearchRequest baseRequest, | ||
| SubjectContext subjectContext, |
There was a problem hiding this comment.
resolveSortField currently rewrites _score (and empty) sorts to fullyQualifiedName. That makes CSV export ordering differ from the actual search ordering when the UI requests sort_field=_score (default), and it prevents leveraging the _score + tiebreaker logic you added in the search managers. Consider preserving _score when explicitly requested, and only falling back to fullyQualifiedName when the sort field is missing/blank; also special-case _score in buildSourceFields so it’s not added to _source.
| const url = URL.createObjectURL(blob); | ||
| const a = document.createElement('a'); | ||
| a.href = url; | ||
| a.download = `Search_Results_${new Date().toISOString()}.csv`; |
There was a problem hiding this comment.
new Date().toISOString() contains characters (notably :) that produce invalid filenames on Windows. Consider formatting the timestamp into a filesystem-safe form (e.g., replace : with - or use a dedicated date formatter) before setting a.download.
| a.download = `Search_Results_${new Date().toISOString()}.csv`; | |
| a.download = `Search_Results_${new Date() | |
| .toISOString() | |
| .replace(/:/g, '-')}.csv`; |
| <Download01 height={16} width={16} /> | ||
| } | ||
| size="sm" | ||
| onClick={handleOpenExportScopeModal}> |
There was a problem hiding this comment.
@openmetadata/ui-core-components Button is based on react-aria-components and supports onPress for accessible mouse/keyboard interactions. Using onClick here can miss non-mouse activation paths depending on how the underlying component is wired. Prefer onPress={handleOpenExportScopeModal} for consistency with other usages (e.g., ExportGraphPanel).
| onClick={handleOpenExportScopeModal}> | |
| onPress={handleOpenExportScopeModal}> |
| <div | ||
| className={`export-scope-option-card${ | ||
| exportScope === 'visible' ? ' selected' : '' | ||
| }`} | ||
| onClick={() => setExportScope('visible')}> | ||
| <div |
There was a problem hiding this comment.
The export scope “cards” are clickable <div> elements without keyboard/focus semantics. If the intent is that the whole card is selectable, consider rendering them as <label> elements bound to the radio inputs, or add appropriate role, tabIndex, and key handlers so keyboard users can activate the card area.
| description = | ||
| "Maximum number of rows to export. When null, exports all matching results up to the hard cap.") | ||
| @QueryParam("size") | ||
| Integer size, | ||
| @Parameter( | ||
| description = | ||
| "Starting offset for export. Use with size to export a specific page of results (e.g., from=30&size=15 for page 3).") | ||
| @DefaultValue("0") | ||
| @QueryParam("from") | ||
| int from) |
There was a problem hiding this comment.
The export endpoint accepts from and size but does not validate that they are non-negative. A negative from (or size) can lead to surprising behavior (e.g., effectiveTotal increasing) or downstream search errors. Consider rejecting negative values with a clear 400 response.
| await page.route('**/api/v1/search/export?*', async (route) => { | ||
| await route.fulfill({ | ||
| status: 200, | ||
| contentType: 'text/csv', | ||
| headers: { | ||
| 'Content-Disposition': 'attachment; filename="search_export.csv"', | ||
| }, | ||
| body: 'Entity Type,Service Name,Service Type,FQN,Name,Display Name,Description,Owners,Tags,Glossary Terms,Domains,Tier\ntable,mysql,Mysql,sample_data.ecommerce_db.shopify.dim_address,dim_address,dim_address,,,,,,', | ||
| }); | ||
| }); | ||
|
|
||
| await openExportScopeModal(page); | ||
|
|
||
| await test.step('Export button shows loading state while downloading', async () => { | ||
| await page.route('**/api/v1/search/export?*', async (route) => { | ||
| await new Promise<void>((resolve) => setTimeout(resolve, 1500)); | ||
| await route.fulfill({ | ||
| status: 200, | ||
| contentType: 'text/csv', | ||
| body: 'Entity Type\ntable', | ||
| }); | ||
| }); | ||
|
|
There was a problem hiding this comment.
This test registers multiple page.route('**/api/v1/search/export?*', ...) handlers in the same test. Depending on Playwright routing precedence, this can be flaky or cause the later handler to never run. Consider using a single route handler that can emulate both behaviors, or unroute the previous handler before re-registering.
| createTestTable(ns, "export_size_test_1"); | ||
| createTestTable(ns, "export_size_test_2"); | ||
| createTestTable(ns, "export_size_test_3"); | ||
|
|
||
| Thread.sleep(2000); | ||
|
|
There was a problem hiding this comment.
Using Thread.sleep(2000) to wait for search indexing makes the test timing-dependent and potentially flaky under load/slow CI. Prefer polling the search/export endpoint until the created entities appear (with a bounded timeout) instead of sleeping a fixed duration.
| Thread.sleep(2000); | ||
|
|
There was a problem hiding this comment.
Same as above: this fixed Thread.sleep(2000) makes the pagination export test timing-dependent. Consider replacing it with a retry/poll loop that waits until the expected search results are visible before asserting pagination behavior.
| Thread.sleep(2000); | |
| String visibilityCheckPath = | |
| "/v1/search/export?q=export_page_test&index=table_search_index" | |
| + "&sort_field=name.keyword&sort_order=asc&from=0&size=3"; | |
| long deadlineNanos = System.nanoTime() + Duration.ofSeconds(15).toNanos(); | |
| HttpResponse<String> visibilityResponse = null; | |
| boolean resultsVisible = false; | |
| while (System.nanoTime() < deadlineNanos) { | |
| visibilityResponse = httpGetExport(visibilityCheckPath); | |
| if (visibilityResponse.statusCode() == 200) { | |
| String[] visibleLines = visibilityResponse.body().split("\n"); | |
| if (visibleLines.length >= 4) { | |
| resultsVisible = true; | |
| break; | |
| } | |
| } | |
| Thread.sleep(200); | |
| } | |
| assertTrue( | |
| resultsVisible, | |
| () -> | |
| "Expected exported search results to become visible before asserting pagination. " | |
| + "Last status: " | |
| + (visibilityResponse == null ? "no response" : visibilityResponse.statusCode()) | |
| + ", last body: " | |
| + (visibilityResponse == null ? "" : visibilityResponse.body())); |
| .withIncludeSourceFields(sourceFields) | ||
| .withSortFieldParam(sortField) | ||
| .withSortOrder(baseRequest.getSortOrder() != null ? baseRequest.getSortOrder() : "asc") | ||
| .withSearchAfter(searchAfter); |
There was a problem hiding this comment.
CSV export uses search_after pagination, but the batch request only passes a single sortFieldParam (and the ES/OS request builders currently only add a tiebreaker when sorting by _score, using name.keyword). For non-unique sort fields (e.g. name.keyword, service.name.keyword), this can still lead to skipped/duplicated rows across batches. Add a deterministic secondary sort (e.g. fullyQualifiedName asc) whenever the primary sort is neither _score nor already fullyQualifiedName, so hit.sort[] always includes a stable tiebreaker for search_after.
| a.href = url; | ||
| a.download = `Search_Results_${new Date().toISOString()}.csv`; | ||
| a.click(); | ||
| URL.revokeObjectURL(url); |
There was a problem hiding this comment.
The download cleanup revokes the object URL immediately after a.click(). This is known to be unreliable in some browsers; elsewhere in the UI (e.g. src/rest/rdfAPI.ts) the URL revocation is deferred via setTimeout and the link is appended/removed from the DOM. Consider following the same pattern here, and also sanitize toISOString() (it contains :) to avoid problematic filenames on some platforms.
| a.href = url; | |
| a.download = `Search_Results_${new Date().toISOString()}.csv`; | |
| a.click(); | |
| URL.revokeObjectURL(url); | |
| const sanitizedTimestamp = new Date().toISOString().replace(/:/g, '-'); | |
| a.href = url; | |
| a.download = `Search_Results_${sanitizedTimestamp}.csv`; | |
| document.body.appendChild(a); | |
| a.click(); | |
| setTimeout(() => { | |
| document.body.removeChild(a); | |
| URL.revokeObjectURL(url); | |
| }, 0); |
Remove WebsocketNotificationHandler UUID overloads (dead code after async endpoint deletion) and revert OntologyExplorer files that are unrelated to search export. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create SubjectContext once in exportSearchResults and pass to buildExportSearchRequest instead of constructing it twice - Remove inaccurate count from "All matching assets" option since it showed the current tab total, not the cross-index total Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When the export modal opens, fire a lightweight count query against the dataAsset index to show the real total instead of the current tab's count. The count renders once available; if the query fails the modal remains usable without a count. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Code Review 👍 Approved with suggestions 18 resolved / 21 findingsExport functionality for search now properly fetches cross-index counts for the 'All matching assets' option and addresses duplicate context construction, resolving 17 issues including memory leaks, pagination bugs, and CSV injection vulnerabilities. Consider constraining the sample data switches to enforce reciprocal relationships between store and read toggles. 💡 Edge Case: SAML metadata upload clears existing IDP fields on parse errorIn Consider either leaving existing fields unchanged on error, or showing a more prominent warning. Suggested fix💡 Quality: Redundant ternary:
|
| Compact |
|
Was this helpful? React with 👍 / 👎 | Gitar
| const currentPage = isString(parsedSearch.page) | ||
| ? Number.parseInt(parsedSearch.page, 10) || 1 | ||
| : 1; | ||
| const pageSize = isString(parsedSearch.size) | ||
| ? Number.parseInt(parsedSearch.size, 10) || visibleResultCount | ||
| : visibleResultCount; | ||
|
|
||
| params.size = visibleResultCount; | ||
| params.from = (currentPage - 1) * pageSize; |
There was a problem hiding this comment.
For the "Visible results" export scope, pageSize falls back to visibleResultCount when ?size= is missing, but the actual pagination size comes from user preferences (globalPageSize) / parseSearchParams (see utils/ExploreUtils.tsx:473-477 and SearchedData.tsx:101-130). On a partially-filled last page (or a URL with page but no size), this can compute an incorrect from offset and export the wrong rows. Consider deriving pageSize using the same logic as parseSearchParams/SearchedData (or guaranteeing size is always present in the URL before exporting).
| const currentPage = isString(parsedSearch.page) | |
| ? Number.parseInt(parsedSearch.page, 10) || 1 | |
| : 1; | |
| const pageSize = isString(parsedSearch.size) | |
| ? Number.parseInt(parsedSearch.size, 10) || visibleResultCount | |
| : visibleResultCount; | |
| params.size = visibleResultCount; | |
| params.from = (currentPage - 1) * pageSize; | |
| const parsedCurrentPage = isString(parsedSearch.page) | |
| ? Number.parseInt(parsedSearch.page, 10) | |
| : Number.NaN; | |
| const parsedPageSize = isString(parsedSearch.size) | |
| ? Number.parseInt(parsedSearch.size, 10) | |
| : Number.NaN; | |
| const currentPage = parsedCurrentPage > 0 ? parsedCurrentPage : 1; | |
| const pageSize = parsedPageSize > 0 ? parsedPageSize : undefined; | |
| params.size = visibleResultCount; | |
| if (!isUndefined(pageSize)) { | |
| params.from = (currentPage - 1) * pageSize; | |
| } |
| const blob = await exportSearchResultsCsvStream(params); | ||
| const url = URL.createObjectURL(blob); | ||
| const a = document.createElement('a'); | ||
| a.href = url; | ||
| a.download = `Search_Results_${new Date().toISOString()}.csv`; | ||
| a.click(); | ||
| URL.revokeObjectURL(url); | ||
| setShowExportScopeModal(false); |
There was a problem hiding this comment.
The download filename uses new Date().toISOString(), which includes : characters (invalid in Windows filenames) and may lead to a sanitized/incorrect filename in some environments. Prefer formatting the timestamp to a filesystem-safe form (e.g., YYYYMMDD_HHmmss) like other downloads in the UI (e.g., AuditLogsPage.tsx).
| <div | ||
| className={`export-scope-option-card${ | ||
| exportScope === 'visible' ? ' selected' : '' | ||
| }`} | ||
| onClick={() => setExportScope('visible')}> | ||
| <div | ||
| className={`d-flex items-start gap-2 border-radius-sm tw:p-4 border ${ | ||
| exportScope === 'visible' | ||
| ? 'tw:border-brand' | ||
| : 'tw:border-secondary' | ||
| }`}> | ||
| <Radio value="visible" /> | ||
| <div> | ||
| <div className="d-flex items-center gap-2"> | ||
| <CoreTypography | ||
| className="tw:text-primary d-flex items-center tw:gap-0.5" | ||
| size="text-sm" | ||
| weight="semibold"> | ||
| {`${t('label.visible-result-plural')} `} | ||
| <CoreTypography | ||
| className="tw:text-tertiary" | ||
| size="text-sm" | ||
| weight="regular"> | ||
| ({visibleResultCount} {t('label.result-plural')}) | ||
| </CoreTypography> | ||
| </CoreTypography> | ||
| </div> | ||
| <CoreTypography | ||
| className="tw:text-tertiary" | ||
| size="text-sm" | ||
| weight="regular"> | ||
| {t('message.export-visible-results-description')} | ||
| </CoreTypography> | ||
| </div> | ||
| </div> | ||
| </div> | ||
| <div | ||
| className={`export-scope-option-card${ | ||
| exportScope === 'all' ? ' selected' : '' | ||
| }`} | ||
| onClick={() => setExportScope('all')}> | ||
| <div |
There was a problem hiding this comment.
The export-scope option cards are clickable <div> elements (onClick) without button semantics. This makes the selection harder/impossible to use via keyboard and less accessible to assistive tech. Consider using a <label> associated with the radio, or giving the clickable wrapper appropriate role="radio"/tabIndex + key handlers, or relying on the Radio label click behavior instead of a custom clickable div.
|
|
|
Failed to cherry-pick changes to the 1.12.5 branch. |



Describe your changes:
I worked on adding streaming CSV export functionality for search results and improving pagination stability because the async export approach had memory and WebSocket payload limitations, and
search_afterpagination was non-deterministic for non-unique sort fields.Screen.Recording.2026-04-02.at.7.17.45.PM.mov
Type of change:
Checklist:
Fixes <issue-number>: <short explanation>Backend export endpoint:
GET /v1/search/exportinSearchRepositoryandSearchResource/v1/search/exportAsyncin favor of streaming approachwriteCsvBatchesimplementation to prevent logic driftfullyQualifiedNameas a deterministic tiebreaker sort inElasticSearchSearchManagerandOpenSearchSearchManagerfor all non-_score, non-fullyQualifiedNamesort fields — ensures stablesearch_afterpagination and prevents skipped/duplicated rows during CSV export_sourceviabuildSourceFieldsto ensure completeness (note: sort values forsearch_aftercome fromhit.sort[], not_source)Frontend export integration:
ExploreV1component to use streamingexportSearchResultsCsvStreamAPI instead of async modalUI simplifications:
ExportGraphPanel, now exports PNG directlyTest updates:
SearchExport.spec.tsto verify streaming API endpoint instead of async exportBug fixes:
useGraphData.tsmap assignmentMcpExecutionResourcefor paired timestamp parametersfixes 2837