Skip to content

Add export functionality for search#26900

Merged
harshach merged 41 commits intomainfrom
search_export
Apr 5, 2026
Merged

Add export functionality for search#26900
harshach merged 41 commits intomainfrom
search_export

Conversation

@harshach
Copy link
Copy Markdown
Collaborator

@harshach harshach commented Apr 1, 2026

Describe your changes:

I worked on adding streaming CSV export functionality for search results and improving pagination stability because the async export approach had memory and WebSocket payload limitations, and search_after pagination was non-deterministic for non-unique sort fields.

Screen.Recording.2026-04-02.at.7.17.45.PM.mov

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Backend export endpoint:

  • Added streaming CSV export endpoint GET /v1/search/export in SearchRepository and SearchResource
  • Deprecated async export endpoint /v1/search/exportAsync in favor of streaming approach
  • Both async and streaming export paths share the same writeCsvBatches implementation to prevent logic drift
  • Added fullyQualifiedName as a deterministic tiebreaker sort in ElasticSearchSearchManager and OpenSearchSearchManager for all non-_score, non-fullyQualifiedName sort fields — ensures stable search_after pagination and prevents skipped/duplicated rows during CSV export
  • User-provided sort fields are dynamically included in _source via buildSourceFields to ensure completeness (note: sort values for search_after come from hit.sort[], not _source)

Frontend export integration:

  • Updated ExploreV1 component to use streaming exportSearchResultsCsvStream API instead of async modal
  • Replaced entity export modal provider with direct streaming download with ISO date filename

UI simplifications:

  • Removed dropdown menu from ExportGraphPanel, now exports PNG directly
  • Removed SVG export option from ontology explorer

Test updates:

  • Updated SearchExport.spec.ts to verify streaming API endpoint instead of async export

Bug fixes:

  • Fixed null check logic in useGraphData.ts map assignment
  • Added validation in McpExecutionResource for paired timestamp parameters

fixes 2837

@harshach harshach requested a review from a team as a code owner April 1, 2026 00:13
Copilot AI review requested due to automatic review settings April 1, 2026 00:13
@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 1, 2026
TeddyCr
TeddyCr previously approved these changes Apr 1, 2026
@TeddyCr TeddyCr dismissed their stale review April 1, 2026 00:21

comment

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds asynchronous CSV export for Explore/search results, with UI wiring, backend endpoint support, and WebSocket-based progress/completion notifications.

Changes:

  • UI: adds an “Export” dropdown in Explore and a REST helper to start /search/exportAsync jobs.
  • Backend: introduces CSV row formatting for search hits and a batched export flow using search_after, exposed via a new /search/exportAsync endpoint.
  • Tests: adds a Playwright e2e test for the new Explore export UI and a Java unit test suite for CSV formatting.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
openmetadata-ui/src/main/resources/ui/src/rest/searchAPI.ts Adds REST client helper to initiate async search CSV export.
openmetadata-ui/src/main/resources/ui/src/components/ExploreV1/ExploreV1.component.tsx Adds Explore export dropdown and hooks it into the export modal/provider.
openmetadata-ui/src/main/resources/ui/playwright/e2e/Features/SearchExport.spec.ts Adds e2e coverage for the Explore export dropdown + modal.
openmetadata-ui/src/main/resources/ui/src/locale/languages/en-us.json Adds new translation keys used by the export UI.
openmetadata-ui/src/main/resources/ui/src/locale/languages/zh-tw.json Adds new translation keys used by the export UI.
openmetadata-ui/src/main/resources/ui/src/locale/languages/zh-cn.json Adds new translation keys used by the export UI.
openmetadata-ui/src/main/resources/ui/src/locale/languages/tr-tr.json Adds new translation keys used by the export UI.
openmetadata-ui/src/main/resources/ui/src/locale/languages/th-th.json Adds new translation keys used by the export UI.
openmetadata-ui/src/main/resources/ui/src/locale/languages/ru-ru.json Adds new translation keys used by the export UI.
openmetadata-ui/src/main/resources/ui/src/locale/languages/pt-pt.json Adds new translation keys used by the export UI.
openmetadata-ui/src/main/resources/ui/src/locale/languages/pt-br.json Adds new translation keys used by the export UI.
openmetadata-ui/src/main/resources/ui/src/locale/languages/pr-pr.json Adds new translation keys used by the export UI.
openmetadata-ui/src/main/resources/ui/src/locale/languages/nl-nl.json Adds new translation keys used by the export UI.
openmetadata-ui/src/main/resources/ui/src/locale/languages/mr-in.json Adds new translation keys used by the export UI.
openmetadata-ui/src/main/resources/ui/src/locale/languages/ko-kr.json Adds new translation keys used by the export UI.
openmetadata-ui/src/main/resources/ui/src/locale/languages/ja-jp.json Adds new translation keys used by the export UI.
openmetadata-ui/src/main/resources/ui/src/locale/languages/he-he.json Adds new translation keys used by the export UI.
openmetadata-ui/src/main/resources/ui/src/locale/languages/gl-es.json Adds new translation keys used by the export UI.
openmetadata-ui/src/main/resources/ui/src/locale/languages/fr-fr.json Adds new translation keys used by the export UI.
openmetadata-ui/src/main/resources/ui/src/locale/languages/es-es.json Adds new translation keys used by the export UI.
openmetadata-ui/src/main/resources/ui/src/locale/languages/de-de.json Adds new translation keys used by the export UI.
openmetadata-ui/src/main/resources/ui/src/locale/languages/ar-sa.json Adds new translation keys used by the export UI.
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchResultCsvExporter.java New utility to render ES _source maps into CSV rows with escaping and field extraction.
openmetadata-service/src/test/java/org/openmetadata/service/search/SearchResultCsvExporterTest.java Unit tests for CSV escaping/field extraction (needs assertion fixes).
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java Implements batched CSV export of search hits with progress callbacks.
openmetadata-service/src/main/java/org/openmetadata/service/resources/search/SearchResource.java Adds /search/exportAsync endpoint to start async export job and emit WebSocket notifications.

Comment on lines +213 to +217
void testExportSourceFieldsContainsRequiredFields() {
List<String> fields = SearchResultCsvExporter.EXPORT_SOURCE_FIELDS;

assert fields.contains("entityType");
assert fields.contains("name");
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks use the Java assert keyword, which is commonly disabled in unit test runs unless -ea is enabled. As a result, the test may not actually validate EXPORT_SOURCE_FIELDS in CI. Replace these with JUnit assertions (e.g., assertTrue / assertAll).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

@TeddyCr TeddyCr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a progress bar? I have a vague memory we do that somewhere. i am assuming all data asset is everything in OpenMetadata right? That might take a while.

     if (progressCallback != null) {
        progressCallback.onProgress(
            exported, totalHits, String.format("Exported %d of %d rows", exported, totalHits));
      }

I see we have that in the code as well, maybe this is something we can leverage to report progress back?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 64%
64.52% (59052/91524) 44.01% (30941/70298) 47.33% (9340/19731)

TeddyCr
TeddyCr previously approved these changes Apr 1, 2026
@github-actions

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 43 out of 43 changed files in this pull request and generated 8 comments.

Comment on lines +2522 to +2536
writer.write(SearchResultCsvExporter.CSV_HEADER);
writer.newLine();
writer.flush();

if (totalHits > 0) {
writeCsvBatches(baseRequest, subjectContext, writer, totalHits, from);
}
} finally {
writer.flush();
}
}

private void writeCsvBatches(
SearchRequest baseRequest,
SubjectContext subjectContext,
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolveSortField currently rewrites _score (and empty) sorts to fullyQualifiedName. That makes CSV export ordering differ from the actual search ordering when the UI requests sort_field=_score (default), and it prevents leveraging the _score + tiebreaker logic you added in the search managers. Consider preserving _score when explicitly requested, and only falling back to fullyQualifiedName when the sort field is missing/blank; also special-case _score in buildSourceFields so it’s not added to _source.

Copilot uses AI. Check for mistakes.
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = `Search_Results_${new Date().toISOString()}.csv`;
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new Date().toISOString() contains characters (notably :) that produce invalid filenames on Windows. Consider formatting the timestamp into a filesystem-safe form (e.g., replace : with - or use a dedicated date formatter) before setting a.download.

Suggested change
a.download = `Search_Results_${new Date().toISOString()}.csv`;
a.download = `Search_Results_${new Date()
.toISOString()
.replace(/:/g, '-')}.csv`;

Copilot uses AI. Check for mistakes.
<Download01 height={16} width={16} />
}
size="sm"
onClick={handleOpenExportScopeModal}>
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@openmetadata/ui-core-components Button is based on react-aria-components and supports onPress for accessible mouse/keyboard interactions. Using onClick here can miss non-mouse activation paths depending on how the underlying component is wired. Prefer onPress={handleOpenExportScopeModal} for consistency with other usages (e.g., ExportGraphPanel).

Suggested change
onClick={handleOpenExportScopeModal}>
onPress={handleOpenExportScopeModal}>

Copilot uses AI. Check for mistakes.
Comment on lines +642 to +647
<div
className={`export-scope-option-card${
exportScope === 'visible' ? ' selected' : ''
}`}
onClick={() => setExportScope('visible')}>
<div
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The export scope “cards” are clickable <div> elements without keyboard/focus semantics. If the intent is that the whole card is selectable, consider rendering them as <label> elements bound to the radio inputs, or add appropriate role, tabIndex, and key handlers so keyboard users can activate the card area.

Copilot uses AI. Check for mistakes.
Comment on lines +304 to +313
description =
"Maximum number of rows to export. When null, exports all matching results up to the hard cap.")
@QueryParam("size")
Integer size,
@Parameter(
description =
"Starting offset for export. Use with size to export a specific page of results (e.g., from=30&size=15 for page 3).")
@DefaultValue("0")
@QueryParam("from")
int from)
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The export endpoint accepts from and size but does not validate that they are non-negative. A negative from (or size) can lead to surprising behavior (e.g., effectiveTotal increasing) or downstream search errors. Consider rejecting negative values with a clear 400 response.

Copilot uses AI. Check for mistakes.
Comment on lines +207 to +229
await page.route('**/api/v1/search/export?*', async (route) => {
await route.fulfill({
status: 200,
contentType: 'text/csv',
headers: {
'Content-Disposition': 'attachment; filename="search_export.csv"',
},
body: 'Entity Type,Service Name,Service Type,FQN,Name,Display Name,Description,Owners,Tags,Glossary Terms,Domains,Tier\ntable,mysql,Mysql,sample_data.ecommerce_db.shopify.dim_address,dim_address,dim_address,,,,,,',
});
});

await openExportScopeModal(page);

await test.step('Export button shows loading state while downloading', async () => {
await page.route('**/api/v1/search/export?*', async (route) => {
await new Promise<void>((resolve) => setTimeout(resolve, 1500));
await route.fulfill({
status: 200,
contentType: 'text/csv',
body: 'Entity Type\ntable',
});
});

Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test registers multiple page.route('**/api/v1/search/export?*', ...) handlers in the same test. Depending on Playwright routing precedence, this can be flaky or cause the later handler to never run. Consider using a single route handler that can emulate both behaviors, or unroute the previous handler before re-registering.

Copilot uses AI. Check for mistakes.
Comment on lines +1509 to +1514
createTestTable(ns, "export_size_test_1");
createTestTable(ns, "export_size_test_2");
createTestTable(ns, "export_size_test_3");

Thread.sleep(2000);

Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Thread.sleep(2000) to wait for search indexing makes the test timing-dependent and potentially flaky under load/slow CI. Prefer polling the search/export endpoint until the created entities appear (with a bounded timeout) instead of sleeping a fixed duration.

Copilot uses AI. Check for mistakes.
Comment on lines +1567 to +1568
Thread.sleep(2000);

Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: this fixed Thread.sleep(2000) makes the pagination export test timing-dependent. Consider replacing it with a retry/poll loop that waits until the expected search results are visible before asserting pagination behavior.

Suggested change
Thread.sleep(2000);
String visibilityCheckPath =
"/v1/search/export?q=export_page_test&index=table_search_index"
+ "&sort_field=name.keyword&sort_order=asc&from=0&size=3";
long deadlineNanos = System.nanoTime() + Duration.ofSeconds(15).toNanos();
HttpResponse<String> visibilityResponse = null;
boolean resultsVisible = false;
while (System.nanoTime() < deadlineNanos) {
visibilityResponse = httpGetExport(visibilityCheckPath);
if (visibilityResponse.statusCode() == 200) {
String[] visibleLines = visibilityResponse.body().split("\n");
if (visibleLines.length >= 4) {
resultsVisible = true;
break;
}
}
Thread.sleep(200);
}
assertTrue(
resultsVisible,
() ->
"Expected exported search results to become visible before asserting pagination. "
+ "Last status: "
+ (visibilityResponse == null ? "no response" : visibilityResponse.statusCode())
+ ", last body: "
+ (visibilityResponse == null ? "" : visibilityResponse.body()));

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 43 out of 43 changed files in this pull request and generated 2 comments.

Comment on lines +2635 to +2638
.withIncludeSourceFields(sourceFields)
.withSortFieldParam(sortField)
.withSortOrder(baseRequest.getSortOrder() != null ? baseRequest.getSortOrder() : "asc")
.withSearchAfter(searchAfter);
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CSV export uses search_after pagination, but the batch request only passes a single sortFieldParam (and the ES/OS request builders currently only add a tiebreaker when sorting by _score, using name.keyword). For non-unique sort fields (e.g. name.keyword, service.name.keyword), this can still lead to skipped/duplicated rows across batches. Add a deterministic secondary sort (e.g. fullyQualifiedName asc) whenever the primary sort is neither _score nor already fullyQualifiedName, so hit.sort[] always includes a stable tiebreaker for search_after.

Copilot uses AI. Check for mistakes.
Comment on lines +225 to +228
a.href = url;
a.download = `Search_Results_${new Date().toISOString()}.csv`;
a.click();
URL.revokeObjectURL(url);
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The download cleanup revokes the object URL immediately after a.click(). This is known to be unreliable in some browsers; elsewhere in the UI (e.g. src/rest/rdfAPI.ts) the URL revocation is deferred via setTimeout and the link is appended/removed from the DOM. Consider following the same pattern here, and also sanitize toISOString() (it contains :) to avoid problematic filenames on some platforms.

Suggested change
a.href = url;
a.download = `Search_Results_${new Date().toISOString()}.csv`;
a.click();
URL.revokeObjectURL(url);
const sanitizedTimestamp = new Date().toISOString().replace(/:/g, '-');
a.href = url;
a.download = `Search_Results_${sanitizedTimestamp}.csv`;
document.body.appendChild(a);
a.click();
setTimeout(() => {
document.body.removeChild(a);
URL.revokeObjectURL(url);
}, 0);

Copilot uses AI. Check for mistakes.
Remove WebsocketNotificationHandler UUID overloads (dead code after
async endpoint deletion) and revert OntologyExplorer files that are
unrelated to search export.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
harshach and others added 2 commits April 4, 2026 19:10
- Create SubjectContext once in exportSearchResults and pass to
  buildExportSearchRequest instead of constructing it twice
- Remove inaccurate count from "All matching assets" option since it
  showed the current tab total, not the cross-index total

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When the export modal opens, fire a lightweight count query against
the dataAsset index to show the real total instead of the current
tab's count. The count renders once available; if the query fails
the modal remains usable without a count.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 5, 2026

Code Review 👍 Approved with suggestions 18 resolved / 21 findings

Export functionality for search now properly fetches cross-index counts for the 'All matching assets' option and addresses duplicate context construction, resolving 17 issues including memory leaks, pagination bugs, and CSV injection vulnerabilities. Consider constraining the sample data switches to enforce reciprocal relationships between store and read toggles.

💡 Edge Case: SAML metadata upload clears existing IDP fields on parse error

In handleMetadataFileUpload, when XML parsing fails (catch block at lines 367-374), the handler sets entityId, ssoLoginUrl, and idpX509Certificate to empty strings via updateIdpFields. If the admin already had valid IDP fields configured and accidentally uploads an invalid XML file, those fields are blanked out in the form state. While the form hasn't been saved yet (so it's recoverable), this could be surprising and lead to accidental data loss if the user then hits Save without noticing.

Consider either leaving existing fields unchanged on error, or showing a more prominent warning.

Suggested fix
// On parse error, don't clear existing fields:
catch {
  setMetadataUploadFileName(file.name);
  setMetadataUploadStatus('error');
  // Remove the updateIdpFields call that blanks fields
}
💡 Quality: Redundant ternary: sample_data_config if sample_data_config else None

At line 155, sample_data_config if sample_data_config else None is equivalent to just sample_data_config since the variable is already either a value or None (set at lines 147-151). The ternary adds noise without changing behavior.

Suggested fix
sample_data = SampleData(
    data=sampler_interface.generate_sample_data(
        sample_data_config
    ),
    ...
💡 Edge Case: Store/read sample data switches lack reciprocal constraint

In the Profiler Configuration page, turning ON storeSampleData auto-enables readSampleData (line 356-359), but turning OFF storeSampleData does NOT disable readSampleData, and the user can independently disable readSampleData while storeSampleData remains ON. The combination store=true, read=false (storing data that won't be read) may be a valid but confusing configuration. Consider adding a reciprocal constraint: when storeSampleData is turned OFF, also turn OFF readSampleData, or at minimum disable the read switch when store is off to communicate the dependency.

✅ 18 resolved
Bug: SecurityContext used in async thread causes runtime failure

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/search/SearchResource.java:333-347
The JAX-RS SecurityContext is request-scoped and bound to the HTTP request thread. In SearchResource.java, it is captured by the async lambda (lines 335-351) and later used by WebsocketNotificationHandler.sendCsvExportProgressNotification/Complete/Failed which call securityContext.getUserPrincipal().getName(). By the time the async task runs, the HTTP request has completed and the SecurityContext is no longer valid, causing a NullPointerException or HTTP 500 error.

The SubjectContext (an immutable record) is safe to use across threads, but SecurityContext is not.

Performance: Entire CSV (up to 200K rows) held in memory and sent via WebSocket

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:2452-2466 📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/search/SearchResource.java:338-346 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchResultCsvExporter.java:27
The exportSearchResultsCsv method builds the entire CSV in a StringBuilder (SearchRepository.java:2452-2536), which is then returned as a String and sent as a single WebSocket message via sendCsvExportCompleteNotification. With 200,000 rows and potentially wide fields (descriptions, owners, tags), this could easily be 100+ MB. The data is duplicated multiple times: StringBuilder → String → JSON serialization → WebSocket buffer. This creates significant heap pressure and risks OOM for large exports.

Additionally, most WebSocket implementations have default max message size limits (e.g., 64KB-1MB) which would silently truncate or reject large payloads.

Bug: Export pagination may stop early if totalHits changes

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:2469 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:2501-2502 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:2515-2528
The export loop (SearchRepository.java:2469) uses while (exported < totalHits) where totalHits was computed from a separate count query. If documents are deleted between the count query and the paginated fetches, the search_after pagination may return empty results before exported reaches totalHits, which is correctly handled by the break at line 2502. However, if documents are added, the loop will stop at the original totalHits, which is acceptable. The more subtle issue: if the last batch returns exactly BATCH_SIZE hits but the sort values don't produce a valid searchAfter (e.g., sort node is missing), the loop breaks without exporting all data. This is an edge case but worth noting.

Security: CSV injection via formula characters not sanitized

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchResultCsvExporter.java:78-89
The escapeCsv method in SearchResultCsvExporter handles commas, quotes, and newlines, but does not guard against CSV injection. Values starting with =, +, -, or @ can be interpreted as formulas by spreadsheet applications (Excel, Google Sheets). If user-controlled fields like description, displayName, or name contain such values, opening the exported CSV could execute arbitrary formulas.

This is low severity because the data originates from the organization's own metadata catalog, but it's a defense-in-depth concern.

Bug: Temp file leaks when export loop throws an exception

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:2456-2470
The temp file created at line 2456 is only deleted in a try-finally block (lines 2544-2548) that is outside the try-with-resources block containing the export loop (lines 2457-2542). If any exception is thrown during writing (e.g., search failure, JSON parse error, timeout), the exception propagates past the outer try-finally, and the temp file is never cleaned up. Under repeated failures this could fill the temp directory.

...and 13 more resolved from earlier reviews

🤖 Prompt for agents
Code Review: Export functionality for search now properly fetches cross-index counts for the 'All matching assets' option and addresses duplicate context construction, resolving 17 issues including memory leaks, pagination bugs, and CSV injection vulnerabilities. Consider constraining the sample data switches to enforce reciprocal relationships between store and read toggles.

1. 💡 Edge Case: SAML metadata upload clears existing IDP fields on parse error

   In `handleMetadataFileUpload`, when XML parsing fails (catch block at lines 367-374), the handler sets `entityId`, `ssoLoginUrl`, and `idpX509Certificate` to empty strings via `updateIdpFields`. If the admin already had valid IDP fields configured and accidentally uploads an invalid XML file, those fields are blanked out in the form state. While the form hasn't been saved yet (so it's recoverable), this could be surprising and lead to accidental data loss if the user then hits Save without noticing.
   
   Consider either leaving existing fields unchanged on error, or showing a more prominent warning.

   Suggested fix:
   // On parse error, don't clear existing fields:
   catch {
     setMetadataUploadFileName(file.name);
     setMetadataUploadStatus('error');
     // Remove the updateIdpFields call that blanks fields
   }

2. 💡 Quality: Redundant ternary: `sample_data_config if sample_data_config else None`

   At line 155, `sample_data_config if sample_data_config else None` is equivalent to just `sample_data_config` since the variable is already either a value or `None` (set at lines 147-151). The ternary adds noise without changing behavior.

   Suggested fix:
   sample_data = SampleData(
       data=sampler_interface.generate_sample_data(
           sample_data_config
       ),
       ...

3. 💡 Edge Case: Store/read sample data switches lack reciprocal constraint

   In the Profiler Configuration page, turning ON `storeSampleData` auto-enables `readSampleData` (line 356-359), but turning OFF `storeSampleData` does NOT disable `readSampleData`, and the user can independently disable `readSampleData` while `storeSampleData` remains ON. The combination `store=true, read=false` (storing data that won't be read) may be a valid but confusing configuration. Consider adding a reciprocal constraint: when `storeSampleData` is turned OFF, also turn OFF `readSampleData`, or at minimum disable the read switch when store is off to communicate the dependency.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated 3 comments.

Comment on lines +230 to +238
const currentPage = isString(parsedSearch.page)
? Number.parseInt(parsedSearch.page, 10) || 1
: 1;
const pageSize = isString(parsedSearch.size)
? Number.parseInt(parsedSearch.size, 10) || visibleResultCount
: visibleResultCount;

params.size = visibleResultCount;
params.from = (currentPage - 1) * pageSize;
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the "Visible results" export scope, pageSize falls back to visibleResultCount when ?size= is missing, but the actual pagination size comes from user preferences (globalPageSize) / parseSearchParams (see utils/ExploreUtils.tsx:473-477 and SearchedData.tsx:101-130). On a partially-filled last page (or a URL with page but no size), this can compute an incorrect from offset and export the wrong rows. Consider deriving pageSize using the same logic as parseSearchParams/SearchedData (or guaranteeing size is always present in the URL before exporting).

Suggested change
const currentPage = isString(parsedSearch.page)
? Number.parseInt(parsedSearch.page, 10) || 1
: 1;
const pageSize = isString(parsedSearch.size)
? Number.parseInt(parsedSearch.size, 10) || visibleResultCount
: visibleResultCount;
params.size = visibleResultCount;
params.from = (currentPage - 1) * pageSize;
const parsedCurrentPage = isString(parsedSearch.page)
? Number.parseInt(parsedSearch.page, 10)
: Number.NaN;
const parsedPageSize = isString(parsedSearch.size)
? Number.parseInt(parsedSearch.size, 10)
: Number.NaN;
const currentPage = parsedCurrentPage > 0 ? parsedCurrentPage : 1;
const pageSize = parsedPageSize > 0 ? parsedPageSize : undefined;
params.size = visibleResultCount;
if (!isUndefined(pageSize)) {
params.from = (currentPage - 1) * pageSize;
}

Copilot uses AI. Check for mistakes.
Comment on lines +244 to +251
const blob = await exportSearchResultsCsvStream(params);
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = `Search_Results_${new Date().toISOString()}.csv`;
a.click();
URL.revokeObjectURL(url);
setShowExportScopeModal(false);
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The download filename uses new Date().toISOString(), which includes : characters (invalid in Windows filenames) and may lead to a sanitized/incorrect filename in some environments. Prefer formatting the timestamp to a filesystem-safe form (e.g., YYYYMMDD_HHmmss) like other downloads in the UI (e.g., AuditLogsPage.tsx).

Copilot uses AI. Check for mistakes.
Comment on lines +664 to +705
<div
className={`export-scope-option-card${
exportScope === 'visible' ? ' selected' : ''
}`}
onClick={() => setExportScope('visible')}>
<div
className={`d-flex items-start gap-2 border-radius-sm tw:p-4 border ${
exportScope === 'visible'
? 'tw:border-brand'
: 'tw:border-secondary'
}`}>
<Radio value="visible" />
<div>
<div className="d-flex items-center gap-2">
<CoreTypography
className="tw:text-primary d-flex items-center tw:gap-0.5"
size="text-sm"
weight="semibold">
{`${t('label.visible-result-plural')} `}
<CoreTypography
className="tw:text-tertiary"
size="text-sm"
weight="regular">
({visibleResultCount} {t('label.result-plural')})
</CoreTypography>
</CoreTypography>
</div>
<CoreTypography
className="tw:text-tertiary"
size="text-sm"
weight="regular">
{t('message.export-visible-results-description')}
</CoreTypography>
</div>
</div>
</div>
<div
className={`export-scope-option-card${
exportScope === 'all' ? ' selected' : ''
}`}
onClick={() => setExportScope('all')}>
<div
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The export-scope option cards are clickable <div> elements (onClick) without button semantics. This makes the selection harder/impossible to use via keyboard and less accessible to assistive tech. Consider using a <label> associated with the radio, or giving the clickable wrapper appropriate role="radio"/tabIndex + key handlers, or relying on the Radio label click behavior instead of a custom clickable div.

Copilot uses AI. Check for mistakes.
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Apr 5, 2026

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Apr 5, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 5, 2026

Failed to cherry-pick changes to the 1.12.5 branch.
Please cherry-pick the changes manually.
You can find more details here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch UI UI specific issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants