Skip to content

feat: cleanse search response to strip UI-specific fields#71

Merged
steve-calvert-glean merged 2 commits intomainfrom
scalvert/search-response-cleansing
Apr 3, 2026
Merged

feat: cleanse search response to strip UI-specific fields#71
steve-calvert-glean merged 2 commits intomainfrom
scalvert/search-response-cleansing

Conversation

@steve-calvert-glean
Copy link
Copy Markdown
Collaborator

Summary

  • Strips the SDK's bloated SearchResponse down to only RFC-aligned fields relevant to programmatic consumers (stopgap until POST /api/search ships)
  • Adds --raw flag to bypass cleansing and get the full SDK response
  • Warns on stderr when --fields requests a field that was removed by cleansing

Details

The current search response includes 17+ top-level fields and deeply nested UI-specific data (tracking tokens, session info, structured results, generated QnA, experiment IDs, facet rendering config, etc.). This change uses an allowlist approach to keep only:

Response level: results, cursor, hasMoreResults, requestID
Result level: title, url, snippets, document
Document level: title, url, datasource, docType, metadata (filtered to author/times)
Author level: name, email

Empty structured results (knowledge cards with no document/title) are filtered out.

Deletion path

When POST /api/search ships: delete cleanse.go + cleanse_test.go, remove --raw flag from search.go, remove map[string]any branch from formatter.go.

Test plan

  • go test ./... — 365 tests pass
  • golangci-lint run — no issues
  • glean search "query" — cleansed output verified
  • glean search --raw "query" — full SDK response verified
  • glean search --output ndjson "query" — one cleansed result per line
  • glean search --fields "results.trackingToken,results.title" "query" — warning on stderr, allowed fields shown
  • glean search --fields "results.title,results.document.url" "query" — no warning, projection works

🤖 Generated with Claude Code

Strip the SDK's bloated SearchResponse down to only the fields
relevant to programmatic consumers, aligning with the RFC for
POST /api/search. This is a stopgap until the new API ships.

- Add allowlist-based response cleansing (internal/output/cleanse.go)
- Filter out empty structured results that have no document or title
- Sub-filter author to only name and email
- Add --raw flag to bypass cleansing for full SDK response
- Warn on stderr when --fields requests a cleansed-away field
- Extend WriteNDJSON to handle cleansed map[string]any responses

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@steve-calvert-glean steve-calvert-glean added the enhancement New feature or request label Apr 3, 2026
@steve-calvert-glean steve-calvert-glean merged commit 7260bc2 into main Apr 3, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants