-
Notifications
You must be signed in to change notification settings - Fork 182
Description
PPL Search Result Highlighting — Design
Author: Jialiang Liang
Date: 2026-02-17
1. Scope
This design covers search result highlighting for PPL queries executed through the
Calcite engine. There are two distinct user stories:
User Story 1: OSD Explore users
When a user switches to PPL mode in Explore and runs search source=logs "error",
matching terms should be highlighted in the results table — the same experience they get
with DSL today. The user does not configure highlighting; OSD handles it automatically
using its existing getHighlightRequest() function.
User Story 2: API / CLI users
API consumers sending PPL queries via POST /_plugins/_ppl may want to highlight
specific fields with custom tags (e.g., <em> for HTML). They need full control over
which fields are highlighted, what tags are used, and the fragment size.
Both stories, one mechanism
Both are served by the same API: an optional highlight object in the PPL request body.
OSD and API users construct the config differently, but the backend treats them
identically — it is a pure pass-through. When no highlight is provided, behavior is
unchanged (backward compatible).
Out of scope
- Porting the V2
highlight()PPL function to the Calcite engine - Customizable highlight settings in OSD UI (e.g., per-field configuration from Explore)
2. Highlighting Behavior
PPL highlighting is fully aligned with the existing DSL highlight feature — the same
OpenSearch highlighting engine, the same rules for what gets highlighted, and the same
rendering in OSD Explore.
When do highlights appear?
Only when the PPL query contains a full-text search — e.g.,
search source=logs "error" or search source=logs "connection timeout". The search
term is translated to a query_string query, and OpenSearch's highlighter identifies
where that term appears in the document.
If the query contains only structured filters (e.g., search source=logs | where status = 200),
no highlights are produced — even if the highlight config is present in the request.
What gets highlighted?
The matching search terms inside text and keyword field values. For example,
searching "Holmes" highlights the term Holmes wherever it appears — in firstname,
lastname, address, or any other string field that contains the match.
What does NOT get highlighted?
- Non-string fields: Numeric, date, boolean, and other non-text field types never
produce highlight fragments, even if the document matched because of those fields. - Structured filters: PPL commands like
where,stats, or conditions using
comparison operators (>,<,=) do not produce highlights. These are translated
to range/term/bool filters in the DSL, which OpenSearch's highlighter does not act on.
What about piped commands?
Piped commands narrow the result set but do not affect which terms are highlighted.
For example:
search source=logs "error" | where status > 400
The where status > 400 filters rows, but only the "error" full-text search produces
highlights. This is the same behavior as a DSL bool query with a query_string in
must and a range in filter — the filter narrows results without contributing to
highlights.
What does it look like?
Matching terms in field values are wrapped in configurable tags:
- OSD Explore: Tags are OSD internal markers (
@opensearch-dashboards-highlighted-field@)
that OSD renders as bold/colored text in the results table — identical to DSL behavior. - API/CLI: Users choose their own tags (e.g.,
<em>,<mark>, or custom markers)
and see them in the JSON response.
3. API Design
Principle: caller-driven, backend pass-through
- The caller (OSD, API client, CLI) controls highlighting by providing a
highlight
object in the PPL request body - The backend forwards the config as-is to OpenSearch and returns highlight data in
the response - When no
highlightis provided, no highlighting is applied
This is consistent with how DSL works: OSD injects the highlight clause, not OpenSearch.
Request API
POST /_plugins/_ppl
{
"query": "search source=logs \"error\"",
"highlight": {
"fields": { "*": {} },
"pre_tags": ["@opensearch-dashboards-highlighted-field@"],
"post_tags": ["@/opensearch-dashboards-highlighted-field@"],
"fragment_size": 2147483647
}
}The highlight object supports the same structure as OpenSearch's highlighting API:
| Field | Description |
|---|---|
fields |
Map of field names to per-field config. "*" for wildcard. |
pre_tags |
Array of tags inserted before highlighted terms |
post_tags |
Array of tags inserted after highlighted terms |
fragment_size |
Max character length of each fragment. OSD sets 2^31 - 1 so the entire field value is returned rather than OpenSearch's default 100-char truncation. |
API/CLI example — specific field with custom tags:
POST /_plugins/_ppl
{
"query": "search source=logs \"error\"",
"highlight": {
"fields": { "message": {} },
"pre_tags": ["<em>"],
"post_tags": ["</em>"],
"fragment_size": 200
}
}Response format
The response includes a highlights array parallel to datarows:
{
"schema": [{ "name": "firstname", "type": "string" }, ...],
"datarows": [["Holmes", ...], ["Blanche", ...], ["Amber", ...]],
"highlights": [
{ "firstname": ["<tag>Holmes</tag>"], "firstname.keyword": ["<tag>Holmes</tag>"] },
{ "lastname": ["<tag>Holmes</tag>"], "lastname.keyword": ["<tag>Holmes</tag>"] },
{ "address": ["880 <tag>Holmes</tag> Lane"] }
],
"total": 3,
"size": 3
}- Each entry in
highlightscorresponds to the row at the same index indatarows - Entries are
nullwhen a row has no highlight data for the requested fields - The
highlightsarray is omitted entirely when no highlight config is provided
4. Design Decisions
4.1 No command-level scoping — follow DSL behavior
DSL does not scope highlighting to specific query clauses — getHighlightRequest()
applies the same wildcard config regardless of query structure. To match this behavior,
when a caller provides a highlight config, the backend attaches it to the OpenSearch
request regardless of which PPL commands are in the pipeline.
4.2 Relationship with V2 highlight() function
PPL has an existing per-field highlight() function in the V2 engine (e.g.,
highlight(msg, pre_tags='<em>')). This is not supported in the Calcite engine.
The request-level API covers the same use cases:
| Capability | V2 highlight() function |
Request-level API |
|---|---|---|
| Per-field control | highlight(msg) in PPL query |
"fields": { "msg": {} } in request body |
| Custom tags | highlight(msg, pre_tags='<em>') |
"pre_tags": ["<em>"] in request body |
| Wildcard fields | highlight(*) in PPL |
"fields": { "*": {} } in request body |
Porting highlight() to Calcite is out of scope. The backend plumbing supports both
approaches if an in-query function is added later.
4.3 Backend is a pure pass-through — no hardcoded defaults
An alternative considered was a ?highlight=true query parameter that triggers the
backend to inject a default config. This was rejected because it would hardcode
OSD-specific knowledge (tags, fragment size) into the backend.
Instead, the backend simply forwards whatever config the caller provides. One mechanism
for all callers, no OSD-specific knowledge in the backend.
5. Performance Evaluation
Methodology
A/B benchmark comparing PPL query execution with and without highlighting on a
worst-case dataset where every document matches and every text field produces highlights:
- Dataset: 10,000 documents, search term
"error"in 4 text fields per document - Query:
search source=highlight_perf_test "error"— returns all 10,000 rows - Iterations: 20 measured runs after 3 warmup runs
- Environment: Single-node OpenSearch (1 shard, 0 replicas), local dev machine
Results
| Scenario | Avg Latency | P50 | P95 | Response Size |
|---|---|---|---|---|
| No highlight | 163 ms | 161 ms | 197 ms | 2.84 MB |
Highlight (wildcard *) |
617 ms | 613 ms | 669 ms | 9.29 MB |
| Highlight (single field) | 299 ms | 298 ms | 325 ms | 4.06 MB |
| Comparison | Latency Overhead | Size Overhead |
|---|---|---|
| Wildcard vs no highlight | +278% (~3.8x) | +227% (~3.3x) |
| Single field vs no highlight | +83% (~1.8x) | +43% (~1.4x) |
Analysis
-
Worst-case benchmark. Every row matches, every text field highlights. Real-world
result sets are smaller and sparser. -
The overhead is in OpenSearch, not the SQL plugin. The backend is a pure
pass-through. The latency and size cost is OpenSearch's native highlighting — the
same cost DSL users already pay today. -
Single-field highlighting is significantly cheaper. API users who specify only the
fields they need pay roughly half the overhead of wildcard. -
Response size is proportional to highlight data. Wildcard adds ~6.4 MB of
highlight fragments (tags + full field values for 4 fields x 10k rows).
6. Limitations
-
Same limitations as DSL highlighting. Only full-text queries produce highlights;
onlytext/keywordfields; no query-aware scoping. These are inherent OpenSearch
limitations (see Background: DSL Highlighting in the Appendix). -
shouldHighlightis hardcoded totrue. The OSD frontend currently always sends
the highlight config. It should read from thedoc_table:highlightUI setting. -
No in-query
highlight()function in Calcite. The request-level API covers the
same capabilities, but interactive PPL users cannot usehighlight()in queries.
Appendix
Background: DSL Highlighting
How OSD enables DSL highlighting
OSD's SearchSource.flatten() injects a highlight clause into every DSL request when
the doc_table:highlight UI setting is enabled (default: true). The
getHighlightRequest() function returns a hardcoded config: wildcard fields, OSD custom
tags, and fragment_size: 2^31 - 1.
| Aspect | Behavior |
|---|---|
| Fields | Always "*" (wildcard) — hardcoded |
| Tags | Always OSD custom tags — hardcoded |
| Fragment size | Always max int — hardcoded |
| On/off | doc_table:highlight UI setting (default: true) |
| Query-aware scoping | None — receives the query but never inspects it |
DSL highlighting limitations
These are inherent OpenSearch limitations, not OSD-specific:
-
Only full-text queries produce highlights.
query_string,match,
match_phrase, etc. Structured filters (range,term,boolfilter) do not. -
Only
textandkeywordfields are highlighted. Numeric, boolean, date, and
other non-string types produce no fragments. -
No query-aware scoping. Same wildcard config regardless of query structure.
-
Keyword subfields are included.
"*"matches bothfirstnameand
firstname.keyword.
These limitations apply equally to PPL highlighting since we use the same mechanism.
Backend Implementation Details
Data flow
PPLQueryRequest -> PPLService -> AbstractPlan (carries highlightConfig)
.getHighlight() .setHighlightOnPlan() across thread boundary
-> QueryPlan.execute() -> CalcitePlanContext ThreadLocal
(on worker thread) .setHighlightConfig()
-> CalciteLogicalIndexScan (3-arg constructor)
buildInitialSchema() checks ThreadLocal:
if highlight config present, appends _highlight (SqlTypeName.ANY)
to the Calcite RowType
-> CalciteEnumerableIndexScan.scan()
applyHighlightConfig(): reads ThreadLocal, builds HighlightBuilder,
attaches to OpenSearchRequestBuilder
OpenSearch response -> OpenSearchResponse.addHighlightsToBuilder()
highlight fragments builds _highlight ExprTupleValue per hit
-> OpenSearchIndexEnumerator.current()
carries _highlight as opaque ExprValue in Calcite row
(SqlTypeName.ANY — Calcite passes it through without conversion)
Calcite operators (filter, sort, dedup) naturally preserve
_highlight as a regular column — no positional misalignment
-> OpenSearchExecutionEngine.buildResultSet()
reads _highlight inline from each ResultSet row,
embeds in ExprTupleValue, excludes _highlight from response schema
-> QueryResult.highlights()
extracts _highlight from each row tuple
-> JdbcResponseFormatter / SimpleJsonResponseFormatter
writes "highlights" array in JSON response
Key files
| File | Role |
|---|---|
PPLQueryRequest.getHighlight() |
Extracts highlight JSONObject from request body |
PPLService.setHighlightOnPlan() |
Attaches config to AbstractPlan for cross-thread transport |
AbstractPlan.highlightConfig |
Carries config from REST handler thread to worker thread |
QueryPlan.execute() / ExplainPlan.execute() |
Sets CalcitePlanContext ThreadLocal on worker thread |
CalciteLogicalIndexScan.buildInitialSchema() |
Conditionally appends _highlight column to RowType |
AbstractCalciteIndexScan.applyHighlightConfig() |
Converts config map to HighlightBuilder |
OpenSearchResponse.addHighlightsToBuilder() |
Builds _highlight ExprTupleValue from OpenSearch response |
OpenSearchIndexEnumerator.current() |
Carries _highlight as opaque ExprValue in Calcite row |
OpenSearchExecutionEngine.buildResultSet() |
Reads _highlight inline from row, excludes from schema |
QueryResult.highlights() |
Extracts highlight data from row tuples |
JdbcResponseFormatter / SimpleJsonResponseFormatter |
Writes highlights array in JSON response |
HighlightExpression.HIGHLIGHT_FIELD |
Constant "_highlight" used across all files |
Threading model
The PPL endpoint runs on a REST handler thread, but query execution runs on a separate
sql-worker thread pool. ThreadLocals do not cross thread boundaries.
Solution: The highlight config is carried via the AbstractPlan object (a normal
Java reference). execute() runs on the worker thread and sets the ThreadLocal there.
REST handler thread: PPLService -> setHighlightOnPlan(plan) -> queryManager.submit(plan)
|
Worker thread: plan.execute() -> setHighlightThreadLocal() -> CalcitePlanContext.set()
|
analyze() -> buildInitialSchema() adds _highlight to RowType
-> optimize() -> scan() -> applyHighlightConfig()
-> enumerator carries _highlight as column
Frontend Implementation Details (OSD)
Changes
| File | Change |
|---|---|
ppl_search_strategy.ts |
Calls getHighlightRequest() and attaches config to request.body.highlight |
facet.ts |
Forwards request.body.highlight into params.body sent to /_plugins/_ppl |
Response handling
| File | Role |
|---|---|
ppl_search_strategy.ts |
Stores rawResponse.data.highlights on dataFrame.meta |
data_frames/utils.ts |
Attaches meta.highlights[i] as hit.highlight per row in convertResult() |
getHighlightRequest() is the same function OSD uses for DSL — no new OSD-specific code
needed.
Sample Queries & Responses
Test data: accounts.json (1000 documents with fields like firstname, lastname,
address, age, etc.)
With highlight (OSD-style — wildcard fields, OSD tags)
curl -s -X POST "localhost:9200/_plugins/_ppl" \
-H 'Content-Type: application/json' -d '{
"query": "search source=accounts \"Holmes\"",
"highlight": {
"fields": { "*": {} },
"pre_tags": ["@opensearch-dashboards-highlighted-field@"],
"post_tags": ["@/opensearch-dashboards-highlighted-field@"],
"fragment_size": 2147483647
}
}' | jqResponse (trimmed to highlights):
{
"highlights": [
{
"firstname": ["@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"],
"firstname.keyword": ["@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"]
},
{
"lastname": ["@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"],
"lastname.keyword": ["@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"]
},
{
"address": ["880 @opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@ Lane"]
}
],
"total": 3,
"size": 3
}Without highlight (backward compatible)
curl -s -X POST "localhost:9200/_plugins/_ppl" \
-H 'Content-Type: application/json' -d '{
"query": "search source=accounts \"Holmes\""
}' | jqResponse: No highlights field — backward compatible.
Explain (highlight appears in generated DSL)
curl -s -X POST "localhost:9200/_plugins/_ppl/_explain" \
-H 'Content-Type: application/json' -d '{
"query": "search source=accounts \"Holmes\"",
"highlight": {
"fields": { "*": {} },
"pre_tags": ["@opensearch-dashboards-highlighted-field@"],
"post_tags": ["@/opensearch-dashboards-highlighted-field@"],
"fragment_size": 2147483647
}
}' | jqResponse (relevant portion):
{
"query": { "query_string": { "query": "Holmes" } },
"highlight": {
"pre_tags": ["@opensearch-dashboards-highlighted-field@"],
"post_tags": ["@/opensearch-dashboards-highlighted-field@"],
"fields": { "*": { "fragment_size": 2147483647 } }
}
}Custom tags + specific field (API user scenario)
curl -s -X POST "localhost:9200/_plugins/_ppl" \
-H 'Content-Type: application/json' -d '{
"query": "search source=accounts \"Holmes\"",
"highlight": {
"fields": { "address": {} },
"pre_tags": ["<em>"],
"post_tags": ["</em>"],
"fragment_size": 200
}
}' | jqResponse:
{
"highlights": [
null,
null,
{ "address": ["880 <em>Holmes</em> Lane"] }
],
"total": 3,
"size": 3
}Only address is highlighted. Rows 1 and 2 are null because "Holmes" appears in
firstname/lastname, not address.
Text search + piped filter
curl -s -X POST "localhost:9200/_plugins/_ppl" \
-H 'Content-Type: application/json' -d '{
"query": "search source=accounts \"Holmes\" | where balance > 40000",
"highlight": {
"fields": { "*": {} },
"pre_tags": ["@opensearch-dashboards-highlighted-field@"],
"post_tags": ["@/opensearch-dashboards-highlighted-field@"],
"fragment_size": 2147483647
}
}' | jqResponse:
{
"highlights": [
{
"lastname": ["@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"],
"lastname.keyword": ["@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"]
}
],
"total": 1,
"size": 1
}The where balance > 40000 narrows results but does not produce highlights (it is a
range filter, not a full-text query).
Metadata
Metadata
Assignees
Labels
Type
Projects
Status