Query Information
PPL Command/Query:
source=test-isnotnull-bug | where isnotnull(description) and description != '' | stats count() by description
Expected Result:
Results should not contain null values for the description field when isnotnull(description) filter is applied.
Actual Result:
{
"datarows": [
[3, null],
[1, "Short description 1"],
[1, "Short description 2"],
[1, "Short description 3"]
]
}
The first row contains null for description despite the isnotnull(description) filter.
Dataset Information
Dataset/Schema Type
Index Mapping
{
"mappings": {
"properties": {
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 50
}
}
},
"value": {
"type": "long"
}
}
}
}
Sample Data
{"description": "Short description 1", "value": 1}
{"description": "This is a very long description that definitely exceeds the 50 character limit set in ignore_above", "value": 100}
{"description": "Another extremely long description that will be ignored by the keyword subfield due to exceeding limit", "value": 101}
{"description": "", "value": 200}
Bug Description
Issue Summary
PPL query with where isnotnull(field) filter incorrectly returns null values in aggregation results when the field is a text type with a keyword subfield that has an ignore_above limit, and documents contain values exceeding that limit.
Preconditions for Bug to Occur
The bug requires ALL of the following conditions:
-
Field Mapping Structure:
- Field must be
text type with a keyword subfield
- The
keyword subfield must have ignore_above setting
{
"field_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": N
}
}
}
}
-
Data Condition:
- Documents must exist where the field value exceeds the
ignore_above character limit
- For these documents:
- The
text field is populated (has a value)
- The
keyword subfield is null (ignored due to length)
-
Query Pattern:
- PPL query with
isnotnull() filter on the field
- Aggregation (
stats ... by) on the same field
source=index | where isnotnull(field) | stats count() by field
When Bug Will NOT Occur
The bug will NOT reproduce if:
-
Simple keyword field (no text parent):
{"description": {"type": "keyword"}}
❌ No bug - filter and aggregation use same field
-
No ignore_above limit:
{
"description": {
"type": "text",
"fields": {"keyword": {"type": "keyword"}}
}
}
❌ No bug - keyword subfield always populated
-
All values within limit:
- All descriptions < 50 characters (if
ignore_above: 50)
❌ No bug - keyword subfield populated for all docs
-
Truly null/missing fields:
{"description": null}
{"value": 100}
❌ No bug - both text and keyword fields are null
Steps to Reproduce
# 1. Create index with text field + keyword subfield with ignore_above
PUT /test-isnotnull-bug
{
"mappings": {
"properties": {
"description": {
"type": "text",
"fields": {
"keyword": {"type": "keyword", "ignore_above": 50}
}
}
}
}
}
# 2. Insert data with descriptions exceeding 50 characters
POST /test-isnotnull-bug/_bulk
{"index":{}}
{"description":"Short description 1","value":1}
{"index":{}}
{"description":"This is a very long description that definitely exceeds the 50 character limit","value":100}
# 3. Execute PPL query
POST /_plugins/_ppl
{
"query": "source=test-isnotnull-bug | where isnotnull(description) and description != '' | stats count() by description"
}
# 4. Observe null values in results despite isnotnull() filter
Root Cause Analysis
Explain Plan Output:
Physical Plan (OpenSearch DSL):
{
"query": {
"bool": {
"must": [{"exists": {"field": "description"}}],
"must_not": [{"term": {"description.keyword": {"value": ""}}}]
}
},
"aggregations": {
"composite_buckets": {
"composite": {
"sources": [{
"description": {
"terms": {
"field": "description.keyword",
"missing_bucket": true,
"missing_order": "first"
}
}
}]
}
}
}
}
The Problem:
-
Filter operates on: description (text field)
isnotnull(description) → {"exists": {"field": "description"}}
- Documents with long descriptions PASS this filter (text field exists)
-
Aggregation operates on: description.keyword (keyword subfield)
- Aggregates by
"field": "description.keyword"
- For documents with descriptions > 50 chars,
description.keyword is NULL due to ignore_above: 50
-
The Mismatch:
- 3 documents have
description text field populated → filter passes ✓
- But their
description.keyword is null → aggregation sees null ✗
- With
missing_bucket: true, these appear as [3, null] in results
Verification:
# Confirm: 3 documents have description but no description.keyword
POST /test-isnotnull-bug/_count
{
"query": {
"bool": {
"must": [{"exists": {"field": "description"}}],
"must_not": [{"exists": {"field": "description.keyword"}}]
}
}
}
# Returns: {"count": 3}
Tentative Root Cause
In /opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java (lines 207-215), the bucketNullable flag defaults to true:
boolean bucketNullable =
Boolean.parseBoolean(
aggregate.getHints().stream()
.filter(hits -> hits.hintName.equals("stats_args"))
.map(hint -> hint.kvOptions.getOrDefault(Argument.BUCKET_NULLABLE, "true"))
.findFirst()
.orElseGet(() -> "true"));
This flag controls missing_bucket in /opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/aggregation/dsl/CompositeAggregationBuilder.java (lines 72-74):
if (bucketNullable) {
sourceBuilder.missingBucket(true).missingOrder(missingOrder);
}
The system does not analyze WHERE clause filters (specifically isnotnull()) to automatically set bucketNullable=false for fields with non-null constraints.
Tentative Proposed Fix
Enhance the aggregation analyzer to detect isnotnull() filters on fields being aggregated:
- In
AggregateAnalyzer.analyze(), extract fields with isnotnull() predicates from the Filter node
- Pass this information to
createTermsSourceBuilder()
- Set
missingBucket(false) when the aggregation field has an isnotnull() filter
Suggested Code Change:
// Extract non-nullable fields from filter predicates
private static Set<String> extractNonNullableFields(Filter filterNode) {
// Parse filter conditions to find isnotnull() predicates
// Return set of field names that should not have missing buckets
}
// In createTermsSourceBuilder()
private static CompositeValuesSourceBuilder<?> createTermsSourceBuilder(
String bucketName, RexNode group, AggregateBuilderHelper helper,
Set<String> nonNullableFields) {
TermsValuesSourceBuilder termsBuilder =
new TermsValuesSourceBuilder(bucketName).order(SortOrder.ASC);
String fieldName = extractFieldName(group);
boolean shouldIncludeMissing = helper.bucketNullable
&& !nonNullableFields.contains(fieldName);
if (shouldIncludeMissing) {
termsBuilder.missingBucket(true).missingOrder(MissingOrder.FIRST);
}
return helper.build(group, termsBuilder);
}
Workaround
- Increase
ignore_above limit to accommodate all expected values
- Use keyword field directly if text analysis is not needed
- Post-filter null values in application code
- Remove
ignore_above if all values should be indexed
Environment Information
OpenSearch Version:
OpenSearch 3.3.0-SNAPSHOT
Additional Details:
Impact
This bug violates the semantic meaning of isnotnull() filter in aggregation queries, leading to incorrect query results. It particularly affects:
- OpenTelemetry metrics with long descriptions (common scenario)
- Any text fields with keyword subfields using
ignore_above
- Data analysis and reporting where null exclusion is critical
- Dashboard visualizations that rely on accurate null filtering
The bug is subtle because the filter appears to work (exists check passes) but the aggregation produces incorrect results due to the field type mismatch between text and keyword subfields.
Query Information
PPL Command/Query:
Expected Result:
Results should not contain null values for the
descriptionfield whenisnotnull(description)filter is applied.Actual Result:
{ "datarows": [ [3, null], [1, "Short description 1"], [1, "Short description 2"], [1, "Short description 3"] ] }The first row contains
nullfor description despite theisnotnull(description)filter.Dataset Information
Dataset/Schema Type
Index Mapping
{ "mappings": { "properties": { "description": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 50 } } }, "value": { "type": "long" } } } }Sample Data
{"description": "Short description 1", "value": 1} {"description": "This is a very long description that definitely exceeds the 50 character limit set in ignore_above", "value": 100} {"description": "Another extremely long description that will be ignored by the keyword subfield due to exceeding limit", "value": 101} {"description": "", "value": 200}Bug Description
Issue Summary
PPL query with
where isnotnull(field)filter incorrectly returns null values in aggregation results when the field is atexttype with akeywordsubfield that has anignore_abovelimit, and documents contain values exceeding that limit.Preconditions for Bug to Occur
The bug requires ALL of the following conditions:
Field Mapping Structure:
texttype with akeywordsubfieldkeywordsubfield must haveignore_abovesetting{ "field_name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": N } } } }Data Condition:
ignore_abovecharacter limittextfield is populated (has a value)keywordsubfield is null (ignored due to length)Query Pattern:
isnotnull()filter on the fieldstats ... by) on the same fieldWhen Bug Will NOT Occur
The bug will NOT reproduce if:
Simple keyword field (no text parent):
{"description": {"type": "keyword"}}❌ No bug - filter and aggregation use same field
No ignore_above limit:
{ "description": { "type": "text", "fields": {"keyword": {"type": "keyword"}} } }❌ No bug - keyword subfield always populated
All values within limit:
ignore_above: 50)❌ No bug - keyword subfield populated for all docs
Truly null/missing fields:
{"description": null} {"value": 100}❌ No bug - both text and keyword fields are null
Steps to Reproduce
Root Cause Analysis
Explain Plan Output:
The Problem:
Filter operates on:
description(text field)isnotnull(description)→{"exists": {"field": "description"}}Aggregation operates on:
description.keyword(keyword subfield)"field": "description.keyword"description.keywordis NULL due toignore_above: 50The Mismatch:
descriptiontext field populated → filter passes ✓description.keywordis null → aggregation sees null ✗missing_bucket: true, these appear as[3, null]in resultsVerification:
Tentative Root Cause
In
/opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java(lines 207-215), thebucketNullableflag defaults totrue:This flag controls
missing_bucketin/opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/aggregation/dsl/CompositeAggregationBuilder.java(lines 72-74):The system does not analyze WHERE clause filters (specifically
isnotnull()) to automatically setbucketNullable=falsefor fields with non-null constraints.Tentative Proposed Fix
Enhance the aggregation analyzer to detect
isnotnull()filters on fields being aggregated:AggregateAnalyzer.analyze(), extract fields withisnotnull()predicates from the Filter nodecreateTermsSourceBuilder()missingBucket(false)when the aggregation field has anisnotnull()filterSuggested Code Change:
Workaround
ignore_abovelimit to accommodate all expected valuesignore_aboveif all values should be indexedEnvironment Information
OpenSearch Version:
OpenSearch 3.3.0-SNAPSHOT
Additional Details:
isnotnull()condition explicitly, only<>($2, ''), but the physical plan correctly includes theexistsfilterImpact
This bug violates the semantic meaning of
isnotnull()filter in aggregation queries, leading to incorrect query results. It particularly affects:ignore_aboveThe bug is subtle because the filter appears to work (exists check passes) but the aggregation produces incorrect results due to the field type mismatch between text and keyword subfields.