Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HdrHistogram percentiles null pointer exception #96626

Closed
salvatore-campagna opened this issue Jun 6, 2023 · 2 comments · Fixed by #96668
Closed

HdrHistogram percentiles null pointer exception #96626

salvatore-campagna opened this issue Jun 6, 2023 · 2 comments · Fixed by #96668
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@salvatore-campagna
Copy link
Contributor

salvatore-campagna commented Jun 6, 2023

Elasticsearch Version

8.7.0

Installed Plugins

No response

Java Version

bundled

OS Version

All

Problem Description

Running a percentiles aggregation results in a null pointer exception.

The query is the following and is used to transform some data at ingestion time:

POST _transform/_preview
{
  "source": {
    "index": [
      "<.ds-service-proxy-requests-filebeat-*-{now/d-1d{yyyy.MM.dd}}-*>",
      "<.ds-service-proxy-requests-filebeat-*-{now/d{yyyy.MM.dd}}-*>"
    ]
  },
  "pivot": {
    "group_by": {
      "@timestamp": {
        "date_histogram": {
          "field": "@timestamp",
          "fixed_interval": "5m"
        }
      }
    },
    "aggregations": {
      "total_request": {
        "value_count": {
          "field": "status_code"
        }
      },
      "successful_request": {
        "filter": {
          "bool": {
            "should": [{
              "bool": {
                "must_not": {
                  "exists": { "field": "proxy_status_code" }
                }
              }
            }, {
              "range": {
                "proxy_status_code": { "lt": 500 }
              }
            }],
            "minimum_should_match": 1
          }
        },
        "aggregations": {
          "count": {
            "value_count": {
              "field": "status_code"
            }
          }
        }
      },
      "private_link_request": {
        "filter": {
          "bool": {
            "must": {
              "exists": { "field": "link_id" }
            }
          }
        },
        "aggregations": {
          "count": {
            "value_count": {
              "field": "link_id"
            }
          },
          "successful_request": {
            "filter": {
              "bool": {
                "should": [{
                  "bool": {
                    "must_not": {
                      "exists": { "field": "proxy_status_code" }
                    }
                  }
                }, {
                  "range": {
                    "proxy_status_code": { "lt": 500 }
                  }
                }],
                "minimum_should_match": 1
              }
            },
            "aggregations": {
              "count": {
                "value_count": {
                  "field": "status_code"
                }
              }
            }
          }
        }
      },
      "firehose_request": {
        "filter": {
          "bool": {
            "must": {
              "exists": { "field": "firehose_request_id" }
            }
          }
        },
        "aggregations": {
          "count": {
            "value_count": {
              "field": "firehose_request_id"
            }
          },
          "successful_request": {
            "filter": {
              "bool": {
                "should": [{
                  "bool": {
                    "must_not": {
                      "exists": { "field": "proxy_status_code" }
                    }
                  }
                }, {
                  "range": {
                    "proxy_status_code": { "lt": 500 }
                  }
                }],
                "minimum_should_match": 1
              }
            },
            "aggregations": {
              "count": {
                "value_count": {
                  "field": "status_code"
                }
              }
            }
          }
        }
      },
      "request_length": {
        "sum": {
          "field": "request_length"
        }
      },
      "response_length": {
        "sum": {
          "field": "response_length"
        }
      },
      "proxy_errors_breakdown": {
        "terms": {
          "field": "status_reason",
          "size": 20,
          "exclude": ["-"]
        }
      },
      "proxy_internal_time": {
        "percentiles": {
          "field": "proxy_internal_time",
          "percents": [
            50,
            95,
            99
          ],
          "missing": 0, 
          "hdr": {                                  
            "number_of_significant_value_digits": 2
          }
        }
      }
    }
  },
  "description": "Dataset that powers the Proxy review dashboard",
  "dest": {
    "index": "transform-proxy-service-review"
  },
  "settings": {
    "max_page_search_size": 1000,
    "use_point_in_time": false
  },
  "sync": {
    "time": {
      "field": "@timestamp"
    }
  }
}

and the stack trace:

{
  "error": {
    "root_cause": [
      {
        "type": "null_pointer_exception",
        "reason": """Cannot invoke "org.HdrHistogram.DoubleHistogram.getTotalCount()" because "this.state" is null""",
        "stack_trace": """org.elasticsearch.ElasticsearchException$1: Cannot invoke "org.HdrHistogram.DoubleHistogram.getTotalCount()" because "this.state" is null
	at org.elasticsearch.server@8.7.0/org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:668)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.ElasticsearchException.generateFailureXContent(ElasticsearchException.java:596)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.rest.RestResponse.build(RestResponse.java:175)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.rest.RestResponse.<init>(RestResponse.java:123)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.rest.RestResponse.<init>(RestResponse.java:102)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.rest.action.RestActionListener.onFailure(RestActionListener.java:55)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.rest.action.RestCancellableNodeClient$1.onFailure(RestCancellableNodeClient.java:96)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.client.internal.node.NodeClient$SafelyWrappedActionListener.onFailure(NodeClient.java:170)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.tasks.TaskManager$1.onFailure(TaskManager.java:218)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.support.ContextPreservingActionListener.onFailure(ContextPreservingActionListener.java:38)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.ActionListener$Delegating.onFailure(ActionListener.java:97)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.ActionListener$2.onFailure(ActionListener.java:175)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.ActionListener$2.onFailure(ActionListener.java:175)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:169)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:31)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.client.internal.node.NodeClient$SafelyWrappedActionListener.onResponse(NodeClient.java:160)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.tasks.TaskManager$1.onResponse(TaskManager.java:209)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.tasks.TaskManager$1.onResponse(TaskManager.java:203)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:31)
	at org.elasticsearch.security@8.7.0/org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$applyInternal$2(SecurityActionFilter.java:165)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.ActionListener$DelegatingFailureActionListener.onResponse(ActionListener.java:250)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.ActionListener$RunAfterActionListener.onResponse(ActionListener.java:392)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.AbstractSearchAsyncAction.sendSearchResponse(AbstractSearchAsyncAction.java:722)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.FetchLookupFieldsPhase.run(FetchLookupFieldsPhase.java:75)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:469)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:463)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.ExpandSearchPhase.onPhaseDone(ExpandSearchPhase.java:151)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.ExpandSearchPhase.run(ExpandSearchPhase.java:105)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:469)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:463)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.FetchSearchPhase.moveToNextPhase(FetchSearchPhase.java:271)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.FetchSearchPhase.lambda$innerRun$2(FetchSearchPhase.java:108)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.FetchSearchPhase.innerRun(FetchSearchPhase.java:125)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:90)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:958)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1589)
Caused by: java.lang.NullPointerException: Cannot invoke "org.HdrHistogram.DoubleHistogram.getTotalCount()" because "this.state" is null
	at org.elasticsearch.server@8.7.0/org.elasticsearch.search.aggregations.metrics.InternalHDRPercentiles$Iter.next(InternalHDRPercentiles.java:98)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.search.aggregations.metrics.InternalHDRPercentiles$Iter.next(InternalHDRPercentiles.java:78)
	at org.elasticsearch.xpack.transform.transforms.pivot.AggregationResultUtils$PercentilesAggExtractor.value(AggregationResultUtils.java:331)
	at org.elasticsearch.xpack.transform.transforms.pivot.AggregationResultUtils.lambda$extractCompositeAggregationResults$1(AggregationResultUtils.java:143)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
	at org.elasticsearch.xpack.transform.transforms.common.AbstractCompositeAggFunction.lambda$preview$0(AbstractCompositeAggFunction.java:96)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:167)
	... 27 more
"""
      }
    ],
    "type": "null_pointer_exception",
    "reason": """Cannot invoke "org.HdrHistogram.DoubleHistogram.getTotalCount()" because "this.state" is null""",
    "stack_trace": """java.lang.NullPointerException: Cannot invoke "org.HdrHistogram.DoubleHistogram.getTotalCount()" because "this.state" is null
	at org.elasticsearch.server@8.7.0/org.elasticsearch.search.aggregations.metrics.InternalHDRPercentiles$Iter.next(InternalHDRPercentiles.java:98)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.search.aggregations.metrics.InternalHDRPercentiles$Iter.next(InternalHDRPercentiles.java:78)
	at org.elasticsearch.xpack.transform.transforms.pivot.AggregationResultUtils$PercentilesAggExtractor.value(AggregationResultUtils.java:331)
	at org.elasticsearch.xpack.transform.transforms.pivot.AggregationResultUtils.lambda$extractCompositeAggregationResults$1(AggregationResultUtils.java:143)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
	at org.elasticsearch.xpack.transform.transforms.common.AbstractCompositeAggFunction.lambda$preview$0(AbstractCompositeAggFunction.java:96)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:167)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:31)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.client.internal.node.NodeClient$SafelyWrappedActionListener.onResponse(NodeClient.java:160)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.tasks.TaskManager$1.onResponse(TaskManager.java:209)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.tasks.TaskManager$1.onResponse(TaskManager.java:203)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:31)
	at org.elasticsearch.security@8.7.0/org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$applyInternal$2(SecurityActionFilter.java:165)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.ActionListener$DelegatingFailureActionListener.onResponse(ActionListener.java:250)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.ActionListener$RunAfterActionListener.onResponse(ActionListener.java:392)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.AbstractSearchAsyncAction.sendSearchResponse(AbstractSearchAsyncAction.java:722)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.FetchLookupFieldsPhase.run(FetchLookupFieldsPhase.java:75)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:469)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:463)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.ExpandSearchPhase.onPhaseDone(ExpandSearchPhase.java:151)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.ExpandSearchPhase.run(ExpandSearchPhase.java:105)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:469)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:463)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.FetchSearchPhase.moveToNextPhase(FetchSearchPhase.java:271)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.FetchSearchPhase.lambda$innerRun$2(FetchSearchPhase.java:108)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.FetchSearchPhase.innerRun(FetchSearchPhase.java:125)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:90)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:958)
	at org.elasticsearch.server@8.7.0/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1589)
"""
  },
  "status": 500
}

Steps to Reproduce

Unable to reproduce besides running the query in https://logging.us-east-1.aws.elastic-cloud.com/

Logs (if relevant)

No response

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 6, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@salvatore-campagna
Copy link
Contributor Author

salvatore-campagna commented Jun 6, 2023

I investigated the issue and it looks like that:

  1. we set state to null only in two cases: when deserializing AbstractInternalHDRPercentiles from Elasticsearch nodes running versions < 8.7.0 (not the case since the cluster is running all nodes with version 8.7.0) and when constructing empty aggregations (see InternalHDRPercentiles#empty method)
  2. the issue happens when trying to iterate percentile values using the iterator (see class InternalHDRPercentiles#Iter). The next method uses state without checking for null.
  3. the iterator is used by AggregationResultUtils#value used by AggregationResultUtils#extractCompositeAggregationResults that is used by Pivot (the composite aggregation).

So even if we are not able to reproduce the issue with information available right now probably trying to test it with a percentiles aggregation nested in a pivot aggregation (with empty results) might be a good first try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants