Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Query on rollup index for average aggegation metric is giving incorrect results #440

Open
Sreevani871 opened this issue Apr 19, 2021 · 3 comments
Labels
bug Something isn't working

Comments

@Sreevani871
Copy link

Sreevani871 commented Apr 19, 2021

Describe the bug
Same Aggregation query is being fired on source index and rollup index for aggregation metric values comparision, Results are not matching. Average aggregation query on rollup index giving incorrect results.

Rollup Job Configuration
curl -XPUT "localhost:9200/_opendistro/_rollup/jobs/rollup-test?pretty" -H "Content-Type:application/json" -d '{ "rollup": { "enabled": true, "schedule": { "cron": { "expression": "*/1 * * * *", "timezone":"UTC" } }, "description": "Test rollup job", "source_index": "jaeger-span-2021.04.17-000103", "target_index": "rollup-test", "page_size": 5000, "delay": 300, "continuous": false, "dimensions": [ { "date_histogram": { "source_field": "startTimeMillis", "fixed_interval": "1h", "timezone": "UTC" } }, { "terms": { "source_field": "process.serviceName" } }, { "terms": { "source_field": "process.tag.application@version" } }, { "terms": { "source_field": "operationName" } }, { "terms": { "source_field": "exception.type" } }, { "terms": { "source_field": "exception.message" } } ], "metrics": [ { "source_field": "duration", "metrics": [ { "avg": {} }, { "max": {} }, { "min": {} }, { "sum": {} }, { "value_count": {} } ] } ] } } '
Query on Rollup Index
Request
curl -X GET "localhost:9200/rollup-test/_search?pretty&size=0" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "terms": { "process.serviceName": [ "service-xxxxxx" ] } } ] } }, "aggregations": { "timeline": { "date_histogram": { "field": "startTimeMillis", "fixed_interval": "1h" }, "aggs": { "service": { "terms": { "field": "process.serviceName" }, "aggs": { "avg_duration": { "avg": { "field": "duration" } }, "max_duration": { "max": { "field": "duration" } }, "min_duration": { "min": { "field": "duration" } }, "count": { "value_count": { "field": "duration" } }, "sum": { "sum": { "field": "duration" } } } } } } } }'
Response
rollup-index-response.txt

Query on Source Index
Request
curl -X GET "localhost:9200/jaeger-span-2021.04.17-000103/_search?pretty&size=0" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "terms": { "process.serviceName": [ "service-xxxxxx" ] } } ] } }, "aggregations": { "timeline": { "date_histogram": { "field": "startTimeMillis", "fixed_interval": "1h" }, "aggs": { "service": { "terms": { "field": "process.serviceName" }, "aggs": { "avg_duration": { "avg": { "field": "duration" } }, "max_duration": { "max": { "field": "duration" } }, "min_duration": { "min": { "field": "duration" } }, "count": { "value_count": { "field": "duration" } }, "sum": { "sum": { "field": "duration" } } } } } } } }'

Response
source-index-response.txt

Setup Details

All other metrics SUM, VALUE_COUNT, MIN, MAX are giving correct results and matching with aggregation metrics of source index. Only Average is giving incorrect results.
Consider following example taken from response of Rollup index query:
{ "key_as_string" : "2021-04-17T02:00:00.000Z", "key" : 1618624800000, "doc_count" : 562, "service" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "service-xxxxxx", "doc_count" : 562, "avg_duration" : { "value" : 754.1463076048377 }, "count" : { "value" : 2847569 }, "min_duration" : { "value" : 37.0 }, "sum" : { "value" : 1.5818190941E10 }, "max_duration" : { "value" : 2.07551568E8 } } ] } }
Here the expected avg_duration: 1.5818190941E10 / 2847569 = 5,554.9807365511 but the actual value resulted in response is avg_duration = 754.1463076048377

Can anyone explain the reason behind this discrepancy?

@Sreevani871 Sreevani871 added the bug Something isn't working label Apr 19, 2021
@RashmiRam
Copy link

RashmiRam commented Apr 22, 2021

This line https://github.com/opendistro-for-elasticsearch/index-management/blob/v1.12.0.0/src/main/kotlin/com/amazon/opendistroforelasticsearch/indexmanagement/rollup/util/RollupUtils.kt#L246 should be changed to state.sums = 0L; state.counts = 0L;

Ref: https://www.elastic.co/guide/en/elasticsearch/painless/7.10/painless-literals.html#integer-literals
Ref: elastic/elasticsearch#27199

All the aggs which shows wrong value for avg assumes sum as 2147483647 and divide that by count. Resulting in wrong values. This can be verified by multiplying the avg with count to arrive at this number(2147483647) for sum (For each wrong avg values in rolled up search)

@Sreevani871
Copy link
Author

Sreevani871 commented Apr 23, 2021

Any help here @dbbaughe ?
One more issue is with the delay field in rollup job configuration, When I configured the job with continuous field set true and delay field set to 300000(milliseconds), The execution of the job is not honouring the delay time.
In code delay field type is defined as long. What time-unit does it get converted during execution?

@Sreevani871
Copy link
Author

Any help here?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants