Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sum Aggregation On Nested Doc Not Handling Negative Numbers #9979

Closed
natenash203 opened this issue Mar 4, 2015 · 7 comments
Closed

Sum Aggregation On Nested Doc Not Handling Negative Numbers #9979

natenash203 opened this issue Mar 4, 2015 · 7 comments

Comments

@natenash203
Copy link

I have a an index with 30M documents, spread across 4 nodes, 30 shards, with 2 replicas each. Each document has a nested mapping type "a", representing a transaction log entry. Each transaction log entry has a positive or negative dollar value and a timestamp. When I attempt to bucket the nested objects using the date range and sum aggregations, the sum aggregation appears to break on transactions with both positive and negative numbers.

For example, consider the following query. I include a reverse nested agg to show that the nested objects are different than the output of the sum agg.

{
    "size": 0,
    "query": {
        "filtered": {
            "query": {
                "match_all": {}
            },
            "filter": {
                "bool": {
                    "must": [
                        {
                            "range": {
                                "timestamp": {
                                    "gte": "2011-10-01"
                                }
                            }
                        },
                        {
                            "term": {
                                "bar.name": "ASpecialName"
                            }
                        }
                    ]
                }
            }
        }
    },
    "aggs": {
        "BAR": {
            "terms": {
                "field": "bar.name",
                "size": 2,
                "order": {
                    "NESTED>TOTAL": "asc"
                }
            },
            "aggs": {
                "NESTED": {
                    "nested": {
                        "path": "a"
                    },
                    "aggs": {
                        "TOTAL": {
                            "sum": {
                                "field": "a.obligatedamount"
                            }
                        },
                        "DATES": {
                            "date_range": {
                                "field": "a.signeddate",
                                "keyed": true,
                                "ranges": [
                                    {
                                        "key": "FY2012",
                                        "from": "2011-10-01",
                                        "to": "2012-09-30"
                                    },
                                    {
                                        "key": "FY2013",
                                        "from": "2012-10-01",
                                        "to": "2013-09-30"
                                    },
                                    {
                                        "key": "FY2014",
                                        "from": "2013-10-01",
                                        "to": "2014-09-30"
                                    },
                                    {
                                        "key": "FY2015",
                                        "from": "2014-10-01",
                                        "to": "2015-09-30"
                                    }
                                ]
                            },
                            "aggs": {
                                "DATEBUCKET_SUBTOTAL": {
                                    "sum": {
                                        "field": "a.obligatedamount"
                                    }
                                },
                                "HITS_REVERSE": {
                                    "reverse_nested": {},
                                    "aggs": {
                                        "HITS": {
                                            "top_hits": {
                                                "_source": {
                                                    "include": [
                                                        "a.obligatedamount",
                                                        "a.signeddate"
                                                    ]
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

The following result is returned. Note the odd number in the DATEBUCKET_SUBTOTAL in FY2012 for "ASpecialPlace". This appears to only be an issue when the transaction log contains both negative and positive numbers. (Response slightly truncated...)

{
    "aggregations": {
        "BAR": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "ASpecialPlace",
                    "doc_count": 2,
                    "NESTED": {
                        "doc_count": 3,
                        "DATES": {
                            "buckets": {
                                "FY2012": {
                                    "from_as_string": "2011-10-01T00:00:00.000Z",
                                    "to_as_string": "2012-09-30T00:00:00.000Z",
                                    "doc_count": 1,
                                    "HITS_REVERSE": {
                                        "doc_count": 1,
                                        "HITS": {
                                            "hits": {
                                                "total": 1,
                                                "max_score": 1,
                                                "hits": [
                                                    {
                                                        "_index": "myIndex",
                                                        "_type": "myType",
                                                        "_id": "AG9J61P110025",
                                                        "_score": 1,
                                                        "_source": {
                                                            "a": [
                                                                {
                                                                    "signeddate": "2011-02-01T00:00:00+0000",
                                                                    "obligatedamount": 4000
                                                                },
                                                                {
                                                                    "signeddate": "2012-07-11T00:00:00+0000",
                                                                    "obligatedamount": -1694
                                                                }
                                                            ]
                                                        }
                                                    }
                                                ]
                                            }
                                        }
                                    },
                                    "DATEBUCKET_SUBTOTAL": {
                                        "value": -8.365e-321
                                    }
                                },
                                "FY2013": {
                                    "from_as_string": "2012-10-01T00:00:00.000Z",
                                    "to": 1380499200000,
                                    "doc_count": 0,
                                    "HITS_REVERSE": {
                                        "doc_count": 0,
                                        "HITS": {
                                            "hits": {
                                                "total": 0,
                                                "max_score": null,
                                                "hits": []
                                            }
                                        }
                                    },
                                    "DATEBUCKET_SUBTOTAL": {
                                        "value": 0
                                    }
                                },
                                "FY2014": {
                                    "from_as_string": "2013-10-01T00:00:00.000Z",
                                    "to_as_string": "2014-09-30T00:00:00.000Z",
                                    "doc_count": 1,
                                    "HITS_REVERSE": {
                                        "doc_count": 1,
                                        "HITS": {
                                            "hits": {
                                                "total": 1,
                                                "max_score": 1,
                                                "hits": [
                                                    {
                                                        "_index": "myIndex",
                                                        "_type": "myType",
                                                        "_id": "AG04GGP140011",
                                                        "_score": 1,
                                                        "_source": {
                                                            "a": [
                                                                {
                                                                    "signeddate": "2013-12-19T00:00:00+0000",
                                                                    "obligatedamount": 3449
                                                                }
                                                            ]
                                                        }
                                                    }
                                                ]
                                            }
                                        }
                                    },
                                    "DATEBUCKET_SUBTOTAL": {
                                        "value": 3449
                                    }
                                },
                                "FY2015": {
                                    "from_as_string": "2014-10-01T00:00:00.000Z",
                                    "to_as_string": "2015-09-30T00:00:00.000Z",
                                    "doc_count": 0,
                                    "HITS_REVERSE": {
                                        "doc_count": 0,
                                        "HITS": {
                                            "hits": {
                                                "total": 0,
                                                "max_score": null,
                                                "hits": []
                                            }
                                        }
                                    },
                                    "DATEBUCKET_SUBTOTAL": {
                                        "value": 0
                                    }
                                }
                            }
                        },
                        "TOTAL": {
                            "value": 3449
                        }
                    }
                }
            ]
        }
    }
}

If I set a filter at the nested level to remove any nested "a" objects with a negative value, everything works fine. Note, this behavior is also apparent if using a stats agg instead of just a sum.

@clintongormley
Copy link

@martijnvg @jpountz any idea where things might be going wrong here?

@jpountz
Copy link
Contributor

jpountz commented Mar 16, 2015

The exponent of the sum is suspicious (-321), it's typically what you would get when interpreting bits of a small long as a double. For instance Double.longBitsToDouble(2000) returns 9.88E-321. So I'm wondering if this could be this mapping issue we have seen a couple of times where shards would mistakenly interpret long as doubles because of inconsistent mappings across shards.

@jpountz
Copy link
Contributor

jpountz commented Mar 16, 2015

I quickly tried to reproduce the issue with no success. Here is the script I used:

curl -XDELETE 'localhost:9200/test'

curl -XPUT 'localhost:9200/test' -d '
{
  "mappings": {
    "test": {
      "properties": {
        "a": {
          "type": "nested",
          "properties": {
            "d": {
              "type": "date"
            },
            "v": {
              "type": "long"
            }
          }
        }
      }
    }
  }
}'

curl -XPUT 'localhost:9200/test/test/1' -d '
{
  "a": [
    {
      "d": 10000000,
      "v": 4000
    },
    {
      "d": 20000000,
      "v": -1694
    }
  ]
}'

curl -XPOST 'localhost:9200/test/_refresh'

curl -XGET 'localhost:9200/test/_search' -d '
{
  "aggs": {
    "NESTED": {
      "nested": {
        "path": "a"
      },
      "aggs": {
        "TOTAL": {
          "sum": {
            "field": "a.v"
          }
        },
        "HISTO": {
          "date_histogram": {
            "field": "a.d",
            "interval": "day"
          },
          "aggs": {
            "TOTAL2": {
              "sum": {
                "field": "a.v"
              }
            }
          }
        }
      }
    }
  }
}'

@jpountz
Copy link
Contributor

jpountz commented Mar 16, 2015

@natenash203 Could you let us know which elasticsearch version you are running and whether you rely on dynamic mappings?

@natenash203
Copy link
Author

@jpountz Currently running 1.4.4. However, I am pretty sure I was running 1.4.0 when I initially created the index, set the mappings, and indexed the data.

W/re to mapping, I use a mix of explicit and dynamic mappings. The field in question however, is mapped dynamically. Currently, the index is reporting it's mapped as a double.

For reference, the range of possible values for the "v" field go from roughly -11,000,000,000.00 to 11,000,000,000.00 and as they are dollars, will include decimal places to the hundreds (i.e 12.34).

Do you think I should explicitly map the "v" field to a float? Perhaps double, but isn't that not good for currency values?

@jpountz
Copy link
Contributor

jpountz commented Mar 16, 2015

OK, then I probably know what happened:

  • two documents have been indexed on two different shards at about the same time
  • one shard mapped the field as a long and the other one as a double
  • the double mapping was the first one propagated to the master node so it "won"
  • later the shard mapped with a long has been relocated to somewhere else where the field was mapped as a double and so its indexed fields are now interpreted as a double although they were indexed as a long

This is unfortunately a know issue, see #8688 for more background.

Do you think I should explicitly map the "v" field to a float? Perhaps double, but isn't that not good for currency values?

Explicit mapping is certainly a good idea as it avoids running into #8688. Regarding the type of the field, a good option is usually to store the number of cents as a long in order to avoid floating-point rounding issues. 64 bits would be more than enough to store the range of values you are interested in.

Closing as a duplicate of #8688

@jpountz jpountz closed this as completed Mar 16, 2015
@natenash203
Copy link
Author

After looking at #8688, that looks right to me. When I initially indexed, my "v" values were sometimes floats and sometimes doubles. I index at about 1000 docs/sec so with 30 shards, it's entirely possible, one shard dynamically mapped a float and another mapped a double.

I will reindex with an explicit mapping and go from there. Thanks much for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants