scaled_float still has precision problem #32570

kyreddie · 2018-08-02T00:42:55Z

Reproduced on Elasticsearch version 6.3.2 in Elastic Cloud

Looks like this was addressed in [https://github.com//pull/27207] but problem still exists.

Steps to reproduce:

PUT testindex{
"settings":{
"number_of_shards":1
},
"mappings":{
"type1":{
"properties":{
"field1":{
"type":"scaled_float",
"scaling_factor":100
}
}
}
}
}

PUT /testindex/type1/1?pretty{
"field1":79.99
}

GET testindex/_search{
"query":{
"bool":{
"must":[
{
"range":{
"field1":{
"gte":0,
"lte":79.99
}
}
}
]
}
}
}

Query results in 0 hits. Looks like the query is being resolved to field1:[0 TO 7998], which is incorrect. I would expect to return document

Looking at the code, could we cast dValue to a float?? hi = Math.round(Math.floor((float)dValue));

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-08-02T00:42:56Z

Pinging @elastic/es-search-aggs

kyreddie · 2018-08-02T00:49:58Z

cc: @original-brownbear

original-brownbear · 2018-08-03T14:02:53Z

I'm not sure this is a bug actually. The scaling factor is just too small as far as I understand the definition of the scaling factor.
If you scale by 100 then you will still effectively round the last digit here since we have 4 digits here (e.g. on my box 79.99d * 100 = 7998.9999999... => casts to 7998 since we're always rounding down for lte comparisons).

@jpountz probably knows this better than me though :) so ping :)

sergiosalvatore · 2018-08-03T16:37:28Z

Hmm -- seems like there's definitely some confusion then. From the Elasticsearch documentation:

For instance, a price field could be stored in a scaled_float with a scaling_factor of 100. All APIs would work as if the field was stored as a double, but under the hood Elasticsearch would be working with the number of cents, price*100, which is an integer.

In terms of "round tripping," my understanding according to the documentation referenced above is that internally, when a document with the value 79.99 is stored in a scaled_float field, it's stored as 7999 in the index. When I request that stored document, the value is indeed returned as 79.99 scaled appropriately. It seems that it should follow the same process for filters as it is when storing/retrieving the value.

eskibars · 2018-08-03T23:39:52Z

At minimum, we need to make a better declaration on how this works and limitations. The documentation currently reads:

Values will be multiplied by this factor at index time and rounded to the closest long value.

For an input value of 79.99 and a scaling factor of 100, I think it's reasonable for a user to expect 79.99*100=7999. I don't think the user reasonably expects that 79.99 * 10 = 7998.9999... unless you're familiar with the internals. The docs either need to imply round down logic or the feature needs to give you the closest value really / we need to define how a user should understand it to work.

colings86 · 2018-08-10T13:41:01Z

We discussed this in FixItFriday and the outcome is that we feel this is a bug as users do not know (and should not reasonably know) the ins and outs of binary floating point representations. The problem here seems to be centred around us doing the wrong thing to resolve the value of the lte field. We should fix this so that instead of just doing 79.99 * 100 here we do Math.floor(79.99 * 100 + 0.5) and make the bound lt instead of lte so we search from 0 to less than "the next representable value higher than 79.99)

jpountz · 2018-08-13T08:50:04Z

This approach would still have similar problems, they would just occur on the half point between representable values rather than on representable values themselves? Alternatively maybe we could use big decimals to delay rounding to after the multiplication (which is where the fact that doubles do not represent numbers accurately gets amplified) by replacing Double.parseDouble(boundary) * scalingFactor with something like new BigDecimal(boundary).multiply(BigDecimal.valueOf(scalingFactor)).doubleValue()?

original-brownbear · 2018-08-13T10:24:39Z

@jpountz

I don't think using BigDecimal here will help, if I try it in the debugger then:

new BigDecimal(parse(upperTerm)).multiply(BigDecimal.valueOf(scalingFactor)).doubleValue() == parse(upperTerm) * scalingFactor

returns true.

I think this is a principal problem with double and the equals-type comparisons:

We can't just use arithmetic rounding here to resolve the fact that 79.99 will be slightly less than than internally because that would mess up the behaviour for things like < 17.55 with a small scaling factor because it could round to 18.

=> we can't really fix the feature to get the desired behaviour here as far as I can see.

=> We should probably fix the docs?
... the other option I see would be to change the storage behaviour scaled_float slightly and simply use another digit internally. That way we'd get the expected behaviour because we wouldn't have a rounding error on the last digit at the cost of storing an extra digit wouldn't we?

jpountz · 2018-08-13T11:13:23Z

@original-brownbear Try new BigDecimal(Double.toString(parse(upperTerm))).multiply(BigDecimal.valueOf(scalingFactor)).doubleValue(). The BigDecimal needs to be constructed out of the string representation of the number, otherwise it inherits the accuracy loss of the double representation which causes this issue.

original-brownbear · 2018-08-13T11:17:07Z

@jpountz huh you're right! That works :)

jshell> new BigDecimal(Double.toString(79.99)).multiply(BigDecimal.valueOf(100)).doubleValue()
$6 ==> 7999.0

DaveCTurner · 2018-08-14T11:30:09Z

I think I suggested the Math.floor(79.99 * 100 + 0.5) idea to fix this query for representable values but it's right that it doesn't work for some value that's around half-way between two representable values. BigDecimal seems like a good answer to dealing with unrepresentable values correctly but it seems like any double in the process is going to lead to difficulty because of facts like 79.999999999999993 == 80.0:

DELETE /i

PUT /i
{
  "mappings": {
    "_doc": {
      "properties": {
        "f": {
          "scaling_factor": 100,
          "type": "scaled_float"
        }
      }
    }
  },
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  }
}

POST /i/_doc/_bulk?refresh
{"index":{}}
{"f":79.98}
{"index":{}}
{"f":79.99}
{"index":{}}
{"f":80.00}

GET /i/_search
{
  "query": {
    "range": {
      "f": {
        "lte": 79.999999999999993
      }
    }
  }
}

I think this should not yield all 3 docs, but it does.

jpountz · 2018-08-14T13:53:06Z

Yeah I was going to comment with an example like that after reading that "users do not know (and should not reasonably know) the ins and outs of binary floating point representations". I think this is a good example why we can't (reasonably) hide this problem entirely from our users.

I'm not too worried about this issue since users will only face it if they represent numbers on client-side with something that is more accurate than doubles, which is uncommon.

Also for the record, scaled_float is not the only type affected by this issue, all our numeric types are.

jpountz · 2018-08-27T16:01:12Z

Even though this won't solve all issues, we agreed to implement the above proposal (#32570 (comment)), which will remove some surprises. We should also document that scaled floats are prone to floating-point accuracy issues, the fact that our docs use scaling factors of 100 or 1000 which might suggest like this is actually a decimal type.

sergiosalvatore · 2018-09-11T21:57:34Z

If this is the final solution here, I might recommend that you change the example referenced in the documentation that specifically refers to pricing (which is where we used it). While I do understand the intricacies of floating point math, my reading of the documentation left me with the sense that I didn't have to worry about that because the work was being done behind the scenes by elastic. Since that's not really the case, I don't think using this feature for pricing is appropriate. Instead once should just store the smallest non-fractional amount (like cents, etc.) and deal with the rounding outside.

jpountz · 2018-09-12T07:23:31Z

👍 We should show an example that uses a percentage or something like that instead.

sergiosalvatore · 2018-10-02T20:08:07Z

Is there a place where I can see when this gets slotted in for development?

mayya-sharipova · 2018-10-02T20:39:31Z

@sergiosalvatore When there a PR for this feature, you will be able to see the link to this PR in this issue

* Use `toString` and `Bigdecimal` parsing to get intuitive behaviour for `scaled_float` as discussed in elastic#32570 * Closes elastic#32570

* Use `toString` and `Bigdecimal` parsing to get intuitive behaviour for `scaled_float` as discussed in #32570 * Closes #32570

kyreddie added >bug :Search/Mapping Index mappings, including merging and defining field types labels Aug 2, 2018

colings86 added the discuss label Aug 6, 2018

jpountz added team-discuss and removed discuss labels Aug 24, 2018

jpountz added help wanted adoptme and removed team-discuss labels Aug 27, 2018

original-brownbear mentioned this issue Jan 6, 2019

MAPPING: Improve Precision for scaled_float #37169

Merged

original-brownbear closed this as completed in #37169 Jan 11, 2019

original-brownbear added a commit that referenced this issue Jan 11, 2019

Improve Precision for scaled_float (#37169)

860a8a7

* Use `toString` and `Bigdecimal` parsing to get intuitive behaviour for `scaled_float` as discussed in #32570 * Closes #32570

original-brownbear added a commit that referenced this issue Jan 11, 2019

Improve Precision for scaled_float (#37169)

70d5943

* Use `toString` and `Bigdecimal` parsing to get intuitive behaviour for `scaled_float` as discussed in #32570 * Closes #32570

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scaled_float still has precision problem #32570

scaled_float still has precision problem #32570

kyreddie commented Aug 2, 2018 •

edited

Loading

elasticmachine commented Aug 2, 2018

kyreddie commented Aug 2, 2018

original-brownbear commented Aug 3, 2018

sergiosalvatore commented Aug 3, 2018

eskibars commented Aug 3, 2018 •

edited

Loading

colings86 commented Aug 10, 2018

jpountz commented Aug 13, 2018

original-brownbear commented Aug 13, 2018

jpountz commented Aug 13, 2018

original-brownbear commented Aug 13, 2018

DaveCTurner commented Aug 14, 2018

jpountz commented Aug 14, 2018

jpountz commented Aug 27, 2018

sergiosalvatore commented Sep 11, 2018

jpountz commented Sep 12, 2018

sergiosalvatore commented Oct 2, 2018

mayya-sharipova commented Oct 2, 2018

scaled_float still has precision problem #32570

scaled_float still has precision problem #32570

Comments

kyreddie commented Aug 2, 2018 • edited Loading

elasticmachine commented Aug 2, 2018

kyreddie commented Aug 2, 2018

original-brownbear commented Aug 3, 2018

sergiosalvatore commented Aug 3, 2018

eskibars commented Aug 3, 2018 • edited Loading

colings86 commented Aug 10, 2018

jpountz commented Aug 13, 2018

original-brownbear commented Aug 13, 2018

jpountz commented Aug 13, 2018

original-brownbear commented Aug 13, 2018

DaveCTurner commented Aug 14, 2018

jpountz commented Aug 14, 2018

jpountz commented Aug 27, 2018

sergiosalvatore commented Sep 11, 2018

jpountz commented Sep 12, 2018

sergiosalvatore commented Oct 2, 2018

mayya-sharipova commented Oct 2, 2018

kyreddie commented Aug 2, 2018 •

edited

Loading

eskibars commented Aug 3, 2018 •

edited

Loading