Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scaled_float still has precision problem #32570

Closed
kyreddie opened this issue Aug 2, 2018 · 17 comments
Closed

scaled_float still has precision problem #32570

kyreddie opened this issue Aug 2, 2018 · 17 comments
Labels
>bug help wanted adoptme :Search/Mapping Index mappings, including merging and defining field types

Comments

@kyreddie
Copy link

kyreddie commented Aug 2, 2018

Reproduced on Elasticsearch version 6.3.2 in Elastic Cloud

Looks like this was addressed in [https://github.com//pull/27207] but problem still exists.

Steps to reproduce:

PUT testindex{
"settings":{
"number_of_shards":1
},
"mappings":{
"type1":{
"properties":{
"field1":{
"type":"scaled_float",
"scaling_factor":100
}
}
}
}
}

PUT /testindex/type1/1?pretty{
"field1":79.99
}

GET testindex/_search{
"query":{
"bool":{
"must":[
{
"range":{
"field1":{
"gte":0,
"lte":79.99
}
}
}
]
}
}
}

Query results in 0 hits. Looks like the query is being resolved to field1:[0 TO 7998], which is incorrect. I would expect to return document

Looking at the code, could we cast dValue to a float?? hi = Math.round(Math.floor((float)dValue));

@kyreddie kyreddie added >bug :Search/Mapping Index mappings, including merging and defining field types labels Aug 2, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@kyreddie
Copy link
Author

kyreddie commented Aug 2, 2018

cc: @original-brownbear

@original-brownbear
Copy link
Member

I'm not sure this is a bug actually. The scaling factor is just too small as far as I understand the definition of the scaling factor.
If you scale by 100 then you will still effectively round the last digit here since we have 4 digits here (e.g. on my box 79.99d * 100 = 7998.9999999... => casts to 7998 since we're always rounding down for lte comparisons).

@jpountz probably knows this better than me though :) so ping :)

@sergiosalvatore
Copy link

Hmm -- seems like there's definitely some confusion then. From the Elasticsearch documentation:

For instance, a price field could be stored in a scaled_float with a scaling_factor of 100. All APIs would work as if the field was stored as a double, but under the hood Elasticsearch would be working with the number of cents, price*100, which is an integer.

In terms of "round tripping," my understanding according to the documentation referenced above is that internally, when a document with the value 79.99 is stored in a scaled_float field, it's stored as 7999 in the index. When I request that stored document, the value is indeed returned as 79.99 scaled appropriately. It seems that it should follow the same process for filters as it is when storing/retrieving the value.

@eskibars
Copy link
Contributor

eskibars commented Aug 3, 2018

At minimum, we need to make a better declaration on how this works and limitations. The documentation currently reads:

Values will be multiplied by this factor at index time and rounded to the closest long value.

For an input value of 79.99 and a scaling factor of 100, I think it's reasonable for a user to expect 79.99*100=7999. I don't think the user reasonably expects that 79.99 * 10 = 7998.9999... unless you're familiar with the internals. The docs either need to imply round down logic or the feature needs to give you the closest value really / we need to define how a user should understand it to work.

@colings86
Copy link
Contributor

We discussed this in FixItFriday and the outcome is that we feel this is a bug as users do not know (and should not reasonably know) the ins and outs of binary floating point representations. The problem here seems to be centred around us doing the wrong thing to resolve the value of the lte field. We should fix this so that instead of just doing 79.99 * 100 here we do Math.floor(79.99 * 100 + 0.5) and make the bound lt instead of lte so we search from 0 to less than "the next representable value higher than 79.99)

@jpountz
Copy link
Contributor

jpountz commented Aug 13, 2018

This approach would still have similar problems, they would just occur on the half point between representable values rather than on representable values themselves? Alternatively maybe we could use big decimals to delay rounding to after the multiplication (which is where the fact that doubles do not represent numbers accurately gets amplified) by replacing Double.parseDouble(boundary) * scalingFactor with something like new BigDecimal(boundary).multiply(BigDecimal.valueOf(scalingFactor)).doubleValue()?

@original-brownbear
Copy link
Member

@jpountz

I don't think using BigDecimal here will help, if I try it in the debugger then:

new BigDecimal(parse(upperTerm)).multiply(BigDecimal.valueOf(scalingFactor)).doubleValue() == parse(upperTerm) * scalingFactor

returns true.

I think this is a principal problem with double and the equals-type comparisons:

  • We can't just use arithmetic rounding here to resolve the fact that 79.99 will be slightly less than than internally because that would mess up the behaviour for things like < 17.55 with a small scaling factor because it could round to 18.

=> we can't really fix the feature to get the desired behaviour here as far as I can see.

=> We should probably fix the docs?
... the other option I see would be to change the storage behaviour scaled_float slightly and simply use another digit internally. That way we'd get the expected behaviour because we wouldn't have a rounding error on the last digit at the cost of storing an extra digit wouldn't we?

@jpountz
Copy link
Contributor

jpountz commented Aug 13, 2018

@original-brownbear Try new BigDecimal(Double.toString(parse(upperTerm))).multiply(BigDecimal.valueOf(scalingFactor)).doubleValue(). The BigDecimal needs to be constructed out of the string representation of the number, otherwise it inherits the accuracy loss of the double representation which causes this issue.

@original-brownbear
Copy link
Member

@jpountz huh you're right! That works :)

jshell> new BigDecimal(Double.toString(79.99)).multiply(BigDecimal.valueOf(100)).doubleValue()
$6 ==> 7999.0

@DaveCTurner
Copy link
Contributor

I think I suggested the Math.floor(79.99 * 100 + 0.5) idea to fix this query for representable values but it's right that it doesn't work for some value that's around half-way between two representable values. BigDecimal seems like a good answer to dealing with unrepresentable values correctly but it seems like any double in the process is going to lead to difficulty because of facts like 79.999999999999993 == 80.0:

DELETE /i

PUT /i
{
  "mappings": {
    "_doc": {
      "properties": {
        "f": {
          "scaling_factor": 100,
          "type": "scaled_float"
        }
      }
    }
  },
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  }
}

POST /i/_doc/_bulk?refresh
{"index":{}}
{"f":79.98}
{"index":{}}
{"f":79.99}
{"index":{}}
{"f":80.00}

GET /i/_search
{
  "query": {
    "range": {
      "f": {
        "lte": 79.999999999999993
      }
    }
  }
}

I think this should not yield all 3 docs, but it does.

@jpountz
Copy link
Contributor

jpountz commented Aug 14, 2018

Yeah I was going to comment with an example like that after reading that "users do not know (and should not reasonably know) the ins and outs of binary floating point representations". I think this is a good example why we can't (reasonably) hide this problem entirely from our users.

I'm not too worried about this issue since users will only face it if they represent numbers on client-side with something that is more accurate than doubles, which is uncommon.

Also for the record, scaled_float is not the only type affected by this issue, all our numeric types are.

@jpountz
Copy link
Contributor

jpountz commented Aug 27, 2018

Even though this won't solve all issues, we agreed to implement the above proposal (#32570 (comment)), which will remove some surprises. We should also document that scaled floats are prone to floating-point accuracy issues, the fact that our docs use scaling factors of 100 or 1000 which might suggest like this is actually a decimal type.

@jpountz jpountz added help wanted adoptme and removed team-discuss labels Aug 27, 2018
@sergiosalvatore
Copy link

If this is the final solution here, I might recommend that you change the example referenced in the documentation that specifically refers to pricing (which is where we used it). While I do understand the intricacies of floating point math, my reading of the documentation left me with the sense that I didn't have to worry about that because the work was being done behind the scenes by elastic. Since that's not really the case, I don't think using this feature for pricing is appropriate. Instead once should just store the smallest non-fractional amount (like cents, etc.) and deal with the rounding outside.

@jpountz
Copy link
Contributor

jpountz commented Sep 12, 2018

👍 We should show an example that uses a percentage or something like that instead.

@sergiosalvatore
Copy link

Is there a place where I can see when this gets slotted in for development?

@mayya-sharipova
Copy link
Contributor

@sergiosalvatore When there a PR for this feature, you will be able to see the link to this PR in this issue

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Jan 6, 2019
* Use `toString` and `Bigdecimal` parsing to get intuitive behaviour for `scaled_float` as discussed in elastic#32570
* Closes elastic#32570
original-brownbear added a commit that referenced this issue Jan 11, 2019
* Use `toString` and `Bigdecimal` parsing to get intuitive behaviour for `scaled_float` as discussed in #32570
* Closes #32570
original-brownbear added a commit that referenced this issue Jan 11, 2019
* Use `toString` and `Bigdecimal` parsing to get intuitive behaviour for `scaled_float` as discussed in #32570
* Closes #32570
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug help wanted adoptme :Search/Mapping Index mappings, including merging and defining field types
Projects
None yet
Development

No branches or pull requests

9 participants