Remove the coerce option on numeric fields. #25861

jpountz · 2017-07-24T13:13:27Z

I don't like the fact that the coerce option combines a useful feature: the ability to convert strings to a number (which we are supposed to do in most APIs anyway) with a dangerous feature: the truncation (it does not even round) of floating-point numbers when indexed as a byte/short/integer/long.

I think we should remove this option, accept a string in all cases and require users to use an ingest processor if they want to ignore the decimal part of their floating-point numbers. Note that eg. "1.0" would still be accepted in an integer field since its decimal part is equal to zero.

jpountz · 2017-07-24T15:53:47Z

We had a brief discussion about it in the search meeting and there was consensus that enabling truncation by default can be trappy. We discussed the following alternatives:

keep the coerce option but make it only about accepting strings
add an option that tells the field what to do if there is a decimal part (truncate/floor/ceil/round)

We should also try to gather feedback about use-cases for truncating, if any, and what users should migrate to instead (the convert processor?).

jimczi · 2017-07-28T13:53:10Z

Discussed in Fix it Friday and we agreed that we should keep the coerce option and make it only about accepting strings. The automatic truncation is not a good option and should be replaced with an ingest processor that does the right thing (floor/ceil/round).

nablyy · 2017-07-31T06:05:26Z

hi I'd like to look into this as my first PR.
is there a good place to start looking in the code to address this?

jpountz · 2017-07-31T07:22:20Z

@nablyy You should look at NumberFieldMapper, and try to disallow floating-point numbers that have a decimal part on integer types and 7.x indices.

liketic · 2017-09-29T03:18:41Z

@nablyy If you're not working on this, can I take this?

nablyy · 2017-10-19T12:09:38Z

@liketic If you want, take this.

PnPie · 2017-11-01T22:18:54Z

Hello everyone,
If no one is working on this anymore, I would like to make a PR #27219 :) thx in advance for all your reviews and comments.

akhilravipati97 · 2018-02-15T18:36:37Z

@jpountz Hi, I am trying to understand the issue and I think I need some clarification. When you say make coerce all about accepting strings, do you mean that from now on all it should mean to a user is that it will convert strings into the target type as long as there is no need to manipulate the value itself and that invalid strings will now be simply errored instead of truncated etc?

i.e Do you mean, the change should be so that, "5.0" --> Integer [coerce, in this context would actually coerce the value to 5], but "5.1" --> Integer [coerce, in this context means nothing, because "5.1" can't be coerced to integer without manipulation, and so should result in error]

akotlar · 2018-03-09T01:47:55Z

I think this is a frankly terrible idea. It is perfectly understandable what happens when you convert a float to an int. It truncated the float in every sensible language.

Why conflate this wth questions about rounding? In what language does int(float) result in a rounded number?

I find this feature eminantly useful because it allows me to define int(float) in my indexing config. Please leave it in. This is a documentation issue at worst.

jpountz · 2018-03-13T18:09:04Z

@akotlar Agreed casting may be useful. The thing I dislike is that it happens implicitly.
cc @elastic/es-search-aggs

akotlar · 2018-03-13T18:12:48Z

@jpountz Sorry, maybe I misunderstood. Does it not only happen when coerce: true ? If so, isn't that explicit? I would agree if it did something like round implicitly: when coercing from float to int I expect truncation.

jpountz · 2018-04-04T09:52:21Z

@akotlar Could you give more details about the kind of fields that you index and want to truncate?

akotlar · 2018-04-04T17:19:44Z

Absolutely. I work with genetic data, indexing all mutations, and a lot of annotated information about those mutations, found in say 1000 people.

One of those annotations comes from dbSNP. As it arrives from UCSC Genome Browser, there is a field, alleleNs, which is an integer count ostensibly, because it is the averaging of number of chromosomes (most people have 2 copies of each chromosome) by number of people. Unfortunately, on rare occasion someone will have 3 or 1 copies of a chromosome. So the average ends up being a float. This happens very rarely, and I mostly don’t care about being off by < 1 once in a great while.

In general, genetic data can be really inconsistent, and so say 1/10000 entries will be badly formatted. These I want to cast to the majority type.

The way I wrote the program that handles all of this stuff (and which uses ES), I have attempted to place as much of my api as possible into YAML configs, so that one program could accommodate many permutations of this messy data. This maps really well to ES, since the search layer can be largely configured from a single Json/YAML. I don’t want to write programming logic or invest functions per annotation feature, since there are on the order of a thousand possible features, and millions of possibly desirable combinations of these (and the feature space grows monthly more or less). That is a good use case for config files.

So I realize that I could use an ingestion pipeline, but for simple transformations like this, I vastly prefer to just use an already existing mechanism (existant in the mapping configs ES supports) that can handle this in a perfectly understandable way.

jpountz · 2018-04-04T17:48:17Z

OK, I'm adding back the discuss label to make sure that we take your feedback into account.

On my end I would still prefer that we do this change. I understand there could be disagreement. And I also think there might be work to do on the integration of index APIs with pipelines. For instance the mapping option has the benefit that you can never skip it, while it's easy to just forget to pass the name of the ingest pipeline in a call to the index APIs.

akotlar · 2018-04-04T18:30:32Z

Sounds good

jpountz · 2018-08-24T14:11:48Z

We discussed this in FixitFriday and agreed to still remove support for coerce despite latest arguments. For the record, here is an example how to handle rounding on top of Elasticsearch thanks to an ingest pipeline. Another option would be to do the same on client-side before sending data to Elasticsearch.

PUT test
{
  "mappings": {
    "_doc": {
      "properties": {
        "my_long": {
          "type": "long",
          "coerce": false
        }
      }
    }
  }
}

PUT _ingest/pipeline/round_my_long
{
  "description": "describe pipeline",
  "processors": [
    {
      "script": {
        "lang": "painless",
        "source": "ctx[params.field] = (long) Math.round(ctx[params.field])",
        "params": {
          "field": "my_long"
        }
      }
    }
  ]
}

PUT test/_doc/1?pipeline=round_my_long
{
  "my_long": 3.6
}

akotlar · 2018-08-24T15:20:07Z

My position hasn't changed, it is clear that this is a needless complication.

Thanks for the ingestion pipeline tip.

mayya-sharipova · 2020-07-13T21:11:05Z

I am wondering if we still want to coerce values during queries. Currently we truncate decimal part on range queries on integer data types. I assume this is useful if we ran a query across different indexes where the same field name can have different field types.

jpountz · 2020-07-16T16:01:18Z

@mayya-sharipova Indeed we want to keep doing the right thing when querying integer field types with ranges on floating-point numbers. (We're not always truncating, we sometimes round up, sometimes down, so that querying integer fields with floating-point numbers always gives the same answer as if the field had been indexed as a floating-point number too.)

Float values (like 7.5 or 8.5) get stored in the Integer-mapped confidence field due to coercion [1], but when querying ES (as in a range query), the value is truncated/rounded [2]. E.g., if an indicator's conf is stored as 7.5, but you later query for indicators at 7.5, that indicator won't be returned b/c ES truncates the field value before querying to comply with the Integer mapping type. Fix this problem by remapping as Float. [1]: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/coerce.html [2]: elastic/elasticsearch#25861 (comment) Co-authored-by: wes <wesyoung@users.noreply.github.com>

Float values (like 7.5 or 8.5) get stored in the Integer-mapped confidence field due to coercion [1], but when querying ES (as in a range query), the value is truncated/rounded [2]. E.g., if an indicator's conf is stored as 7.5, but you later query for indicators at 7.5, that indicator won't be returned b/c ES truncates the field value before querying to comply with the Integer mapping type. [1]: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/coerce.html [2]: elastic/elasticsearch#25861 (comment) Co-authored-by: wes <wesyoung@users.noreply.github.com>

hendrikmuhs · 2020-11-20T14:48:15Z

I think this is a good initiative. However I am concerned about the breaking change when a field is mapped as integer type but _source contains a floating point part. Users might rely on this feature, a re-index would suddenly fail.

I think such a change should be done with at least 2 major release cycles: 1 to flip the default, so its strict by default and 2 to remove it completely.

I am not fully sure if accepting .0 is good or bad, unsigned_long which lacks coerce by design errors for .0. If .0 gets accepted, unsigned_long should be relaxed, too. It's probably better to be fully strict which is also more compliant with e.g. JSON.

mayya-sharipova · 2020-11-20T17:43:30Z

We've discussed this issue today in the Search meeting, and decided for now to postpone implementation of this change. One reason is that we still don't have an appealing way to index numbers with floating point as integers. Ingest pipeline can be complex for some use cases.
Another reason is runtime fields. As we change the behaviour of coerce option, it may be break runtime fields that were relying on the original behaviour.
As runtime fields are shaping up, they may provide better ways to extract values depending on index version and to index documents from source instead of using injest pipelines.

copernicus365 · 2022-05-12T18:09:25Z

Is it at least possible to have an opt-in setting that allows floating point numbers (doubles, decimals, etc), to not be truncated as integers (long)? That way there's no breaking change, while those of us who are being bit by floats / doubles being truncated have a way ff. Thanks

javanna · 2022-08-18T16:07:07Z

We have re-discussed this with the team. We are reluctant to change the behaviour of coerce due to the breaking nature of such change. We have discussed the possibility of adding a new option that would allow users to not truncate floating point numbers, but we are not sold on it given that documents can be adapted before ingestion, or through an ingest pipeline. Also, it is now possible to load the original value as float, or do any needed computation around it if necessary through a runtime field, that can even shadow the original field transparently for search consumers.

With this, I am leaning towards closing this issue. Please feel free to comment below if you think otherwise.

jpountz added :Search/Mapping Index mappings, including merging and defining field types discuss labels Jul 24, 2017

jpountz changed the title ~~Remove the coerce option.~~ Remove the coerce option on numeric fields. Jul 24, 2017

jpountz mentioned this issue Jul 24, 2017

Coerce decimal strings for whole number types by truncating the decimal part #25835

Merged

jimczi added help wanted adoptme good first issue low hanging fruit and removed discuss labels Jul 28, 2017

jpountz mentioned this issue Aug 1, 2017

Dynamically map all numerics to floats by default? #16018

Closed

fred84 mentioned this issue Aug 17, 2017

Stricter validation for min/max values for whole numbers #26137

Merged

colings86 added this to Search & Aggs in Background tasks Oct 11, 2017

scottsom mentioned this issue Oct 20, 2017

Out of range query against field of long type gets clipped #27052

Closed

PnPie mentioned this issue Nov 1, 2017

Change coerce and range query behavior #27219

Closed

jpountz added discuss and removed help wanted adoptme labels Apr 4, 2018

colings86 added the >deprecation label Apr 24, 2018

jpountz removed the discuss label Aug 24, 2018

akotlar mentioned this issue Aug 24, 2018

Coerce ints due to Elasticsearch removing coerce option on integers bystrogenomics/bystro#39

Open

andrershov mentioned this issue Mar 25, 2019

Bulk requests ignore "coerce" option of dynamically created fields #40409

Open

jtibshirani added the help wanted adoptme label Apr 30, 2019

jtibshirani mentioned this issue May 20, 2019

Elasticsearch accepts and stores floats in keyword mapped field #34222

Closed

rjernst added the Team:Search Meta label for search team label May 4, 2020

mdavis332 mentioned this issue Sep 24, 2020

Updates confidence mapping from Integer to Float csirtgadgets/csirtg-smrt-v1#293

Merged

mdavis332 mentioned this issue Oct 4, 2020

Updates conf mapping from int to float csirtgadgets/bearded-avenger#490

Merged

mayya-sharipova removed the help wanted adoptme label Jan 19, 2021

mayya-sharipova mentioned this issue Jan 19, 2021

Unsigned long field mapping behaviour is inconsistent with other numeric fields #67565

Closed

mayya-sharipova added the team-discuss label May 16, 2022

javanna closed this as completed Aug 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the coerce option on numeric fields. #25861

Remove the coerce option on numeric fields. #25861

jpountz commented Jul 24, 2017

jpountz commented Jul 24, 2017

jimczi commented Jul 28, 2017

nablyy commented Jul 31, 2017

jpountz commented Jul 31, 2017

liketic commented Sep 29, 2017

nablyy commented Oct 19, 2017

PnPie commented Nov 1, 2017 •

edited

akhilravipati97 commented Feb 15, 2018

akotlar commented Mar 9, 2018

jpountz commented Mar 13, 2018

akotlar commented Mar 13, 2018

jpountz commented Apr 4, 2018

akotlar commented Apr 4, 2018

jpountz commented Apr 4, 2018

akotlar commented Apr 4, 2018

jpountz commented Aug 24, 2018

akotlar commented Aug 24, 2018

mayya-sharipova commented Jul 13, 2020

jpountz commented Jul 16, 2020

hendrikmuhs commented Nov 20, 2020

mayya-sharipova commented Nov 20, 2020

copernicus365 commented May 12, 2022

javanna commented Aug 18, 2022

Remove the coerce option on numeric fields. #25861

Remove the coerce option on numeric fields. #25861

Comments

jpountz commented Jul 24, 2017

jpountz commented Jul 24, 2017

jimczi commented Jul 28, 2017

nablyy commented Jul 31, 2017

jpountz commented Jul 31, 2017

liketic commented Sep 29, 2017

nablyy commented Oct 19, 2017

PnPie commented Nov 1, 2017 • edited

akhilravipati97 commented Feb 15, 2018

akotlar commented Mar 9, 2018

jpountz commented Mar 13, 2018

akotlar commented Mar 13, 2018

jpountz commented Apr 4, 2018

akotlar commented Apr 4, 2018

jpountz commented Apr 4, 2018

akotlar commented Apr 4, 2018

jpountz commented Aug 24, 2018

akotlar commented Aug 24, 2018

mayya-sharipova commented Jul 13, 2020

jpountz commented Jul 16, 2020

hendrikmuhs commented Nov 20, 2020

mayya-sharipova commented Nov 20, 2020

copernicus365 commented May 12, 2022

javanna commented Aug 18, 2022

PnPie commented Nov 1, 2017 •

edited