Support for a fully numeric flattened field #61550

jimczi · 2020-08-25T17:31:55Z

This issue is a spinoff of #43805 that focuses on a specific use case: supporting numeric fields in the flattened field.
We've discussed this internally and agreed that it is something that we'd like to provide.
This new field could be considered as the numeric version of the flattened field where all values should be parseable as numbers. The details of the implementation are still unclear but multiple ideas were shared internally:

We could reuse the framework added for the rank_feature query where field names could be indexed as terms and values as frequencies.
We could use points with multiple dimensions and/or prefixes/suffixes to index the pair field name, value.

This issue is a placeholder to provide feedback and updates on the overall plan (supporting a fully numeric flattened field).

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-08-25T17:31:56Z

Pinging @elastic/es-search (:Search/Mapping)

jpountz · 2020-08-26T12:22:52Z

Once we have this field, I guess the next question will be how to deal with objects that have a mix of strings and numbers. This makes me wonder whether we should try to fold this functionality into the existing flattened field, or start thinking about whether we could have a sort of wrapper that could redirect fields to either flattened or its numeric variant at both index and search time, e.g. something like that:

{
  "foo": {
    "type": "flattened",
    "numeric_field_pattern": [ "*.count" ]
  }
}

so that an object like

{
  "foo": {
    "tags": [ "x", "y" ],
    "count": 42
  },
  "bar": {
    "tags": [ "x" ],
    "count": 100
  }
}

would have its foo.tags/bar.tags fields indexed and searched with flattened while the foo.count/bar.count fields would be indexed and searched with the numeric variant.

jtibshirani · 2020-10-28T18:57:48Z

@polyfractal brought up the good point that in some telemetry use cases, all values represent counts. This type of data is similar to a histogram, but with labeled buckets. For example, we could be tracking the usage of every aggregation:

{
  "agg_usage": {
    "terms": 101,
    "date_histogram": 2450,
    ...
  }
}

It would be natural to perform a histogram-like aggregation on agg_usage to sum up the counts for each entry terms, date_histogram, etc. When designing the feature, it'd be good to keep this case in mind -- for example, it could affect whether we want to distinguish long counts vs. arbitrary numerics.

hendrikmuhs · 2020-11-18T07:11:37Z

it could affect whether we want to distinguish long counts vs. arbitrary numerics

I similar fashion this feature might be useful for ML use cases. It seems to me that being able to specify the sub-type (long, float, double, ...) would be good. For ML these vectors can become huge, but on the other side don't require necessarily a double. Being able to define the sub-type (e.g. float) would be a way to choose between precision and space.

axw · 2021-02-03T00:41:06Z

Does this issue cover support for histogram and aggregate_metric_double fields? For the APM/Metrics use-case of #63530, we will need to store basic numbers, histograms, and at some point probably aggregate metrics.

egalpin · 2021-05-06T16:38:04Z

+1, following. This feature will unblock the ability to remove nested fields in a use case I have 😁

baybatu · 2021-05-31T17:51:37Z

+1, following. I need to have numeric(float) flattened fields to use on thousands of unique field names with field_value_factor functions.. Currently, I had to increase default mapping count but it's bad practice as doc said.

patodevilla · 2021-07-03T00:31:11Z

+1, following!

yshyshkin · 2021-07-30T16:33:13Z

+1. It would really help in storing lots of financial information without a mapping explosion.

Fgerthoffert · 2022-06-09T12:20:13Z

+1

koenbouwmans · 2023-02-07T16:59:54Z

+1

vchhabra · 2023-08-01T07:05:38Z

While this is being worked upon, I am able to way around numeric range query on flattened type leveraging runtime fields at query time ('query time' - as in my case the numeric field names are not known in advance).

Example:

Index Mappings

{
  "flattened_test": {
    "mappings": {
      "properties": {
        "host": {
          "type": "flattened"
        }
      }
    }
  }
}

Sample documents

"host": {
  "hostname": "bionic_1",
  "name": "bionic_1",
  "num_one": 1323
}
---
"host": {
  "hostname": "bionic_2",
  "name": "bionic_2",
  "num_one": 2323
}
---
"host": {
  "hostname": "bionic_3",
  "name": "bionic_3",
  "num_one": 3323
}

Sample Range Query

GET flattened_test/_search
{
  "runtime_mappings": {
    "doc['host.num_one']": {
      "type": "long"
    }
  },
  
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "host.num_one": {
              "gte": 4000,
              "lte": 7000
            }
          }
        }
      ]
    }
  }
}

This serves me well for the use-case at hand. And I understand the performance implications of query time runtime fields and the trade-off is acceptable in my case.

However being new to ES, wanted to validate here - if I am over-looking anything obvious or any other feedback?

Thanks,

felixbarny · 2023-08-17T07:44:45Z

To follow up and update on the use case in Elastic APM (#61550 (comment)):

We're not planning to use flattened. Instead, we'll use subobjects: false at the root of the metric mappings. This will allow ingesting metrics such as connections and connections.idle in the same index, without causing a mapping conflict. Currently, this requires all incoming documents to be flat but the ES team is working on also supporting nested object notations in documents where subobjects are disabled in the mapping: #97972. This makes adding the subobjects: false flag backwards compatible.

I'm sure there are other valid use cases for numeric flattened fields, though, such as avoiding field explosions.

Having said that, we're also working on a new way of dealing with field explosions by ignoring fields that exceed the limit instead of rejecting documents: #96235

leehaotan · 2024-03-28T01:54:02Z

+1 need numeric fields in flattened types to be fully supported for range queries

elasticsearchmachine · 2024-07-16T09:27:41Z

Pinging @elastic/es-search-foundations (Team:Search Foundations)

jimczi added >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types labels Aug 25, 2020

elasticmachine added the Team:Search Meta label for search team label Aug 25, 2020

jtibshirani mentioned this issue Aug 25, 2020

Follow-up improvements to flattened object fields. #43805

Open

3 tasks

mayya-sharipova mentioned this issue Oct 30, 2020

Please undo sparse vector deprecation #62273

Closed

mayya-sharipova self-assigned this Nov 17, 2020

jimczi mentioned this issue Feb 2, 2021

Dotted field names that conflict with objects #63530

Closed

mikecote mentioned this issue Feb 19, 2021

Make alert params searchable elastic/kibana#50213

Open

axw mentioned this issue Feb 21, 2021

Make mapping explosion on tags visible elastic/apm-server#1292

Closed

mayya-sharipova removed their assignment Mar 5, 2021

mayya-sharipova self-assigned this Apr 27, 2021

axw mentioned this issue May 28, 2021

model: align labels with ECS elastic/apm-server#3873

Closed

jsoriano mentioned this issue Jun 23, 2021

[filebeat] Add option to dedot keys to decode_json_fields processor elastic/beats#26154

Open

simitt mentioned this issue Aug 18, 2021

Switch labels and numeric_labels type to flattened elastic/apm-server#5963

Open

ChrsMark mentioned this issue Sep 1, 2021

[Agent] Support labels dedot in k8s provider elastic/beats#27019

Closed

josefschiefer27 mentioned this issue Feb 16, 2023

Add flattened field type opensearch-project/OpenSearch#1018

Closed

danielmitterdorfer mentioned this issue Oct 4, 2023

Support a keyed histogram #100242

Closed

javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for a fully numeric flattened field #61550

Support for a fully numeric flattened field #61550

jimczi commented Aug 25, 2020

elasticmachine commented Aug 25, 2020

jpountz commented Aug 26, 2020

jtibshirani commented Oct 28, 2020 •

edited

Loading

hendrikmuhs commented Nov 18, 2020

axw commented Feb 3, 2021

egalpin commented May 6, 2021

baybatu commented May 31, 2021 •

edited

Loading

patodevilla commented Jul 3, 2021

yshyshkin commented Jul 30, 2021 •

edited

Loading

Fgerthoffert commented Jun 9, 2022

koenbouwmans commented Feb 7, 2023

vchhabra commented Aug 1, 2023

felixbarny commented Aug 17, 2023 •

edited

Loading

leehaotan commented Mar 28, 2024

elasticsearchmachine commented Jul 16, 2024

Support for a fully numeric flattened field #61550

Support for a fully numeric flattened field #61550

Comments

jimczi commented Aug 25, 2020

elasticmachine commented Aug 25, 2020

jpountz commented Aug 26, 2020

jtibshirani commented Oct 28, 2020 • edited Loading

hendrikmuhs commented Nov 18, 2020

axw commented Feb 3, 2021

egalpin commented May 6, 2021

baybatu commented May 31, 2021 • edited Loading

patodevilla commented Jul 3, 2021

yshyshkin commented Jul 30, 2021 • edited Loading

Fgerthoffert commented Jun 9, 2022

koenbouwmans commented Feb 7, 2023

vchhabra commented Aug 1, 2023

felixbarny commented Aug 17, 2023 • edited Loading

leehaotan commented Mar 28, 2024

elasticsearchmachine commented Jul 16, 2024

jtibshirani commented Oct 28, 2020 •

edited

Loading

baybatu commented May 31, 2021 •

edited

Loading

yshyshkin commented Jul 30, 2021 •

edited

Loading

felixbarny commented Aug 17, 2023 •

edited

Loading